WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:02.910 [MUSIC PLAYING] 00:00:09.710 --> 00:00:12.530 DAVID MALAN: So odds are you're on the internet these days, 00:00:12.530 --> 00:00:14.300 but what does that actually mean? 00:00:14.300 --> 00:00:18.410 And indeed, this internet that we use very often these days for messaging, 00:00:18.410 --> 00:00:21.440 for email, for browsing the web and other services still, 00:00:21.440 --> 00:00:24.500 there's a whole infrastructure that underlies it that is increasingly 00:00:24.500 --> 00:00:28.130 powering new ideas, new start ups, new companies, new businesses 00:00:28.130 --> 00:00:31.040 as well as new forms of communication among humans. 00:00:31.040 --> 00:00:33.890 And yet, like most every topic we've explored, 00:00:33.890 --> 00:00:36.017 you'll realize that while it's very complex, 00:00:36.017 --> 00:00:37.850 perhaps, up here, or certainly seems complex 00:00:37.850 --> 00:00:40.580 up here, if we begin with some of the fundamentals and then layer 00:00:40.580 --> 00:00:43.670 and layer and layer on top of those, do we pretty quickly get 00:00:43.670 --> 00:00:46.970 back to today's technology but with a much better understanding of what's 00:00:46.970 --> 00:00:49.890 going on from the ground up. 00:00:49.890 --> 00:00:51.890 So here is a bit of alphabet soup. 00:00:51.890 --> 00:00:55.730 Odds are you might have seen one or more of these acronyms 00:00:55.730 --> 00:01:00.690 to date, IP, DHCP, DNS, TCP, UDP, ICMP, and so many more. 00:01:00.690 --> 00:01:03.410 These are all examples of something called protocols, 00:01:03.410 --> 00:01:06.560 where protocols are kind of like languages 00:01:06.560 --> 00:01:08.250 that computers speak with one another. 00:01:08.250 --> 00:01:10.250 They're not programming languages so they're not 00:01:10.250 --> 00:01:14.540 used by humans to make computers do things or follow instructions per se. 00:01:14.540 --> 00:01:16.490 A protocol is really a set of conventions 00:01:16.490 --> 00:01:18.410 that two computers or two computer programs 00:01:18.410 --> 00:01:21.106 might use when intercommunicating. 00:01:21.106 --> 00:01:23.480 And so what's an example of a protocol in the real world? 00:01:23.480 --> 00:01:26.440 Well, we humans have some silly protocols, one of which 00:01:26.440 --> 00:01:29.490 here is, culturally, when you meet someone to extend your hand 00:01:29.490 --> 00:01:31.490 and then he or she presumably extends their hand 00:01:31.490 --> 00:01:33.680 and you do this for who knows what reason. 00:01:33.680 --> 00:01:37.190 And now you've sort of completed that social transaction. 00:01:37.190 --> 00:01:39.830 But it's a protocol in the sense that when I extend my hand, 00:01:39.830 --> 00:01:42.950 most any polite other person knows that they're probably 00:01:42.950 --> 00:01:45.830 supposed to extend their hand as well, embrace for a moment, 00:01:45.830 --> 00:01:47.150 and then complete. 00:01:47.150 --> 00:01:50.250 And the protocol says, too, you probably do this for terribly long. 00:01:50.250 --> 00:01:53.420 And so there's these rules of thumb or actual rules 00:01:53.420 --> 00:01:55.790 that you follow when implementing protocols. 00:01:55.790 --> 00:01:59.510 And so computers, great as they are following rules, 00:01:59.510 --> 00:02:02.340 very often use protocols when they intercommunicate, 00:02:02.340 --> 00:02:04.980 in order to get data from one place to another. 00:02:04.980 --> 00:02:06.950 So let's tell exactly that story. 00:02:06.950 --> 00:02:10.259 If you're on the internet, right now, on the internet, 00:02:10.259 --> 00:02:12.050 what does that actually mean and how can it 00:02:12.050 --> 00:02:14.690 help us solve problems, ultimately, having access 00:02:14.690 --> 00:02:18.230 to this inter-networked infrastructure? 00:02:18.230 --> 00:02:21.830 Well, let's consider what happens when I first visit my favorite web 00:02:21.830 --> 00:02:22.830 page, for instance. 00:02:22.830 --> 00:02:27.810 If I go ahead and visit something like Facebook.com, I go ahead and log in 00:02:27.810 --> 00:02:29.810 and I'm immediately presented with my news feed. 00:02:29.810 --> 00:02:32.476 Or maybe your favorite website is Gmail or your favorite website 00:02:32.476 --> 00:02:36.320 is Bing or maybe your favorite website is any number of other places 00:02:36.320 --> 00:02:40.970 you might go on the web, all of which take in as input a request from you 00:02:40.970 --> 00:02:44.870 and produce, ultimately, output, the screen that you ultimately see. 00:02:44.870 --> 00:02:48.290 But how does that data get from one location to another? 00:02:48.290 --> 00:02:50.240 Let's begin to draw a picture, perhaps. 00:02:50.240 --> 00:02:53.090 And this picture might be representative of your own home network 00:02:53.090 --> 00:02:56.070 or maybe your campus network or maybe your office network. 00:02:56.070 --> 00:02:58.400 But generally speaking, you are on the internet 00:02:58.400 --> 00:03:01.250 maybe with your phone or your laptop or your desktop device, 00:03:01.250 --> 00:03:04.610 and we'll just depict that is this sort of abstract laptop here. 00:03:04.610 --> 00:03:07.880 So that laptop somehow wants to communicate with a web server 00:03:07.880 --> 00:03:09.950 elsewhere, Facebook, Google, Bing, whatever. 00:03:09.950 --> 00:03:13.080 And we're just going to present that as way over here in the picture 00:03:13.080 --> 00:03:15.680 in a really big corporate office building, perhaps. 00:03:15.680 --> 00:03:17.750 And inside of that building are the servers 00:03:17.750 --> 00:03:21.230 that compose that particular web site. 00:03:21.230 --> 00:03:25.337 But how do I get data from that server, which, if it's Google or somewhere else 00:03:25.337 --> 00:03:27.920 might be all the way in California or halfway across the world 00:03:27.920 --> 00:03:29.240 and back to my laptop? 00:03:29.240 --> 00:03:32.480 Well, somehow I have to be able to send messages to it 00:03:32.480 --> 00:03:34.160 and receive messages from it. 00:03:34.160 --> 00:03:38.450 And of course in between me and this resulting website 00:03:38.450 --> 00:03:41.060 is what we'll generally call the internet. 00:03:41.060 --> 00:03:43.280 It's kind of conveniently drawn as a cloud 00:03:43.280 --> 00:03:45.710 here, which is another semi-technical term that's 00:03:45.710 --> 00:03:47.120 come into vogue in recent years. 00:03:47.120 --> 00:03:50.420 And the cloud really just refers to internet services these days. 00:03:50.420 --> 00:03:52.020 It's not a technical term unto itself. 00:03:52.020 --> 00:03:55.510 It's just a sexier term than saying, my business is on the internet. 00:03:55.510 --> 00:03:58.010 Oversimplification, and we'll come back to that before long. 00:03:58.010 --> 00:04:01.130 But you can assume here that the internet is somehow 00:04:01.130 --> 00:04:03.050 this delivery mechanism. 00:04:03.050 --> 00:04:06.420 It somehow gets data from point A to point B and back. 00:04:06.420 --> 00:04:08.420 But how does that work? 00:04:08.420 --> 00:04:10.910 If my data's coming in as input and it's reaching, 00:04:10.910 --> 00:04:12.920 eventually, its destination and then a response 00:04:12.920 --> 00:04:16.730 is coming back in this direction, what's actually going on underneath the hood 00:04:16.730 --> 00:04:20.329 there, especially since, in the story at hand, all I've typed 00:04:20.329 --> 00:04:24.500 is something like Facebook.com Gmail.com or the like? 00:04:24.500 --> 00:04:28.460 Well, it turns out that your computer these days, when you first turn it on 00:04:28.460 --> 00:04:32.390 and you connect to the Wi-Fi in a room or you connect with an ethernet cable 00:04:32.390 --> 00:04:35.600 to the wired network, your computer receives some information 00:04:35.600 --> 00:04:36.590 automatically. 00:04:36.590 --> 00:04:42.650 Your computer speaks a protocol called DHCP, typically, Dynamic Host 00:04:42.650 --> 00:04:43.850 Configuration Protocol. 00:04:43.850 --> 00:04:46.820 But in most of these cases, the acronym isn't really what's important, 00:04:46.820 --> 00:04:50.360 certainly, it's what the protocol itself does. 00:04:50.360 --> 00:04:53.900 And in this case, this Dynamic Host Configuration Protocol 00:04:53.900 --> 00:04:58.460 dynamically configures hosts via a protocol, if you will. 00:04:58.460 --> 00:04:59.750 So what does this mean? 00:04:59.750 --> 00:05:02.885 Essentially DHCP says this, when you turn on your computer 00:05:02.885 --> 00:05:04.760 or you take out your phone for the first time 00:05:04.760 --> 00:05:08.550 and you're connected on Wi-Fi or to a wired network, it says, hello, world. 00:05:08.550 --> 00:05:09.400 I am alive. 00:05:09.400 --> 00:05:13.000 I would like to be given an address that I can communicate 00:05:13.000 --> 00:05:14.920 with other computers on the internet. 00:05:14.920 --> 00:05:18.280 It's not quite that verbose, perhaps, but it is a question. 00:05:18.280 --> 00:05:21.490 Hey, computers around me, please give me an address. 00:05:21.490 --> 00:05:25.630 And what it gives you is what's called an IP address, Internet Protocol. 00:05:25.630 --> 00:05:28.900 So just as in the real world where physical buildings have historically 00:05:28.900 --> 00:05:31.690 been uniquely addressed with postal addresses 00:05:31.690 --> 00:05:35.980 like Harvard's computer science building is at 33 Oxford Street 00:05:35.980 --> 00:05:38.980 Cambridge, Massachusetts, USA. 00:05:38.980 --> 00:05:43.060 02138 is the more precise zip code as well. 00:05:43.060 --> 00:05:45.580 That uniquely identifies that building in the world. 00:05:45.580 --> 00:05:47.851 So does my computer need an address, and it's not 00:05:47.851 --> 00:05:50.100 going to be some free form address like that in words. 00:05:50.100 --> 00:05:51.933 It's actually going to be a numeric address. 00:05:51.933 --> 00:05:57.265 Specifically, I'm going to get an IP address of the form number dot number 00:05:57.265 --> 00:06:02.500 dot number dot number, so four numbers separated by dots. 00:06:02.500 --> 00:06:05.680 Each of those four numbers happens to be a byte long 00:06:05.680 --> 00:06:10.780 or eight bits, so each of these numbers, therefore is between 0 and 255, 00:06:10.780 --> 00:06:15.280 and so this means, long story short, that the total address is 32 bits-- 00:06:15.280 --> 00:06:19.669 plus 8 plus 8 plus 8-- and that means there's four billion possible addresses 00:06:19.669 --> 00:06:20.210 in the world. 00:06:20.210 --> 00:06:23.140 And that's great because people have got a lot of computers and a lot of laptops 00:06:23.140 --> 00:06:24.760 and a lot of desktops and servers these days. 00:06:24.760 --> 00:06:26.740 But it turns out we're actually running out 00:06:26.740 --> 00:06:28.450 because we have so many such devices. 00:06:28.450 --> 00:06:31.270 So there's a newer version of IP that's increasingly 00:06:31.270 --> 00:06:33.850 being used called IP version 6. 00:06:33.850 --> 00:06:37.020 We're talking here about IP version 4, since it's so omnipresent. 00:06:37.020 --> 00:06:41.440 And IP version 6, just so you know, uses 128 bits for its addresses, 00:06:41.440 --> 00:06:45.550 way more than 32, so we'll be good to go for some time. 00:06:45.550 --> 00:06:48.450 But DHCP gives me this address, an IP address of the form something 00:06:48.450 --> 00:06:50.200 dot something dot something dot something. 00:06:50.200 --> 00:06:53.860 And the purpose of this address is to help my data get from point A 00:06:53.860 --> 00:06:56.800 to point B. And indeed, anytime my computer sends 00:06:56.800 --> 00:06:59.470 a request on the internet like, Facebook, 00:06:59.470 --> 00:07:02.950 please show me my news feed, or Gmail, please show me my inbox, 00:07:02.950 --> 00:07:05.470 my computer has to use that IP address. 00:07:05.470 --> 00:07:08.180 So much like if sending a letter in the real world, 00:07:08.180 --> 00:07:10.390 you might have an otherwise blank envelope 00:07:10.390 --> 00:07:13.270 and you might want to send a message to someone else in the world, 00:07:13.270 --> 00:07:15.160 you might write their physical address. 00:07:15.160 --> 00:07:21.190 But in the computer world, we might write something like 1.2.3.4 00:07:21.190 --> 00:07:25.180 in the to field, assuming that this is the IP address to which we 00:07:25.180 --> 00:07:26.690 want to send this data. 00:07:26.690 --> 00:07:31.670 Meanwhile, my from address might be 5.6.7.8, 00:07:31.670 --> 00:07:33.610 so I'll write it in the top left hand corner 00:07:33.610 --> 00:07:36.970 by convention, whereby that indicates to the whole internet this 00:07:36.970 --> 00:07:39.500 is where this request came from. 00:07:39.500 --> 00:07:43.840 Now, I know my origin address, the source address here at top left 00:07:43.840 --> 00:07:45.610 because DHCP told me. 00:07:45.610 --> 00:07:48.310 How do I know one, two, three, four? 00:07:48.310 --> 00:07:52.690 How do I know the IP address of Facebook.com or Gmail.com, right? 00:07:52.690 --> 00:07:54.940 We don't live in the world of 800 numbers 00:07:54.940 --> 00:07:58.120 anymore, where you dial 1-800 something, something, something, something, 00:07:58.120 --> 00:08:00.900 something and you have to advertise your phone number, per se. 00:08:00.900 --> 00:08:03.819 We don't necessarily live only in the world of 1-800-COLLECT 00:08:03.819 --> 00:08:06.610 any more where we had these mnemonics where you had letters mapping 00:08:06.610 --> 00:08:08.290 to numbers just to help remember it. 00:08:08.290 --> 00:08:10.990 We went full in on this idea of mnemonics 00:08:10.990 --> 00:08:19.210 such that now we have Facebook.com and Gmail.com and no numbers 00:08:19.210 --> 00:08:21.070 whatsoever for us humans to remember. 00:08:21.070 --> 00:08:24.820 So thankfully, it turns out there's another system in this world, 00:08:24.820 --> 00:08:29.150 another acronym, if you will, a new one now, called DNS, Domain Name System. 00:08:29.150 --> 00:08:31.600 So there are also in the world, not just DHCP servers that 00:08:31.600 --> 00:08:34.330 have people IP addresses from their local network, 00:08:34.330 --> 00:08:37.030 there's also DNS servers whose purpose in life 00:08:37.030 --> 00:08:40.299 is to convert domain names to IP addresses and vice 00:08:40.299 --> 00:08:42.830 versa and a few other features as well. 00:08:42.830 --> 00:08:44.150 Now, what does that mean? 00:08:44.150 --> 00:08:46.790 That means that when my Mac or my PC sees 00:08:46.790 --> 00:08:48.775 little old me, the human, typing Facebook.com 00:08:48.775 --> 00:08:55.570 or Gmail.com, my laptop contacts a nearby DNS server and says, 00:08:55.570 --> 00:08:59.105 hey, my human has asked me for Facebook.com. 00:08:59.105 --> 00:09:00.380 What is its IP address? 00:09:00.380 --> 00:09:03.400 And DNS server's purpose in life is to answer that question and say, 00:09:03.400 --> 00:09:07.030 oh, Facebook.com, it's 1.2.3.4. 00:09:07.030 --> 00:09:08.860 Use that address instead. 00:09:08.860 --> 00:09:11.110 Now, thankfully, my computer can now write 00:09:11.110 --> 00:09:14.140 that number on its virtual envelope, so to speak, and then 00:09:14.140 --> 00:09:16.330 pass that envelope out to the internet. 00:09:16.330 --> 00:09:20.680 And because of these numeric addresses, it will be properly, hopefully, 00:09:20.680 --> 00:09:24.160 routed across the internet to its destination. 00:09:24.160 --> 00:09:28.090 Because it turns out inside of the internet here, 00:09:28.090 --> 00:09:31.330 interconnecting everything in between point 00:09:31.330 --> 00:09:35.860 A and B are things called routers or gateways. 00:09:35.860 --> 00:09:38.170 And I could draw this picture in any number of ways. 00:09:38.170 --> 00:09:41.596 But the point is that it's just so darn interconnected. 00:09:41.596 --> 00:09:43.720 And indeed, there might be even more pathways still 00:09:43.720 --> 00:09:45.040 or maybe even fewer pathways. 00:09:45.040 --> 00:09:47.290 Indeed, on the internet, there's often multiple ways 00:09:47.290 --> 00:09:50.590 for data to get from one point to another, some shorter, some longer. 00:09:50.590 --> 00:09:52.750 But there's this resilience, this redundancy, 00:09:52.750 --> 00:09:55.390 and this was a feature back in the day, especially in so far 00:09:55.390 --> 00:09:58.240 as the internet had militaristic origins. 00:09:58.240 --> 00:10:01.900 It was meant to be redone into as to withstand failures of one or more 00:10:01.900 --> 00:10:04.540 of these nodes, these dots in the picture. 00:10:04.540 --> 00:10:07.450 Now, each of these dots is just a server, really, 00:10:07.450 --> 00:10:10.300 a special server called router or gateway, whose purpose in life 00:10:10.300 --> 00:10:12.550 is to do exactly that, to route data. 00:10:12.550 --> 00:10:15.490 Upon receiving a virtual envelope like that one, 00:10:15.490 --> 00:10:19.930 it looks at the to address realizes, oh, this is destined for 1.2.3.4. 00:10:19.930 --> 00:10:21.850 I know that that address is over this way. 00:10:21.850 --> 00:10:23.900 Meanwhile, if it gets another envelope from someone else, 00:10:23.900 --> 00:10:25.480 it might say, oh, this is some other address. 00:10:25.480 --> 00:10:26.719 It's going to go this way. 00:10:26.719 --> 00:10:28.510 And so routers have multiple cables or they 00:10:28.510 --> 00:10:30.730 have multiple virtual network connections elsewhere 00:10:30.730 --> 00:10:33.670 or wireless connections, any number of possible connections 00:10:33.670 --> 00:10:35.210 might they have to other routers. 00:10:35.210 --> 00:10:39.070 And so it can route it to its next hop, so to speak. 00:10:39.070 --> 00:10:41.890 And generally on the internet, within 30 hops, 00:10:41.890 --> 00:10:45.850 within 30 transmissions from router, router, router 00:10:45.850 --> 00:10:48.190 will your data get from one point to another. 00:10:48.190 --> 00:10:50.340 And it might not follow the same path each time 00:10:50.340 --> 00:10:52.690 but it will traverse this so-called internet. 00:10:52.690 --> 00:10:54.760 And so that's kind of what the internet is. 00:10:54.760 --> 00:10:57.490 It's this collection of routers and it's this collection 00:10:57.490 --> 00:11:01.450 of networks, a network of networks that is incredibly 00:11:01.450 --> 00:11:03.940 interconnected in different ways. 00:11:03.940 --> 00:11:05.840 So DHCP gives me an IP address. 00:11:05.840 --> 00:11:07.220 So I have a unique IP address. 00:11:07.220 --> 00:11:10.270 DHCP, it turns out, also tells me what the IP address 00:11:10.270 --> 00:11:14.950 is of my local DNS server so I know whom to ask to convert domain names to IP 00:11:14.950 --> 00:11:15.520 addresses. 00:11:15.520 --> 00:11:18.640 But once I have that, I can now use a protocol 00:11:18.640 --> 00:11:27.000 called TCP to send my data reliably, typically, from one point to another. 00:11:27.000 --> 00:11:29.587 So whereas IP is responsible for a few things, 00:11:29.587 --> 00:11:31.670 one of its most important functions is this notion 00:11:31.670 --> 00:11:34.610 of addressing and standardizing how things are addressed. 00:11:34.610 --> 00:11:37.190 But TCP, one of its most salient features 00:11:37.190 --> 00:11:40.640 is to guarantee, with high probability, delivery. 00:11:40.640 --> 00:11:42.799 And what I mean by that is that bad stuff can 00:11:42.799 --> 00:11:44.340 happen in the middle of the internet. 00:11:44.340 --> 00:11:45.756 These routers can get really busy. 00:11:45.756 --> 00:11:48.020 They can get really congested and overloaded. 00:11:48.020 --> 00:11:49.360 And so routers might-- 00:11:49.360 --> 00:11:52.214 well, virtually drop packets. 00:11:52.214 --> 00:11:54.380 They might receive so many packets at once they just 00:11:54.380 --> 00:11:57.290 can't, like a human, deal with it all at one time because they 00:11:57.290 --> 00:12:01.040 have a finite amount of memory or RAM or disk space and so they drop them, 00:12:01.040 --> 00:12:01.610 so to speak. 00:12:01.610 --> 00:12:04.750 They just delete them and they expect the senders to resend them. 00:12:04.750 --> 00:12:08.570 TCP is a protocol, another agreement between computers, 00:12:08.570 --> 00:12:12.740 that if the receiving computer realizes, hmm, I got some of your packets 00:12:12.740 --> 00:12:16.940 but not all of them, TCP mandates, much like our human handshake, 00:12:16.940 --> 00:12:18.440 that something next should happen. 00:12:18.440 --> 00:12:23.600 TCP says, my laptop should retransmit that virtual envelope. 00:12:23.600 --> 00:12:26.510 But TCP allows us to do something more than guarantee 00:12:26.510 --> 00:12:28.400 with high probability delivery of data. 00:12:28.400 --> 00:12:32.750 It also allows us to multiplex among services, or put more simply, 00:12:32.750 --> 00:12:35.724 it allows a server to receive different types of data 00:12:35.724 --> 00:12:37.640 for different types of services, for instance, 00:12:37.640 --> 00:12:42.450 web services on the server, email services, chat services and the like. 00:12:42.450 --> 00:12:47.330 And so it turns out that on this virtual envelope that gets sent from a computer 00:12:47.330 --> 00:12:49.760 to a server, it's actually not sufficient 00:12:49.760 --> 00:12:54.110 for there to be the return address and the IP address of the destination. 00:12:54.110 --> 00:12:56.870 I also need to specify what type of information 00:12:56.870 --> 00:13:00.140 is inside this envelope, or equivalently, what kind of service 00:13:00.140 --> 00:13:01.430 I'm trying to contact. 00:13:01.430 --> 00:13:05.030 And I could do this by specifying in words what's inside this envelope. 00:13:05.030 --> 00:13:07.305 Maybe it's something like HTTP, the prefix 00:13:07.305 --> 00:13:08.930 that you're familiar with from the web. 00:13:08.930 --> 00:13:09.650 Maybe it's an email. 00:13:09.650 --> 00:13:11.233 Maybe it's a chat message or the like. 00:13:11.233 --> 00:13:13.730 But if it is, in fact, something like HTTP, 00:13:13.730 --> 00:13:17.720 turns out the convention is not to use words but to use numbers. 00:13:17.720 --> 00:13:21.290 And so in fact, I need to pull one other piece of information 00:13:21.290 --> 00:13:25.160 on this envelope, which is a so-called port number, a TCP port 00:13:25.160 --> 00:13:29.740 number, which is numerically printed after a colon on a virtual envelope 00:13:29.740 --> 00:13:30.240 this. 00:13:30.240 --> 00:13:32.810 And in this case I wrote 80 because 80 happens to be, 00:13:32.810 --> 00:13:36.410 by human convention, the number we humans agreed on some years ago, 00:13:36.410 --> 00:13:39.680 identifies web services on servers. 00:13:39.680 --> 00:13:42.380 But this means that if the server I'm sending this to, 00:13:42.380 --> 00:13:46.070 1.2.3.4, actually has other services on it like a chat server 00:13:46.070 --> 00:13:49.520 and an email server and the like, this won't get confused with an email 00:13:49.520 --> 00:13:52.361 that I or someone else am sending to the server or a chat message. 00:13:52.361 --> 00:13:54.110 The server will know upon receipt of this, 00:13:54.110 --> 00:13:56.450 oh, this is a request for a web page. 00:13:56.450 --> 00:14:00.060 Let me send this virtual envelope to the web server. 00:14:00.060 --> 00:14:02.690 But HTTP isn't the only such protocol. 00:14:02.690 --> 00:14:07.010 There are something called UDP, which is common in some circles as well. 00:14:07.010 --> 00:14:10.400 UDP works a little differently, in so far 00:14:10.400 --> 00:14:14.750 as its feature is to not guarantee delivery. 00:14:14.750 --> 00:14:18.480 If some data gets lost, packets get dropped, so to speak, 00:14:18.480 --> 00:14:22.100 for whatever reasons, malfunction, technical difficulties, 00:14:22.100 --> 00:14:25.250 routers are overloaded, UDP says, our protocol 00:14:25.250 --> 00:14:28.350 shall be not to retransmit that data. 00:14:28.350 --> 00:14:30.980 And that's a strange thing, because it sounds worse. 00:14:30.980 --> 00:14:34.700 And yet, this protocol's been around for quite some time, still used, 00:14:34.700 --> 00:14:38.020 quite appropriate in some contexts. 00:14:38.020 --> 00:14:42.120 But what context would you actually want to just forge ahead, irrespective 00:14:42.120 --> 00:14:44.740 of getting complete information? 00:14:44.740 --> 00:14:49.320 Well, go to here is something like videoconferencing or audio conferencing 00:14:49.320 --> 00:14:52.440 or live TV on the internet, watching a game like a football game, 00:14:52.440 --> 00:14:53.200 for instance. 00:14:53.200 --> 00:14:55.890 If you want to watch it in real time, you 00:14:55.890 --> 00:14:59.130 might prefer that the stream, the bits that 00:14:59.130 --> 00:15:02.970 are coming from the NFL or wherever to your computer don't actually buffer 00:15:02.970 --> 00:15:04.110 don't actually stall. 00:15:04.110 --> 00:15:06.840 You would rather miss a second so that at least you stay 00:15:06.840 --> 00:15:10.830 current in real time with that game, or video conferencing even more so. 00:15:10.830 --> 00:15:14.010 It'd kind of be annoying if you have a bad connection or some packets 00:15:14.010 --> 00:15:17.220 get dropped and you just have to wait and wait for the person's voice 00:15:17.220 --> 00:15:18.780 or image to be retransmitted. 00:15:18.780 --> 00:15:20.790 You'd rather just say, what did you say? 00:15:20.790 --> 00:15:22.260 Could you repeat yourself? 00:15:22.260 --> 00:15:23.190 Say again? 00:15:23.190 --> 00:15:26.050 You can just use human protocols to deal with that, too. 00:15:26.050 --> 00:15:30.321 So sometimes you want live streaming applications for whatever purpose 00:15:30.321 --> 00:15:32.070 and you want the data just to keep coming. 00:15:32.070 --> 00:15:34.920 As much of it as can make it through is great. 00:15:34.920 --> 00:15:38.380 But you don't necessarily want it to be resent. 00:15:38.380 --> 00:15:40.787 So data is going from one point to another, 00:15:40.787 --> 00:15:42.120 but how long does all this take? 00:15:42.120 --> 00:15:45.150 My god, this is kind of a long story just to get data there. 00:15:45.150 --> 00:15:46.770 Well, let's do an experiment. 00:15:46.770 --> 00:15:50.740 Let's go ahead and pull up a program that uses a different protocol 00:15:50.740 --> 00:15:51.762 altogether, ICMP. 00:15:51.762 --> 00:15:53.220 And there's other protocols, still. 00:15:53.220 --> 00:15:55.011 This one's a little more technical but it's 00:15:55.011 --> 00:15:57.870 wonderfully revealing in a few ways. 00:15:57.870 --> 00:16:00.270 I'm on my Mac here in the so-called terminal window 00:16:00.270 --> 00:16:02.970 that you can pull up something similar on Windows and other operating systems 00:16:02.970 --> 00:16:03.670 as well. 00:16:03.670 --> 00:16:05.461 And what I'm going to do is literally trace 00:16:05.461 --> 00:16:08.260 the route between my laptop here and some foreign server, 00:16:08.260 --> 00:16:12.120 for instance, one on the west coast of the US, Berkeley's web server. 00:16:12.120 --> 00:16:18.750 So let me do that, traceroute, www.berkeley.edu, enter. 00:16:18.750 --> 00:16:22.920 And curiously, we start to see a whole bunch of lines of output, most of them 00:16:22.920 --> 00:16:23.670 numerical. 00:16:23.670 --> 00:16:26.590 And indeed, notice that each of these is an IP address. 00:16:26.590 --> 00:16:28.920 But what is it an IP address of? 00:16:28.920 --> 00:16:32.370 Well, we have 18 of these between me and Berkeley, apparently. 00:16:32.370 --> 00:16:36.532 Turns out those represent routers between me and Berkeley, California. 00:16:36.532 --> 00:16:38.490 Each of them has an IP address and each of them 00:16:38.490 --> 00:16:42.420 has a measurement of how long it took my data to get from my Mac to that router. 00:16:42.420 --> 00:16:43.344 It's highly variable. 00:16:43.344 --> 00:16:45.010 Notice, it's kind of all over the place. 00:16:45.010 --> 00:16:46.176 In fact, this is just weird. 00:16:46.176 --> 00:16:48.760 This took 3,000 milliseconds or three seconds, 00:16:48.760 --> 00:16:51.840 so I'm guessing that that router in row eight 00:16:51.840 --> 00:16:55.526 was congested for some reason, some kind of network issue there temporarily, 00:16:55.526 --> 00:16:57.150 but then my data actually went through. 00:16:57.150 --> 00:16:58.250 And it's not cumulative. 00:16:58.250 --> 00:17:01.230 These are individual tests from my Mac to each of these routers 00:17:01.230 --> 00:17:02.720 iteratively, one at a time. 00:17:02.720 --> 00:17:04.470 And you can kind of get an aggregate sense 00:17:04.470 --> 00:17:07.980 of how long it takes, therefore, for data to get from the east coast 00:17:07.980 --> 00:17:08.730 to the west coast. 00:17:08.730 --> 00:17:11.438 If we look at some of the later numbers, they're kind of variable 00:17:11.438 --> 00:17:14.369 but they seem to be around 75 milliseconds. 00:17:14.369 --> 00:17:15.750 So this is kind of extraordinary. 00:17:15.750 --> 00:17:19.510 If you want to fly from Boston, Massachusetts to San Francisco, 00:17:19.510 --> 00:17:21.569 it's going to take you five, six, seven hours. 00:17:21.569 --> 00:17:23.940 You want to send an email or send a packet, 00:17:23.940 --> 00:17:26.119 it's going to take you 75 milliseconds. 00:17:26.119 --> 00:17:29.345 That's astonishing, how quickly the data can transmit. 00:17:29.345 --> 00:17:31.220 Now, notice this is not all that enlightening 00:17:31.220 --> 00:17:32.842 knowing these IP addresses. 00:17:32.842 --> 00:17:34.800 But eventually, some of them have domain names, 00:17:34.800 --> 00:17:37.770 just because the humans controlling those routers decided, 00:17:37.770 --> 00:17:41.220 we're going to give these routers actual names, domain names, 00:17:41.220 --> 00:17:43.650 as opposed to just having IP addresses. 00:17:43.650 --> 00:17:47.760 And you can often, but not always, infer from the domain names where they are. 00:17:47.760 --> 00:17:51.450 So I'm going to guess that at least row 11 00:17:51.450 --> 00:17:58.470 here, I don't know what XE7000.rtsw is, but losa.net, Los 00:17:58.470 --> 00:18:00.060 Angeles in California. 00:18:00.060 --> 00:18:03.330 I'm guessing my data kind of came into Southern California first. 00:18:03.330 --> 00:18:04.810 But then notice what happens next. 00:18:04.810 --> 00:18:07.950 A couple of nameless servers, LAX, so maybe that's the airport. 00:18:07.950 --> 00:18:10.260 Indeed, routers, for historical reasons, tend 00:18:10.260 --> 00:18:13.060 to be named after a nearby airport codes. 00:18:13.060 --> 00:18:15.120 I'm not sure what this next one is here but I do 00:18:15.120 --> 00:18:18.510 recognize Oakland and UCB, UC Berkeley. 00:18:18.510 --> 00:18:23.350 So I'm guessing one of the next routers is actually in Oakland or near Oakland. 00:18:23.350 --> 00:18:26.280 And so that's a pretty long cable or interconnection essentially 00:18:26.280 --> 00:18:28.110 between LA and Berkeley. 00:18:28.110 --> 00:18:30.750 But the result, ultimately, is that my data makes its way 00:18:30.750 --> 00:18:33.510 to Berkeley, this time via this path. 00:18:33.510 --> 00:18:35.750 If I ran it again now or in a day or a week, 00:18:35.750 --> 00:18:37.500 the path might be a little different based 00:18:37.500 --> 00:18:43.230 on congestion and interconnectivity, but the data actually gets there. 00:18:43.230 --> 00:18:46.940 And cutely enough, it looks like Berkeley's web server is called Cal web 00:18:46.940 --> 00:18:49.200 farm prod-- for production-- 00:18:49.200 --> 00:18:51.590 ist.berkeley.edu. 00:18:51.590 --> 00:18:53.520 75 milliseconds only. 00:18:53.520 --> 00:18:55.950 But what about this, what if we don't stop at the edge 00:18:55.950 --> 00:19:01.412 as we do at the edge of this continent but keep going? 00:19:01.412 --> 00:19:02.370 What's going to happen? 00:19:02.370 --> 00:19:07.410 Well, let me try to trace the route to, say, www.cnn.co.jp, 00:19:07.410 --> 00:19:10.170 the domain name for what I presume is going 00:19:10.170 --> 00:19:14.900 to be the Japanese version of CNN's web site in Japan. 00:19:14.900 --> 00:19:17.840 Here, too, we have a bunch of nameless servers just with IP addresses. 00:19:17.840 --> 00:19:19.214 Gets through them pretty quickly. 00:19:19.214 --> 00:19:21.020 We seem to have some lulls sometimes. 00:19:21.020 --> 00:19:24.140 This program won't-- sometimes the routers won't respond to these queries 00:19:24.140 --> 00:19:26.120 so they remain, essentially, anonymous. 00:19:26.120 --> 00:19:28.560 But now this is quite interesting. 00:19:28.560 --> 00:19:29.570 Oh, my god. 00:19:29.570 --> 00:19:36.470 We went from routers 12, 13, 14, 15 taking about 63 milliseconds, 00:19:36.470 --> 00:19:40.400 give or take, to 193 milliseconds, which isn't a blip because it 00:19:40.400 --> 00:19:42.650 stays around that value, 180 milliseconds, 00:19:42.650 --> 00:19:46.110 160 milliseconds, 177 milliseconds. 00:19:46.110 --> 00:19:52.460 That's a big jump of 100-some milliseconds just between routers 15 00:19:52.460 --> 00:19:54.575 and 16. 00:19:54.575 --> 00:19:56.360 Why might that be? 00:19:56.360 --> 00:20:00.850 What could be between routers 15 and 16? 00:20:00.850 --> 00:20:04.834 Well, if you know your geography, it might very we be the Pacific Ocean. 00:20:04.834 --> 00:20:07.000 There's quite a bit of distance, there's quite a bit 00:20:07.000 --> 00:20:10.660 of cabling that actually connects the west coast of the country to Japan 00:20:10.660 --> 00:20:14.240 and other areas in Asia and beyond, and that's what's pretty amazing. 00:20:14.240 --> 00:20:17.440 Not only is there interconnectivity on the internet these days via cabling 00:20:17.440 --> 00:20:21.010 and via Wi-Fi signals and via satellite signals, 00:20:21.010 --> 00:20:24.280 via microwave signals and the like, you have so many different ways for data 00:20:24.280 --> 00:20:25.160 to be transmitted. 00:20:25.160 --> 00:20:28.060 And it's absolutely astonishing and exciting, dare I say, 00:20:28.060 --> 00:20:30.072 just how interconnected the world now is. 00:20:30.072 --> 00:20:32.530 In fact, thanks to this animation online, let's take a look 00:20:32.530 --> 00:20:36.776 and appreciate just how extensive this network actually is. 00:20:36.776 --> 00:20:39.752 [MUSIC PLAYING] 00:21:37.330 --> 00:21:40.607 All right, so let's actually solve a problem now with this internet. 00:21:40.607 --> 00:21:43.440 All right, the internet, as you probably heard, is filled with cats. 00:21:43.440 --> 00:21:45.810 And yet, these cat images can be pretty big. 00:21:45.810 --> 00:21:47.790 And indeed, bigger, still, than images are 00:21:47.790 --> 00:21:50.230 things like video files from Netflix and the like. 00:21:50.230 --> 00:21:53.070 And so there's huge amounts of traffic transmitting 00:21:53.070 --> 00:21:54.940 over those kinds of interconnections. 00:21:54.940 --> 00:21:57.522 So how do we ensure, at least with high probability, 00:21:57.522 --> 00:21:58.980 that data can actually get through? 00:21:58.980 --> 00:22:03.340 How can we ensure that there's some form of fairness, if not net neutrality, 00:22:03.340 --> 00:22:06.450 so that my data can get to its destination 00:22:06.450 --> 00:22:08.400 just as readily as your data can get there? 00:22:08.400 --> 00:22:10.710 Well, sometimes it's opportune to actually take 00:22:10.710 --> 00:22:13.590 big packets of information and chop them up. 00:22:13.590 --> 00:22:17.400 So indeed, what a computer will often do, thanks to TCP/IP, 00:22:17.400 --> 00:22:19.320 the combination of these protocols, is we'll 00:22:19.320 --> 00:22:22.180 take large files and large images, in this case, 00:22:22.180 --> 00:22:24.660 tear them up into, say, roughly-- 00:22:24.660 --> 00:22:29.610 oops-- equal sized parts like this here and then tear it down even further, 00:22:29.610 --> 00:22:32.940 perhaps, to get it into a smaller byte-sized piece 00:22:32.940 --> 00:22:37.170 and then send not only one packet of information over the internet. 00:22:37.170 --> 00:22:42.990 But instead, put one piece of information in this packet here. 00:22:42.990 --> 00:22:46.770 Put one other piece of information in this packet here, 00:22:46.770 --> 00:22:49.746 whose addressing, both to and from, is identical. 00:22:49.746 --> 00:22:51.870 And then do the same thing for the two other pieces 00:22:51.870 --> 00:22:55.290 so that ultimately we have four packets, each of which 00:22:55.290 --> 00:22:58.100 contains one portion, one quarter, in this case, 00:22:58.100 --> 00:23:03.250 of the resulting message, all of which are destined for the same destination. 00:23:03.250 --> 00:23:06.970 But the problem to be solved, now, is what do you do with this information? 00:23:06.970 --> 00:23:09.150 If I have four seemingly identical envelopes 00:23:09.150 --> 00:23:11.880 but inside of which are disparate pieces of information 00:23:11.880 --> 00:23:13.950 that somehow need to be reassembled-- 00:23:13.950 --> 00:23:16.950 let's put on our proverbial engineering hats-- 00:23:16.950 --> 00:23:18.210 how do you solve this problem? 00:23:18.210 --> 00:23:21.209 Is this sufficient information on the envelopes so 00:23:21.209 --> 00:23:24.000 that if I send this out on the internet toward Berkeley or Stanford 00:23:24.000 --> 00:23:29.190 or Facebook or wherever, how does that recipient know what to do with it? 00:23:29.190 --> 00:23:31.110 What would you, the human, do if you have 00:23:31.110 --> 00:23:33.900 not virtual but physical envelopes? 00:23:33.900 --> 00:23:35.910 Well, here, too, and here's an opportunity 00:23:35.910 --> 00:23:39.000 really to bring to bear human intuition to a problem that 00:23:39.000 --> 00:23:43.500 seems fairly technical and well beyond one's own technical understanding. 00:23:43.500 --> 00:23:47.220 And yet, it really is just a technical manifestation of a real world problem. 00:23:47.220 --> 00:23:49.260 I need to keep these in order somehow. 00:23:49.260 --> 00:23:50.100 So you know what? 00:23:50.100 --> 00:23:55.630 I'm going to say something like one of four on the first one, like this. 00:23:55.630 --> 00:23:59.700 The next one, I'm going to say two of four on the next one, like this. 00:23:59.700 --> 00:24:03.210 And then I'm going to say three of four and then on the next one 00:24:03.210 --> 00:24:06.150 here, I'm going to put four of four. 00:24:06.150 --> 00:24:07.380 And what's the takeaway, now? 00:24:07.380 --> 00:24:11.010 Now, whoever is the recipient of these several envelopes 00:24:11.010 --> 00:24:13.110 as I send them out on the internet-- and indeed, 00:24:13.110 --> 00:24:14.776 they don't have to follow the same path. 00:24:14.776 --> 00:24:15.690 One can go this way. 00:24:15.690 --> 00:24:17.031 One can be routed that way. 00:24:17.031 --> 00:24:18.280 Another can go to this router. 00:24:18.280 --> 00:24:19.530 Another can go to that router. 00:24:19.530 --> 00:24:22.113 Because they're all addressed and because all of these routers 00:24:22.113 --> 00:24:24.360 are somehow interconnected, all four of those packets 00:24:24.360 --> 00:24:27.490 will hopefully get to their destination. 00:24:27.490 --> 00:24:29.880 But if they don't, the recipient can look 00:24:29.880 --> 00:24:33.686 at that additional detail I wrote on the envelope and see, oh, I got part one. 00:24:33.686 --> 00:24:34.310 I got part two. 00:24:34.310 --> 00:24:35.018 I got part three. 00:24:35.018 --> 00:24:37.290 But where is part four of four? 00:24:37.290 --> 00:24:38.970 It didn't arrive because of congestion. 00:24:38.970 --> 00:24:41.480 Literally got dropped on the floor or not picked up. 00:24:41.480 --> 00:24:43.710 So the computer, who's supposed to be receiving 00:24:43.710 --> 00:24:47.850 that data, thanks to TCP recall, can say, hey, please send me again 00:24:47.850 --> 00:24:49.650 packet four of four. 00:24:49.650 --> 00:24:52.080 And so as technical as the internet might seem, 00:24:52.080 --> 00:24:55.860 it really, again, is just some fairly intuitive solutions 00:24:55.860 --> 00:24:59.250 to problems like this, albeit translated to more technical contexts, more 00:24:59.250 --> 00:25:02.310 technical protocols, and more technical languages. 00:25:02.310 --> 00:25:05.670 But let's look at some more user-facing protocols. 00:25:05.670 --> 00:25:08.760 The ones we've discussed thus far are fairly low level, if you will. 00:25:08.760 --> 00:25:11.010 And indeed, there's this whole internet hierarchy 00:25:11.010 --> 00:25:13.710 of protocols layer on protocols layer on protocols 00:25:13.710 --> 00:25:17.700 so that what we humans really tend to care about, if we're not the engineers 00:25:17.700 --> 00:25:21.210 but we're really the software developers and we're the users of applications, 00:25:21.210 --> 00:25:24.210 we care about application layer protocols that 00:25:24.210 --> 00:25:28.050 is right between the human and all of those lower level protocols. 00:25:28.050 --> 00:25:32.037 For instance, these, at least one of which has got to jump out at you, HTTP. 00:25:32.037 --> 00:25:33.120 Odds are you've seen this. 00:25:33.120 --> 00:25:34.900 Odds are you've typed this, though decreasingly 00:25:34.900 --> 00:25:37.858 do you have to still type it because browsers will just add it for you, 00:25:37.858 --> 00:25:38.550 HTTP. 00:25:38.550 --> 00:25:43.460 The secure or encrypted version, HTTPS, IMAP for email in-bounds, 00:25:43.460 --> 00:25:50.040 SMTP for email outbound, SFTP for Secure File Transfer, SSH for Secure Shell, 00:25:50.040 --> 00:25:54.090 an encrypted text textual channel between two computers, and many more. 00:25:54.090 --> 00:25:59.430 But HTTP, let's focus on that one because that is Hypertext Transfer 00:25:59.430 --> 00:26:00.810 Protocol. 00:26:00.810 --> 00:26:04.860 Or HTTPS, the same but the S stands for-- 00:26:04.860 --> 00:26:09.670 not savings-- secure, so it's actually encrypted in this case. 00:26:09.670 --> 00:26:11.590 So what does this actually mean? 00:26:11.590 --> 00:26:14.580 Well, at the end of the day, HTTP is a protocol 00:26:14.580 --> 00:26:19.770 that governs what kinds of messages go inside of those envelopes 00:26:19.770 --> 00:26:22.500 that I've been preparing for the internet, what kinds of messages 00:26:22.500 --> 00:26:24.420 go inside of those envelopes. 00:26:24.420 --> 00:26:28.620 And it turns out the simplest message that a computer sends 00:26:28.620 --> 00:26:33.230 through this whole internet, ultimately, inside of virtual envelope 00:26:33.230 --> 00:26:38.330 is quite often, thanks to HTTP, inside of this virtual envelope, 00:26:38.330 --> 00:26:40.490 if I'm trying to request a cat from the internet, 00:26:40.490 --> 00:26:43.490 might literally be a message like this, get me, 00:26:43.490 --> 00:26:49.451 for instance slash cat.jpg for JPEG. 00:26:49.451 --> 00:26:51.200 And maybe some additional text after that, 00:26:51.200 --> 00:26:54.020 maybe some additional text below that, but at the end of the day inside 00:26:54.020 --> 00:26:57.186 the virtual envelope, if I am on the internet and I'm going on Google Images 00:26:57.186 --> 00:27:00.260 and I want to find a picture of a cat, inside of my envelope, 00:27:00.260 --> 00:27:05.240 if I am a web browser speaking HTTP is going to literally be a textual message 00:27:05.240 --> 00:27:11.210 that says get/cat.jpeg, if I know that's where the image is on some server. 00:27:11.210 --> 00:27:13.670 The response is going to be what was just inside of those 00:27:13.670 --> 00:27:15.859 four envelopes back from the server to me, 00:27:15.859 --> 00:27:18.650 chopped up maybe into multiple pieces but in a way where I can then 00:27:18.650 --> 00:27:21.108 realize, oh, wait a minute, you sent me only three or four. 00:27:21.108 --> 00:27:22.460 Please send me the fourth one. 00:27:22.460 --> 00:27:25.210 So it works in both ways, whether it's me sending a cat to someone 00:27:25.210 --> 00:27:27.080 or receiving a cat from someone. 00:27:27.080 --> 00:27:31.850 This protocol, HTTP, governs how the messages are formatted 00:27:31.850 --> 00:27:35.990 and what language, so to speak, is spoken between web browser and server. 00:27:35.990 --> 00:27:39.830 So indeed, HTTP is entirely about having a web 00:27:39.830 --> 00:27:42.650 browser communicate with a server. 00:27:42.650 --> 00:27:44.427 And we can see this in action. 00:27:44.427 --> 00:27:47.260 I'm going to go ahead and pull up a so-called terminal window again, 00:27:47.260 --> 00:27:49.580 this textual command prompt on my computer. 00:27:49.580 --> 00:27:52.257 And I'm going to pretend to be a browser. 00:27:52.257 --> 00:27:54.590 So I'm not going to just trace the route between point A 00:27:54.590 --> 00:27:57.890 and point B. I'm actually going to request a web 00:27:57.890 --> 00:28:00.980 page as though I am Chrome or Edge or Firefox or Safari 00:28:00.980 --> 00:28:03.020 or whatever your favorite browser is. 00:28:03.020 --> 00:28:05.840 But of course, as before, all I know is that I 00:28:05.840 --> 00:28:10.030 want to visit my favorite web site, Facebook.com, for instance. 00:28:10.030 --> 00:28:12.260 But I don't know its IP address necessarily, 00:28:12.260 --> 00:28:14.100 so let's go through that step. 00:28:14.100 --> 00:28:15.650 How do I look up its IP address? 00:28:15.650 --> 00:28:19.520 Well, my Mac already has an IP address because of DHCP. 00:28:19.520 --> 00:28:20.660 I'm already powered up. 00:28:20.660 --> 00:28:22.820 I'm already connected to the Wi-Fi here on campus, 00:28:22.820 --> 00:28:25.490 and so I already have my own IP address, and I also 00:28:25.490 --> 00:28:27.290 have the IP address of a DNS server. 00:28:27.290 --> 00:28:29.040 So my Mac just knows that. 00:28:29.040 --> 00:28:34.501 But I can use that capability now to look up the IP address 00:28:34.501 --> 00:28:37.000 for the name, Facebook, and I'm going to do that as follows, 00:28:37.000 --> 00:28:41.000 nslookup, for name server lookup. 00:28:41.000 --> 00:28:46.820 And I'm going to go ahead and type in www.facebook.com, enter. 00:28:46.820 --> 00:28:49.640 And interestingly, we get back this somewhat cryptic response 00:28:49.640 --> 00:28:50.990 but let's make some sense of it. 00:28:50.990 --> 00:28:54.915 So it looks like the server that this response came back from 10.0.0.2, 00:28:54.915 --> 00:28:58.040 which happens to be a private IP address here on campus that you might have 00:28:58.040 --> 00:29:00.780 in your own company or university or even home network, 00:29:00.780 --> 00:29:03.980 Then a non-authoritative answer is this, www.facebook.com, 00:29:03.980 --> 00:29:09.980 whose canonical name is, curiously, star-mini.c10r.facebook.com. 00:29:09.980 --> 00:29:12.200 Well, it turns out that companies like Facebook 00:29:12.200 --> 00:29:14.900 absolutely have many, many, many different web servers, 00:29:14.900 --> 00:29:17.270 and they might not necessarily have just one IP address. 00:29:17.270 --> 00:29:19.081 But we might just be seeing one IP address 00:29:19.081 --> 00:29:21.830 depending on where I am in the world and depending on how Facebook 00:29:21.830 --> 00:29:23.930 has configured its infrastructure. 00:29:23.930 --> 00:29:28.070 The takeaway, then, is that apparently so far as my Mac is concerned, 00:29:28.070 --> 00:29:33.020 www.facebook.com is an alias for or a synonym for this 00:29:33.020 --> 00:29:36.950 longer less well marketed domain name here. 00:29:36.950 --> 00:29:40.340 But what we really care about, if I'm about to pretend to be a browser, 00:29:40.340 --> 00:29:41.510 is this IP address. 00:29:41.510 --> 00:29:45.890 Facebook's IP address is apparently 31.13.65.36. 00:29:45.890 --> 00:29:47.450 And I can see this, in fact. 00:29:47.450 --> 00:29:52.560 Let me go into Google Chrome, or any browser for that matter, 00:29:52.560 --> 00:30:00.640 and go to http://31.13.65.36, enter. 00:30:00.640 --> 00:30:02.880 And voila, I made it to Facebook. 00:30:02.880 --> 00:30:04.980 Now of course no one in their right mind is 00:30:04.980 --> 00:30:08.866 going to advertise the IP address as 31.13.65.36. 00:30:08.866 --> 00:30:09.990 No one would remember that. 00:30:09.990 --> 00:30:13.590 We're not in the age of phone numbers on the side of billboards anymore. 00:30:13.590 --> 00:30:18.000 Now we have Domain Name System and DNS which does this conversion for us. 00:30:18.000 --> 00:30:21.690 But now that I know that IP address, I can use this information 00:30:21.690 --> 00:30:24.630 and pretend to be a browser and not just see the response in Chrome 00:30:24.630 --> 00:30:27.324 as we just did, but I can see it in my textual window 00:30:27.324 --> 00:30:28.740 so I can look inside the envelope. 00:30:28.740 --> 00:30:31.500 Indeed, this terminal window is going to let me pretend to-- 00:30:31.500 --> 00:30:35.380 well, actually send a message as though I'm a browser pretending to be one. 00:30:35.380 --> 00:30:39.150 But it's going to let me see inside of the response that comes back. 00:30:39.150 --> 00:30:40.380 Here's what I'm going to do. 00:30:40.380 --> 00:30:45.030 I'm going to go ahead and type in cURL dash I, 00:30:45.030 --> 00:30:49.710 and I'm going to go ahead and type http:// and then that IP address 00:30:49.710 --> 00:30:51.630 and I'm going to hit enter. 00:30:51.630 --> 00:30:55.530 And notice, uh-oh, Facebook has moved permanently. 00:30:55.530 --> 00:30:56.760 But this is a good thing. 00:30:56.760 --> 00:30:58.680 To where has Facebook moved? 00:30:58.680 --> 00:31:01.230 Well, apparently we've gone back a response, 00:31:01.230 --> 00:31:12.370 via version 1.1 of of HTTP that Facebook, per this status code, so 00:31:12.370 --> 00:31:16.880 to speak, has moved permanently. 00:31:16.880 --> 00:31:20.870 Has moved permanently, which sounds scary, but where has it moved to? 00:31:20.870 --> 00:31:24.620 Oh, they don't want people visiting their IP address, even though it works. 00:31:24.620 --> 00:31:28.520 They want to redirect people, so to speak, to their domain name. 00:31:28.520 --> 00:31:32.180 So we seem to be kind of in a cyclical situation here where, wait a minute, 00:31:32.180 --> 00:31:34.860 I thought I had to convert my domain name to an IP address. 00:31:34.860 --> 00:31:37.970 And indeed, I do, but it turns out cURL is pretending 00:31:37.970 --> 00:31:40.460 to be a text-based browser here, effectively, 00:31:40.460 --> 00:31:43.950 and it is already going to do this DNS look up for me so this is OK. 00:31:43.950 --> 00:31:52.940 I'm going to go ahead now and do cURL dash I, http://www.facebook.com, enter. 00:31:52.940 --> 00:31:53.630 Oh, my god. 00:31:53.630 --> 00:31:55.930 Facebook moved again. 00:31:55.930 --> 00:31:57.740 But where did they move this time? 00:31:57.740 --> 00:32:00.010 Well, it seems that Facebook would prefer 00:32:00.010 --> 00:32:04.030 that we visit https://www.facebook.com, which 00:32:04.030 --> 00:32:05.800 is the secure, the encrypted version. 00:32:05.800 --> 00:32:07.120 OK, I can oblige. 00:32:07.120 --> 00:32:08.440 Let's go ahead and do that. 00:32:08.440 --> 00:32:15.830 cURL dash I of the HTTPS version, which I've just pasted in, enter, and voila. 00:32:15.830 --> 00:32:21.620 Now, this looks overwhelming, but what's really important is this message here. 00:32:21.620 --> 00:32:23.867 It turns out everything is OK. 00:32:23.867 --> 00:32:25.700 And indeed, what's come back from the server 00:32:25.700 --> 00:32:29.240 is a virtual envelope, inside of which is this message here saying, 00:32:29.240 --> 00:32:30.410 hey, no big deal. 00:32:30.410 --> 00:32:31.340 Everything is OK. 00:32:31.340 --> 00:32:34.080 And you never see this number when you visit web pages, 00:32:34.080 --> 00:32:36.830 unless you're a software developer and you know what tools to use. 00:32:36.830 --> 00:32:40.340 Instead, some of us out there, some of us normal humans 00:32:40.340 --> 00:32:43.520 occasionally see a different number, maybe the one number 00:32:43.520 --> 00:32:45.180 you associate with the web. 00:32:45.180 --> 00:32:46.650 Let me simulate it as follows. 00:32:46.650 --> 00:32:49.980 Let me go ahead and request this completely bogus page. 00:32:49.980 --> 00:32:53.690 Hopefully that's not actually someone's user name and hit enter. 00:32:53.690 --> 00:32:55.880 Scroll back up a bit. 00:32:55.880 --> 00:32:57.920 What do you notice this time? 00:32:57.920 --> 00:33:01.610 If you've ever wondered what 404 means, it 00:33:01.610 --> 00:33:06.050 is the numeric code inside of a virtual envelope coming back from a server 00:33:06.050 --> 00:33:08.690 when you have requested some nonsensical URL because 00:33:08.690 --> 00:33:10.730 of a typographical error or just nonsense 00:33:10.730 --> 00:33:14.865 that I typed that's now having the server tell you, uh-uh, not found, 404. 00:33:14.865 --> 00:33:16.490 So this is just a special numeric code. 00:33:16.490 --> 00:33:18.290 And this is common in programming to have 00:33:18.290 --> 00:33:21.200 numbers correspond to different types of things that can go wrong 00:33:21.200 --> 00:33:25.190 or, better yet, that can go well, as in the case of 200 OK. 00:33:25.190 --> 00:33:27.590 Now, all of this stuff is called HTTP headers. 00:33:27.590 --> 00:33:29.640 So I was oversimplifying earlier when I said 00:33:29.640 --> 00:33:33.800 HTTP is just this handshake of sorts between servers where you say, 00:33:33.800 --> 00:33:38.060 get me a cat picture and then you get back the response as per those four 00:33:38.060 --> 00:33:38.690 envelopes. 00:33:38.690 --> 00:33:40.170 There's more headers. 00:33:40.170 --> 00:33:43.970 There's more key value pairs, words with colons, words with colons, 00:33:43.970 --> 00:33:46.730 words with colons, and then values to the right of those. 00:33:46.730 --> 00:33:50.270 And that is just additional metadata, more information from the server that 00:33:50.270 --> 00:33:52.460 tells you a little something about it. 00:33:52.460 --> 00:33:57.260 But if I instead run that same command one final time, 00:33:57.260 --> 00:34:04.160 this time doing cURL and then specifying not dash I but just the URL itself 00:34:04.160 --> 00:34:08.900 and hit enter, this craziness comes back. 00:34:08.900 --> 00:34:11.840 And this looks like a whole lot of programming language in something 00:34:11.840 --> 00:34:15.199 called JavaScript or big JSON object. 00:34:15.199 --> 00:34:17.790 And my god, look how much data came back from the server. 00:34:17.790 --> 00:34:20.449 But notice, I'm starting to see some structure. 00:34:20.449 --> 00:34:23.670 Open bracket div and the word label here. 00:34:23.670 --> 00:34:25.969 And if I go up here, input here. 00:34:25.969 --> 00:34:30.156 And indeed, what you are seeing is a language called HTML. 00:34:30.156 --> 00:34:32.989 Inside of the virtual envelope, if you're requesting not a cat image 00:34:32.989 --> 00:34:36.980 but a web page that has your news feed or your inbox from Gmail or your search 00:34:36.980 --> 00:34:40.400 results from Google is a language called HTML. 00:34:40.400 --> 00:34:42.540 And HTML's not a programming language. 00:34:42.540 --> 00:34:45.020 And indeed, it's not as cryptic looking as this. 00:34:45.020 --> 00:34:46.679 Google is being very-- 00:34:46.679 --> 00:34:48.560 or, Facebook is being very efficient when 00:34:48.560 --> 00:34:50.560 it comes to showing me this information and just 00:34:50.560 --> 00:34:52.550 getting rid of as much formatting as they 00:34:52.550 --> 00:34:57.240 can to save space, to save on internet bandwidth or transmission thereof. 00:34:57.240 --> 00:34:59.990 But it's a language that comes back in this virtual envelope 00:34:59.990 --> 00:35:01.670 that a browser knows how to display. 00:35:01.670 --> 00:35:03.545 It's a markup language in the sense that it's 00:35:03.545 --> 00:35:07.010 going to tell the browser what to show on the screen, where to show the cat, 00:35:07.010 --> 00:35:10.970 where to put words, whether to make those words big or bold or italics 00:35:10.970 --> 00:35:12.960 or centered or any number of other things. 00:35:12.960 --> 00:35:18.845 And indeed, what you are seeing is this. 00:35:18.845 --> 00:35:23.480 This is www.facebook.com graphically, as we see it in the browser. 00:35:23.480 --> 00:35:28.460 Underneath the hood is that black and white seemingly nonsensical Greek, 00:35:28.460 --> 00:35:31.610 if you will, that at first glance, there's no way most of us 00:35:31.610 --> 00:35:32.810 would understand it. 00:35:32.810 --> 00:35:34.940 But that's because we're looking at it here. 00:35:34.940 --> 00:35:37.430 We need to dive in a little deeper, take a look 00:35:37.430 --> 00:35:40.190 at what HTML is, how it's actually structured, 00:35:40.190 --> 00:35:44.270 make the simplest of web pages, a hello world of web pages, if you will. 00:35:44.270 --> 00:35:46.310 And then can we realize and build back up 00:35:46.310 --> 00:35:49.700 to this point exactly what composes pages like Facebook and Gmail 00:35:49.700 --> 00:35:51.380 and Google and Bing and others. 00:35:51.380 --> 00:35:53.588 Because at that point, we'll have understood not only 00:35:53.588 --> 00:35:56.690 how the internet works, but how you can use it 00:35:56.690 --> 00:36:00.920 as a delivery vehicle for your ideas, for your programs, for your products, 00:36:00.920 --> 00:36:06.710 for your companies and more and actually deliver information and deliver cats 00:36:06.710 --> 00:36:11.350 and much more to your users on this internet.