1 00:00:00,000 --> 00:00:02,910 [MUSIC PLAYING] 2 00:00:02,910 --> 00:00:09,710 3 00:00:09,710 --> 00:00:12,530 DAVID MALAN: So odds are you're on the internet these days, 4 00:00:12,530 --> 00:00:14,300 but what does that actually mean? 5 00:00:14,300 --> 00:00:18,410 And indeed, this internet that we use very often these days for messaging, 6 00:00:18,410 --> 00:00:21,440 for email, for browsing the web and other services still, 7 00:00:21,440 --> 00:00:24,500 there's a whole infrastructure that underlies it that is increasingly 8 00:00:24,500 --> 00:00:28,130 powering new ideas, new start ups, new companies, new businesses 9 00:00:28,130 --> 00:00:31,040 as well as new forms of communication among humans. 10 00:00:31,040 --> 00:00:33,890 And yet, like most every topic we've explored, 11 00:00:33,890 --> 00:00:36,017 you'll realize that while it's very complex, 12 00:00:36,017 --> 00:00:37,850 perhaps, up here, or certainly seems complex 13 00:00:37,850 --> 00:00:40,580 up here, if we begin with some of the fundamentals and then layer 14 00:00:40,580 --> 00:00:43,670 and layer and layer on top of those, do we pretty quickly get 15 00:00:43,670 --> 00:00:46,970 back to today's technology but with a much better understanding of what's 16 00:00:46,970 --> 00:00:49,890 going on from the ground up. 17 00:00:49,890 --> 00:00:51,890 So here is a bit of alphabet soup. 18 00:00:51,890 --> 00:00:55,730 Odds are you might have seen one or more of these acronyms 19 00:00:55,730 --> 00:01:00,690 to date, IP, DHCP, DNS, TCP, UDP, ICMP, and so many more. 20 00:01:00,690 --> 00:01:03,410 These are all examples of something called protocols, 21 00:01:03,410 --> 00:01:06,560 where protocols are kind of like languages 22 00:01:06,560 --> 00:01:08,250 that computers speak with one another. 23 00:01:08,250 --> 00:01:10,250 They're not programming languages so they're not 24 00:01:10,250 --> 00:01:14,540 used by humans to make computers do things or follow instructions per se. 25 00:01:14,540 --> 00:01:16,490 A protocol is really a set of conventions 26 00:01:16,490 --> 00:01:18,410 that two computers or two computer programs 27 00:01:18,410 --> 00:01:21,106 might use when intercommunicating. 28 00:01:21,106 --> 00:01:23,480 And so what's an example of a protocol in the real world? 29 00:01:23,480 --> 00:01:26,440 Well, we humans have some silly protocols, one of which 30 00:01:26,440 --> 00:01:29,490 here is, culturally, when you meet someone to extend your hand 31 00:01:29,490 --> 00:01:31,490 and then he or she presumably extends their hand 32 00:01:31,490 --> 00:01:33,680 and you do this for who knows what reason. 33 00:01:33,680 --> 00:01:37,190 And now you've sort of completed that social transaction. 34 00:01:37,190 --> 00:01:39,830 But it's a protocol in the sense that when I extend my hand, 35 00:01:39,830 --> 00:01:42,950 most any polite other person knows that they're probably 36 00:01:42,950 --> 00:01:45,830 supposed to extend their hand as well, embrace for a moment, 37 00:01:45,830 --> 00:01:47,150 and then complete. 38 00:01:47,150 --> 00:01:50,250 And the protocol says, too, you probably do this for terribly long. 39 00:01:50,250 --> 00:01:53,420 And so there's these rules of thumb or actual rules 40 00:01:53,420 --> 00:01:55,790 that you follow when implementing protocols. 41 00:01:55,790 --> 00:01:59,510 And so computers, great as they are following rules, 42 00:01:59,510 --> 00:02:02,340 very often use protocols when they intercommunicate, 43 00:02:02,340 --> 00:02:04,980 in order to get data from one place to another. 44 00:02:04,980 --> 00:02:06,950 So let's tell exactly that story. 45 00:02:06,950 --> 00:02:10,259 If you're on the internet, right now, on the internet, 46 00:02:10,259 --> 00:02:12,050 what does that actually mean and how can it 47 00:02:12,050 --> 00:02:14,690 help us solve problems, ultimately, having access 48 00:02:14,690 --> 00:02:18,230 to this inter-networked infrastructure? 49 00:02:18,230 --> 00:02:21,830 Well, let's consider what happens when I first visit my favorite web 50 00:02:21,830 --> 00:02:22,830 page, for instance. 51 00:02:22,830 --> 00:02:27,810 If I go ahead and visit something like Facebook.com, I go ahead and log in 52 00:02:27,810 --> 00:02:29,810 and I'm immediately presented with my news feed. 53 00:02:29,810 --> 00:02:32,476 Or maybe your favorite website is Gmail or your favorite website 54 00:02:32,476 --> 00:02:36,320 is Bing or maybe your favorite website is any number of other places 55 00:02:36,320 --> 00:02:40,970 you might go on the web, all of which take in as input a request from you 56 00:02:40,970 --> 00:02:44,870 and produce, ultimately, output, the screen that you ultimately see. 57 00:02:44,870 --> 00:02:48,290 But how does that data get from one location to another? 58 00:02:48,290 --> 00:02:50,240 Let's begin to draw a picture, perhaps. 59 00:02:50,240 --> 00:02:53,090 And this picture might be representative of your own home network 60 00:02:53,090 --> 00:02:56,070 or maybe your campus network or maybe your office network. 61 00:02:56,070 --> 00:02:58,400 But generally speaking, you are on the internet 62 00:02:58,400 --> 00:03:01,250 maybe with your phone or your laptop or your desktop device, 63 00:03:01,250 --> 00:03:04,610 and we'll just depict that is this sort of abstract laptop here. 64 00:03:04,610 --> 00:03:07,880 So that laptop somehow wants to communicate with a web server 65 00:03:07,880 --> 00:03:09,950 elsewhere, Facebook, Google, Bing, whatever. 66 00:03:09,950 --> 00:03:13,080 And we're just going to present that as way over here in the picture 67 00:03:13,080 --> 00:03:15,680 in a really big corporate office building, perhaps. 68 00:03:15,680 --> 00:03:17,750 And inside of that building are the servers 69 00:03:17,750 --> 00:03:21,230 that compose that particular web site. 70 00:03:21,230 --> 00:03:25,337 But how do I get data from that server, which, if it's Google or somewhere else 71 00:03:25,337 --> 00:03:27,920 might be all the way in California or halfway across the world 72 00:03:27,920 --> 00:03:29,240 and back to my laptop? 73 00:03:29,240 --> 00:03:32,480 Well, somehow I have to be able to send messages to it 74 00:03:32,480 --> 00:03:34,160 and receive messages from it. 75 00:03:34,160 --> 00:03:38,450 And of course in between me and this resulting website 76 00:03:38,450 --> 00:03:41,060 is what we'll generally call the internet. 77 00:03:41,060 --> 00:03:43,280 It's kind of conveniently drawn as a cloud 78 00:03:43,280 --> 00:03:45,710 here, which is another semi-technical term that's 79 00:03:45,710 --> 00:03:47,120 come into vogue in recent years. 80 00:03:47,120 --> 00:03:50,420 And the cloud really just refers to internet services these days. 81 00:03:50,420 --> 00:03:52,020 It's not a technical term unto itself. 82 00:03:52,020 --> 00:03:55,510 It's just a sexier term than saying, my business is on the internet. 83 00:03:55,510 --> 00:03:58,010 Oversimplification, and we'll come back to that before long. 84 00:03:58,010 --> 00:04:01,130 But you can assume here that the internet is somehow 85 00:04:01,130 --> 00:04:03,050 this delivery mechanism. 86 00:04:03,050 --> 00:04:06,420 It somehow gets data from point A to point B and back. 87 00:04:06,420 --> 00:04:08,420 But how does that work? 88 00:04:08,420 --> 00:04:10,910 If my data's coming in as input and it's reaching, 89 00:04:10,910 --> 00:04:12,920 eventually, its destination and then a response 90 00:04:12,920 --> 00:04:16,730 is coming back in this direction, what's actually going on underneath the hood 91 00:04:16,730 --> 00:04:20,329 there, especially since, in the story at hand, all I've typed 92 00:04:20,329 --> 00:04:24,500 is something like Facebook.com Gmail.com or the like? 93 00:04:24,500 --> 00:04:28,460 Well, it turns out that your computer these days, when you first turn it on 94 00:04:28,460 --> 00:04:32,390 and you connect to the Wi-Fi in a room or you connect with an ethernet cable 95 00:04:32,390 --> 00:04:35,600 to the wired network, your computer receives some information 96 00:04:35,600 --> 00:04:36,590 automatically. 97 00:04:36,590 --> 00:04:42,650 Your computer speaks a protocol called DHCP, typically, Dynamic Host 98 00:04:42,650 --> 00:04:43,850 Configuration Protocol. 99 00:04:43,850 --> 00:04:46,820 But in most of these cases, the acronym isn't really what's important, 100 00:04:46,820 --> 00:04:50,360 certainly, it's what the protocol itself does. 101 00:04:50,360 --> 00:04:53,900 And in this case, this Dynamic Host Configuration Protocol 102 00:04:53,900 --> 00:04:58,460 dynamically configures hosts via a protocol, if you will. 103 00:04:58,460 --> 00:04:59,750 So what does this mean? 104 00:04:59,750 --> 00:05:02,885 Essentially DHCP says this, when you turn on your computer 105 00:05:02,885 --> 00:05:04,760 or you take out your phone for the first time 106 00:05:04,760 --> 00:05:08,550 and you're connected on Wi-Fi or to a wired network, it says, hello, world. 107 00:05:08,550 --> 00:05:09,400 I am alive. 108 00:05:09,400 --> 00:05:13,000 I would like to be given an address that I can communicate 109 00:05:13,000 --> 00:05:14,920 with other computers on the internet. 110 00:05:14,920 --> 00:05:18,280 It's not quite that verbose, perhaps, but it is a question. 111 00:05:18,280 --> 00:05:21,490 Hey, computers around me, please give me an address. 112 00:05:21,490 --> 00:05:25,630 And what it gives you is what's called an IP address, Internet Protocol. 113 00:05:25,630 --> 00:05:28,900 So just as in the real world where physical buildings have historically 114 00:05:28,900 --> 00:05:31,690 been uniquely addressed with postal addresses 115 00:05:31,690 --> 00:05:35,980 like Harvard's computer science building is at 33 Oxford Street 116 00:05:35,980 --> 00:05:38,980 Cambridge, Massachusetts, USA. 117 00:05:38,980 --> 00:05:43,060 02138 is the more precise zip code as well. 118 00:05:43,060 --> 00:05:45,580 That uniquely identifies that building in the world. 119 00:05:45,580 --> 00:05:47,851 So does my computer need an address, and it's not 120 00:05:47,851 --> 00:05:50,100 going to be some free form address like that in words. 121 00:05:50,100 --> 00:05:51,933 It's actually going to be a numeric address. 122 00:05:51,933 --> 00:05:57,265 Specifically, I'm going to get an IP address of the form number dot number 123 00:05:57,265 --> 00:06:02,500 dot number dot number, so four numbers separated by dots. 124 00:06:02,500 --> 00:06:05,680 Each of those four numbers happens to be a byte long 125 00:06:05,680 --> 00:06:10,780 or eight bits, so each of these numbers, therefore is between 0 and 255, 126 00:06:10,780 --> 00:06:15,280 and so this means, long story short, that the total address is 32 bits-- 127 00:06:15,280 --> 00:06:19,669 plus 8 plus 8 plus 8-- and that means there's four billion possible addresses 128 00:06:19,669 --> 00:06:20,210 in the world. 129 00:06:20,210 --> 00:06:23,140 And that's great because people have got a lot of computers and a lot of laptops 130 00:06:23,140 --> 00:06:24,760 and a lot of desktops and servers these days. 131 00:06:24,760 --> 00:06:26,740 But it turns out we're actually running out 132 00:06:26,740 --> 00:06:28,450 because we have so many such devices. 133 00:06:28,450 --> 00:06:31,270 So there's a newer version of IP that's increasingly 134 00:06:31,270 --> 00:06:33,850 being used called IP version 6. 135 00:06:33,850 --> 00:06:37,020 We're talking here about IP version 4, since it's so omnipresent. 136 00:06:37,020 --> 00:06:41,440 And IP version 6, just so you know, uses 128 bits for its addresses, 137 00:06:41,440 --> 00:06:45,550 way more than 32, so we'll be good to go for some time. 138 00:06:45,550 --> 00:06:48,450 But DHCP gives me this address, an IP address of the form something 139 00:06:48,450 --> 00:06:50,200 dot something dot something dot something. 140 00:06:50,200 --> 00:06:53,860 And the purpose of this address is to help my data get from point A 141 00:06:53,860 --> 00:06:56,800 to point B. And indeed, anytime my computer sends 142 00:06:56,800 --> 00:06:59,470 a request on the internet like, Facebook, 143 00:06:59,470 --> 00:07:02,950 please show me my news feed, or Gmail, please show me my inbox, 144 00:07:02,950 --> 00:07:05,470 my computer has to use that IP address. 145 00:07:05,470 --> 00:07:08,180 So much like if sending a letter in the real world, 146 00:07:08,180 --> 00:07:10,390 you might have an otherwise blank envelope 147 00:07:10,390 --> 00:07:13,270 and you might want to send a message to someone else in the world, 148 00:07:13,270 --> 00:07:15,160 you might write their physical address. 149 00:07:15,160 --> 00:07:21,190 But in the computer world, we might write something like 1.2.3.4 150 00:07:21,190 --> 00:07:25,180 in the to field, assuming that this is the IP address to which we 151 00:07:25,180 --> 00:07:26,690 want to send this data. 152 00:07:26,690 --> 00:07:31,670 Meanwhile, my from address might be 5.6.7.8, 153 00:07:31,670 --> 00:07:33,610 so I'll write it in the top left hand corner 154 00:07:33,610 --> 00:07:36,970 by convention, whereby that indicates to the whole internet this 155 00:07:36,970 --> 00:07:39,500 is where this request came from. 156 00:07:39,500 --> 00:07:43,840 Now, I know my origin address, the source address here at top left 157 00:07:43,840 --> 00:07:45,610 because DHCP told me. 158 00:07:45,610 --> 00:07:48,310 How do I know one, two, three, four? 159 00:07:48,310 --> 00:07:52,690 How do I know the IP address of Facebook.com or Gmail.com, right? 160 00:07:52,690 --> 00:07:54,940 We don't live in the world of 800 numbers 161 00:07:54,940 --> 00:07:58,120 anymore, where you dial 1-800 something, something, something, something, 162 00:07:58,120 --> 00:08:00,900 something and you have to advertise your phone number, per se. 163 00:08:00,900 --> 00:08:03,819 We don't necessarily live only in the world of 1-800-COLLECT 164 00:08:03,819 --> 00:08:06,610 any more where we had these mnemonics where you had letters mapping 165 00:08:06,610 --> 00:08:08,290 to numbers just to help remember it. 166 00:08:08,290 --> 00:08:10,990 We went full in on this idea of mnemonics 167 00:08:10,990 --> 00:08:19,210 such that now we have Facebook.com and Gmail.com and no numbers 168 00:08:19,210 --> 00:08:21,070 whatsoever for us humans to remember. 169 00:08:21,070 --> 00:08:24,820 So thankfully, it turns out there's another system in this world, 170 00:08:24,820 --> 00:08:29,150 another acronym, if you will, a new one now, called DNS, Domain Name System. 171 00:08:29,150 --> 00:08:31,600 So there are also in the world, not just DHCP servers that 172 00:08:31,600 --> 00:08:34,330 have people IP addresses from their local network, 173 00:08:34,330 --> 00:08:37,030 there's also DNS servers whose purpose in life 174 00:08:37,030 --> 00:08:40,299 is to convert domain names to IP addresses and vice 175 00:08:40,299 --> 00:08:42,830 versa and a few other features as well. 176 00:08:42,830 --> 00:08:44,150 Now, what does that mean? 177 00:08:44,150 --> 00:08:46,790 That means that when my Mac or my PC sees 178 00:08:46,790 --> 00:08:48,775 little old me, the human, typing Facebook.com 179 00:08:48,775 --> 00:08:55,570 or Gmail.com, my laptop contacts a nearby DNS server and says, 180 00:08:55,570 --> 00:08:59,105 hey, my human has asked me for Facebook.com. 181 00:08:59,105 --> 00:09:00,380 What is its IP address? 182 00:09:00,380 --> 00:09:03,400 And DNS server's purpose in life is to answer that question and say, 183 00:09:03,400 --> 00:09:07,030 oh, Facebook.com, it's 1.2.3.4. 184 00:09:07,030 --> 00:09:08,860 Use that address instead. 185 00:09:08,860 --> 00:09:11,110 Now, thankfully, my computer can now write 186 00:09:11,110 --> 00:09:14,140 that number on its virtual envelope, so to speak, and then 187 00:09:14,140 --> 00:09:16,330 pass that envelope out to the internet. 188 00:09:16,330 --> 00:09:20,680 And because of these numeric addresses, it will be properly, hopefully, 189 00:09:20,680 --> 00:09:24,160 routed across the internet to its destination. 190 00:09:24,160 --> 00:09:28,090 Because it turns out inside of the internet here, 191 00:09:28,090 --> 00:09:31,330 interconnecting everything in between point 192 00:09:31,330 --> 00:09:35,860 A and B are things called routers or gateways. 193 00:09:35,860 --> 00:09:38,170 And I could draw this picture in any number of ways. 194 00:09:38,170 --> 00:09:41,596 But the point is that it's just so darn interconnected. 195 00:09:41,596 --> 00:09:43,720 And indeed, there might be even more pathways still 196 00:09:43,720 --> 00:09:45,040 or maybe even fewer pathways. 197 00:09:45,040 --> 00:09:47,290 Indeed, on the internet, there's often multiple ways 198 00:09:47,290 --> 00:09:50,590 for data to get from one point to another, some shorter, some longer. 199 00:09:50,590 --> 00:09:52,750 But there's this resilience, this redundancy, 200 00:09:52,750 --> 00:09:55,390 and this was a feature back in the day, especially in so far 201 00:09:55,390 --> 00:09:58,240 as the internet had militaristic origins. 202 00:09:58,240 --> 00:10:01,900 It was meant to be redone into as to withstand failures of one or more 203 00:10:01,900 --> 00:10:04,540 of these nodes, these dots in the picture. 204 00:10:04,540 --> 00:10:07,450 Now, each of these dots is just a server, really, 205 00:10:07,450 --> 00:10:10,300 a special server called router or gateway, whose purpose in life 206 00:10:10,300 --> 00:10:12,550 is to do exactly that, to route data. 207 00:10:12,550 --> 00:10:15,490 Upon receiving a virtual envelope like that one, 208 00:10:15,490 --> 00:10:19,930 it looks at the to address realizes, oh, this is destined for 1.2.3.4. 209 00:10:19,930 --> 00:10:21,850 I know that that address is over this way. 210 00:10:21,850 --> 00:10:23,900 Meanwhile, if it gets another envelope from someone else, 211 00:10:23,900 --> 00:10:25,480 it might say, oh, this is some other address. 212 00:10:25,480 --> 00:10:26,719 It's going to go this way. 213 00:10:26,719 --> 00:10:28,510 And so routers have multiple cables or they 214 00:10:28,510 --> 00:10:30,730 have multiple virtual network connections elsewhere 215 00:10:30,730 --> 00:10:33,670 or wireless connections, any number of possible connections 216 00:10:33,670 --> 00:10:35,210 might they have to other routers. 217 00:10:35,210 --> 00:10:39,070 And so it can route it to its next hop, so to speak. 218 00:10:39,070 --> 00:10:41,890 And generally on the internet, within 30 hops, 219 00:10:41,890 --> 00:10:45,850 within 30 transmissions from router, router, router 220 00:10:45,850 --> 00:10:48,190 will your data get from one point to another. 221 00:10:48,190 --> 00:10:50,340 And it might not follow the same path each time 222 00:10:50,340 --> 00:10:52,690 but it will traverse this so-called internet. 223 00:10:52,690 --> 00:10:54,760 And so that's kind of what the internet is. 224 00:10:54,760 --> 00:10:57,490 It's this collection of routers and it's this collection 225 00:10:57,490 --> 00:11:01,450 of networks, a network of networks that is incredibly 226 00:11:01,450 --> 00:11:03,940 interconnected in different ways. 227 00:11:03,940 --> 00:11:05,840 So DHCP gives me an IP address. 228 00:11:05,840 --> 00:11:07,220 So I have a unique IP address. 229 00:11:07,220 --> 00:11:10,270 DHCP, it turns out, also tells me what the IP address 230 00:11:10,270 --> 00:11:14,950 is of my local DNS server so I know whom to ask to convert domain names to IP 231 00:11:14,950 --> 00:11:15,520 addresses. 232 00:11:15,520 --> 00:11:18,640 But once I have that, I can now use a protocol 233 00:11:18,640 --> 00:11:27,000 called TCP to send my data reliably, typically, from one point to another. 234 00:11:27,000 --> 00:11:29,587 So whereas IP is responsible for a few things, 235 00:11:29,587 --> 00:11:31,670 one of its most important functions is this notion 236 00:11:31,670 --> 00:11:34,610 of addressing and standardizing how things are addressed. 237 00:11:34,610 --> 00:11:37,190 But TCP, one of its most salient features 238 00:11:37,190 --> 00:11:40,640 is to guarantee, with high probability, delivery. 239 00:11:40,640 --> 00:11:42,799 And what I mean by that is that bad stuff can 240 00:11:42,799 --> 00:11:44,340 happen in the middle of the internet. 241 00:11:44,340 --> 00:11:45,756 These routers can get really busy. 242 00:11:45,756 --> 00:11:48,020 They can get really congested and overloaded. 243 00:11:48,020 --> 00:11:49,360 And so routers might-- 244 00:11:49,360 --> 00:11:52,214 well, virtually drop packets. 245 00:11:52,214 --> 00:11:54,380 They might receive so many packets at once they just 246 00:11:54,380 --> 00:11:57,290 can't, like a human, deal with it all at one time because they 247 00:11:57,290 --> 00:12:01,040 have a finite amount of memory or RAM or disk space and so they drop them, 248 00:12:01,040 --> 00:12:01,610 so to speak. 249 00:12:01,610 --> 00:12:04,750 They just delete them and they expect the senders to resend them. 250 00:12:04,750 --> 00:12:08,570 TCP is a protocol, another agreement between computers, 251 00:12:08,570 --> 00:12:12,740 that if the receiving computer realizes, hmm, I got some of your packets 252 00:12:12,740 --> 00:12:16,940 but not all of them, TCP mandates, much like our human handshake, 253 00:12:16,940 --> 00:12:18,440 that something next should happen. 254 00:12:18,440 --> 00:12:23,600 TCP says, my laptop should retransmit that virtual envelope. 255 00:12:23,600 --> 00:12:26,510 But TCP allows us to do something more than guarantee 256 00:12:26,510 --> 00:12:28,400 with high probability delivery of data. 257 00:12:28,400 --> 00:12:32,750 It also allows us to multiplex among services, or put more simply, 258 00:12:32,750 --> 00:12:35,724 it allows a server to receive different types of data 259 00:12:35,724 --> 00:12:37,640 for different types of services, for instance, 260 00:12:37,640 --> 00:12:42,450 web services on the server, email services, chat services and the like. 261 00:12:42,450 --> 00:12:47,330 And so it turns out that on this virtual envelope that gets sent from a computer 262 00:12:47,330 --> 00:12:49,760 to a server, it's actually not sufficient 263 00:12:49,760 --> 00:12:54,110 for there to be the return address and the IP address of the destination. 264 00:12:54,110 --> 00:12:56,870 I also need to specify what type of information 265 00:12:56,870 --> 00:13:00,140 is inside this envelope, or equivalently, what kind of service 266 00:13:00,140 --> 00:13:01,430 I'm trying to contact. 267 00:13:01,430 --> 00:13:05,030 And I could do this by specifying in words what's inside this envelope. 268 00:13:05,030 --> 00:13:07,305 Maybe it's something like HTTP, the prefix 269 00:13:07,305 --> 00:13:08,930 that you're familiar with from the web. 270 00:13:08,930 --> 00:13:09,650 Maybe it's an email. 271 00:13:09,650 --> 00:13:11,233 Maybe it's a chat message or the like. 272 00:13:11,233 --> 00:13:13,730 But if it is, in fact, something like HTTP, 273 00:13:13,730 --> 00:13:17,720 turns out the convention is not to use words but to use numbers. 274 00:13:17,720 --> 00:13:21,290 And so in fact, I need to pull one other piece of information 275 00:13:21,290 --> 00:13:25,160 on this envelope, which is a so-called port number, a TCP port 276 00:13:25,160 --> 00:13:29,740 number, which is numerically printed after a colon on a virtual envelope 277 00:13:29,740 --> 00:13:30,240 this. 278 00:13:30,240 --> 00:13:32,810 And in this case I wrote 80 because 80 happens to be, 279 00:13:32,810 --> 00:13:36,410 by human convention, the number we humans agreed on some years ago, 280 00:13:36,410 --> 00:13:39,680 identifies web services on servers. 281 00:13:39,680 --> 00:13:42,380 But this means that if the server I'm sending this to, 282 00:13:42,380 --> 00:13:46,070 1.2.3.4, actually has other services on it like a chat server 283 00:13:46,070 --> 00:13:49,520 and an email server and the like, this won't get confused with an email 284 00:13:49,520 --> 00:13:52,361 that I or someone else am sending to the server or a chat message. 285 00:13:52,361 --> 00:13:54,110 The server will know upon receipt of this, 286 00:13:54,110 --> 00:13:56,450 oh, this is a request for a web page. 287 00:13:56,450 --> 00:14:00,060 Let me send this virtual envelope to the web server. 288 00:14:00,060 --> 00:14:02,690 But HTTP isn't the only such protocol. 289 00:14:02,690 --> 00:14:07,010 There are something called UDP, which is common in some circles as well. 290 00:14:07,010 --> 00:14:10,400 UDP works a little differently, in so far 291 00:14:10,400 --> 00:14:14,750 as its feature is to not guarantee delivery. 292 00:14:14,750 --> 00:14:18,480 If some data gets lost, packets get dropped, so to speak, 293 00:14:18,480 --> 00:14:22,100 for whatever reasons, malfunction, technical difficulties, 294 00:14:22,100 --> 00:14:25,250 routers are overloaded, UDP says, our protocol 295 00:14:25,250 --> 00:14:28,350 shall be not to retransmit that data. 296 00:14:28,350 --> 00:14:30,980 And that's a strange thing, because it sounds worse. 297 00:14:30,980 --> 00:14:34,700 And yet, this protocol's been around for quite some time, still used, 298 00:14:34,700 --> 00:14:38,020 quite appropriate in some contexts. 299 00:14:38,020 --> 00:14:42,120 But what context would you actually want to just forge ahead, irrespective 300 00:14:42,120 --> 00:14:44,740 of getting complete information? 301 00:14:44,740 --> 00:14:49,320 Well, go to here is something like videoconferencing or audio conferencing 302 00:14:49,320 --> 00:14:52,440 or live TV on the internet, watching a game like a football game, 303 00:14:52,440 --> 00:14:53,200 for instance. 304 00:14:53,200 --> 00:14:55,890 If you want to watch it in real time, you 305 00:14:55,890 --> 00:14:59,130 might prefer that the stream, the bits that 306 00:14:59,130 --> 00:15:02,970 are coming from the NFL or wherever to your computer don't actually buffer 307 00:15:02,970 --> 00:15:04,110 don't actually stall. 308 00:15:04,110 --> 00:15:06,840 You would rather miss a second so that at least you stay 309 00:15:06,840 --> 00:15:10,830 current in real time with that game, or video conferencing even more so. 310 00:15:10,830 --> 00:15:14,010 It'd kind of be annoying if you have a bad connection or some packets 311 00:15:14,010 --> 00:15:17,220 get dropped and you just have to wait and wait for the person's voice 312 00:15:17,220 --> 00:15:18,780 or image to be retransmitted. 313 00:15:18,780 --> 00:15:20,790 You'd rather just say, what did you say? 314 00:15:20,790 --> 00:15:22,260 Could you repeat yourself? 315 00:15:22,260 --> 00:15:23,190 Say again? 316 00:15:23,190 --> 00:15:26,050 You can just use human protocols to deal with that, too. 317 00:15:26,050 --> 00:15:30,321 So sometimes you want live streaming applications for whatever purpose 318 00:15:30,321 --> 00:15:32,070 and you want the data just to keep coming. 319 00:15:32,070 --> 00:15:34,920 As much of it as can make it through is great. 320 00:15:34,920 --> 00:15:38,380 But you don't necessarily want it to be resent. 321 00:15:38,380 --> 00:15:40,787 So data is going from one point to another, 322 00:15:40,787 --> 00:15:42,120 but how long does all this take? 323 00:15:42,120 --> 00:15:45,150 My god, this is kind of a long story just to get data there. 324 00:15:45,150 --> 00:15:46,770 Well, let's do an experiment. 325 00:15:46,770 --> 00:15:50,740 Let's go ahead and pull up a program that uses a different protocol 326 00:15:50,740 --> 00:15:51,762 altogether, ICMP. 327 00:15:51,762 --> 00:15:53,220 And there's other protocols, still. 328 00:15:53,220 --> 00:15:55,011 This one's a little more technical but it's 329 00:15:55,011 --> 00:15:57,870 wonderfully revealing in a few ways. 330 00:15:57,870 --> 00:16:00,270 I'm on my Mac here in the so-called terminal window 331 00:16:00,270 --> 00:16:02,970 that you can pull up something similar on Windows and other operating systems 332 00:16:02,970 --> 00:16:03,670 as well. 333 00:16:03,670 --> 00:16:05,461 And what I'm going to do is literally trace 334 00:16:05,461 --> 00:16:08,260 the route between my laptop here and some foreign server, 335 00:16:08,260 --> 00:16:12,120 for instance, one on the west coast of the US, Berkeley's web server. 336 00:16:12,120 --> 00:16:18,750 So let me do that, traceroute, www.berkeley.edu, enter. 337 00:16:18,750 --> 00:16:22,920 And curiously, we start to see a whole bunch of lines of output, most of them 338 00:16:22,920 --> 00:16:23,670 numerical. 339 00:16:23,670 --> 00:16:26,590 And indeed, notice that each of these is an IP address. 340 00:16:26,590 --> 00:16:28,920 But what is it an IP address of? 341 00:16:28,920 --> 00:16:32,370 Well, we have 18 of these between me and Berkeley, apparently. 342 00:16:32,370 --> 00:16:36,532 Turns out those represent routers between me and Berkeley, California. 343 00:16:36,532 --> 00:16:38,490 Each of them has an IP address and each of them 344 00:16:38,490 --> 00:16:42,420 has a measurement of how long it took my data to get from my Mac to that router. 345 00:16:42,420 --> 00:16:43,344 It's highly variable. 346 00:16:43,344 --> 00:16:45,010 Notice, it's kind of all over the place. 347 00:16:45,010 --> 00:16:46,176 In fact, this is just weird. 348 00:16:46,176 --> 00:16:48,760 This took 3,000 milliseconds or three seconds, 349 00:16:48,760 --> 00:16:51,840 so I'm guessing that that router in row eight 350 00:16:51,840 --> 00:16:55,526 was congested for some reason, some kind of network issue there temporarily, 351 00:16:55,526 --> 00:16:57,150 but then my data actually went through. 352 00:16:57,150 --> 00:16:58,250 And it's not cumulative. 353 00:16:58,250 --> 00:17:01,230 These are individual tests from my Mac to each of these routers 354 00:17:01,230 --> 00:17:02,720 iteratively, one at a time. 355 00:17:02,720 --> 00:17:04,470 And you can kind of get an aggregate sense 356 00:17:04,470 --> 00:17:07,980 of how long it takes, therefore, for data to get from the east coast 357 00:17:07,980 --> 00:17:08,730 to the west coast. 358 00:17:08,730 --> 00:17:11,438 If we look at some of the later numbers, they're kind of variable 359 00:17:11,438 --> 00:17:14,369 but they seem to be around 75 milliseconds. 360 00:17:14,369 --> 00:17:15,750 So this is kind of extraordinary. 361 00:17:15,750 --> 00:17:19,510 If you want to fly from Boston, Massachusetts to San Francisco, 362 00:17:19,510 --> 00:17:21,569 it's going to take you five, six, seven hours. 363 00:17:21,569 --> 00:17:23,940 You want to send an email or send a packet, 364 00:17:23,940 --> 00:17:26,119 it's going to take you 75 milliseconds. 365 00:17:26,119 --> 00:17:29,345 That's astonishing, how quickly the data can transmit. 366 00:17:29,345 --> 00:17:31,220 Now, notice this is not all that enlightening 367 00:17:31,220 --> 00:17:32,842 knowing these IP addresses. 368 00:17:32,842 --> 00:17:34,800 But eventually, some of them have domain names, 369 00:17:34,800 --> 00:17:37,770 just because the humans controlling those routers decided, 370 00:17:37,770 --> 00:17:41,220 we're going to give these routers actual names, domain names, 371 00:17:41,220 --> 00:17:43,650 as opposed to just having IP addresses. 372 00:17:43,650 --> 00:17:47,760 And you can often, but not always, infer from the domain names where they are. 373 00:17:47,760 --> 00:17:51,450 So I'm going to guess that at least row 11 374 00:17:51,450 --> 00:17:58,470 here, I don't know what XE7000.rtsw is, but losa.net, Los 375 00:17:58,470 --> 00:18:00,060 Angeles in California. 376 00:18:00,060 --> 00:18:03,330 I'm guessing my data kind of came into Southern California first. 377 00:18:03,330 --> 00:18:04,810 But then notice what happens next. 378 00:18:04,810 --> 00:18:07,950 A couple of nameless servers, LAX, so maybe that's the airport. 379 00:18:07,950 --> 00:18:10,260 Indeed, routers, for historical reasons, tend 380 00:18:10,260 --> 00:18:13,060 to be named after a nearby airport codes. 381 00:18:13,060 --> 00:18:15,120 I'm not sure what this next one is here but I do 382 00:18:15,120 --> 00:18:18,510 recognize Oakland and UCB, UC Berkeley. 383 00:18:18,510 --> 00:18:23,350 So I'm guessing one of the next routers is actually in Oakland or near Oakland. 384 00:18:23,350 --> 00:18:26,280 And so that's a pretty long cable or interconnection essentially 385 00:18:26,280 --> 00:18:28,110 between LA and Berkeley. 386 00:18:28,110 --> 00:18:30,750 But the result, ultimately, is that my data makes its way 387 00:18:30,750 --> 00:18:33,510 to Berkeley, this time via this path. 388 00:18:33,510 --> 00:18:35,750 If I ran it again now or in a day or a week, 389 00:18:35,750 --> 00:18:37,500 the path might be a little different based 390 00:18:37,500 --> 00:18:43,230 on congestion and interconnectivity, but the data actually gets there. 391 00:18:43,230 --> 00:18:46,940 And cutely enough, it looks like Berkeley's web server is called Cal web 392 00:18:46,940 --> 00:18:49,200 farm prod-- for production-- 393 00:18:49,200 --> 00:18:51,590 ist.berkeley.edu. 394 00:18:51,590 --> 00:18:53,520 75 milliseconds only. 395 00:18:53,520 --> 00:18:55,950 But what about this, what if we don't stop at the edge 396 00:18:55,950 --> 00:19:01,412 as we do at the edge of this continent but keep going? 397 00:19:01,412 --> 00:19:02,370 What's going to happen? 398 00:19:02,370 --> 00:19:07,410 Well, let me try to trace the route to, say, www.cnn.co.jp, 399 00:19:07,410 --> 00:19:10,170 the domain name for what I presume is going 400 00:19:10,170 --> 00:19:14,900 to be the Japanese version of CNN's web site in Japan. 401 00:19:14,900 --> 00:19:17,840 Here, too, we have a bunch of nameless servers just with IP addresses. 402 00:19:17,840 --> 00:19:19,214 Gets through them pretty quickly. 403 00:19:19,214 --> 00:19:21,020 We seem to have some lulls sometimes. 404 00:19:21,020 --> 00:19:24,140 This program won't-- sometimes the routers won't respond to these queries 405 00:19:24,140 --> 00:19:26,120 so they remain, essentially, anonymous. 406 00:19:26,120 --> 00:19:28,560 But now this is quite interesting. 407 00:19:28,560 --> 00:19:29,570 Oh, my god. 408 00:19:29,570 --> 00:19:36,470 We went from routers 12, 13, 14, 15 taking about 63 milliseconds, 409 00:19:36,470 --> 00:19:40,400 give or take, to 193 milliseconds, which isn't a blip because it 410 00:19:40,400 --> 00:19:42,650 stays around that value, 180 milliseconds, 411 00:19:42,650 --> 00:19:46,110 160 milliseconds, 177 milliseconds. 412 00:19:46,110 --> 00:19:52,460 That's a big jump of 100-some milliseconds just between routers 15 413 00:19:52,460 --> 00:19:54,575 and 16. 414 00:19:54,575 --> 00:19:56,360 Why might that be? 415 00:19:56,360 --> 00:20:00,850 What could be between routers 15 and 16? 416 00:20:00,850 --> 00:20:04,834 Well, if you know your geography, it might very we be the Pacific Ocean. 417 00:20:04,834 --> 00:20:07,000 There's quite a bit of distance, there's quite a bit 418 00:20:07,000 --> 00:20:10,660 of cabling that actually connects the west coast of the country to Japan 419 00:20:10,660 --> 00:20:14,240 and other areas in Asia and beyond, and that's what's pretty amazing. 420 00:20:14,240 --> 00:20:17,440 Not only is there interconnectivity on the internet these days via cabling 421 00:20:17,440 --> 00:20:21,010 and via Wi-Fi signals and via satellite signals, 422 00:20:21,010 --> 00:20:24,280 via microwave signals and the like, you have so many different ways for data 423 00:20:24,280 --> 00:20:25,160 to be transmitted. 424 00:20:25,160 --> 00:20:28,060 And it's absolutely astonishing and exciting, dare I say, 425 00:20:28,060 --> 00:20:30,072 just how interconnected the world now is. 426 00:20:30,072 --> 00:20:32,530 In fact, thanks to this animation online, let's take a look 427 00:20:32,530 --> 00:20:36,776 and appreciate just how extensive this network actually is. 428 00:20:36,776 --> 00:20:39,752 [MUSIC PLAYING] 429 00:20:39,752 --> 00:21:37,330 430 00:21:37,330 --> 00:21:40,607 All right, so let's actually solve a problem now with this internet. 431 00:21:40,607 --> 00:21:43,440 All right, the internet, as you probably heard, is filled with cats. 432 00:21:43,440 --> 00:21:45,810 And yet, these cat images can be pretty big. 433 00:21:45,810 --> 00:21:47,790 And indeed, bigger, still, than images are 434 00:21:47,790 --> 00:21:50,230 things like video files from Netflix and the like. 435 00:21:50,230 --> 00:21:53,070 And so there's huge amounts of traffic transmitting 436 00:21:53,070 --> 00:21:54,940 over those kinds of interconnections. 437 00:21:54,940 --> 00:21:57,522 So how do we ensure, at least with high probability, 438 00:21:57,522 --> 00:21:58,980 that data can actually get through? 439 00:21:58,980 --> 00:22:03,340 How can we ensure that there's some form of fairness, if not net neutrality, 440 00:22:03,340 --> 00:22:06,450 so that my data can get to its destination 441 00:22:06,450 --> 00:22:08,400 just as readily as your data can get there? 442 00:22:08,400 --> 00:22:10,710 Well, sometimes it's opportune to actually take 443 00:22:10,710 --> 00:22:13,590 big packets of information and chop them up. 444 00:22:13,590 --> 00:22:17,400 So indeed, what a computer will often do, thanks to TCP/IP, 445 00:22:17,400 --> 00:22:19,320 the combination of these protocols, is we'll 446 00:22:19,320 --> 00:22:22,180 take large files and large images, in this case, 447 00:22:22,180 --> 00:22:24,660 tear them up into, say, roughly-- 448 00:22:24,660 --> 00:22:29,610 oops-- equal sized parts like this here and then tear it down even further, 449 00:22:29,610 --> 00:22:32,940 perhaps, to get it into a smaller byte-sized piece 450 00:22:32,940 --> 00:22:37,170 and then send not only one packet of information over the internet. 451 00:22:37,170 --> 00:22:42,990 But instead, put one piece of information in this packet here. 452 00:22:42,990 --> 00:22:46,770 Put one other piece of information in this packet here, 453 00:22:46,770 --> 00:22:49,746 whose addressing, both to and from, is identical. 454 00:22:49,746 --> 00:22:51,870 And then do the same thing for the two other pieces 455 00:22:51,870 --> 00:22:55,290 so that ultimately we have four packets, each of which 456 00:22:55,290 --> 00:22:58,100 contains one portion, one quarter, in this case, 457 00:22:58,100 --> 00:23:03,250 of the resulting message, all of which are destined for the same destination. 458 00:23:03,250 --> 00:23:06,970 But the problem to be solved, now, is what do you do with this information? 459 00:23:06,970 --> 00:23:09,150 If I have four seemingly identical envelopes 460 00:23:09,150 --> 00:23:11,880 but inside of which are disparate pieces of information 461 00:23:11,880 --> 00:23:13,950 that somehow need to be reassembled-- 462 00:23:13,950 --> 00:23:16,950 let's put on our proverbial engineering hats-- 463 00:23:16,950 --> 00:23:18,210 how do you solve this problem? 464 00:23:18,210 --> 00:23:21,209 Is this sufficient information on the envelopes so 465 00:23:21,209 --> 00:23:24,000 that if I send this out on the internet toward Berkeley or Stanford 466 00:23:24,000 --> 00:23:29,190 or Facebook or wherever, how does that recipient know what to do with it? 467 00:23:29,190 --> 00:23:31,110 What would you, the human, do if you have 468 00:23:31,110 --> 00:23:33,900 not virtual but physical envelopes? 469 00:23:33,900 --> 00:23:35,910 Well, here, too, and here's an opportunity 470 00:23:35,910 --> 00:23:39,000 really to bring to bear human intuition to a problem that 471 00:23:39,000 --> 00:23:43,500 seems fairly technical and well beyond one's own technical understanding. 472 00:23:43,500 --> 00:23:47,220 And yet, it really is just a technical manifestation of a real world problem. 473 00:23:47,220 --> 00:23:49,260 I need to keep these in order somehow. 474 00:23:49,260 --> 00:23:50,100 So you know what? 475 00:23:50,100 --> 00:23:55,630 I'm going to say something like one of four on the first one, like this. 476 00:23:55,630 --> 00:23:59,700 The next one, I'm going to say two of four on the next one, like this. 477 00:23:59,700 --> 00:24:03,210 And then I'm going to say three of four and then on the next one 478 00:24:03,210 --> 00:24:06,150 here, I'm going to put four of four. 479 00:24:06,150 --> 00:24:07,380 And what's the takeaway, now? 480 00:24:07,380 --> 00:24:11,010 Now, whoever is the recipient of these several envelopes 481 00:24:11,010 --> 00:24:13,110 as I send them out on the internet-- and indeed, 482 00:24:13,110 --> 00:24:14,776 they don't have to follow the same path. 483 00:24:14,776 --> 00:24:15,690 One can go this way. 484 00:24:15,690 --> 00:24:17,031 One can be routed that way. 485 00:24:17,031 --> 00:24:18,280 Another can go to this router. 486 00:24:18,280 --> 00:24:19,530 Another can go to that router. 487 00:24:19,530 --> 00:24:22,113 Because they're all addressed and because all of these routers 488 00:24:22,113 --> 00:24:24,360 are somehow interconnected, all four of those packets 489 00:24:24,360 --> 00:24:27,490 will hopefully get to their destination. 490 00:24:27,490 --> 00:24:29,880 But if they don't, the recipient can look 491 00:24:29,880 --> 00:24:33,686 at that additional detail I wrote on the envelope and see, oh, I got part one. 492 00:24:33,686 --> 00:24:34,310 I got part two. 493 00:24:34,310 --> 00:24:35,018 I got part three. 494 00:24:35,018 --> 00:24:37,290 But where is part four of four? 495 00:24:37,290 --> 00:24:38,970 It didn't arrive because of congestion. 496 00:24:38,970 --> 00:24:41,480 Literally got dropped on the floor or not picked up. 497 00:24:41,480 --> 00:24:43,710 So the computer, who's supposed to be receiving 498 00:24:43,710 --> 00:24:47,850 that data, thanks to TCP recall, can say, hey, please send me again 499 00:24:47,850 --> 00:24:49,650 packet four of four. 500 00:24:49,650 --> 00:24:52,080 And so as technical as the internet might seem, 501 00:24:52,080 --> 00:24:55,860 it really, again, is just some fairly intuitive solutions 502 00:24:55,860 --> 00:24:59,250 to problems like this, albeit translated to more technical contexts, more 503 00:24:59,250 --> 00:25:02,310 technical protocols, and more technical languages. 504 00:25:02,310 --> 00:25:05,670 But let's look at some more user-facing protocols. 505 00:25:05,670 --> 00:25:08,760 The ones we've discussed thus far are fairly low level, if you will. 506 00:25:08,760 --> 00:25:11,010 And indeed, there's this whole internet hierarchy 507 00:25:11,010 --> 00:25:13,710 of protocols layer on protocols layer on protocols 508 00:25:13,710 --> 00:25:17,700 so that what we humans really tend to care about, if we're not the engineers 509 00:25:17,700 --> 00:25:21,210 but we're really the software developers and we're the users of applications, 510 00:25:21,210 --> 00:25:24,210 we care about application layer protocols that 511 00:25:24,210 --> 00:25:28,050 is right between the human and all of those lower level protocols. 512 00:25:28,050 --> 00:25:32,037 For instance, these, at least one of which has got to jump out at you, HTTP. 513 00:25:32,037 --> 00:25:33,120 Odds are you've seen this. 514 00:25:33,120 --> 00:25:34,900 Odds are you've typed this, though decreasingly 515 00:25:34,900 --> 00:25:37,858 do you have to still type it because browsers will just add it for you, 516 00:25:37,858 --> 00:25:38,550 HTTP. 517 00:25:38,550 --> 00:25:43,460 The secure or encrypted version, HTTPS, IMAP for email in-bounds, 518 00:25:43,460 --> 00:25:50,040 SMTP for email outbound, SFTP for Secure File Transfer, SSH for Secure Shell, 519 00:25:50,040 --> 00:25:54,090 an encrypted text textual channel between two computers, and many more. 520 00:25:54,090 --> 00:25:59,430 But HTTP, let's focus on that one because that is Hypertext Transfer 521 00:25:59,430 --> 00:26:00,810 Protocol. 522 00:26:00,810 --> 00:26:04,860 Or HTTPS, the same but the S stands for-- 523 00:26:04,860 --> 00:26:09,670 not savings-- secure, so it's actually encrypted in this case. 524 00:26:09,670 --> 00:26:11,590 So what does this actually mean? 525 00:26:11,590 --> 00:26:14,580 Well, at the end of the day, HTTP is a protocol 526 00:26:14,580 --> 00:26:19,770 that governs what kinds of messages go inside of those envelopes 527 00:26:19,770 --> 00:26:22,500 that I've been preparing for the internet, what kinds of messages 528 00:26:22,500 --> 00:26:24,420 go inside of those envelopes. 529 00:26:24,420 --> 00:26:28,620 And it turns out the simplest message that a computer sends 530 00:26:28,620 --> 00:26:33,230 through this whole internet, ultimately, inside of virtual envelope 531 00:26:33,230 --> 00:26:38,330 is quite often, thanks to HTTP, inside of this virtual envelope, 532 00:26:38,330 --> 00:26:40,490 if I'm trying to request a cat from the internet, 533 00:26:40,490 --> 00:26:43,490 might literally be a message like this, get me, 534 00:26:43,490 --> 00:26:49,451 for instance slash cat.jpg for JPEG. 535 00:26:49,451 --> 00:26:51,200 And maybe some additional text after that, 536 00:26:51,200 --> 00:26:54,020 maybe some additional text below that, but at the end of the day inside 537 00:26:54,020 --> 00:26:57,186 the virtual envelope, if I am on the internet and I'm going on Google Images 538 00:26:57,186 --> 00:27:00,260 and I want to find a picture of a cat, inside of my envelope, 539 00:27:00,260 --> 00:27:05,240 if I am a web browser speaking HTTP is going to literally be a textual message 540 00:27:05,240 --> 00:27:11,210 that says get/cat.jpeg, if I know that's where the image is on some server. 541 00:27:11,210 --> 00:27:13,670 The response is going to be what was just inside of those 542 00:27:13,670 --> 00:27:15,859 four envelopes back from the server to me, 543 00:27:15,859 --> 00:27:18,650 chopped up maybe into multiple pieces but in a way where I can then 544 00:27:18,650 --> 00:27:21,108 realize, oh, wait a minute, you sent me only three or four. 545 00:27:21,108 --> 00:27:22,460 Please send me the fourth one. 546 00:27:22,460 --> 00:27:25,210 So it works in both ways, whether it's me sending a cat to someone 547 00:27:25,210 --> 00:27:27,080 or receiving a cat from someone. 548 00:27:27,080 --> 00:27:31,850 This protocol, HTTP, governs how the messages are formatted 549 00:27:31,850 --> 00:27:35,990 and what language, so to speak, is spoken between web browser and server. 550 00:27:35,990 --> 00:27:39,830 So indeed, HTTP is entirely about having a web 551 00:27:39,830 --> 00:27:42,650 browser communicate with a server. 552 00:27:42,650 --> 00:27:44,427 And we can see this in action. 553 00:27:44,427 --> 00:27:47,260 I'm going to go ahead and pull up a so-called terminal window again, 554 00:27:47,260 --> 00:27:49,580 this textual command prompt on my computer. 555 00:27:49,580 --> 00:27:52,257 And I'm going to pretend to be a browser. 556 00:27:52,257 --> 00:27:54,590 So I'm not going to just trace the route between point A 557 00:27:54,590 --> 00:27:57,890 and point B. I'm actually going to request a web 558 00:27:57,890 --> 00:28:00,980 page as though I am Chrome or Edge or Firefox or Safari 559 00:28:00,980 --> 00:28:03,020 or whatever your favorite browser is. 560 00:28:03,020 --> 00:28:05,840 But of course, as before, all I know is that I 561 00:28:05,840 --> 00:28:10,030 want to visit my favorite web site, Facebook.com, for instance. 562 00:28:10,030 --> 00:28:12,260 But I don't know its IP address necessarily, 563 00:28:12,260 --> 00:28:14,100 so let's go through that step. 564 00:28:14,100 --> 00:28:15,650 How do I look up its IP address? 565 00:28:15,650 --> 00:28:19,520 Well, my Mac already has an IP address because of DHCP. 566 00:28:19,520 --> 00:28:20,660 I'm already powered up. 567 00:28:20,660 --> 00:28:22,820 I'm already connected to the Wi-Fi here on campus, 568 00:28:22,820 --> 00:28:25,490 and so I already have my own IP address, and I also 569 00:28:25,490 --> 00:28:27,290 have the IP address of a DNS server. 570 00:28:27,290 --> 00:28:29,040 So my Mac just knows that. 571 00:28:29,040 --> 00:28:34,501 But I can use that capability now to look up the IP address 572 00:28:34,501 --> 00:28:37,000 for the name, Facebook, and I'm going to do that as follows, 573 00:28:37,000 --> 00:28:41,000 nslookup, for name server lookup. 574 00:28:41,000 --> 00:28:46,820 And I'm going to go ahead and type in www.facebook.com, enter. 575 00:28:46,820 --> 00:28:49,640 And interestingly, we get back this somewhat cryptic response 576 00:28:49,640 --> 00:28:50,990 but let's make some sense of it. 577 00:28:50,990 --> 00:28:54,915 So it looks like the server that this response came back from 10.0.0.2, 578 00:28:54,915 --> 00:28:58,040 which happens to be a private IP address here on campus that you might have 579 00:28:58,040 --> 00:29:00,780 in your own company or university or even home network, 580 00:29:00,780 --> 00:29:03,980 Then a non-authoritative answer is this, www.facebook.com, 581 00:29:03,980 --> 00:29:09,980 whose canonical name is, curiously, star-mini.c10r.facebook.com. 582 00:29:09,980 --> 00:29:12,200 Well, it turns out that companies like Facebook 583 00:29:12,200 --> 00:29:14,900 absolutely have many, many, many different web servers, 584 00:29:14,900 --> 00:29:17,270 and they might not necessarily have just one IP address. 585 00:29:17,270 --> 00:29:19,081 But we might just be seeing one IP address 586 00:29:19,081 --> 00:29:21,830 depending on where I am in the world and depending on how Facebook 587 00:29:21,830 --> 00:29:23,930 has configured its infrastructure. 588 00:29:23,930 --> 00:29:28,070 The takeaway, then, is that apparently so far as my Mac is concerned, 589 00:29:28,070 --> 00:29:33,020 www.facebook.com is an alias for or a synonym for this 590 00:29:33,020 --> 00:29:36,950 longer less well marketed domain name here. 591 00:29:36,950 --> 00:29:40,340 But what we really care about, if I'm about to pretend to be a browser, 592 00:29:40,340 --> 00:29:41,510 is this IP address. 593 00:29:41,510 --> 00:29:45,890 Facebook's IP address is apparently 31.13.65.36. 594 00:29:45,890 --> 00:29:47,450 And I can see this, in fact. 595 00:29:47,450 --> 00:29:52,560 Let me go into Google Chrome, or any browser for that matter, 596 00:29:52,560 --> 00:30:00,640 and go to http://31.13.65.36, enter. 597 00:30:00,640 --> 00:30:02,880 And voila, I made it to Facebook. 598 00:30:02,880 --> 00:30:04,980 Now of course no one in their right mind is 599 00:30:04,980 --> 00:30:08,866 going to advertise the IP address as 31.13.65.36. 600 00:30:08,866 --> 00:30:09,990 No one would remember that. 601 00:30:09,990 --> 00:30:13,590 We're not in the age of phone numbers on the side of billboards anymore. 602 00:30:13,590 --> 00:30:18,000 Now we have Domain Name System and DNS which does this conversion for us. 603 00:30:18,000 --> 00:30:21,690 But now that I know that IP address, I can use this information 604 00:30:21,690 --> 00:30:24,630 and pretend to be a browser and not just see the response in Chrome 605 00:30:24,630 --> 00:30:27,324 as we just did, but I can see it in my textual window 606 00:30:27,324 --> 00:30:28,740 so I can look inside the envelope. 607 00:30:28,740 --> 00:30:31,500 Indeed, this terminal window is going to let me pretend to-- 608 00:30:31,500 --> 00:30:35,380 well, actually send a message as though I'm a browser pretending to be one. 609 00:30:35,380 --> 00:30:39,150 But it's going to let me see inside of the response that comes back. 610 00:30:39,150 --> 00:30:40,380 Here's what I'm going to do. 611 00:30:40,380 --> 00:30:45,030 I'm going to go ahead and type in cURL dash I, 612 00:30:45,030 --> 00:30:49,710 and I'm going to go ahead and type http:// and then that IP address 613 00:30:49,710 --> 00:30:51,630 and I'm going to hit enter. 614 00:30:51,630 --> 00:30:55,530 And notice, uh-oh, Facebook has moved permanently. 615 00:30:55,530 --> 00:30:56,760 But this is a good thing. 616 00:30:56,760 --> 00:30:58,680 To where has Facebook moved? 617 00:30:58,680 --> 00:31:01,230 Well, apparently we've gone back a response, 618 00:31:01,230 --> 00:31:12,370 via version 1.1 of of HTTP that Facebook, per this status code, so 619 00:31:12,370 --> 00:31:16,880 to speak, has moved permanently. 620 00:31:16,880 --> 00:31:20,870 Has moved permanently, which sounds scary, but where has it moved to? 621 00:31:20,870 --> 00:31:24,620 Oh, they don't want people visiting their IP address, even though it works. 622 00:31:24,620 --> 00:31:28,520 They want to redirect people, so to speak, to their domain name. 623 00:31:28,520 --> 00:31:32,180 So we seem to be kind of in a cyclical situation here where, wait a minute, 624 00:31:32,180 --> 00:31:34,860 I thought I had to convert my domain name to an IP address. 625 00:31:34,860 --> 00:31:37,970 And indeed, I do, but it turns out cURL is pretending 626 00:31:37,970 --> 00:31:40,460 to be a text-based browser here, effectively, 627 00:31:40,460 --> 00:31:43,950 and it is already going to do this DNS look up for me so this is OK. 628 00:31:43,950 --> 00:31:52,940 I'm going to go ahead now and do cURL dash I, http://www.facebook.com, enter. 629 00:31:52,940 --> 00:31:53,630 Oh, my god. 630 00:31:53,630 --> 00:31:55,930 Facebook moved again. 631 00:31:55,930 --> 00:31:57,740 But where did they move this time? 632 00:31:57,740 --> 00:32:00,010 Well, it seems that Facebook would prefer 633 00:32:00,010 --> 00:32:04,030 that we visit https://www.facebook.com, which 634 00:32:04,030 --> 00:32:05,800 is the secure, the encrypted version. 635 00:32:05,800 --> 00:32:07,120 OK, I can oblige. 636 00:32:07,120 --> 00:32:08,440 Let's go ahead and do that. 637 00:32:08,440 --> 00:32:15,830 cURL dash I of the HTTPS version, which I've just pasted in, enter, and voila. 638 00:32:15,830 --> 00:32:21,620 Now, this looks overwhelming, but what's really important is this message here. 639 00:32:21,620 --> 00:32:23,867 It turns out everything is OK. 640 00:32:23,867 --> 00:32:25,700 And indeed, what's come back from the server 641 00:32:25,700 --> 00:32:29,240 is a virtual envelope, inside of which is this message here saying, 642 00:32:29,240 --> 00:32:30,410 hey, no big deal. 643 00:32:30,410 --> 00:32:31,340 Everything is OK. 644 00:32:31,340 --> 00:32:34,080 And you never see this number when you visit web pages, 645 00:32:34,080 --> 00:32:36,830 unless you're a software developer and you know what tools to use. 646 00:32:36,830 --> 00:32:40,340 Instead, some of us out there, some of us normal humans 647 00:32:40,340 --> 00:32:43,520 occasionally see a different number, maybe the one number 648 00:32:43,520 --> 00:32:45,180 you associate with the web. 649 00:32:45,180 --> 00:32:46,650 Let me simulate it as follows. 650 00:32:46,650 --> 00:32:49,980 Let me go ahead and request this completely bogus page. 651 00:32:49,980 --> 00:32:53,690 Hopefully that's not actually someone's user name and hit enter. 652 00:32:53,690 --> 00:32:55,880 Scroll back up a bit. 653 00:32:55,880 --> 00:32:57,920 What do you notice this time? 654 00:32:57,920 --> 00:33:01,610 If you've ever wondered what 404 means, it 655 00:33:01,610 --> 00:33:06,050 is the numeric code inside of a virtual envelope coming back from a server 656 00:33:06,050 --> 00:33:08,690 when you have requested some nonsensical URL because 657 00:33:08,690 --> 00:33:10,730 of a typographical error or just nonsense 658 00:33:10,730 --> 00:33:14,865 that I typed that's now having the server tell you, uh-uh, not found, 404. 659 00:33:14,865 --> 00:33:16,490 So this is just a special numeric code. 660 00:33:16,490 --> 00:33:18,290 And this is common in programming to have 661 00:33:18,290 --> 00:33:21,200 numbers correspond to different types of things that can go wrong 662 00:33:21,200 --> 00:33:25,190 or, better yet, that can go well, as in the case of 200 OK. 663 00:33:25,190 --> 00:33:27,590 Now, all of this stuff is called HTTP headers. 664 00:33:27,590 --> 00:33:29,640 So I was oversimplifying earlier when I said 665 00:33:29,640 --> 00:33:33,800 HTTP is just this handshake of sorts between servers where you say, 666 00:33:33,800 --> 00:33:38,060 get me a cat picture and then you get back the response as per those four 667 00:33:38,060 --> 00:33:38,690 envelopes. 668 00:33:38,690 --> 00:33:40,170 There's more headers. 669 00:33:40,170 --> 00:33:43,970 There's more key value pairs, words with colons, words with colons, 670 00:33:43,970 --> 00:33:46,730 words with colons, and then values to the right of those. 671 00:33:46,730 --> 00:33:50,270 And that is just additional metadata, more information from the server that 672 00:33:50,270 --> 00:33:52,460 tells you a little something about it. 673 00:33:52,460 --> 00:33:57,260 But if I instead run that same command one final time, 674 00:33:57,260 --> 00:34:04,160 this time doing cURL and then specifying not dash I but just the URL itself 675 00:34:04,160 --> 00:34:08,900 and hit enter, this craziness comes back. 676 00:34:08,900 --> 00:34:11,840 And this looks like a whole lot of programming language in something 677 00:34:11,840 --> 00:34:15,199 called JavaScript or big JSON object. 678 00:34:15,199 --> 00:34:17,790 And my god, look how much data came back from the server. 679 00:34:17,790 --> 00:34:20,449 But notice, I'm starting to see some structure. 680 00:34:20,449 --> 00:34:23,670 Open bracket div and the word label here. 681 00:34:23,670 --> 00:34:25,969 And if I go up here, input here. 682 00:34:25,969 --> 00:34:30,156 And indeed, what you are seeing is a language called HTML. 683 00:34:30,156 --> 00:34:32,989 Inside of the virtual envelope, if you're requesting not a cat image 684 00:34:32,989 --> 00:34:36,980 but a web page that has your news feed or your inbox from Gmail or your search 685 00:34:36,980 --> 00:34:40,400 results from Google is a language called HTML. 686 00:34:40,400 --> 00:34:42,540 And HTML's not a programming language. 687 00:34:42,540 --> 00:34:45,020 And indeed, it's not as cryptic looking as this. 688 00:34:45,020 --> 00:34:46,679 Google is being very-- 689 00:34:46,679 --> 00:34:48,560 or, Facebook is being very efficient when 690 00:34:48,560 --> 00:34:50,560 it comes to showing me this information and just 691 00:34:50,560 --> 00:34:52,550 getting rid of as much formatting as they 692 00:34:52,550 --> 00:34:57,240 can to save space, to save on internet bandwidth or transmission thereof. 693 00:34:57,240 --> 00:34:59,990 But it's a language that comes back in this virtual envelope 694 00:34:59,990 --> 00:35:01,670 that a browser knows how to display. 695 00:35:01,670 --> 00:35:03,545 It's a markup language in the sense that it's 696 00:35:03,545 --> 00:35:07,010 going to tell the browser what to show on the screen, where to show the cat, 697 00:35:07,010 --> 00:35:10,970 where to put words, whether to make those words big or bold or italics 698 00:35:10,970 --> 00:35:12,960 or centered or any number of other things. 699 00:35:12,960 --> 00:35:18,845 And indeed, what you are seeing is this. 700 00:35:18,845 --> 00:35:23,480 This is www.facebook.com graphically, as we see it in the browser. 701 00:35:23,480 --> 00:35:28,460 Underneath the hood is that black and white seemingly nonsensical Greek, 702 00:35:28,460 --> 00:35:31,610 if you will, that at first glance, there's no way most of us 703 00:35:31,610 --> 00:35:32,810 would understand it. 704 00:35:32,810 --> 00:35:34,940 But that's because we're looking at it here. 705 00:35:34,940 --> 00:35:37,430 We need to dive in a little deeper, take a look 706 00:35:37,430 --> 00:35:40,190 at what HTML is, how it's actually structured, 707 00:35:40,190 --> 00:35:44,270 make the simplest of web pages, a hello world of web pages, if you will. 708 00:35:44,270 --> 00:35:46,310 And then can we realize and build back up 709 00:35:46,310 --> 00:35:49,700 to this point exactly what composes pages like Facebook and Gmail 710 00:35:49,700 --> 00:35:51,380 and Google and Bing and others. 711 00:35:51,380 --> 00:35:53,588 Because at that point, we'll have understood not only 712 00:35:53,588 --> 00:35:56,690 how the internet works, but how you can use it 713 00:35:56,690 --> 00:36:00,920 as a delivery vehicle for your ideas, for your programs, for your products, 714 00:36:00,920 --> 00:36:06,710 for your companies and more and actually deliver information and deliver cats 715 00:36:06,710 --> 00:36:11,350 and much more to your users on this internet. 716 00:36:11,350 --> 00:36:13,293