1 00:00:00,000 --> 00:00:02,480 2 00:00:02,480 --> 00:00:16,368 [MUSIC PLAYING] 3 00:00:16,368 --> 00:00:19,760 SPEAKER: You type an address into a browser, you send an email, 4 00:00:19,760 --> 00:00:22,613 you perhaps have a video conference or a chat online. 5 00:00:22,613 --> 00:00:24,530 Have you ever stopped to consider what exactly 6 00:00:24,530 --> 00:00:28,068 is going on underneath the hood, so to speak, of those pieces of software? 7 00:00:28,068 --> 00:00:30,110 And really the entire infrastructure that somehow 8 00:00:30,110 --> 00:00:33,470 connects you to the person or persons with whom you're communicating. 9 00:00:33,470 --> 00:00:36,080 Well it turns out that there's a whole stack, so to speak, 10 00:00:36,080 --> 00:00:39,770 of internet technologies that underline the software that you and I use 11 00:00:39,770 --> 00:00:41,390 these days, every day. 12 00:00:41,390 --> 00:00:43,670 And indeed, the software that we use, browsers 13 00:00:43,670 --> 00:00:46,520 and email clients and the like, are really abstractions, 14 00:00:46,520 --> 00:00:50,420 very user friendly abstractions, on top of some lower level implementation 15 00:00:50,420 --> 00:00:51,200 details. 16 00:00:51,200 --> 00:00:54,800 And these days, too, have we built abstractions even above those so known 17 00:00:54,800 --> 00:00:58,280 as the cloud, an abstraction on top of this underlying infrastructure 18 00:00:58,280 --> 00:01:02,450 that enables us to do most anything we want computationally without even 19 00:01:02,450 --> 00:01:04,220 having that hardware locally. 20 00:01:04,220 --> 00:01:08,110 So let's see if we can't distill what goes on when you do type an address 21 00:01:08,110 --> 00:01:12,200 or a URL into the address bar of a browser and then hit Enter. 22 00:01:12,200 --> 00:01:13,550 Or you type out an email-- 23 00:01:13,550 --> 00:01:15,890 specify someone's email address and then hit Enter. 24 00:01:15,890 --> 00:01:18,770 What exactly is going on underneath the hood? 25 00:01:18,770 --> 00:01:22,160 Well, at the end of the day, I dare say that what your laptop, and my laptop, 26 00:01:22,160 --> 00:01:25,070 and our desktops, and even our servers are capable of really 27 00:01:25,070 --> 00:01:29,390 is just sending messages in envelopes back and forth across the internet. 28 00:01:29,390 --> 00:01:31,100 Virtual envelopes, if you will. 29 00:01:31,100 --> 00:01:34,550 Now in our human world, an envelope needs a few things on the outside. 30 00:01:34,550 --> 00:01:37,790 If you want to send a letter or a card or something old school to someone, 31 00:01:37,790 --> 00:01:39,260 you need to address it, of course. 32 00:01:39,260 --> 00:01:42,600 And you need to put, perhaps in the middle, the recipient's name, 33 00:01:42,600 --> 00:01:44,330 and address, and other details. 34 00:01:44,330 --> 00:01:47,510 You might put in the top left hand corner, by convention, your own name 35 00:01:47,510 --> 00:01:48,443 and or address. 36 00:01:48,443 --> 00:01:50,360 You might even put a little memo in the bottom 37 00:01:50,360 --> 00:01:53,990 that specifies what's inside or fragile or some other annotation. 38 00:01:53,990 --> 00:01:56,390 So this metaphor of the physical world is actually 39 00:01:56,390 --> 00:02:00,140 pretty apt for what's going on underneath the hood in computers. 40 00:02:00,140 --> 00:02:02,960 When you have a computer plugged into a network 41 00:02:02,960 --> 00:02:05,630 or connected wirelessly to a network, it really 42 00:02:05,630 --> 00:02:08,765 is just sending and receiving envelopes, virtual envelopes, 43 00:02:08,765 --> 00:02:11,390 that at the end of the day are just patterns of zeros and ones, 44 00:02:11,390 --> 00:02:15,750 but collectively, those zeros and ones represent your email or the request 45 00:02:15,750 --> 00:02:18,500 that you've made of a web server, the response you're getting back 46 00:02:18,500 --> 00:02:19,940 from that web server. 47 00:02:19,940 --> 00:02:23,630 So let's see if we can't formalize exactly what these lower level 48 00:02:23,630 --> 00:02:27,200 primitives are, consider exactly how they're layered on top of one another, 49 00:02:27,200 --> 00:02:30,290 because thereafter we can build almost anything we want 50 00:02:30,290 --> 00:02:33,770 on top of this infrastructure once we understand what those underlying 51 00:02:33,770 --> 00:02:36,020 building blocks actually are. 52 00:02:36,020 --> 00:02:38,990 So let's consider how we actually address 53 00:02:38,990 --> 00:02:40,950 this envelope in the first place. 54 00:02:40,950 --> 00:02:43,940 After all, when I turn on my laptop or turn on my phone 55 00:02:43,940 --> 00:02:47,570 or open up my desktop in the morning, how does that computer or that phone 56 00:02:47,570 --> 00:02:50,420 even know what its own address is on the internet? 57 00:02:50,420 --> 00:02:52,610 Because just as in our human world, wherein 58 00:02:52,610 --> 00:02:56,000 you need to be uniquely addressable in the physical world 59 00:02:56,000 --> 00:02:59,720 in order to even receive an envelope or a card or a package, 60 00:02:59,720 --> 00:03:04,220 so do computers need to be uniquely identifiable on the internet. 61 00:03:04,220 --> 00:03:07,490 Now for our purposes, now we can consider the internet just 62 00:03:07,490 --> 00:03:11,810 to be an internetworked collection of computers connected 63 00:03:11,810 --> 00:03:13,730 via wires, connected wirelessly. 64 00:03:13,730 --> 00:03:16,610 There's some kind of interconnectivity among all of these devices 65 00:03:16,610 --> 00:03:19,730 and these days our phones and internet of things devices and other things 66 00:03:19,730 --> 00:03:20,360 still. 67 00:03:20,360 --> 00:03:22,580 So let's just stipulate that somehow or other there's 68 00:03:22,580 --> 00:03:25,550 a physical connection, or even a wireless connection, 69 00:03:25,550 --> 00:03:28,100 between all of these various devices. 70 00:03:28,100 --> 00:03:31,340 So those devices all need unique addresses, 71 00:03:31,340 --> 00:03:34,550 just like a building in the human world needs an address. 72 00:03:34,550 --> 00:03:37,970 For instance, the computer science building here on campus is at 33 Oxford 73 00:03:37,970 --> 00:03:41,780 Street, Cambridge, Massachusetts, 02138, USA. 74 00:03:41,780 --> 00:03:47,872 With that precise information, can you send us a real mail or a package 75 00:03:47,872 --> 00:03:50,330 or anything else through the physical world in order for it 76 00:03:50,330 --> 00:03:51,920 to arrive on our doorstep? 77 00:03:51,920 --> 00:03:54,490 But what if you, instead, wanted to send us an email 78 00:03:54,490 --> 00:03:56,240 and get it to that building, or really me, 79 00:03:56,240 --> 00:04:00,470 wherever I am physically in the world on my internet works device? 80 00:04:00,470 --> 00:04:02,810 You need to know my computer's address, you 81 00:04:02,810 --> 00:04:05,570 need to know my phone's address, or at least the mail server 82 00:04:05,570 --> 00:04:08,660 that's responsible for receiving that message from you. 83 00:04:08,660 --> 00:04:12,710 Well, it turns out that most any network on a campus, in a corporation, 84 00:04:12,710 --> 00:04:16,279 even at home these days has a DHCP server. 85 00:04:16,279 --> 00:04:19,040 Stands for a Dynamic Host Configuration Protocol, 86 00:04:19,040 --> 00:04:22,940 and that's just a fancy way of describing a server that is constantly 87 00:04:22,940 --> 00:04:26,540 listening for new laptops, new desktops, new phones, new other devices, to wake 88 00:04:26,540 --> 00:04:31,310 up or be turned on and to shout out the digital equivalent of hello, world, 89 00:04:31,310 --> 00:04:32,690 what is my address? 90 00:04:32,690 --> 00:04:35,250 Because the purpose in life of these DHCP 91 00:04:35,250 --> 00:04:37,610 servers is to answer that question. 92 00:04:37,610 --> 00:04:41,990 To say, David you're going to go ahead and be address 1.2.3.4 today. 93 00:04:41,990 --> 00:04:47,570 Or David, you're going to be 4.5.6.7 or 5.6.7.8. 94 00:04:47,570 --> 00:04:51,560 Any number of possibilities can be used to represent 95 00:04:51,560 --> 00:04:53,840 uniquely my particular device. 96 00:04:53,840 --> 00:04:57,890 So DHCP servers are run by the system administrators on a campus, 97 00:04:57,890 --> 00:05:00,530 in a company, in an internet service provider. 98 00:05:00,530 --> 00:05:03,770 More generally, they're run by whoever provides us 99 00:05:03,770 --> 00:05:05,540 with our internet connectivity. 100 00:05:05,540 --> 00:05:07,400 They just exist on our network. 101 00:05:07,400 --> 00:05:10,250 But these DHCP servers also give us other information. 102 00:05:10,250 --> 00:05:14,300 After all, it's not really sufficient just to know what my own address is. 103 00:05:14,300 --> 00:05:16,950 How do I know where anyone else in the world is? 104 00:05:16,950 --> 00:05:18,870 Well, it turns out that the internet is filled 105 00:05:18,870 --> 00:05:22,350 with devices called routers whose purpose in life, 106 00:05:22,350 --> 00:05:25,770 as their name suggests, is to route information from point A 107 00:05:25,770 --> 00:05:28,690 to point B to point C and so on. 108 00:05:28,690 --> 00:05:31,470 And those routers, similarly, need to know these addresses 109 00:05:31,470 --> 00:05:34,350 so that they know upon receiving some packet of information, 110 00:05:34,350 --> 00:05:37,810 some virtual envelope, in which direction to send it off. 111 00:05:37,810 --> 00:05:43,350 So these DHCP servers also tell me not just my address, but also the address 112 00:05:43,350 --> 00:05:45,360 of the next hop, so to speak. 113 00:05:45,360 --> 00:05:47,940 I, as a little old laptop or phone or a desktop, 114 00:05:47,940 --> 00:05:51,840 I have no idea where 99.999 percent of the computers in the world 115 00:05:51,840 --> 00:05:53,790 are, even higher than that perhaps. 116 00:05:53,790 --> 00:05:57,900 But I do need to know where the next computer is on the internet, 117 00:05:57,900 --> 00:06:01,260 so that if I want to send information that leaves this room, 118 00:06:01,260 --> 00:06:03,960 it needs to go to a router whose purpose in life 119 00:06:03,960 --> 00:06:06,690 is to, again, route it further along. 120 00:06:06,690 --> 00:06:09,840 And generally there might be one, two, maybe even 30 steps 121 00:06:09,840 --> 00:06:14,550 or hops in between me and my destination for that email or virtual envelope, 122 00:06:14,550 --> 00:06:17,610 and those routers are all configured by people who aren't me, 123 00:06:17,610 --> 00:06:21,060 system administrators beyond this, beyond these walls 124 00:06:21,060 --> 00:06:22,800 to know how to route that data. 125 00:06:22,800 --> 00:06:26,158 So we can actually see evidence of this that you yourself 126 00:06:26,158 --> 00:06:28,200 have had underneath your fingertips all this time 127 00:06:28,200 --> 00:06:29,970 and you might not have ever poked around. 128 00:06:29,970 --> 00:06:32,610 For instance, if you want to see your own address, 129 00:06:32,610 --> 00:06:34,950 keep an eye out for a number of this form. 130 00:06:34,950 --> 00:06:39,330 It's a number dot number dot number dot number, and each of those place holders 131 00:06:39,330 --> 00:06:44,520 represents a specific value, either starting at zero or ending at 255. 132 00:06:44,520 --> 00:06:49,530 In other words, each of these hashes can be any value between 0 and 255, 133 00:06:49,530 --> 00:06:54,960 and that range 0 to 255 well that's 256 total possible values. 134 00:06:54,960 --> 00:06:55,770 That's eight bits. 135 00:06:55,770 --> 00:07:00,300 Ergo, each of these place holders represents 8 bits, 8 more bits, 8 more, 136 00:07:00,300 --> 00:07:01,020 8 more. 137 00:07:01,020 --> 00:07:05,070 So an IP address, by definition, is 32 bits. 138 00:07:05,070 --> 00:07:05,760 And there it is. 139 00:07:05,760 --> 00:07:07,950 IP, an acronym you've probably seen somewhere, 140 00:07:07,950 --> 00:07:10,270 even if you've not thought hard about what it is, 141 00:07:10,270 --> 00:07:11,940 stands for Internet Protocol. 142 00:07:11,940 --> 00:07:15,360 Internet Protocol mandates that every computer on the internet, 143 00:07:15,360 --> 00:07:19,920 at the risk of oversimplification, has a unique address called an IP address. 144 00:07:19,920 --> 00:07:22,860 And those IP addresses look like this. 145 00:07:22,860 --> 00:07:27,720 If these IP addresses are composed of 32 bits, how many possible IPs are there 146 00:07:27,720 --> 00:07:30,840 and therefore how many possible machines can we have on our internet? 147 00:07:30,840 --> 00:07:36,060 Well, 2 times 2 times 2, 2 to the 32, so that's four billion, give or take. 148 00:07:36,060 --> 00:07:40,170 By design of IP addresses, you can have four billion, 149 00:07:40,170 --> 00:07:44,858 give or take, possible permutations of zeros and ones if you have 32 in total, 150 00:07:44,858 --> 00:07:46,650 and that gives you four billion, maximally, 151 00:07:46,650 --> 00:07:51,180 computers and phones and internet of things devices, and the like. 152 00:07:51,180 --> 00:07:53,760 Now that sounds big, but not when each of us 153 00:07:53,760 --> 00:07:57,780 personally probably carries one IP address in our pocket in our phone, 154 00:07:57,780 --> 00:08:00,572 maybe another on our wrist these days, one or more computers 155 00:08:00,572 --> 00:08:03,780 in our life, not to mention all of the other devices and servers in the world 156 00:08:03,780 --> 00:08:05,190 that need these addresses, too. 157 00:08:05,190 --> 00:08:08,250 So long story short, this is version 4 of IP. 158 00:08:08,250 --> 00:08:12,090 It's decades old, but there's also a newcomer on the field 159 00:08:12,090 --> 00:08:15,180 called IPv6, version 6. 160 00:08:15,180 --> 00:08:17,250 There isn't really to be a version 5. 161 00:08:17,250 --> 00:08:19,830 And IPv6 is only finally gaining traction 162 00:08:19,830 --> 00:08:22,290 because we're running so short on IPs that it's 163 00:08:22,290 --> 00:08:25,530 becoming a problem for campuses, for companies, and beyond. 164 00:08:25,530 --> 00:08:30,310 But IPv6 will use 128 bits instead of 32, 165 00:08:30,310 --> 00:08:33,630 which gives us many, many, many, many, more possibilities, bigger 166 00:08:33,630 --> 00:08:35,530 numbers than I can even pronounce. 167 00:08:35,530 --> 00:08:38,370 So that should cut it for quite some time. 168 00:08:38,370 --> 00:08:43,110 But not every computer on the internet needs a public IP address, only 169 00:08:43,110 --> 00:08:47,970 those envelopes, so to speak, that need to leave my pocket, or my home, 170 00:08:47,970 --> 00:08:50,100 or my campus, or my company. 171 00:08:50,100 --> 00:08:54,540 It turns out, as a short term mechanism to squeeze a bit more utility out 172 00:08:54,540 --> 00:08:58,050 of our 32-bit addresses, which are still omnipresent 173 00:08:58,050 --> 00:09:00,780 and the most popular among the versions, well 174 00:09:00,780 --> 00:09:02,850 we can actually distinguish between public IP 175 00:09:02,850 --> 00:09:06,300 addresses that do actually go out on the internet and private addresses. 176 00:09:06,300 --> 00:09:08,790 And indeed, if your own IP address happens 177 00:09:08,790 --> 00:09:11,820 to start with the number 10 and then a dot or the number 178 00:09:11,820 --> 00:09:17,640 172.16 and then a dot, or the number 162.168 and then a dot, 179 00:09:17,640 --> 00:09:21,810 and then something else, well, odds are, your computer has a private IP address. 180 00:09:21,810 --> 00:09:25,860 And this is just a feature of the little router that's probably in your home, 181 00:09:25,860 --> 00:09:28,920 or the bigger router on your campus or corporate network, 182 00:09:28,920 --> 00:09:33,690 that enables you to have an IP address that's only used within the company, 183 00:09:33,690 --> 00:09:36,790 only used within your home, and cannot, by definition, 184 00:09:36,790 --> 00:09:40,410 be routed publicly beyond your company, beyond your home, 185 00:09:40,410 --> 00:09:42,390 because the router will stop it. 186 00:09:42,390 --> 00:09:45,750 And so here we actually have the beginnings of a firewalling mechanism, 187 00:09:45,750 --> 00:09:46,292 if you will. 188 00:09:46,292 --> 00:09:48,000 In the real world, a firewall is a device 189 00:09:48,000 --> 00:09:51,690 that prevents fire from going from one store to another, for instance. 190 00:09:51,690 --> 00:09:54,330 In the virtual world, a firewall is a piece of software 191 00:09:54,330 --> 00:09:57,480 that prevents zeros and ones from going from one place to another. 192 00:09:57,480 --> 00:09:59,940 And in this case do we already have a mechanism 193 00:09:59,940 --> 00:10:03,840 via public and private addresses of keeping some data securely, 194 00:10:03,840 --> 00:10:06,690 or with high probability securely, within our company 195 00:10:06,690 --> 00:10:09,490 versus allowing it to go out on the internet. 196 00:10:09,490 --> 00:10:13,980 So we'll see now some screenshots of some actual computers from Mac OS 197 00:10:13,980 --> 00:10:16,165 and Windows alike that reveal their IP addresses, 198 00:10:16,165 --> 00:10:18,290 and you yourself can see this on your own machines. 199 00:10:18,290 --> 00:10:20,860 For instance, here on Windows 10 is a screenshot 200 00:10:20,860 --> 00:10:24,050 of what your Network Preferences, so to speak, might look like. 201 00:10:24,050 --> 00:10:26,680 And if you focus down here, it's a bit arcane at first glance, 202 00:10:26,680 --> 00:10:32,470 but IPv4 address is 192168.1.139 when we took that screenshot. 203 00:10:32,470 --> 00:10:35,230 And indeed, it starts with 192168 which means it's private, 204 00:10:35,230 --> 00:10:38,260 and indeed, I took this screenshot while we were within a home network, 205 00:10:38,260 --> 00:10:41,980 and so that suggests it can be used to route among computers in that home 206 00:10:41,980 --> 00:10:43,480 but not beyond. 207 00:10:43,480 --> 00:10:45,730 You'll see, too, if we move on to the next screen 208 00:10:45,730 --> 00:10:47,740 where you see more advanced network properties, 209 00:10:47,740 --> 00:10:51,390 you can also see the dimension of this default gateway, which is 210 00:10:51,390 --> 00:10:53,830 synonymous with router, default router. 211 00:10:53,830 --> 00:10:56,740 192168.1.1. 212 00:10:56,740 --> 00:11:00,340 So a default router or default gateway is that first hop, 213 00:11:00,340 --> 00:11:03,370 so that if I want to send an email outside of my home, 214 00:11:03,370 --> 00:11:05,890 I want to visit a web page outside of my company, 215 00:11:05,890 --> 00:11:09,820 all I need do is hand that virtual envelope containing that email 216 00:11:09,820 --> 00:11:14,050 or that web request off to the machine on the local network that 217 00:11:14,050 --> 00:11:15,340 has that IP address. 218 00:11:15,340 --> 00:11:19,450 I have no idea where it's going to go thereafter, to hops two and three 219 00:11:19,450 --> 00:11:22,150 and beyond, but that's why we have this whole internet 220 00:11:22,150 --> 00:11:23,710 and even more routers out there. 221 00:11:23,710 --> 00:11:27,280 They, the routers, intercommunicate and relay that data, 222 00:11:27,280 --> 00:11:31,000 hop to hop to hop, until it finally reaches its destination. 223 00:11:31,000 --> 00:11:33,310 Now where did I get my IPv4 address from, 224 00:11:33,310 --> 00:11:35,215 where did I get my default gateway from? 225 00:11:35,215 --> 00:11:39,610 From the DHCP server in my home, in my company, or whatever network 226 00:11:39,610 --> 00:11:40,740 I happen to be on. 227 00:11:40,740 --> 00:11:41,800 And Mac OS is the same. 228 00:11:41,800 --> 00:11:44,560 If these screens are unfamiliar, you might recognize this, 229 00:11:44,560 --> 00:11:46,210 under System Preferences in Mac OS. 230 00:11:46,210 --> 00:11:48,790 Here, while connected to Harvard University's network, 231 00:11:48,790 --> 00:11:53,560 you can actually see that my IP address was 10.254.16.242. 232 00:11:53,560 --> 00:11:57,010 That number, too, starting with one of those internal or private prefixes, 233 00:11:57,010 --> 00:11:59,350 indicative of the fact that even within Harvard, 234 00:11:59,350 --> 00:12:03,190 where we keeping all of our Harvard traffic internal to Harvard, 235 00:12:03,190 --> 00:12:05,328 and then not exposing that externally. 236 00:12:05,328 --> 00:12:07,870 And indeed, if we look in the more advanced preferences here, 237 00:12:07,870 --> 00:12:13,060 we can see that the router for my Mac was 10.254.16.1. 238 00:12:13,060 --> 00:12:16,390 Which is to say this Mac, when it's ready to send something off campus, 239 00:12:16,390 --> 00:12:20,890 simply hands that envelope off to this particular router here. 240 00:12:20,890 --> 00:12:25,480 And the router's job, ultimately, that first hop, a border gateway or border 241 00:12:25,480 --> 00:12:27,400 router, literally referring to a computer 242 00:12:27,400 --> 00:12:30,730 that physically or metaphorically is on the edge of a campus 243 00:12:30,730 --> 00:12:35,230 or company, its purpose in life is to simply change 244 00:12:35,230 --> 00:12:38,110 what's on that envelope initially from the private IP 245 00:12:38,110 --> 00:12:41,800 address to one or more public IP addresses, thereby 246 00:12:41,800 --> 00:12:43,180 maintaining this mapping. 247 00:12:43,180 --> 00:12:46,720 So this might result in everyone else in the world thinking 248 00:12:46,720 --> 00:12:49,930 that I and you and everyone else in my company or campus 249 00:12:49,930 --> 00:12:53,980 are all actually at the same IP address, but that's not true. 250 00:12:53,980 --> 00:12:56,950 Each of our personal devices has a private IP address 251 00:12:56,950 --> 00:12:59,830 and that router can actually translate via something 252 00:12:59,830 --> 00:13:03,190 called network address translation, or NAT, from a private address 253 00:13:03,190 --> 00:13:05,320 to public and back. 254 00:13:05,320 --> 00:13:09,400 And so in this way, too, can a company help mask the origin 255 00:13:09,400 --> 00:13:13,000 or the identity of whoever it is that's accessing some internet-based service. 256 00:13:13,000 --> 00:13:16,840 Of course, that same company could log what it is that's leaving the company 257 00:13:16,840 --> 00:13:19,720 and coming back in, so via subpoena or another mechanism, 258 00:13:19,720 --> 00:13:22,630 could someone certainly figure out who was accessing 259 00:13:22,630 --> 00:13:25,532 that service at a particular time, but the outside world 260 00:13:25,532 --> 00:13:26,740 would need help knowing that. 261 00:13:26,740 --> 00:13:29,600 And so here, even within Harvard, it's done perhaps for that reason. 262 00:13:29,600 --> 00:13:32,710 But also perhaps in order to use one public IP 263 00:13:32,710 --> 00:13:36,340 address among hundreds or thousands of university affiliates, 264 00:13:36,340 --> 00:13:39,680 so that frankly we just don't need as many IP addresses. 265 00:13:39,680 --> 00:13:42,100 So what might be both a technological motivation 266 00:13:42,100 --> 00:13:46,550 can also have these policy side effects as well. 267 00:13:46,550 --> 00:13:48,400 So IP itself. 268 00:13:48,400 --> 00:13:49,030 Protocol. 269 00:13:49,030 --> 00:13:50,488 Well, what does that actually mean? 270 00:13:50,488 --> 00:13:53,820 A protocol-- it's not a language, per se, it's not a programming language. 271 00:13:53,820 --> 00:13:55,570 It's really just a set of conventions that 272 00:13:55,570 --> 00:13:58,210 govern how computers intercommunicate. 273 00:13:58,210 --> 00:14:01,840 IP, specifically, says that if you want to send a message on the internet, 274 00:14:01,840 --> 00:14:07,480 you shall write a sender address on the envelope and a recipient address 275 00:14:07,480 --> 00:14:10,420 on the envelope, and that will ensure that the routers know 276 00:14:10,420 --> 00:14:12,670 what to do with it, and they'll send it back and forth 277 00:14:12,670 --> 00:14:14,140 in the appropriate directions. 278 00:14:14,140 --> 00:14:17,600 IP gives us some other features, as well, fragmentation among them. 279 00:14:17,600 --> 00:14:20,680 It turns out for efficiency if you've got a really big email or a really 280 00:14:20,680 --> 00:14:23,830 big file, whether a PowerPoint file or video file, 281 00:14:23,830 --> 00:14:27,070 it's not really fair to everyone else to kind of jam that onto the network 282 00:14:27,070 --> 00:14:30,820 and to the exclusion of other people's data at any given point in time. 283 00:14:30,820 --> 00:14:34,960 And so IP tends to fragment big files into smaller pieces 284 00:14:34,960 --> 00:14:37,540 and send them in multiple envelopes that eventually 285 00:14:37,540 --> 00:14:41,210 get reassembled at the other end, so that there is a compelling feature as 286 00:14:41,210 --> 00:14:41,710 well. 287 00:14:41,710 --> 00:14:44,710 But this leads, of course, to a slippery slope of implications 288 00:14:44,710 --> 00:14:47,560 for net neutrality and for companies or governments 289 00:14:47,560 --> 00:14:50,710 to actually then start to distinguish between quality 290 00:14:50,710 --> 00:14:53,890 of service of this type of data and this other type of data. 291 00:14:53,890 --> 00:14:54,850 Why can they do that? 292 00:14:54,850 --> 00:14:57,423 Well, it's all quantized at a very small unit of measure, 293 00:14:57,423 --> 00:14:59,590 and within these packets are additional information. 294 00:14:59,590 --> 00:15:04,510 Not just those addresses, but hints as to what type of data is in the packet. 295 00:15:04,510 --> 00:15:08,320 Is it an email, is it a web page, is it a video conference, is it Netflix, 296 00:15:08,320 --> 00:15:10,030 is it some competitor service? 297 00:15:10,030 --> 00:15:12,880 And so ISPs or companies or governments can certainly 298 00:15:12,880 --> 00:15:18,130 distinguish among these types of packets and treat them theoretically, and all 299 00:15:18,130 --> 00:15:20,350 to really these days, differently. 300 00:15:20,350 --> 00:15:23,230 So that they're derived simply from these basic primitives. 301 00:15:23,230 --> 00:15:25,900 Now we can very quickly go pretty low level. 302 00:15:25,900 --> 00:15:28,990 If you actually look back at the formal definition 303 00:15:28,990 --> 00:15:33,580 that humans crafted decades ago for what IP is, this is how they drew it. 304 00:15:33,580 --> 00:15:36,700 You might call this ASCII art, to borrow a phrase from our look 305 00:15:36,700 --> 00:15:38,140 at computational thinking. 306 00:15:38,140 --> 00:15:41,140 It's sort of an artist's rendition of some structure 307 00:15:41,140 --> 00:15:43,960 just by using the keys on his or her keyboard. 308 00:15:43,960 --> 00:15:45,970 And so these dashes and pluses, really, just 309 00:15:45,970 --> 00:15:48,430 are meant to draw a rectangular picture, nothing more. 310 00:15:48,430 --> 00:15:52,700 The numbers on top represent units of 10 bits at a time. 311 00:15:52,700 --> 00:15:57,610 Here's bit 0, here's 10, here's 20, and over here is the 32nd such bit. 312 00:15:57,610 --> 00:16:02,140 So start at zero, and you count as high as 31, so that's our 32nd bit. 313 00:16:02,140 --> 00:16:04,810 And we can see a few details within here. 314 00:16:04,810 --> 00:16:08,320 We can see details like the source address, 315 00:16:08,320 --> 00:16:12,190 and it's the whole width of this picture indicating that indeed this 316 00:16:12,190 --> 00:16:17,210 is a 32-bit value that composes the source address or the sender address. 317 00:16:17,210 --> 00:16:20,290 Destination address is just as wide, so there's another 32 bits. 318 00:16:20,290 --> 00:16:22,240 There's options and other time to live. 319 00:16:22,240 --> 00:16:25,420 You can specify just how many routers this can be handed off 320 00:16:25,420 --> 00:16:28,300 to before the router should say, we just don't know where 321 00:16:28,300 --> 00:16:30,370 this destination is, we shall give up. 322 00:16:30,370 --> 00:16:32,710 And there's other fields as well in here. 323 00:16:32,710 --> 00:16:34,330 Now what are we really looking at? 324 00:16:34,330 --> 00:16:36,430 This is just an artist's rendition of what 325 00:16:36,430 --> 00:16:38,140 it means to send a pattern of bits. 326 00:16:38,140 --> 00:16:41,020 The first few bits somehow relate to version. 327 00:16:41,020 --> 00:16:44,580 The next few bits relate to IHL and type of service and total length. 328 00:16:44,580 --> 00:16:46,330 Eventually, the pattern of bits represents 329 00:16:46,330 --> 00:16:48,500 source address and destination address. 330 00:16:48,500 --> 00:16:52,600 So any computer that's receiving just a series of bits wirelessly 331 00:16:52,600 --> 00:16:57,160 or over the wire in the form of wavelengths of light or of electricity 332 00:16:57,160 --> 00:17:01,990 on a wire, simply needs to realize, oh, once I've received this many bits, 333 00:17:01,990 --> 00:17:04,270 I can infer that those bits were my source address, 334 00:17:04,270 --> 00:17:06,069 those were my destination address. 335 00:17:06,069 --> 00:17:09,250 But again, this is so low level, it's a lot more pleasant to sort of think 336 00:17:09,250 --> 00:17:11,230 about things at the virtual level. 337 00:17:11,230 --> 00:17:14,589 An envelope that just has this information written on it, and let's 338 00:17:14,589 --> 00:17:18,640 not worry about an abstraction level below this one, wherein 339 00:17:18,640 --> 00:17:21,250 we get into the weeds of this data. 340 00:17:21,250 --> 00:17:25,776 But it turns out that IP is not the only protocol that drives the internet. 341 00:17:25,776 --> 00:17:28,359 In fact there's several, but perhaps the other most common one 342 00:17:28,359 --> 00:17:30,410 that you've heard of is that one here. 343 00:17:30,410 --> 00:17:31,630 TCP. 344 00:17:31,630 --> 00:17:33,460 Transmission Control Protocol. 345 00:17:33,460 --> 00:17:36,700 Now this is just a protocol that solves a different problem. 346 00:17:36,700 --> 00:17:40,660 Rather than simply focus on addressing computers on the internet 347 00:17:40,660 --> 00:17:43,090 and ensuring data gets from one point to another, 348 00:17:43,090 --> 00:17:46,630 TCP is about, among other things, guaranteeing delivery. 349 00:17:46,630 --> 00:17:52,330 TCP adds some additional zeros and ones to that envelope on the outside of it 350 00:17:52,330 --> 00:17:55,450 that helps us get that antelope to its destination with much 351 00:17:55,450 --> 00:17:56,737 higher probability. 352 00:17:56,737 --> 00:17:58,570 In other words, the internet's a busy place. 353 00:17:58,570 --> 00:18:01,870 Servers are constantly getting new users, 354 00:18:01,870 --> 00:18:05,260 routers are receiving any number of packets at any given time, 355 00:18:05,260 --> 00:18:07,360 and sometimes there are spikes in connectivity. 356 00:18:07,360 --> 00:18:10,630 People might all be tuning into some news broadcast online streaming 357 00:18:10,630 --> 00:18:13,630 lots of video, or downloading the latest news all at once, 358 00:18:13,630 --> 00:18:15,745 or everyone's playing the latest game online, 359 00:18:15,745 --> 00:18:17,620 and so there can be these bursts of activity. 360 00:18:17,620 --> 00:18:22,690 And honestly humans don't necessarily engineer with those bursts of activity 361 00:18:22,690 --> 00:18:26,680 in mind, and so routers get busy, computers get busy. 362 00:18:26,680 --> 00:18:29,770 And when they get busy, they might receive an envelope of information 363 00:18:29,770 --> 00:18:32,395 and realize, wait a minute, I don't have enough hands for this, 364 00:18:32,395 --> 00:18:35,200 and packets get dropped, so to speak. 365 00:18:35,200 --> 00:18:39,070 In fact that's a term of ours, to drop a packet just means to ignore it. 366 00:18:39,070 --> 00:18:42,490 You don't have enough memory, enough RAM inside of your system 367 00:18:42,490 --> 00:18:46,360 to hang onto it for any length of time, so you just ignore it. 368 00:18:46,360 --> 00:18:49,570 Now this would be pretty darn frustrating if you send an email 369 00:18:49,570 --> 00:18:52,210 and only with some probability does it go through. 370 00:18:52,210 --> 00:18:54,370 Now in practice that might feel like it happens, 371 00:18:54,370 --> 00:18:56,787 especially when things get caught up in spam and the like, 372 00:18:56,787 --> 00:19:00,520 but in practice you really do want emails that are sent to be received. 373 00:19:00,520 --> 00:19:03,190 When you request a web page, you want the entire web page. 374 00:19:03,190 --> 00:19:06,880 And even if those are big emails or big web pages that are therefore 375 00:19:06,880 --> 00:19:10,780 chopped into fragments, you really want to receive all of the fragments 376 00:19:10,780 --> 00:19:13,720 and not just only some of the paragraphs in the email, 377 00:19:13,720 --> 00:19:16,540 or only some sections of the web page. 378 00:19:16,540 --> 00:19:21,670 So TCP ensures that you get all of that data at the end of the day. 379 00:19:21,670 --> 00:19:24,620 Well hopefully not at the end of the day, but ultimately. 380 00:19:24,620 --> 00:19:29,140 And so what TCP adds to the envelope is essentially a little mental note 381 00:19:29,140 --> 00:19:32,740 that this is packet number one of two, or one of three, or one of four, 382 00:19:32,740 --> 00:19:34,480 in the case of an even larger file. 383 00:19:34,480 --> 00:19:39,467 And so when the recipient of this email or this web request gets the envelope 384 00:19:39,467 --> 00:19:42,550 and realizes, wait a minute, I've got numbers two and three and four, wait 385 00:19:42,550 --> 00:19:44,620 a minute, I'm missing the first envelope. 386 00:19:44,620 --> 00:19:47,800 TCP tells that Mac or PC or other computer, 387 00:19:47,800 --> 00:19:50,410 go ahead and send a message back to the sender saying, hey, 388 00:19:50,410 --> 00:19:53,850 I got everything except packet one, please resend. 389 00:19:53,850 --> 00:19:55,850 That's going to take a little bit of extra time, 390 00:19:55,850 --> 00:19:59,800 but that packet can be resent and TCP knows 391 00:19:59,800 --> 00:20:01,600 how to reassemble them in the proper order 392 00:20:01,600 --> 00:20:06,370 so that the human ultimately sees their entire email or that entire web page 393 00:20:06,370 --> 00:20:09,530 and not just some portion thereof. 394 00:20:09,530 --> 00:20:11,072 So what does TCP really look like? 395 00:20:11,072 --> 00:20:13,780 Well, let's just take a quick peek underneath the hood here, too. 396 00:20:13,780 --> 00:20:18,150 And here we see a similar pattern of bits but not addresses, that, again, 397 00:20:18,150 --> 00:20:22,530 is handled by IP itself, but you see mention of source port, 398 00:20:22,530 --> 00:20:23,820 and destination port. 399 00:20:23,820 --> 00:20:25,980 Sequence number, which helps with the delivery, 400 00:20:25,980 --> 00:20:28,320 and then other options as well, all of which 401 00:20:28,320 --> 00:20:31,660 we relate to the delivery of that information. 402 00:20:31,660 --> 00:20:35,820 But these two up here, looks like 16 bits each, source port 403 00:20:35,820 --> 00:20:37,080 and destination port. 404 00:20:37,080 --> 00:20:39,750 Those two have value, because TCP does something else. 405 00:20:39,750 --> 00:20:42,820 It doesn't just guarantee that data gets from one point to another, 406 00:20:42,820 --> 00:20:47,670 it also helps servers distinguish one type of data from another, and in turn 407 00:20:47,670 --> 00:20:52,170 allows companies and universities and internet service providers 408 00:20:52,170 --> 00:20:54,720 or governments to distinguish different types of data 409 00:20:54,720 --> 00:20:57,510 because it's right there on the outside of the envelope. 410 00:20:57,510 --> 00:21:01,890 In particular, TCP specifies what protocol 411 00:21:01,890 --> 00:21:06,210 is being used to convey this packet of information from one computer 412 00:21:06,210 --> 00:21:07,090 to another. 413 00:21:07,090 --> 00:21:09,632 In other words, there's lots of internet services these days. 414 00:21:09,632 --> 00:21:12,190 There's email, there's chat, there's video conferencing, 415 00:21:12,190 --> 00:21:13,650 there's web browsers, and more. 416 00:21:13,650 --> 00:21:16,930 So that's a lot of possibilities, a lot of patterns of zeros and ones 417 00:21:16,930 --> 00:21:18,490 that can be in these envelopes. 418 00:21:18,490 --> 00:21:23,470 So how, upon receiving an envelope, does a server know what type of information 419 00:21:23,470 --> 00:21:24,030 is in it? 420 00:21:24,030 --> 00:21:25,200 Especially big companies. 421 00:21:25,200 --> 00:21:27,480 Google, for instance, supports all of those services. 422 00:21:27,480 --> 00:21:29,550 Video conferencing, email, chat, and more. 423 00:21:29,550 --> 00:21:32,760 So when Google's servers receives a packet of information, 424 00:21:32,760 --> 00:21:35,610 how does Google know that this is an email from you, 425 00:21:35,610 --> 00:21:40,350 as opposed to a chat message from you, as opposed to a video from you 426 00:21:40,350 --> 00:21:41,872 that you're uploading to YouTube? 427 00:21:41,872 --> 00:21:44,580 You need to be able to distinguish these various services because 428 00:21:44,580 --> 00:21:47,010 at the end of the day, they're just patterns of bits. 429 00:21:47,010 --> 00:21:49,560 Well, if we reserve some of those bits, or really 430 00:21:49,560 --> 00:21:53,820 some of the markings on this virtual envelope, for just one more number 431 00:21:53,820 --> 00:21:55,700 we can distinguish services pretty easily. 432 00:21:55,700 --> 00:21:59,130 In fact, HTTP, an acronym that you might not 433 00:21:59,130 --> 00:22:01,530 know what it means but you've surely seen it a lot, 434 00:22:01,530 --> 00:22:03,720 since our hypertext transfer protocol and it's 435 00:22:03,720 --> 00:22:07,890 the conventions via which browsers and servers send web pages back and forth. 436 00:22:07,890 --> 00:22:09,870 Well, by convention, humans decided years 437 00:22:09,870 --> 00:22:12,600 ago to call that service number 80. 438 00:22:12,600 --> 00:22:15,180 TCP port 80, so to speak. 439 00:22:15,180 --> 00:22:19,920 And the secure version of that, HTTPS, they decided to number that 443, 440 00:22:19,920 --> 00:22:23,070 just because they'd already used quite a few numbers in between those two 441 00:22:23,070 --> 00:22:23,700 values. 442 00:22:23,700 --> 00:22:27,630 INAP is the protocol via which you can receive emails or check 443 00:22:27,630 --> 00:22:30,930 your email, that's used different ports depending on whether you're using it 444 00:22:30,930 --> 00:22:33,480 security or insecurity like 143 or 993. 445 00:22:33,480 --> 00:22:38,820 SMTP, which is outbound email, can use similarly 25, 465, or 587. 446 00:22:38,820 --> 00:22:41,970 And then, if familiar, there's something called SSH, secure shell. 447 00:22:41,970 --> 00:22:44,070 This is what developers might use at a lower level 448 00:22:44,070 --> 00:22:47,730 to connect from one computer, say a laptop, to a remote server. 449 00:22:47,730 --> 00:22:49,750 That tends to use port 22. 450 00:22:49,750 --> 00:22:51,750 And there's hundreds, there's actually thousands 451 00:22:51,750 --> 00:22:55,920 of others, as many as 65,000 possibilities, but only some of those 452 00:22:55,920 --> 00:22:57,720 are actually standardized. 453 00:22:57,720 --> 00:22:59,910 So this is to say what ultimately is going 454 00:22:59,910 --> 00:23:03,600 on the outside of an envelope is not just a user's address 455 00:23:03,600 --> 00:23:07,170 but when I as a computer send a message to some other server 456 00:23:07,170 --> 00:23:11,700 and for instance my address is 5.6.7.8 I'll write 457 00:23:11,700 --> 00:23:13,570 that in the top corner of the envelope. 458 00:23:13,570 --> 00:23:17,430 If the recipients of this envelope are supposed to be 1.2.3.4 459 00:23:17,430 --> 00:23:19,810 I do write that in the middle of the envelope, 460 00:23:19,810 --> 00:23:25,860 but I need to further specify IP address 1.2.3.4 but port number, 461 00:23:25,860 --> 00:23:28,530 let's say, 80, if it's a request for a web page. 462 00:23:28,530 --> 00:23:31,433 So conventionally you would do :80 to distinguish that service. 463 00:23:31,433 --> 00:23:34,100 And then of course because of TCP I need to number these things, 464 00:23:34,100 --> 00:23:37,380 so if it's a big request or a big response I better write one of two, 465 00:23:37,380 --> 00:23:39,760 one of three, or one of four, or the like. 466 00:23:39,760 --> 00:23:41,760 And so the envelope I'm ultimately left with 467 00:23:41,760 --> 00:23:44,250 is something a little more like this. 468 00:23:44,250 --> 00:23:47,010 On the outside is this recipient's address, on the outside 469 00:23:47,010 --> 00:23:49,290 is the sender's address, and on the outside 470 00:23:49,290 --> 00:23:52,770 is the sequence number of some sort that specifies 471 00:23:52,770 --> 00:23:56,830 how many packets I've actually sent and hopefully will be received. 472 00:23:56,830 --> 00:24:00,840 So TCP then allows the recipient to see this envelope, realize, oh this 473 00:24:00,840 --> 00:24:01,800 is for my web server. 474 00:24:01,800 --> 00:24:04,470 Google can hand it off to the appropriate piece of software 475 00:24:04,470 --> 00:24:06,420 that governs its web servers and so it's not 476 00:24:06,420 --> 00:24:09,330 confused for something else like an email, a chat message, a voice 477 00:24:09,330 --> 00:24:11,910 conference, or the like. 478 00:24:11,910 --> 00:24:14,400 And again, all of these features derive quite 479 00:24:14,400 --> 00:24:17,400 simply from these patterns of bits that esoterically happen 480 00:24:17,400 --> 00:24:20,340 to be laid out in this way, but if we abstract away from that 481 00:24:20,340 --> 00:24:24,340 and stipulate that just think about it like the real world with an envelope, 482 00:24:24,340 --> 00:24:27,150 it's really just these numeric values that somehow help 483 00:24:27,150 --> 00:24:31,770 us get data from one point to another. 484 00:24:31,770 --> 00:24:36,690 Collectively now, these two protocols, which are so often used hand in hand, 485 00:24:36,690 --> 00:24:39,570 are generally very abbreviated TCP/IP. 486 00:24:39,570 --> 00:24:41,970 It's two separate protocols, two separate conventions 487 00:24:41,970 --> 00:24:43,398 used in conjunction. 488 00:24:43,398 --> 00:24:45,940 Some of this information is just written in different places, 489 00:24:45,940 --> 00:24:49,650 if you will, on the virtual envelope, but TCP/IP settings are 490 00:24:49,650 --> 00:24:51,990 what you might look for on a Mac or PC or server 491 00:24:51,990 --> 00:24:55,020 to actually configure this level of detail. 492 00:24:55,020 --> 00:24:57,870 But of course, I've taken some liberties here. 493 00:24:57,870 --> 00:25:01,170 If my goal is to send a message from one computer 494 00:25:01,170 --> 00:25:04,230 to another, a chat message, an email, anything else, 495 00:25:04,230 --> 00:25:07,320 you know what, I'm pretty sure I have no idea what 496 00:25:07,320 --> 00:25:09,570 the IP address is of any colleague. 497 00:25:09,570 --> 00:25:12,960 And I have no idea what the IP address is of Google or Facebook 498 00:25:12,960 --> 00:25:16,740 or any number of popular websites that I might even visit daily. 499 00:25:16,740 --> 00:25:20,310 I don't even know people's phone numbers anymore but that's another matter. 500 00:25:20,310 --> 00:25:23,280 In the context of words, though, on the internet all of us, 501 00:25:23,280 --> 00:25:26,700 of course, type words, not numbers, when we want to reach some destination. 502 00:25:26,700 --> 00:25:30,900 We go to facebook.com or gmail.com or google.com or bing.com 503 00:25:30,900 --> 00:25:34,520 or any number of other domain names, so to speak. 504 00:25:34,520 --> 00:25:36,270 And of course, that's what you would write 505 00:25:36,270 --> 00:25:38,640 on the outside of an envelope in the human world, 506 00:25:38,640 --> 00:25:43,530 ideally as many words as possible, not just numbers let alone bits alone. 507 00:25:43,530 --> 00:25:46,170 And ideally our computers would similarly 508 00:25:46,170 --> 00:25:49,770 express exactly what we humans know, which is these domain 509 00:25:49,770 --> 00:25:52,380 names that are part of URLs. 510 00:25:52,380 --> 00:25:56,460 So it turns out we need the help of at least one more service among all 511 00:25:56,460 --> 00:25:57,930 of these internet technologies. 512 00:25:57,930 --> 00:26:01,580 We need the help of a service called DNS, domain name system. 513 00:26:01,580 --> 00:26:07,020 A DNS server is a server that quite simply translates domain names 514 00:26:07,020 --> 00:26:11,670 like gmail.com and bing.com and google.com into their corresponding IP 515 00:26:11,670 --> 00:26:12,390 addresses. 516 00:26:12,390 --> 00:26:15,023 We, the humans, might have no idea what they are, 517 00:26:15,023 --> 00:26:17,940 but odds are there's at least one human or more in the world, probably 518 00:26:17,940 --> 00:26:20,220 who works for those companies, that does know. 519 00:26:20,220 --> 00:26:22,620 And provided he or she configures their DNS 520 00:26:22,620 --> 00:26:26,370 servers to know that association of domain name to IP 521 00:26:26,370 --> 00:26:29,940 address, the equivalent of just an Excel file with one column with names 522 00:26:29,940 --> 00:26:33,990 and the other column with numbers, IP addresses well their server can then 523 00:26:33,990 --> 00:26:36,330 answer questions from little old me. 524 00:26:36,330 --> 00:26:39,810 And indeed what my phone knows how to do these days, what my Mac, my PC knows 525 00:26:39,810 --> 00:26:44,430 how to do is when my human types in gmail.com and hits enter, 526 00:26:44,430 --> 00:26:47,730 the very first thing that my browser, and in turn my operating 527 00:26:47,730 --> 00:26:51,510 system like Mac OS or Windows does, is it asks the local DNS 528 00:26:51,510 --> 00:26:55,680 server for the conversion of whatever I typed in, gmail.com, 529 00:26:55,680 --> 00:26:58,200 to the corresponding IP address. 530 00:26:58,200 --> 00:27:04,050 And hopefully, my own network be it at home or on campus or in work, 531 00:27:04,050 --> 00:27:05,870 has the answer to that question. 532 00:27:05,870 --> 00:27:07,590 But the world's a big place, and odds are 533 00:27:07,590 --> 00:27:11,100 my home does not know the IP address of every server in the world. 534 00:27:11,100 --> 00:27:14,670 Odds are my campus or company doesn't know the IP address of every server 535 00:27:14,670 --> 00:27:17,730 in the world, especially since they're surely changing continually 536 00:27:17,730 --> 00:27:21,300 as new sites are coming online and others are going offline. 537 00:27:21,300 --> 00:27:22,590 So how do we know? 538 00:27:22,590 --> 00:27:25,530 Well DNS is a whole hierarchical system whereby 539 00:27:25,530 --> 00:27:30,360 you might have a small DNS server, so to speak conceptually here on site. 540 00:27:30,360 --> 00:27:34,170 But then your internet service provider or ISP, Comcast, Verizon, 541 00:27:34,170 --> 00:27:38,730 or some other entity, they probably have a bigger DNS server with more memory, 542 00:27:38,730 --> 00:27:42,273 with a longer list of domain names and IP addresses. 543 00:27:42,273 --> 00:27:44,440 And you know what, even if they don't know everyone, 544 00:27:44,440 --> 00:27:47,280 there are probably what are called root servers in the world, 545 00:27:47,280 --> 00:27:50,430 that much like the root of a tree, is where everything starts. 546 00:27:50,430 --> 00:27:53,190 And indeed, you can find out from these actual root 547 00:27:53,190 --> 00:27:56,100 servers on the internet, the mapping, effectively, 548 00:27:56,100 --> 00:27:58,890 between all of the dot coms and their IP addresses. 549 00:27:58,890 --> 00:28:01,680 All of the dot govs or the dot nets and their IP addresses. 550 00:28:01,680 --> 00:28:05,700 And frankly, even if they don't know the answer by definition of root server 551 00:28:05,700 --> 00:28:08,790 they will be configured to know who knows. 552 00:28:08,790 --> 00:28:13,050 And so DNS is very hierarchical, and it's also recursive. 553 00:28:13,050 --> 00:28:17,040 You might ask a local server, which might ask a more remote server, which 554 00:28:17,040 --> 00:28:19,110 might ask and even further away server. 555 00:28:19,110 --> 00:28:22,440 That server might say, wait a minute, I know, this server knows, and then 556 00:28:22,440 --> 00:28:24,870 the answer eventually bubbles its way back to you. 557 00:28:24,870 --> 00:28:26,673 And long story short, we can be efficient. 558 00:28:26,673 --> 00:28:28,590 We don't have to constantly ask this question. 559 00:28:28,590 --> 00:28:30,660 We can cache those results locally. 560 00:28:30,660 --> 00:28:34,590 Remember them in my browser, in my Mac or my PC. 561 00:28:34,590 --> 00:28:36,480 There's downsides there, though, too. 562 00:28:36,480 --> 00:28:39,630 By remembering that mapping of domain name to IP address, 563 00:28:39,630 --> 00:28:43,680 I can save myself the trouble of asking that same question multiple times a day 564 00:28:43,680 --> 00:28:45,810 or even per week or even per minute. 565 00:28:45,810 --> 00:28:48,690 The catch, though, is that if Google changes something, or Facebook 566 00:28:48,690 --> 00:28:51,300 reconfigure something and that IP changes, 567 00:28:51,300 --> 00:28:53,110 caching might actually be a bad thing. 568 00:28:53,110 --> 00:28:55,530 And so here, too, even at the level of the internet 569 00:28:55,530 --> 00:28:57,300 do we see these series of trade-offs. 570 00:28:57,300 --> 00:29:01,860 You might save time by caching, but you might sacrifice correctness, 571 00:29:01,860 --> 00:29:06,990 because now the servers recollection of that IP address might become outdated. 572 00:29:06,990 --> 00:29:09,000 And so this is a whole can of worms, ultimately, 573 00:29:09,000 --> 00:29:11,130 and speaks to what it really means to be an engineer 574 00:29:11,130 --> 00:29:13,880 in the world of internet technologies to anticipate to think about 575 00:29:13,880 --> 00:29:15,930 and ultimately to solve these problems. 576 00:29:15,930 --> 00:29:21,000 There is no sure fire solution other than to expect that you'll need 577 00:29:21,000 --> 00:29:23,650 to accommodate these changes over time. 578 00:29:23,650 --> 00:29:25,440 So in Windows, can you see this yourself? 579 00:29:25,440 --> 00:29:28,500 Well, if you open up those same Wi-Fi properties or wired 580 00:29:28,500 --> 00:29:31,470 properties that you have, you'll see again, not only your IPv4 address, 581 00:29:31,470 --> 00:29:35,550 but it was there all this time, your IPv4 DNS servers one 582 00:29:35,550 --> 00:29:39,750 or more IP addresses turns out it's exactly the same by coincidence 583 00:29:39,750 --> 00:29:44,640 but also by design on this computer of my router or my default gateway 584 00:29:44,640 --> 00:29:47,070 192168.1.1. 585 00:29:47,070 --> 00:29:50,950 Which is to say that if this PC needs to know an answer to the question, 586 00:29:50,950 --> 00:29:55,830 what is gmail.com's IP address it is simply going to ask the local server 587 00:29:55,830 --> 00:29:59,310 that has that address and that DNS server, and this is important, 588 00:29:59,310 --> 00:30:00,900 cannot have itself a name. 589 00:30:00,900 --> 00:30:04,320 We need to know what its IP address is, otherwise, of course, 590 00:30:04,320 --> 00:30:05,580 we get into this endless loop. 591 00:30:05,580 --> 00:30:08,640 If we know only the name of our DNS server but only the DNS server 592 00:30:08,640 --> 00:30:12,370 can convert that to an IP address, we'll never actually answer that question. 593 00:30:12,370 --> 00:30:14,430 It's more of a catch-22. 594 00:30:14,430 --> 00:30:16,380 And even if it does have a name, you need 595 00:30:16,380 --> 00:30:21,370 to know manually, via your DHCP server somehow, what its IP address actually 596 00:30:21,370 --> 00:30:21,870 is. 597 00:30:21,870 --> 00:30:22,800 Mac OS, the same. 598 00:30:22,800 --> 00:30:26,370 And here on campus, Harvard happens to have redundancy like most any company. 599 00:30:26,370 --> 00:30:29,470 They don't have just one DNS server they have at least three here, 600 00:30:29,470 --> 00:30:33,250 128.103.1.1, and a couple of others, as well. 601 00:30:33,250 --> 00:30:36,300 And again, I got these automatically when I turned on my Mac or my phone 602 00:30:36,300 --> 00:30:41,220 or my PC via that local DHCP server. 603 00:30:41,220 --> 00:30:44,550 So let's see if we can't mimic what it is my Mac, 604 00:30:44,550 --> 00:30:48,330 your PC, your phone is doing everyday all day long, but rather 605 00:30:48,330 --> 00:30:49,710 unbeknownst to us. 606 00:30:49,710 --> 00:30:51,750 Here I have what's called a terminal window. 607 00:30:51,750 --> 00:30:54,300 This is just a textual interface to my computer here. 608 00:30:54,300 --> 00:30:57,660 Can exist on Macs, or PCs, or other operating systems, as well. 609 00:30:57,660 --> 00:31:01,260 And it allows me to execute by typing commands textually, 610 00:31:01,260 --> 00:31:04,620 only at my keyboard, no mouse, exactly the types of commands 611 00:31:04,620 --> 00:31:08,850 that your browser and other software are effectively executing or running 612 00:31:08,850 --> 00:31:09,570 for you. 613 00:31:09,570 --> 00:31:11,430 For instance, suppose I genuinely do want 614 00:31:11,430 --> 00:31:12,990 to know the IP address of gmail.com. 615 00:31:12,990 --> 00:31:16,020 I can ask this program as follows. 616 00:31:16,020 --> 00:31:19,170 nslookup, for name server look up, and then I can go ahead 617 00:31:19,170 --> 00:31:22,140 and type literally gmail.com and hit Enter. 618 00:31:22,140 --> 00:31:28,620 Here, visually, we see on the screen one answer that it's 172.217.3.37. 619 00:31:28,620 --> 00:31:31,740 And this comes from a server whose IP address in this room 620 00:31:31,740 --> 00:31:36,270 is 10.0.0.2, which we know now to be a private IP address, 621 00:31:36,270 --> 00:31:38,640 and indeed, here on campus we have servers 622 00:31:38,640 --> 00:31:42,480 that are local only to this room, this building, or this set of buildings 623 00:31:42,480 --> 00:31:42,987 here. 624 00:31:42,987 --> 00:31:44,820 Now this is a little interesting because I'm 625 00:31:44,820 --> 00:31:48,000 pretty sure business is good for Google, and surely they 626 00:31:48,000 --> 00:31:51,000 don't have just one server and therefore one IP address. 627 00:31:51,000 --> 00:31:54,720 Well, it turns out that there's a whole hierarchy of servers out there, most 628 00:31:54,720 --> 00:31:59,520 likely, that my data goes to and thereafter through on Google's end. 629 00:31:59,520 --> 00:32:05,160 The one IP address that they're telling me is theirs is 172.217.3.37, 630 00:32:05,160 --> 00:32:08,850 but once my packet of information gets there to Mountain View, California, 631 00:32:08,850 --> 00:32:11,580 or wherever their servers happen to be closest to me, 632 00:32:11,580 --> 00:32:16,290 then they might have any number of servers, dozens, hundreds, thousands, 633 00:32:16,290 --> 00:32:18,800 that can actually receive that packet next. 634 00:32:18,800 --> 00:32:22,380 This just happens to be the outward facing IP that my own Mac or PC 635 00:32:22,380 --> 00:32:24,480 or phone actually sees. 636 00:32:24,480 --> 00:32:28,260 Well, let's see if we can't trace the route to gmail.com via another command, 637 00:32:28,260 --> 00:32:32,550 literally traceroute, can I see the packets of information line 638 00:32:32,550 --> 00:32:36,840 by line leaving my computer and making their way, ultimately, to Google. 639 00:32:36,840 --> 00:32:39,270 I'm going to go ahead and do this once, so dash q1 640 00:32:39,270 --> 00:32:41,502 means do one query, please, at a time. 641 00:32:41,502 --> 00:32:43,710 And then I'm going to go ahead and say, quite simply, 642 00:32:43,710 --> 00:32:45,990 gmail.com, and then Enter. 643 00:32:45,990 --> 00:32:50,400 And we will see, line by line, the sequence of IP addresses 644 00:32:50,400 --> 00:32:54,930 of every router that is to say hop between me and Gmail. 645 00:32:54,930 --> 00:32:57,253 On occasion we'll see these asterisks instead, 646 00:32:57,253 --> 00:32:59,670 which indicates that that router isn't having any of this, 647 00:32:59,670 --> 00:33:03,660 it's not responding to my requests, so we can't see its IP or anything 648 00:33:03,660 --> 00:33:04,650 else about it. 649 00:33:04,650 --> 00:33:09,420 But we can see that in 17 steps does data leave my laptop 650 00:33:09,420 --> 00:33:12,150 and end up at gmail.com, and along the way 651 00:33:12,150 --> 00:33:16,830 it encounters all of these routers that have these unique IP addresses but not 652 00:33:16,830 --> 00:33:18,870 names, it seems, and the amount of time it 653 00:33:18,870 --> 00:33:21,420 takes for my data to get from my laptop to gmail.com 654 00:33:21,420 --> 00:33:26,220 is, oh my, 0.967 milliseconds. 655 00:33:26,220 --> 00:33:29,940 Less than one millisecond is required to get data or an email 656 00:33:29,940 --> 00:33:32,760 from my computer to gmail.com itself. 657 00:33:32,760 --> 00:33:36,450 Now what about all of these other measurements of time up above? 658 00:33:36,450 --> 00:33:38,580 Each of these represents the number of milliseconds 659 00:33:38,580 --> 00:33:43,410 it took during this process for data to go from my laptop to this router, 660 00:33:43,410 --> 00:33:45,940 then to this router, then to this router. 661 00:33:45,940 --> 00:33:48,390 Now, of course, it seems strange that it takes more time 662 00:33:48,390 --> 00:33:52,390 to get these to these close routers than it does to these further away. 663 00:33:52,390 --> 00:33:54,410 But there, too, if I ran this all day long 664 00:33:54,410 --> 00:33:56,160 I would get different numbers continually, 665 00:33:56,160 --> 00:33:58,737 it depends how busy those routers are at that moment in time. 666 00:33:58,737 --> 00:34:00,570 It depends what else everyone here on campus 667 00:34:00,570 --> 00:34:03,840 is doing, or other people in the world at that moment in time. 668 00:34:03,840 --> 00:34:06,660 Routers might be a little slow to respond because they're 669 00:34:06,660 --> 00:34:07,920 busy doing something else. 670 00:34:07,920 --> 00:34:10,050 My data might get dropped in other contexts 671 00:34:10,050 --> 00:34:12,400 and need to be resent, which is just going to take time, 672 00:34:12,400 --> 00:34:15,030 and I don't even see that happening on the screen. 673 00:34:15,030 --> 00:34:18,870 But it's fair to say that these give us a sense of the range of times 674 00:34:18,870 --> 00:34:21,270 it might take to go from a point A to point B, 675 00:34:21,270 --> 00:34:26,040 and let's say 1 to 20 milliseconds or even 32 milliseconds, somewhere 676 00:34:26,040 --> 00:34:29,370 in there is our average, and that can vary over time. 677 00:34:29,370 --> 00:34:32,489 But that's pretty fast, and indeed, even though it took a moment 678 00:34:32,489 --> 00:34:36,120 to run this whole test, this is why an email can be sent from your computer 679 00:34:36,120 --> 00:34:38,460 and be received nearly instantly by someone 680 00:34:38,460 --> 00:34:41,760 around the world, because at the end of the day, we're limited, really, 681 00:34:41,760 --> 00:34:44,880 ultimately, by the speed of light and little more. 682 00:34:44,880 --> 00:34:48,449 Well, to be fair, hardware and cost and everything in between, 683 00:34:48,449 --> 00:34:52,469 but you can certainly transmit your data faster than you can yourself. 684 00:34:52,469 --> 00:34:55,139 But what if we want to go farther away than gmail.com? 685 00:34:55,139 --> 00:34:57,690 Odds are they probably do have servers in California, 686 00:34:57,690 --> 00:35:01,380 but probably here on the east coast of the US as well, let alone abroad. 687 00:35:01,380 --> 00:35:04,050 What if I deliberately try to access a domain that is, 688 00:35:04,050 --> 00:35:06,150 in fact, abroad and go there? 689 00:35:06,150 --> 00:35:09,160 Well, let me go ahead and visit via traceroute, 690 00:35:09,160 --> 00:35:16,230 say, www.cnn.co.jp, the domain name for CNN's Japanese website. 691 00:35:16,230 --> 00:35:18,390 And then we'll add just dash q1 this time 692 00:35:18,390 --> 00:35:22,020 at the end, which is fine, too, to query the server just once. 693 00:35:22,020 --> 00:35:24,360 And here we see the sequence of steps, one 694 00:35:24,360 --> 00:35:27,900 after another, whereby the data's leaving my laptop and in turn campus, 695 00:35:27,900 --> 00:35:30,300 and then we see some anonymous routers in between. 696 00:35:30,300 --> 00:35:36,540 But the 30th there seems to be just in time, because within it 697 00:35:36,540 --> 00:35:40,710 seems 178 milliseconds do we make our way to Japan. 698 00:35:40,710 --> 00:35:44,220 Now that's quite a few milliseconds more, but that rather makes sense. 699 00:35:44,220 --> 00:35:47,400 Whereas it might take one to 20 to 32 milliseconds 700 00:35:47,400 --> 00:35:50,940 to get from here to Gmail either on the east coast or west coast, 701 00:35:50,940 --> 00:35:54,120 I'm kind of not surprised that it takes an order of magnitude 702 00:35:54,120 --> 00:35:59,070 more, almost to factor of 10, to get to Japan, because there's not only 703 00:35:59,070 --> 00:36:01,920 a whole continent between us here in Cambridge and Japan, 704 00:36:01,920 --> 00:36:04,770 there's also an entire Pacific Ocean between us. 705 00:36:04,770 --> 00:36:09,030 And indeed, there are Transatlantic, Transpacific, and transoceanic cables 706 00:36:09,030 --> 00:36:12,000 all around the world these days that actually transmit our data, 707 00:36:12,000 --> 00:36:15,450 not to mention all of the wireless technologies we have, satellites 708 00:36:15,450 --> 00:36:16,385 and below. 709 00:36:16,385 --> 00:36:19,260 And so it does stand to reason that even though none of these routers 710 00:36:19,260 --> 00:36:22,230 were paying attention to me at that moment for privacy sake, 711 00:36:22,230 --> 00:36:26,460 this last one indicates that 200 milliseconds later we can get halfway 712 00:36:26,460 --> 00:36:28,950 across the world digitally. 713 00:36:28,950 --> 00:36:32,310 And so that does rather speak to just how quickly these low level 714 00:36:32,310 --> 00:36:35,340 primitives operate, and we can talk far longer 715 00:36:35,340 --> 00:36:38,400 about how these things work than it actually takes time 716 00:36:38,400 --> 00:36:40,890 to actually get the data there. 717 00:36:40,890 --> 00:36:44,940 So then together we have TCP/IP via DHCP can we 718 00:36:44,940 --> 00:36:49,410 get the addresses that we need to use to address my envelopes and others, 719 00:36:49,410 --> 00:36:49,980 as well. 720 00:36:49,980 --> 00:36:54,180 Via DNS can we convert those domain names into IP addresses and even back. 721 00:36:54,180 --> 00:36:56,910 And those internet technologies are ultimately 722 00:36:56,910 --> 00:37:02,890 what govern how our data gets from point A to point B. But what is the data? 723 00:37:02,890 --> 00:37:05,700 Indeed, everything thus far is really just metadata. 724 00:37:05,700 --> 00:37:09,150 Information that helps our actual data that we care about get from one 725 00:37:09,150 --> 00:37:10,192 point to another. 726 00:37:10,192 --> 00:37:12,900 But it's the data at the end of the day that I really care about. 727 00:37:12,900 --> 00:37:16,140 The contents of my email, the contents of my chat message, 728 00:37:16,140 --> 00:37:19,860 the voice that I'm sending over a video conference, or even just 729 00:37:19,860 --> 00:37:21,275 the contents of a web page. 730 00:37:21,275 --> 00:37:24,150 Indeed, perhaps the most popular service that you and I use every day 731 00:37:24,150 --> 00:37:27,270 is just that, pulling up pages on the web. 732 00:37:27,270 --> 00:37:32,250 So just how is a web page specifically requested and received? 733 00:37:32,250 --> 00:37:36,300 Well, it turns out that http:// that you've surely seen, 734 00:37:36,300 --> 00:37:39,450 but probably not typed for some time, because your browser, odds are, 735 00:37:39,450 --> 00:37:42,840 just inserts it automatically or even invisibly for you. 736 00:37:42,840 --> 00:37:47,970 That HTTP is yet another protocol in this stack of internet technologies. 737 00:37:47,970 --> 00:37:50,550 Hypertext transfer protocol. 738 00:37:50,550 --> 00:37:55,830 A set of conventions that browsers and web servers have agreed upon long ago 739 00:37:55,830 --> 00:37:57,900 to use when intercommunicating. 740 00:37:57,900 --> 00:38:00,540 And to be clear, then, what exactly is a protocol? 741 00:38:00,540 --> 00:38:01,760 Well, it's just a convention. 742 00:38:01,760 --> 00:38:04,620 We humans have protocols even though we might not call them such. 743 00:38:04,620 --> 00:38:07,453 When I meet someone new on the street I might reach up to him or her 744 00:38:07,453 --> 00:38:09,420 and say, hello, my name is David. 745 00:38:09,420 --> 00:38:13,020 And that protocol results in that other person, 746 00:38:13,020 --> 00:38:16,770 if polite, in extending their hand too, reaching into mine 747 00:38:16,770 --> 00:38:20,430 and probably saying as well, hello, nice to meet you or how are you. 748 00:38:20,430 --> 00:38:23,190 That's a human protocol that we were taught some time ago, 749 00:38:23,190 --> 00:38:26,820 and culturally we have all agreed here in the US to, generally speaking, 750 00:38:26,820 --> 00:38:28,740 greet each other in that manner. 751 00:38:28,740 --> 00:38:31,020 Computers, similarly, have standardized what 752 00:38:31,020 --> 00:38:35,850 goes not only on the outside of these envelopes but what goes in the inside, 753 00:38:35,850 --> 00:38:37,020 as well. 754 00:38:37,020 --> 00:38:39,270 And so if, for instance, the goal at hand 755 00:38:39,270 --> 00:38:44,490 is to request a web page of a canonical website like www.example.com, 756 00:38:44,490 --> 00:38:47,610 let's consider exactly what is inside of this envelope. 757 00:38:47,610 --> 00:38:51,810 Well, first of all here we have a proper URL, uniform resource locator. 758 00:38:51,810 --> 00:38:56,010 These days, your browser, whether it's Chrome or Safari or Edge or Firefox, 759 00:38:56,010 --> 00:38:58,950 probably doesn't even show you all of this information. 760 00:38:58,950 --> 00:39:02,280 In the interests of simpler user interfaces or UIs, 761 00:39:02,280 --> 00:39:05,370 browsers have started to hide these so-called protocol here 762 00:39:05,370 --> 00:39:09,210 at the left, even the ww here, the hostname in the middle, 763 00:39:09,210 --> 00:39:12,780 leaving you oftentimes with just example.com or the equivalent 764 00:39:12,780 --> 00:39:14,657 somewhere at the top of your screen. 765 00:39:14,657 --> 00:39:16,740 But if you click on that address, typically you'll 766 00:39:16,740 --> 00:39:18,930 see more information such as that here. 767 00:39:18,930 --> 00:39:22,020 And sometimes there's more information that's just implicit. 768 00:39:22,020 --> 00:39:26,670 It turns out if you try to visit http://www.example.com 769 00:39:26,670 --> 00:39:31,310 or any similar domain name, what you're likely reaching for is 770 00:39:31,310 --> 00:39:33,290 a very specific file on that server. 771 00:39:33,290 --> 00:39:34,550 But how do we reach it? 772 00:39:34,550 --> 00:39:37,850 Well, highlighted in yellow here is what's called the domain name itself, 773 00:39:37,850 --> 00:39:39,260 example.com. 774 00:39:39,260 --> 00:39:41,690 This is something that you buy, or really rent, 775 00:39:41,690 --> 00:39:45,620 on an annual basis via an internet registrar, a company, that 776 00:39:45,620 --> 00:39:48,860 via the associations on the internet that govern IP addresses 777 00:39:48,860 --> 00:39:51,560 domain names has been authorized to sell, or really 778 00:39:51,560 --> 00:39:55,490 rent, you and anyone else a domain name for some amount of time, 779 00:39:55,490 --> 00:39:58,280 usually one year or two years or 10 or anywhere 780 00:39:58,280 --> 00:40:00,860 in between, for some dollar amount. 781 00:40:00,860 --> 00:40:03,680 And what you get, then, is the ability, for that amount of time 782 00:40:03,680 --> 00:40:07,250 renewable thereafter, to use that specific domain name. 783 00:40:07,250 --> 00:40:10,070 It might be dot com, or dot net or dot org, 784 00:40:10,070 --> 00:40:15,050 or any number of hundreds of others of TLDs, or top level domains. 785 00:40:15,050 --> 00:40:18,943 Indeed, that suffix there is what represents the type of website, 786 00:40:18,943 --> 00:40:20,360 at least historically, that it is. 787 00:40:20,360 --> 00:40:24,680 Dot com for commercial, dot net for network, dot edu for education, 788 00:40:24,680 --> 00:40:26,460 or dot gov for government. 789 00:40:26,460 --> 00:40:28,820 Of course, all of those TLDs, or top level domains, 790 00:40:28,820 --> 00:40:32,120 were very US centric by design, and so far it 791 00:40:32,120 --> 00:40:36,350 was generally a cohort of Americans that designed a lot of this system 792 00:40:36,350 --> 00:40:37,070 initially. 793 00:40:37,070 --> 00:40:39,890 Of course, other countries have shorter TLDs. 794 00:40:39,890 --> 00:40:44,150 Country codes, dot US, dot JP and others that signify 795 00:40:44,150 --> 00:40:46,100 a specific country in which they're in. 796 00:40:46,100 --> 00:40:49,430 And these days anyone can buy a dot com or dot net, 797 00:40:49,430 --> 00:40:53,930 but not everyone can buy a dot gov or dot edu, or several other top level 798 00:40:53,930 --> 00:40:54,650 domains, as well. 799 00:40:54,650 --> 00:40:58,700 It depends on whoever controls that particular suffix. 800 00:40:58,700 --> 00:41:03,380 This here we might call the hostname, the name of the specific server 801 00:41:03,380 --> 00:41:06,380 that you were trying to visit that lives within that domain name. 802 00:41:06,380 --> 00:41:09,050 In other contexts, you might call this a subdomain, 803 00:41:09,050 --> 00:41:12,770 indicating what subdivision of a company or university you're actually 804 00:41:12,770 --> 00:41:14,450 trying to access. 805 00:41:14,450 --> 00:41:19,070 And then down here on the right, implicitly so to speak, is a file name. 806 00:41:19,070 --> 00:41:21,830 It is human convention, but not required, that the name 807 00:41:21,830 --> 00:41:27,440 of the file that contains the web page that a server serves up by default, 808 00:41:27,440 --> 00:41:30,350 happens to be traditionally index.html. 809 00:41:30,350 --> 00:41:35,060 It could also be index.htm or any number of other names or extensions, 810 00:41:35,060 --> 00:41:37,140 but this is among the most common. 811 00:41:37,140 --> 00:41:40,450 So if you don't mention that via just a slash, it's implied, 812 00:41:40,450 --> 00:41:43,070 and it's that file or any other file, that's 813 00:41:43,070 --> 00:41:48,230 implied or even specified explicitly that is inside of this envelope. 814 00:41:48,230 --> 00:41:51,740 That's the whole point of this virtual packet of information, 815 00:41:51,740 --> 00:41:56,450 to encapsulate the request for a page and the actual page itself. 816 00:41:56,450 --> 00:41:59,600 At the end of the day, it's HTML, Hypertext Markup Language, 817 00:41:59,600 --> 00:42:03,620 an actual language in which pages are written, that's inside that envelope, 818 00:42:03,620 --> 00:42:06,740 but it's transmitted there via HTTP. 819 00:42:06,740 --> 00:42:10,820 The protocol, the set of conventions via which browser and server agree 820 00:42:10,820 --> 00:42:14,850 to send and receive that information. 821 00:42:14,850 --> 00:42:16,770 So what does that information look like? 822 00:42:16,770 --> 00:42:19,670 And just what have these computers agreed on? 823 00:42:19,670 --> 00:42:22,400 It turns out that inside of this envelope, 824 00:42:22,400 --> 00:42:27,080 when it represents a request for a web page like my URL there, 825 00:42:27,080 --> 00:42:29,030 are these lines here. 826 00:42:29,030 --> 00:42:33,230 GET/HTTP/1.1, where get is clearly a verb, 827 00:42:33,230 --> 00:42:35,900 by definition in all caps in this protocol, 828 00:42:35,900 --> 00:42:40,500 slash means the default page of the website index.html or something else. 829 00:42:40,500 --> 00:42:44,060 And then often a mention of host colon and then the name of the host 830 00:42:44,060 --> 00:42:45,650 that you're actually looking for. 831 00:42:45,650 --> 00:42:47,990 Because it turns out servers can do so many things. 832 00:42:47,990 --> 00:42:51,110 Not just Google servers with voice and chat and other services, 833 00:42:51,110 --> 00:42:54,410 one web server can actually serve up multiple websites. 834 00:42:54,410 --> 00:42:59,660 Example.com, acme.com, Harvard.edu, google.com, all of us 835 00:42:59,660 --> 00:43:04,070 can actually have shared tendencies, so to speak, on the same server in theory. 836 00:43:04,070 --> 00:43:08,510 And so by mentioning what actual website you want inside of the envelope, 837 00:43:08,510 --> 00:43:12,560 the recipient of this envelope can make sure that it serves you my home page 838 00:43:12,560 --> 00:43:14,030 and not someone else's. 839 00:43:14,030 --> 00:43:17,210 But beyond that, there needs to be additional information, as well. 840 00:43:17,210 --> 00:43:20,030 You might explicitly specify the name of the file. 841 00:43:20,030 --> 00:43:23,750 And again, we humans have nothing to do with any of this, ultimately, 842 00:43:23,750 --> 00:43:25,760 we have just typed that URL. 843 00:43:25,760 --> 00:43:29,000 But it's our browser, on Mac OS or Windows or phones, 844 00:43:29,000 --> 00:43:32,870 that's packaging up this information inside of a virtual envelope 845 00:43:32,870 --> 00:43:36,470 and sending it out, ultimately, on our behalf. 846 00:43:36,470 --> 00:43:40,430 And indeed, if all goes well and that envelope reaches point B 847 00:43:40,430 --> 00:43:44,780 and it's opened up and it represents the name of a web page that does, 848 00:43:44,780 --> 00:43:47,540 in fact, exist, the response that I hope to get back 849 00:43:47,540 --> 00:43:51,230 in another envelope from point B to point A 850 00:43:51,230 --> 00:43:54,590 is going to contain an HTTP message like this. 851 00:43:54,590 --> 00:43:59,480 Literally the name of the protocol again, HTTP/1.1, and then a number, 852 00:43:59,480 --> 00:44:01,140 and optionally a phrase. 853 00:44:01,140 --> 00:44:03,800 200 is perhaps a number you've never actually seen, 854 00:44:03,800 --> 00:44:06,490 even though it is the best possible response to get. 855 00:44:06,490 --> 00:44:09,560 200 means, quite literally, OK. 856 00:44:09,560 --> 00:44:11,480 The web page you requested has been found 857 00:44:11,480 --> 00:44:15,170 and has been delivered in this response envelope, OK. 858 00:44:15,170 --> 00:44:19,220 The type of content you've received is in this case text/html. 859 00:44:19,220 --> 00:44:23,660 Which is to say inside of that envelope is a clue to your browser 860 00:44:23,660 --> 00:44:26,240 what kind of content is inside deeper. 861 00:44:26,240 --> 00:44:29,120 Is it text.html, like the contents of a web page? 862 00:44:29,120 --> 00:44:33,570 Is it an image/png like a graphic, or image/gif, something animated, 863 00:44:33,570 --> 00:44:39,720 or video/mp4, an actual video file, this so-called MIME type or content 864 00:44:39,720 --> 00:44:43,080 type is inside of the envelope for your browser so as to provide a hint, 865 00:44:43,080 --> 00:44:45,660 so as to know how to display it on the screen. 866 00:44:45,660 --> 00:44:48,750 There's so many other headers, as well, but these two alone 867 00:44:48,750 --> 00:44:51,330 really specify almost as much information 868 00:44:51,330 --> 00:44:54,950 as you need in order to render that response for the user. 869 00:44:54,950 --> 00:44:56,700 Now as an aside, there are other versions. 870 00:44:56,700 --> 00:44:59,370 And increasingly in vogue, though not yet omnipresent, 871 00:44:59,370 --> 00:45:03,450 is HTTP2 which has additional features, particularly for performance 872 00:45:03,450 --> 00:45:05,820 and getting data to you even more quickly. 873 00:45:05,820 --> 00:45:08,580 It simply replaces that 1.1 with a two and the response, 874 00:45:08,580 --> 00:45:11,880 though, comes back almost the same. 875 00:45:11,880 --> 00:45:15,840 So let's consider an example then, such as harvard.edu. 876 00:45:15,840 --> 00:45:23,310 It turns out that http://harvard.edu is not where Harvard wants you to be. 877 00:45:23,310 --> 00:45:25,890 In fact, let me go ahead and pull up my browser here 878 00:45:25,890 --> 00:45:28,630 and visit precisely that URL. 879 00:45:28,630 --> 00:45:33,480 http://harvard.edu, Enter. 880 00:45:33,480 --> 00:45:37,200 And within seconds do I find myself not at harvard.edu, but rather 881 00:45:37,200 --> 00:45:45,480 at ww.harvard.edu and moreover at https://www.harvard.edu. 882 00:45:45,480 --> 00:45:48,990 In other words, even though I specified a protocol of HTTP, 883 00:45:48,990 --> 00:45:53,320 a domain name of harvard.edu, and no hostname, so to speak, 884 00:45:53,320 --> 00:45:55,890 I have actually been whisked away, seemingly magically, 885 00:45:55,890 --> 00:46:01,260 to this URL instead, for reasons both technical and perhaps marketing alike. 886 00:46:01,260 --> 00:46:05,250 For today, though, let's focus on exactly how this came to pass. 887 00:46:05,250 --> 00:46:08,100 Well, it turns out that inside of the envelope with which Harvard, 888 00:46:08,100 --> 00:46:11,850 or any server, replies to me can be additional metadata, as well. 889 00:46:11,850 --> 00:46:15,180 Not just 200 OK, but really the equivalent of uh-uh, 890 00:46:15,180 --> 00:46:18,360 there's nothing to see here, go here instead. 891 00:46:18,360 --> 00:46:22,410 So let me go ahead and run a program, again in that black and white window 892 00:46:22,410 --> 00:46:24,480 known as my terminal window, whereby I can 893 00:46:24,480 --> 00:46:27,240 pretend to be a browser without all of the graphics 894 00:46:27,240 --> 00:46:29,970 and without all of the distraction and focus only 895 00:46:29,970 --> 00:46:32,760 on the contents of those digital envelopes. 896 00:46:32,760 --> 00:46:36,840 Here the program I'm going to run is called curl for connect to a URL, 897 00:46:36,840 --> 00:46:41,820 and I'm going to specify dash I which is to say I only want the HTTP headers. 898 00:46:41,820 --> 00:46:48,240 I'm going to go ahead now and say http://harvard.edu, nothing more. 899 00:46:48,240 --> 00:46:51,030 When I hit Enter now, here are the complete headers 900 00:46:51,030 --> 00:46:53,100 that come back from the server. 901 00:46:53,100 --> 00:46:56,130 No dot dot dot this time, we see everything, in fact, here, 902 00:46:56,130 --> 00:46:57,510 but notice the first line. 903 00:46:57,510 --> 00:47:02,100 It's not 200 OK, but rather 301 moved permanently. 904 00:47:02,100 --> 00:47:03,510 Like, where did Harvard go? 905 00:47:03,510 --> 00:47:07,410 Well, it turns out that Harvard has specified its new location 906 00:47:07,410 --> 00:47:13,950 down here as https://www.harvard.edu. 907 00:47:13,950 --> 00:47:16,080 Now there's other lines of headers there, 908 00:47:16,080 --> 00:47:19,410 HTTP headers as they're called, each of which starts with a word, 909 00:47:19,410 --> 00:47:23,010 perhaps with some punctuation, and a colon, followed by the value. 910 00:47:23,010 --> 00:47:27,930 Location, value, go to this location is the general paradigm there. 911 00:47:27,930 --> 00:47:31,650 But why might Harvard not want to show me their web page at the address 912 00:47:31,650 --> 00:47:32,670 that I typed? 913 00:47:32,670 --> 00:47:36,420 Well, it turns out that HTTP is by definition insecure. 914 00:47:36,420 --> 00:47:39,210 The extents to which the message is encoded 915 00:47:39,210 --> 00:47:41,740 is quite literally in English or English-like syntax, 916 00:47:41,740 --> 00:47:43,980 such as that we've been looking at here. 917 00:47:43,980 --> 00:47:47,050 It's just that text that's inside the envelope. 918 00:47:47,050 --> 00:47:50,850 If instead, though, you want to encrypt those contents so that no one knows 919 00:47:50,850 --> 00:47:53,370 what web page you're requesting or receiving, 920 00:47:53,370 --> 00:47:57,690 and your employer and your university administrator or your internet service 921 00:47:57,690 --> 00:48:00,900 provider or country does not know what you're doing, 922 00:48:00,900 --> 00:48:04,380 be it for personal reasons, financial, or otherwise, well then 923 00:48:04,380 --> 00:48:06,300 you want to use HTTPS. 924 00:48:06,300 --> 00:48:09,030 And Harvard University, like so many companies today, 925 00:48:09,030 --> 00:48:12,030 is insistent that you actually visit them securely, 926 00:48:12,030 --> 00:48:14,850 if only because it's best practice, but it also 927 00:48:14,850 --> 00:48:18,390 prevents potentially private information from leaking. 928 00:48:18,390 --> 00:48:20,640 And so here with this location line is Harvard saying, 929 00:48:20,640 --> 00:48:24,930 no, we will not respond to you with OK via HTTP, 930 00:48:24,930 --> 00:48:28,980 we have moved permanently to a secure address at HTTPS, 931 00:48:28,980 --> 00:48:31,020 where the S denotes secure. 932 00:48:31,020 --> 00:48:32,320 But why the www? 933 00:48:32,320 --> 00:48:34,710 Back in the day, you probably did have to type 934 00:48:34,710 --> 00:48:40,230 for many companies, www.example.com instead of just going to example.com 935 00:48:40,230 --> 00:48:43,200 and hoping that you end up in the right place. 936 00:48:43,200 --> 00:48:45,780 Well, humans have gotten more comfortable with the internet 937 00:48:45,780 --> 00:48:48,240 over the past years, over the past decades, and indeed, 938 00:48:48,240 --> 00:48:52,080 whereas years ago, in order to advertise yourself effectively on the web, 939 00:48:52,080 --> 00:48:54,810 you might have indeed needed to go to press on your business card 940 00:48:54,810 --> 00:49:00,500 or advertisement with http://www.something.com. 941 00:49:00,500 --> 00:49:05,505 But all of us have kind of seen HTTP enough, if not HTTPS as well, 942 00:49:05,505 --> 00:49:07,130 you don't need to tell me to type that. 943 00:49:07,130 --> 00:49:09,660 And indeed, my browser no longer requires me to type 944 00:49:09,660 --> 00:49:13,080 that, so now you see business cards and advertisements 945 00:49:13,080 --> 00:49:15,805 with just www.something.com. 946 00:49:15,805 --> 00:49:18,480 But you know what, I'm not new to the internet. 947 00:49:18,480 --> 00:49:22,140 I know what ww is, and I know what dot com is as well, 948 00:49:22,140 --> 00:49:27,780 don't even bother showing me or telling me on your card or your website or ad 949 00:49:27,780 --> 00:49:32,740 that it's www.something.com, just tell me something.com. 950 00:49:32,740 --> 00:49:35,328 And so browsers have been getting more user friendly 951 00:49:35,328 --> 00:49:37,120 and humans have been getting more familiar, 952 00:49:37,120 --> 00:49:40,540 and so we tend not to see those prefixes anymore. 953 00:49:40,540 --> 00:49:44,500 But it turns out that for technical reasons, for security reasons, 954 00:49:44,500 --> 00:49:47,370 it tends to be useful to have a subdomain. 955 00:49:47,370 --> 00:49:49,120 As an aside, for things like cookies, it's 956 00:49:49,120 --> 00:49:51,520 useful to keep cookies in a subdomain as opposed 957 00:49:51,520 --> 00:49:56,330 to the domain itself just to narrow the scope via which they can be accessed. 958 00:49:56,330 --> 00:49:58,960 But also for marketing sake, it would be nice 959 00:49:58,960 --> 00:50:04,120 if everyone in the world, whether they type harvard.edu or www.harvard.edu, 960 00:50:04,120 --> 00:50:08,020 ultimately end up in the same location just because that's how we 961 00:50:08,020 --> 00:50:09,940 want to present ourselves to the world. 962 00:50:09,940 --> 00:50:13,150 And so for both technical and marketing and security reasons alike might 963 00:50:13,150 --> 00:50:18,100 Harvard or a company want to redirect to a URL like this one here. 964 00:50:18,100 --> 00:50:19,810 Now what does your browser know to do? 965 00:50:19,810 --> 00:50:23,440 Well, when your browser receives not 200 OK, in which case 966 00:50:23,440 --> 00:50:27,460 it just shows you the page, but instead receives 301 moved permanently, 967 00:50:27,460 --> 00:50:31,390 it instead looks for that location line and takes you there instead, 968 00:50:31,390 --> 00:50:35,920 at which point then you'll get that 200 OK. 969 00:50:35,920 --> 00:50:38,350 And so this, again, is with browsers do. 970 00:50:38,350 --> 00:50:42,820 HTTP is what they understand, and know by definition of that protocol 971 00:50:42,820 --> 00:50:45,160 how to handle these cases. 972 00:50:45,160 --> 00:50:50,140 But not everything is always OK and not always has something moved permanently. 973 00:50:50,140 --> 00:50:52,670 Sometimes something's just not found. 974 00:50:52,670 --> 00:50:55,210 And in fact, of all of these numbers we've seen thus far, 975 00:50:55,210 --> 00:51:00,160 odds are you've not seen or cared about 200 or even 301, but most of us 976 00:51:00,160 --> 00:51:03,070 have probably at least once seen 404. 977 00:51:03,070 --> 00:51:04,060 Why? 978 00:51:04,060 --> 00:51:06,640 Why in the world is that the number we somehow 979 00:51:06,640 --> 00:51:11,140 see anytime you visit a web page that's gone, or anytime you mistype an address 980 00:51:11,140 --> 00:51:12,610 and you reach a dead end? 981 00:51:12,610 --> 00:51:16,300 Well, for better or for worse the designers of websites for years 982 00:51:16,300 --> 00:51:20,360 have exposed this value to end users even though it's not all that useful. 983 00:51:20,360 --> 00:51:22,720 But it's indeed the unique value that humans 984 00:51:22,720 --> 00:51:27,550 decided some years ago would uniquely represent the notion of a page 985 00:51:27,550 --> 00:51:28,720 not being found. 986 00:51:28,720 --> 00:51:33,070 So if inside of that virtual envelope comes back a message 404 not found, 987 00:51:33,070 --> 00:51:36,030 the browser can say that literally or perhaps display 988 00:51:36,030 --> 00:51:41,140 a cute message to that effect, but the reason that you're seeing that 404 989 00:51:41,140 --> 00:51:44,200 is because quite literally and mind numbingly that 990 00:51:44,200 --> 00:51:49,480 is just the low level status code that has come back from an HTTP server. 991 00:51:49,480 --> 00:51:50,890 And there's more of these, too. 992 00:51:50,890 --> 00:51:53,260 In fact, 200 OK is the best you might get. 993 00:51:53,260 --> 00:51:55,300 301 moved permanently we've seen. 994 00:51:55,300 --> 00:51:57,910 302 found is another form of redirection, 995 00:51:57,910 --> 00:51:59,640 but a temporary one instead. 996 00:51:59,640 --> 00:52:04,480 304 not modified is a response that a server can send for efficiency. 997 00:52:04,480 --> 00:52:07,240 If you visited a web page just a moment ago 998 00:52:07,240 --> 00:52:09,520 and you happen to hit reload or click on a link 999 00:52:09,520 --> 00:52:13,060 and get back the same content again, it's not terribly efficient or good 1000 00:52:13,060 --> 00:52:16,750 business for a company to incur the time and perhaps financial cost 1001 00:52:16,750 --> 00:52:20,740 to retransmit all of those bits to you, and so it might instead 1002 00:52:20,740 --> 00:52:25,540 respond with an envelope more succinctly with 304 not modified 1003 00:52:25,540 --> 00:52:30,310 without anything else deeper in that envelope, no additional content. 1004 00:52:30,310 --> 00:52:34,030 And so this way your browser will just reline its own cache, its own copy, 1005 00:52:34,030 --> 00:52:36,550 so to speak, of the original request. 1006 00:52:36,550 --> 00:52:39,990 Meanwhile, if you're not allowed to visit some web page because you've not 1007 00:52:39,990 --> 00:52:41,990 logged in or you don't have authorization there, 1008 00:52:41,990 --> 00:52:45,400 too, well 401 unauthorized might instead come back. 1009 00:52:45,400 --> 00:52:46,960 As might 403 forbidden. 1010 00:52:46,960 --> 00:52:49,420 404 not found means there's just nothing there. 1011 00:52:49,420 --> 00:52:52,750 418 I'm a teapot was an April Fool's joke some years 1012 00:52:52,750 --> 00:52:55,270 ago where someone went to the lengths of actually writing 1013 00:52:55,270 --> 00:53:00,340 a formal specification for what a server should say when it is in fact a teapot. 1014 00:53:00,340 --> 00:53:03,350 But the worst error you might see, and most users would never see this, 1015 00:53:03,350 --> 00:53:08,620 but developers of software would is five zero zero, 500, which 1016 00:53:08,620 --> 00:53:12,520 represents an internal server error, and almost always represents 1017 00:53:12,520 --> 00:53:17,290 a logical or a syntactic error in the code that someone has written, 1018 00:53:17,290 --> 00:53:21,410 be it in Python or any number of other languages. 1019 00:53:21,410 --> 00:53:24,340 And now a fun example, perhaps, to bring all this home. 1020 00:53:24,340 --> 00:53:28,510 It turns out that safetyschool.org is an actual address on the web. 1021 00:53:28,510 --> 00:53:30,940 And indeed, it happens to have been bought or rented 1022 00:53:30,940 --> 00:53:33,790 for years now by some Harvard alum. 1023 00:53:33,790 --> 00:53:36,580 And indeed, if you visit safetyschool.org, 1024 00:53:36,580 --> 00:53:40,330 you shall find yourself at this website here. 1025 00:53:40,330 --> 00:53:44,170 http://safetyschool.org. 1026 00:53:44,170 --> 00:53:49,060 We find ourselves whisked away to www.yale.edu. 1027 00:53:49,060 --> 00:53:50,320 But how is that implemented? 1028 00:53:50,320 --> 00:53:52,420 Well, let's again turn to our terminal window, 1029 00:53:52,420 --> 00:53:55,870 where we can see really the contents of that virtual envelope. 1030 00:53:55,870 --> 00:53:59,710 And if in here in my terminal window I again type curl dash I, 1031 00:53:59,710 --> 00:54:05,470 http://safetyschool.org, well I see all of the headers 1032 00:54:05,470 --> 00:54:07,300 that are exactly coming back. 1033 00:54:07,300 --> 00:54:11,800 And indeed, here, safetyschool.org has permanently moved for years now 1034 00:54:11,800 --> 00:54:17,740 to this location, http://www.yale.org. 1035 00:54:17,740 --> 00:54:21,220 A fun jab at our rivals that some alum has been paying now 1036 00:54:21,220 --> 00:54:25,760 for years on an annual basis. 1037 00:54:25,760 --> 00:54:29,140 So we now have a pair of protocols, TCP and IP, 1038 00:54:29,140 --> 00:54:32,020 via which we can get data, any data, from point A 1039 00:54:32,020 --> 00:54:33,890 to point B on the internet. 1040 00:54:33,890 --> 00:54:38,800 Sometimes that data is itself HTTP data that is a request for a web page 1041 00:54:38,800 --> 00:54:41,080 or a response with a web page. 1042 00:54:41,080 --> 00:54:45,880 But what if there are so many others trying to access data at point B-- 1043 00:54:45,880 --> 00:54:49,360 that is to say, business is good, and a web server out there is receiving 1044 00:54:49,360 --> 00:54:54,400 so many packets per second that the server cannot quite yet keep up? 1045 00:54:54,400 --> 00:54:57,700 The routers in between might very well be able to handle that load perfectly 1046 00:54:57,700 --> 00:55:02,320 because those are much bigger servers, conceptually and physically, with far 1047 00:55:02,320 --> 00:55:05,650 more CPUs and RAM and therefore can handle that load, 1048 00:55:05,650 --> 00:55:10,930 but some business' server out there is only finite in capacity. 1049 00:55:10,930 --> 00:55:15,250 And so what happens when you need to scale to handle more users? 1050 00:55:15,250 --> 00:55:17,680 Well, you might have initially just one server 1051 00:55:17,680 --> 00:55:20,150 such as that Dell server pictured here. 1052 00:55:20,150 --> 00:55:22,450 This is what's called a rack server, insofar 1053 00:55:22,450 --> 00:55:26,170 as it's designed to exist on a rack that you slide this thing into, 1054 00:55:26,170 --> 00:55:29,630 and it happens to be one rack unit or 1.5 inches, 1055 00:55:29,630 --> 00:55:32,230 which is simply a standardization thereof. 1056 00:55:32,230 --> 00:55:35,500 Inside of this rack server is its hard drive, and RAM, 1057 00:55:35,500 --> 00:55:39,340 and CPU, and more pieces, but it's exactly the same technology 1058 00:55:39,340 --> 00:55:41,770 that you might have in a box under your desk 1059 00:55:41,770 --> 00:55:45,910 or even in the form factor of a laptop, just bigger and faster. 1060 00:55:45,910 --> 00:55:48,400 And, to be fair, more expensive. 1061 00:55:48,400 --> 00:55:52,150 But it's only so big, indeed, it's only 1.5 inches tall 1062 00:55:52,150 --> 00:55:54,790 and some number of inches deep, which is to say there's only 1063 00:55:54,790 --> 00:55:57,160 a finite amount of RAM in there. 1064 00:55:57,160 --> 00:55:59,770 There's only a fixed number of CPUs in there, 1065 00:55:59,770 --> 00:56:04,540 and there's only so many gigabytes, presumably, of disk storage space. 1066 00:56:04,540 --> 00:56:07,480 At some point or other, we're going to run out of one or more 1067 00:56:07,480 --> 00:56:08,830 of those resources. 1068 00:56:08,830 --> 00:56:13,600 And even though we've not really gotten into the weeds of how a server handles 1069 00:56:13,600 --> 00:56:16,360 and reads these envelopes, it certainly stands 1070 00:56:16,360 --> 00:56:19,780 to reason that it can only read with finite resources 1071 00:56:19,780 --> 00:56:23,620 some finite number of packets per unit of time, 1072 00:56:23,620 --> 00:56:26,290 be it second or minutes or days. 1073 00:56:26,290 --> 00:56:28,750 And so at some point if business is booming, 1074 00:56:28,750 --> 00:56:32,080 we might receive at any given point more packets 1075 00:56:32,080 --> 00:56:35,380 of information that we can handle and indeed, like some routers, if they're 1076 00:56:35,380 --> 00:56:38,890 overwhelmed, we might just drop these incoming packets, 1077 00:56:38,890 --> 00:56:43,210 or worse yet, not expect them and just crash or freeze or somehow 1078 00:56:43,210 --> 00:56:45,580 behave unpredictably. 1079 00:56:45,580 --> 00:56:47,630 And that's probably not good for our business. 1080 00:56:47,630 --> 00:56:49,960 So how can we go about solving this problem? 1081 00:56:49,960 --> 00:56:54,430 Well, the easiest way, quite simply, is to scale vertically, so to speak. 1082 00:56:54,430 --> 00:56:56,020 That is don't use that server. 1083 00:56:56,020 --> 00:57:01,000 Instead, buy one that's bigger with more RAM and more CPUs and more disk space 1084 00:57:01,000 --> 00:57:03,970 and faster internet connectivity, and really just 1085 00:57:03,970 --> 00:57:06,550 avoid that problem altogether. 1086 00:57:06,550 --> 00:57:08,030 Why is this compelling? 1087 00:57:08,030 --> 00:57:11,560 Well, cost of it aside, you don't have to change your code, 1088 00:57:11,560 --> 00:57:14,260 you needn't change your configuration in software, 1089 00:57:14,260 --> 00:57:20,050 you need only throw hardware and in turn, to be fair, money at the problem. 1090 00:57:20,050 --> 00:57:23,470 Now that in and of itself might alone be a deal breaker, the money alone, 1091 00:57:23,470 --> 00:57:27,640 but at some point if we want to handle that business, we've got to scale up, 1092 00:57:27,640 --> 00:57:29,830 but even this is shortsighted. 1093 00:57:29,830 --> 00:57:35,620 Because at the end of the day, Dell only sells servers that operate so quickly 1094 00:57:35,620 --> 00:57:37,660 and have so much disk space. 1095 00:57:37,660 --> 00:57:40,210 Those resources, too, are ultimately finite. 1096 00:57:40,210 --> 00:57:42,790 And while next year there might be an even bigger version 1097 00:57:42,790 --> 00:57:44,980 of this same machine out there, this year 1098 00:57:44,980 --> 00:57:47,020 you might have the top of the line. 1099 00:57:47,020 --> 00:57:50,800 So at some point, one server, even with so many resources, 1100 00:57:50,800 --> 00:57:54,910 might not be able to handle all of the packets and business you're getting. 1101 00:57:54,910 --> 00:57:56,980 So what do you then do? 1102 00:57:56,980 --> 00:58:01,660 Well, there is an opportunity to scale not vertically, so to speak, 1103 00:58:01,660 --> 00:58:04,120 but horizontally instead. 1104 00:58:04,120 --> 00:58:07,930 Focusing not on the top tier machines, but instead, 1105 00:58:07,930 --> 00:58:12,820 two of the smaller ones, or as needed three or four or more of the same. 1106 00:58:12,820 --> 00:58:15,940 In other words, spending lower on that cost curve, 1107 00:58:15,940 --> 00:58:18,520 getting more hardware, hopefully, for your money, 1108 00:58:18,520 --> 00:58:21,700 but such that the net effect is even more CPU power 1109 00:58:21,700 --> 00:58:25,630 and more disk space and more RAM than you might have gotten with that one 1110 00:58:25,630 --> 00:58:27,240 souped up machine itself. 1111 00:58:27,240 --> 00:58:28,990 And heck, if you really need the capacity, 1112 00:58:28,990 --> 00:58:32,200 you can buy any number of these big servers, 1113 00:58:32,200 --> 00:58:35,980 but you do somehow ultimately have to interconnect them. 1114 00:58:35,980 --> 00:58:38,260 And here now is where there's a trade-off. 1115 00:58:38,260 --> 00:58:40,660 Whereas money was really the only barrier 1116 00:58:40,660 --> 00:58:44,140 to solving this problem initially, though easier said than done, 1117 00:58:44,140 --> 00:58:48,370 now we have to re-engineer our system, because no longer 1118 00:58:48,370 --> 00:58:51,910 are packets of internet data coming in from our customers 1119 00:58:51,910 --> 00:58:54,670 and ending up in one place, they now have to somehow 1120 00:58:54,670 --> 00:58:58,750 be spread across multiple servers. 1121 00:58:58,750 --> 00:59:00,910 So how might we do this back in the day? 1122 00:59:00,910 --> 00:59:04,780 Well, back in the late 90s, when Larry and Sergey of Google fame 1123 00:59:04,780 --> 00:59:07,270 built out their first cluster of servers, 1124 00:59:07,270 --> 00:59:09,970 they didn't have those pretty Dell boxes, rather, 1125 00:59:09,970 --> 00:59:12,940 this was, now in Google's museum, reportedly 1126 00:59:12,940 --> 00:59:15,550 one of their first racks of servers. 1127 00:59:15,550 --> 00:59:18,100 Notice there's no shiny cases, let alone logos, 1128 00:59:18,100 --> 00:59:22,180 but instead, lots of circuit boards on which are hard drive after hard drive 1129 00:59:22,180 --> 00:59:26,650 after hard drive and suffice to say so many wires connecting everything. 1130 00:59:26,650 --> 00:59:30,240 And even though, ironically, this picture seems to be vertical, 1131 00:59:30,240 --> 00:59:36,150 this is, perhaps, one of the earliest examples in our internet era of scaling 1132 00:59:36,150 --> 00:59:37,230 horizontally. 1133 00:59:37,230 --> 00:59:40,110 Each of these servers, which is represented by each of these boards, 1134 00:59:40,110 --> 00:59:43,260 is somehow interconnected in such a way that those servers 1135 00:59:43,260 --> 00:59:45,100 can intercommunicate. 1136 00:59:45,100 --> 00:59:46,140 But how? 1137 00:59:46,140 --> 00:59:48,540 Well, let's consider, with the proverbial engineering hat 1138 00:59:48,540 --> 00:59:51,750 on, how servers might somehow intercommunicate. 1139 00:59:51,750 --> 00:59:54,690 If up here, for instance, is just some artist's rendition 1140 00:59:54,690 --> 00:59:58,590 of someone's laptop, a potential customer who's sending us packets, 1141 00:59:58,590 --> 01:00:03,780 that customer might previously have been accessing our server, which we'll 1142 01:00:03,780 --> 01:00:06,270 represent here with a box and just call it 1143 01:00:06,270 --> 01:00:09,250 A. Server A. There's no other servers involved, 1144 01:00:09,250 --> 01:00:12,600 but there is some internet in between us here, so we'll assume that this 1145 01:00:12,600 --> 01:00:14,880 is the so-called cloud, so to speak. 1146 01:00:14,880 --> 01:00:17,730 And I, as this laptop or the customer, has a connection 1147 01:00:17,730 --> 01:00:20,940 and so does that one and only server have the same. 1148 01:00:20,940 --> 01:00:22,570 This picture is fairly straightforward. 1149 01:00:22,570 --> 01:00:24,720 Now you request a web page via this browser, 1150 01:00:24,720 --> 01:00:27,360 it somehow traverses the internet via routers, 1151 01:00:27,360 --> 01:00:31,150 and then ultimately ends up at that server A. 1152 01:00:31,150 --> 01:00:33,180 But what if, instead, there's not just A, 1153 01:00:33,180 --> 01:00:35,580 even if it's top of the line, because that's not enough, 1154 01:00:35,580 --> 01:00:40,710 but instead their servers A and B together here for this website. 1155 01:00:40,710 --> 01:00:44,790 Well, here we might now have two boxes, the same size or bigger or smaller, 1156 01:00:44,790 --> 01:00:46,860 but ultimately finite, as well. 1157 01:00:46,860 --> 01:00:51,330 And somehow we need to now decide how to route information 1158 01:00:51,330 --> 01:00:54,460 from customer to server A or B. In other words, 1159 01:00:54,460 --> 01:00:57,420 there is now virtually a fork in the road, 1160 01:00:57,420 --> 01:01:00,700 left or right, that packets need to traverse. 1161 01:01:00,700 --> 01:01:02,790 So how can we implement this building block? 1162 01:01:02,790 --> 01:01:05,580 Well, again, as always, go back to first principles. 1163 01:01:05,580 --> 01:01:07,710 We know from our stack of internet technologies 1164 01:01:07,710 --> 01:01:10,710 that we already have a mechanism via which to translate 1165 01:01:10,710 --> 01:01:13,080 domain names into IP addresses. 1166 01:01:13,080 --> 01:01:15,300 And if each of these servers, by definition of IP, 1167 01:01:15,300 --> 01:01:21,180 has its own IP address, why not just use DNS to solve this problem? 1168 01:01:21,180 --> 01:01:25,680 When a customer requests example.com, perhaps answer that request 1169 01:01:25,680 --> 01:01:29,790 with the IP address of A. And then when a second customer somewhere else out 1170 01:01:29,790 --> 01:01:34,110 there on his or her laptop asks for example.com next, 1171 01:01:34,110 --> 01:01:39,060 return to them the IP address of B and vise versa, again and again. 1172 01:01:39,060 --> 01:01:41,580 Literally adopting a round robin technique 1173 01:01:41,580 --> 01:01:45,330 of sorts, whereby one time you answer A, the next time you answer B, 1174 01:01:45,330 --> 01:01:47,580 and back and forth you go. 1175 01:01:47,580 --> 01:01:51,690 On average, you would like to think that this uniform distribution of answers 1176 01:01:51,690 --> 01:01:55,110 will give you 50% load, that is to say traffic, 1177 01:01:55,110 --> 01:01:58,020 on one server and 50% on the other. 1178 01:01:58,020 --> 01:02:00,840 But perhaps this customer is more of a shopper than this one, 1179 01:02:00,840 --> 01:02:03,960 and they end up imposing even more load on A than on B, 1180 01:02:03,960 --> 01:02:08,067 so there with that simple heuristic you can get skew. 1181 01:02:08,067 --> 01:02:10,650 You might not even use round robin, you could just use random, 1182 01:02:10,650 --> 01:02:15,390 but on there on average yes, you'll send 50% traffic left and 50% right, 1183 01:02:15,390 --> 01:02:19,170 but some of those users might be heavier users than other. 1184 01:02:19,170 --> 01:02:22,020 So perhaps we should have some form of feedback loop, 1185 01:02:22,020 --> 01:02:24,910 and DNS alone might not be sufficient. 1186 01:02:24,910 --> 01:02:28,710 We really need there to be a middle man, such as this dot here, 1187 01:02:28,710 --> 01:02:33,840 that decides more intelligently whether to send data to A or to B. 1188 01:02:33,840 --> 01:02:37,560 And we'll call this thing here, this dot now, a load balancer. 1189 01:02:37,560 --> 01:02:40,030 Aptly name insofar as it balances load that's 1190 01:02:40,030 --> 01:02:42,270 incoming across multiple servers. 1191 01:02:42,270 --> 01:02:43,410 But how? 1192 01:02:43,410 --> 01:02:47,970 Well, if these connections between A and B in this load balancer 1193 01:02:47,970 --> 01:02:50,460 are not uni-directional, but bidirectional somehow, 1194 01:02:50,460 --> 01:02:54,360 literally a cable that allows bits to flow left and right. 1195 01:02:54,360 --> 01:02:58,920 Could we perhaps have A just continually report back to that load balancer 1196 01:02:58,920 --> 01:03:02,040 saying, I have capacity, I have capacity? 1197 01:03:02,040 --> 01:03:04,800 Whereas B might say, I've got too many customers. 1198 01:03:04,800 --> 01:03:08,850 And logically, then, this load balancer can just start sending no traffic to B 1199 01:03:08,850 --> 01:03:11,280 and send all of it to A or vise versa. 1200 01:03:11,280 --> 01:03:13,350 Of course, logically, we could find ourselves 1201 01:03:13,350 --> 01:03:18,270 in a situation where both A and B are too busy, what then do we do? 1202 01:03:18,270 --> 01:03:21,390 Well, at some point we have to throw money at the problem 1203 01:03:21,390 --> 01:03:23,760 and solve it by just adding hardware. 1204 01:03:23,760 --> 01:03:27,000 And so C might be added to the mix with that same logic, 1205 01:03:27,000 --> 01:03:30,990 but the load balancer just has to know about it. 1206 01:03:30,990 --> 01:03:32,910 So all fine and good, we seem to have solved 1207 01:03:32,910 --> 01:03:37,680 the problem in a very straightforward way, but as with computer science 1208 01:03:37,680 --> 01:03:40,740 more generally, there's probably a price paid and a trade-off, 1209 01:03:40,740 --> 01:03:42,570 and not just financial. 1210 01:03:42,570 --> 01:03:47,760 Unfortunately, even though I have two, maybe even three servers now, therefore 1211 01:03:47,760 --> 01:03:50,490 seemingly having high availability of service, 1212 01:03:50,490 --> 01:03:53,340 that is any one of these servers theoretically could go down 1213 01:03:53,340 --> 01:03:57,090 and I've still got 2/3 of my capacity. 1214 01:03:57,090 --> 01:04:02,520 But there's a single point of failure here, an SPOF so to speak, 1215 01:04:02,520 --> 01:04:05,070 that could really derail the whole process. 1216 01:04:05,070 --> 01:04:09,150 What happens if this load balancer, which while pictorial is just a dot, 1217 01:04:09,150 --> 01:04:13,470 is actually a server underneath the hood itself? 1218 01:04:13,470 --> 01:04:15,990 What if that load balancer goes down, or what if that load 1219 01:04:15,990 --> 01:04:17,670 balancer itself gets overwhelmed? 1220 01:04:17,670 --> 01:04:22,290 It does not matter how many servers you have here, A through Z, if none of them 1221 01:04:22,290 --> 01:04:23,710 can be reached. 1222 01:04:23,710 --> 01:04:28,050 So this simple architecture alone is not a solution. 1223 01:04:28,050 --> 01:04:32,470 And indeed, this is what is meant by architecting network itself. 1224 01:04:32,470 --> 01:04:36,320 This design is probably not the best, especially for business. 1225 01:04:36,320 --> 01:04:40,300 And so let's start anew, at least down here inside this company, 1226 01:04:40,300 --> 01:04:44,650 and consider if one load balancer is not great, what's better than one? 1227 01:04:44,650 --> 01:04:46,510 Well, honestly, two. 1228 01:04:46,510 --> 01:04:48,550 And so let's now draw them a bit bigger, where 1229 01:04:48,550 --> 01:04:51,820 here we have a load balancer on the left, and here on the right, 1230 01:04:51,820 --> 01:04:53,770 and we'll number them 1 and 2. 1231 01:04:53,770 --> 01:04:58,330 Whereas our servers we'll continue to name A and B and C and perhaps even 1232 01:04:58,330 --> 01:05:02,170 through Z. And now we just have to ensure that we have connections 1233 01:05:02,170 --> 01:05:04,300 to both load balancers, and that each load 1234 01:05:04,300 --> 01:05:09,460 balancer can connect to each server in this sort of mesh network here. 1235 01:05:09,460 --> 01:05:12,730 It's wonderfully redundant now, albeit a bit complex. 1236 01:05:12,730 --> 01:05:15,340 But because we have all of these interconnections now, 1237 01:05:15,340 --> 01:05:21,280 we can ensure that even if one or two go down, data can still reach A, B, or C. 1238 01:05:21,280 --> 01:05:25,660 But how to know whether load balancer 1 should be doing all of this 1239 01:05:25,660 --> 01:05:27,040 or load balancer 2? 1240 01:05:27,040 --> 01:05:30,850 You know what, why don't we draw another connection between 1 and 2? 1241 01:05:30,850 --> 01:05:35,410 And a very common paradigm in systems is heartbeats, quite simply. 1242 01:05:35,410 --> 01:05:40,120 Much like you and I have every second or so a heartbeat saying we are alive, 1243 01:05:40,120 --> 01:05:42,730 we are alive, hello world, hello world, if you will, 1244 01:05:42,730 --> 01:05:48,340 so here might load balancers 1 and 2, themselves just servers, 1245 01:05:48,340 --> 01:05:53,050 say I am alive, I am alive, hello number 2, hello number 2. 1246 01:05:53,050 --> 01:05:57,010 And if 2 does not hear 1 eventually, or if 1 does not hear 2, 1247 01:05:57,010 --> 01:05:59,800 the other can just commandeer that role. 1248 01:05:59,800 --> 01:06:03,100 By default, only 1 will be load balancing, but if it goes offline, 1249 01:06:03,100 --> 01:06:05,470 2 will presume to take over. 1250 01:06:05,470 --> 01:06:08,680 And now we have this property generally known as high availability. 1251 01:06:08,680 --> 01:06:12,070 Even if we lose one or more servers can we still stay up, 1252 01:06:12,070 --> 01:06:16,000 and there is no single point of failure, at least here in this picture, 1253 01:06:16,000 --> 01:06:18,770 because we now have that second load balancer. 1254 01:06:18,770 --> 01:06:21,250 But if we look a little higher, it would seem 1255 01:06:21,250 --> 01:06:24,440 that we do actually have another single point of failure in here, 1256 01:06:24,440 --> 01:06:26,380 and now we go down the rabbit hole. 1257 01:06:26,380 --> 01:06:29,500 If this line here to the cloud, the internet, 1258 01:06:29,500 --> 01:06:33,520 represents my internet connection, my ISP, 1259 01:06:33,520 --> 01:06:38,110 what if that ISP, Comcast, Verizon or any other, itself goes down, 1260 01:06:38,110 --> 01:06:42,790 a big storm and a loss of power might take my whole business offline. 1261 01:06:42,790 --> 01:06:44,830 Well, the best way to solve that would be 1262 01:06:44,830 --> 01:06:48,040 to access someone else's internet connectivity 1263 01:06:48,040 --> 01:06:49,820 and make sure you're connected to that. 1264 01:06:49,820 --> 01:06:54,460 And in fact, if we keep going with load balancer 3 or even 4 1265 01:06:54,460 --> 01:06:59,770 or server D, E, and F, this picture very quickly starts to get so intertwined. 1266 01:06:59,770 --> 01:07:01,240 But this is how you do it. 1267 01:07:01,240 --> 01:07:06,580 And not too long ago was this done entirely with wires and hardware. 1268 01:07:06,580 --> 01:07:10,450 But these days this topology, if you will, this architecture, 1269 01:07:10,450 --> 01:07:12,640 is increasingly done in software. 1270 01:07:12,640 --> 01:07:15,730 And indeed, the whole thing is done in the cloud. 1271 01:07:15,730 --> 01:07:19,420 Less frequently do staff of companies find themselves crawling 1272 01:07:19,420 --> 01:07:22,630 along the floor and in wiring closets and in data centers, 1273 01:07:22,630 --> 01:07:25,510 so to speak, making these connections possible, 1274 01:07:25,510 --> 01:07:28,390 but rather they do it virtually in software. 1275 01:07:28,390 --> 01:07:30,920 And indeed, thus was born the cloud. 1276 01:07:30,920 --> 01:07:33,910 Well, it turns out that as Moore's law, so to speak, 1277 01:07:33,910 --> 01:07:37,930 helps us in each passing year, we seem to have computers that are 1278 01:07:37,930 --> 01:07:40,330 half as expensive and twice as fast. 1279 01:07:40,330 --> 01:07:44,230 Can we ride that sort of curve of innovation in such a way 1280 01:07:44,230 --> 01:07:47,830 that we can solve even more problems each year more quickly? 1281 01:07:47,830 --> 01:07:50,610 And yet, with each passing year, I the human 1282 01:07:50,610 --> 01:07:53,560 am not getting any better or faster at checking my email 1283 01:07:53,560 --> 01:07:57,490 or using the web, so we increasingly have on our laptops and desks 1284 01:07:57,490 --> 01:08:00,370 and our server rooms more computational power 1285 01:08:00,370 --> 01:08:03,440 frankly than we really know what to do. 1286 01:08:03,440 --> 01:08:08,500 And so increasingly in vogue these days is to virtualize that hardware, 1287 01:08:08,500 --> 01:08:13,330 and to take physical hardware with so many CPUs and so much RAM and so much 1288 01:08:13,330 --> 01:08:16,149 disk space and to write software that runs 1289 01:08:16,149 --> 01:08:22,359 on it that creates the illusion that one computer is two, or one computer is 10. 1290 01:08:22,359 --> 01:08:25,359 That is to say, through software can you write 1291 01:08:25,359 --> 01:08:31,210 code that virtualizes that hardware, thereby creating the illusion that you 1292 01:08:31,210 --> 01:08:36,340 can have one server per customer but all 10 of those customers 1293 01:08:36,340 --> 01:08:38,529 are on the same machine. 1294 01:08:38,529 --> 01:08:42,120 Virtualization includes products like VMware and Parallels 1295 01:08:42,120 --> 01:08:43,870 and other companies as well, and it's just 1296 01:08:43,870 --> 01:08:45,580 software that runs on top of the hardware 1297 01:08:45,580 --> 01:08:50,439 and creates this illusion, which then is all the better for business. 1298 01:08:50,439 --> 01:08:53,200 If you can sell one piece of hardware multiple times 1299 01:08:53,200 --> 01:08:55,960 but not necessarily in a way that you're over-provisioning it 1300 01:08:55,960 --> 01:09:00,220 to multiple customers, but rather you're isolating each of those customers 1301 01:09:00,220 --> 01:09:03,880 from one another, giving them not only the illusion of their own machine 1302 01:09:03,880 --> 01:09:07,180 but indeed, the constraints whereby my data can't 1303 01:09:07,180 --> 01:09:12,850 be accessed by another customer who only has cloud access there, too. 1304 01:09:12,850 --> 01:09:17,470 And indeed, this is really in part, why we have now this cloud. 1305 01:09:17,470 --> 01:09:20,000 The cloud is more of a buzz word than anything technical. 1306 01:09:20,000 --> 01:09:23,770 Indeed, using the cloud just means using servers somewhere else 1307 01:09:23,770 --> 01:09:25,750 that someone else is managing. 1308 01:09:25,750 --> 01:09:28,930 No longer do companies with as much frequency have their own server 1309 01:09:28,930 --> 01:09:32,580 room in their office, or their own data center in some warehouse somewhere. 1310 01:09:32,580 --> 01:09:35,790 Rather, they virtualized even that piece of their product 1311 01:09:35,790 --> 01:09:38,729 using Amazon or Microsoft or Google or others 1312 01:09:38,729 --> 01:09:41,207 out there that provide you with access to servers 1313 01:09:41,207 --> 01:09:43,290 that they themselves control, but they provide you 1314 01:09:43,290 --> 01:09:49,319 with access to the illusion of your very own servers known as virtual machines. 1315 01:09:49,319 --> 01:09:52,590 And via this process can we take ever more advantage 1316 01:09:52,590 --> 01:09:56,130 of so many of those new CPUs and disk space and RAM 1317 01:09:56,130 --> 01:09:58,020 that otherwise might frankly go to waste, 1318 01:09:58,020 --> 01:10:02,250 because there's only so much we can typically do with one such machine. 1319 01:10:02,250 --> 01:10:06,540 Hence you might now think of this design, this stack, so to speak, 1320 01:10:06,540 --> 01:10:07,200 as follows. 1321 01:10:07,200 --> 01:10:10,170 In green here pictured is infrastructure, the physical hardware 1322 01:10:10,170 --> 01:10:11,130 that you have bought. 1323 01:10:11,130 --> 01:10:14,340 Here in blue is the hypervisor, the software 1324 01:10:14,340 --> 01:10:18,660 called VMware or Parallels or something else, that virtualizes this hardware 1325 01:10:18,660 --> 01:10:21,240 and creates the illusion that you actually have 1326 01:10:21,240 --> 01:10:23,605 three machines, for instance, on one. 1327 01:10:23,605 --> 01:10:25,980 And within each of those machines, which you can think of 1328 01:10:25,980 --> 01:10:29,220 is just a separate window on that computer, double click 1329 01:10:29,220 --> 01:10:32,610 to open computer A so to speak and computer B and computer 1330 01:10:32,610 --> 01:10:37,170 C, each of those virtual machines, you can install your own operating system 1331 01:10:37,170 --> 01:10:39,750 differently in each of those virtual machines. 1332 01:10:39,750 --> 01:10:42,360 Some version of Windows in A, another in B, 1333 01:10:42,360 --> 01:10:46,590 and maybe Linux or Unix or something else in C. And then within A, B, and C 1334 01:10:46,590 --> 01:10:51,240 can you install your own apps, your own software, or so can your customers, 1335 01:10:51,240 --> 01:10:56,190 thereby being isolated, not so much physically but virtually, 1336 01:10:56,190 --> 01:10:57,750 from everyone else. 1337 01:10:57,750 --> 01:10:59,970 But of course, there's always a price. 1338 01:10:59,970 --> 01:11:02,820 While this might take better advantage of the increasing 1339 01:11:02,820 --> 01:11:06,180 computational resources that we have in these boxes, 1340 01:11:06,180 --> 01:11:08,820 there seems to be some duplication here. 1341 01:11:08,820 --> 01:11:10,950 And indeed, in computer science, anytime you 1342 01:11:10,950 --> 01:11:14,130 start duplicating resources or efforts there's 1343 01:11:14,130 --> 01:11:16,710 probably an opportunity for better design. 1344 01:11:16,710 --> 01:11:19,650 And while this technology itself is still nascent, 1345 01:11:19,650 --> 01:11:22,680 there's a newcomer to the field called containerization, 1346 01:11:22,680 --> 01:11:24,180 and it exists in multiple forms. 1347 01:11:24,180 --> 01:11:28,140 But containerization shares more software, in some sense, 1348 01:11:28,140 --> 01:11:31,680 underneath the hood, so that you might install an operating system not three 1349 01:11:31,680 --> 01:11:36,870 times but once, and share it across those machines but in such a way that 1350 01:11:36,870 --> 01:11:39,330 one cannot access the other. 1351 01:11:39,330 --> 01:11:42,630 And on top of that layer, here called Docker, 1352 01:11:42,630 --> 01:11:44,970 one of the most popular incarnations thereof, 1353 01:11:44,970 --> 01:11:47,730 you have as before your infrastructure, the actual hardware, 1354 01:11:47,730 --> 01:11:51,510 on top of which is your own operating system, be it Windows or Linux. 1355 01:11:51,510 --> 01:11:55,230 On top of that is this program called Docker that provides you 1356 01:11:55,230 --> 01:11:58,350 then with the ability to run A through F apps 1357 01:11:58,350 --> 01:12:02,640 instead of say, just three because the overhead, so to speak, 1358 01:12:02,640 --> 01:12:06,780 computationally is not quite as much as with virtual machines. 1359 01:12:06,780 --> 01:12:10,620 Here we have three operating systems, each installed independently 1360 01:12:10,620 --> 01:12:14,010 on the same hardware, we're just surely consume time, 1361 01:12:14,010 --> 01:12:17,820 whereas here you have just one operating system theoretically and then 1362 01:12:17,820 --> 01:12:21,000 more room for more apps there upon. 1363 01:12:21,000 --> 01:12:24,660 So whereas containerization allows you ultimately 1364 01:12:24,660 --> 01:12:27,300 to isolate one app from another, in virtual machines 1365 01:12:27,300 --> 01:12:30,390 allow you to isolate one machine from another, 1366 01:12:30,390 --> 01:12:34,410 they do this through different techniques and with disparate overhead. 1367 01:12:34,410 --> 01:12:37,200 And surely in the years to come will this overhead only 1368 01:12:37,200 --> 01:12:39,660 get chipped away at as we humans get better 1369 01:12:39,660 --> 01:12:44,940 about running more and more software and less and less but more capable 1370 01:12:44,940 --> 01:12:46,290 hardware. 1371 01:12:46,290 --> 01:12:49,260 There than we have these internet technologies all the way up 1372 01:12:49,260 --> 01:12:52,830 to cloud computing itself, whereas the technologies we've looked at 1373 01:12:52,830 --> 01:12:57,360 are fairly low level protocols that simply get zeros and ones from point A 1374 01:12:57,360 --> 01:13:02,340 to point B. Once we have that ability and we can stipulate that we can do it, 1375 01:13:02,340 --> 01:13:05,510 we can build any number of abstractions on top of it. 1376 01:13:05,510 --> 01:13:09,300 In HTTP, for instance, do we have effectively an application, 1377 01:13:09,300 --> 01:13:14,220 known as web browsing, via which we can transmit text and images and sounds 1378 01:13:14,220 --> 01:13:15,420 and so much more. 1379 01:13:15,420 --> 01:13:18,270 And via the cloud itself do we have the ability now 1380 01:13:18,270 --> 01:13:21,400 to slice up individual machines as though they are multiple 1381 01:13:21,400 --> 01:13:25,200 and that picture before can be implemented not with two load 1382 01:13:25,200 --> 01:13:30,990 balancers and three servers physically, but maybe, just maybe, with just one. 1383 01:13:30,990 --> 01:13:33,780 One server that's been so virtualized or in turn 1384 01:13:33,780 --> 01:13:38,310 containerized so that you can have different parts of its hardware each 1385 01:13:38,310 --> 01:13:41,490 implementing different pieces of functionality that collectively 1386 01:13:41,490 --> 01:13:43,070 implement that architecture. 1387 01:13:43,070 --> 01:13:45,570 And so whereas back in the day might you actually physically 1388 01:13:45,570 --> 01:13:49,080 wire all of those disparate types of machines together, now 1389 01:13:49,080 --> 01:13:52,980 can you do it virtually in software literally with keystrokes and mouse 1390 01:13:52,980 --> 01:13:56,430 clicks because someone has written software that abstracts away 1391 01:13:56,430 --> 01:14:00,690 that underlying hardware in such a way that you can think about it virtually. 1392 01:14:00,690 --> 01:14:03,810 Now at the end of the day, the servers in Google's and Microsoft 1393 01:14:03,810 --> 01:14:06,750 and Amazon's closets are still completely physical themselves 1394 01:14:06,750 --> 01:14:11,610 with so many cables, but you can reroute information, those zeros and ones, 1395 01:14:11,610 --> 01:14:15,000 different ways virtually thanks to these layers 1396 01:14:15,000 --> 01:14:19,550 that we've built on top of these internet technologies. 1397 01:14:19,550 --> 01:14:20,738