1 00:00:00,000 --> 00:00:00,499 2 00:00:00,499 --> 00:00:04,540 DAVID J. MALAN: All right, so the overarching question now, 3 00:00:04,540 --> 00:00:10,310 and we started down this road with our look at Dropbox, is the internet. 4 00:00:10,310 --> 00:00:14,550 So let me try to ask a loaded question deliberately. 5 00:00:14,550 --> 00:00:15,500 What is the internet? 6 00:00:15,500 --> 00:00:18,340 7 00:00:18,340 --> 00:00:20,430 >> Surely you all use It. 8 00:00:20,430 --> 00:00:21,287 >> AUDIENCE: Network? 9 00:00:21,287 --> 00:00:22,370 DAVID J. MALAN: A network? 10 00:00:22,370 --> 00:00:23,856 OK, what is a network? 11 00:00:23,856 --> 00:00:28,184 >> AUDIENCE: A connectivity between different systems. 12 00:00:28,184 --> 00:00:31,100 DAVID J. MALAN: OK, connectivity between different people and systems. 13 00:00:31,100 --> 00:00:33,430 All right, and what makes the internet an internet as 14 00:00:33,430 --> 00:00:38,484 opposed to just a network as we might have in just a building or a classroom? 15 00:00:38,484 --> 00:00:39,400 AUDIENCE: It's global. 16 00:00:39,400 --> 00:00:39,810 DAVID J. MALAN: It's global. 17 00:00:39,810 --> 00:00:42,360 All right, so it's a network of networks, if you will. 18 00:00:42,360 --> 00:00:46,720 Internet denoting connections across individual networks. 19 00:00:46,720 --> 00:00:48,457 And of course, there's different services 20 00:00:48,457 --> 00:00:50,040 that the internet provides these days. 21 00:00:50,040 --> 00:00:54,420 >> There's, of course, the world wide web with which all of us are familiar. 22 00:00:54,420 --> 00:00:56,560 There's services like email. 23 00:00:56,560 --> 00:00:59,620 There's services like chat or Google Chat. 24 00:00:59,620 --> 00:01:02,090 Or there's things like voice over IP. 25 00:01:02,090 --> 00:01:07,270 There's things like Skype, and Google Hangouts, and FaceTime, and the like. 26 00:01:07,270 --> 00:01:09,620 >> And so there's this layering concept in the internet. 27 00:01:09,620 --> 00:01:12,390 And indeed, this too is a fundamental concept 28 00:01:12,390 --> 00:01:15,650 in computer science of layering, or abstraction, 29 00:01:15,650 --> 00:01:17,407 where you build one thing down here. 30 00:01:17,407 --> 00:01:19,240 Then, you build something else on top of it, 31 00:01:19,240 --> 00:01:21,660 and then, something else on top of it, on top of it, on top of it. 32 00:01:21,660 --> 00:01:25,170 And so we'll see some manifestations of that in this discussion and, perhaps, 33 00:01:25,170 --> 00:01:26,600 others moving forward. 34 00:01:26,600 --> 00:01:29,400 >> So let's start to paint a picture of some of the technologies 35 00:01:29,400 --> 00:01:33,040 all around us by considering what is, perhaps, in most everyone's home 36 00:01:33,040 --> 00:01:35,900 here, and use that as a point of departure for a conversation more 37 00:01:35,900 --> 00:01:38,900 generally about how all of this stuff works, and what some of the issues 38 00:01:38,900 --> 00:01:42,090 underlying design decisions have to be when building networks 39 00:01:42,090 --> 00:01:43,800 and when using the internet. 40 00:01:43,800 --> 00:01:48,680 So back at home, we'll go back to my little laptop here. 41 00:01:48,680 --> 00:01:53,040 You probably have one or more computers, and maybe one or more phones, 42 00:01:53,040 --> 00:01:55,504 that are connected these days via Wi-Fi. 43 00:01:55,504 --> 00:01:57,170 Maybe once upon a time, you had a cable. 44 00:01:57,170 --> 00:02:00,020 Maybe you do still have a desktop computer at home that has a cable. 45 00:02:00,020 --> 00:02:03,340 But our story's not really going to change that much there. 46 00:02:03,340 --> 00:02:06,400 >> Here is the so-called cloud, or internet. 47 00:02:06,400 --> 00:02:11,620 And there are bunches of other things on the internet like Amazon.com, 48 00:02:11,620 --> 00:02:14,690 and Facebook, and Google, and Microsoft, and other such companies 49 00:02:14,690 --> 00:02:16,990 on the internet, and certainly people as well. 50 00:02:16,990 --> 00:02:21,660 But there's a whole lot of stuff that goes on between you and the internet. 51 00:02:21,660 --> 00:02:23,770 >> So let's first tease apart that. 52 00:02:23,770 --> 00:02:30,260 What is your computer, if wirelessly, connected to at home? 53 00:02:30,260 --> 00:02:34,402 What kind of devices gets you on the internet these days? 54 00:02:34,402 --> 00:02:35,290 >> AUDIENCE: Router. 55 00:02:35,290 --> 00:02:36,331 >> DAVID J. MALAN: A router. 56 00:02:36,331 --> 00:02:40,840 So you have this a home device called a router, whose purpose in life, 57 00:02:40,840 --> 00:02:43,650 ultimately, is to route information at the simplest form. 58 00:02:43,650 --> 00:02:48,860 If this is the internet over here, your computer has connectivity between it. 59 00:02:48,860 --> 00:02:51,280 And the router, meanwhile, somehow has connectivity 60 00:02:51,280 --> 00:02:53,420 between the rest of the internet. 61 00:02:53,420 --> 00:02:55,800 >> But there's even more going on inside of here. 62 00:02:55,800 --> 00:02:57,760 So let's dive in a little deeper. 63 00:02:57,760 --> 00:02:59,050 You go home. 64 00:02:59,050 --> 00:03:03,110 You open your laptop's lid or turn on your desktop for the first time ever, 65 00:03:03,110 --> 00:03:04,810 the first time in a while. 66 00:03:04,810 --> 00:03:06,340 What happens? 67 00:03:06,340 --> 00:03:10,550 >> What kinds of steps have to happen before you can actually 68 00:03:10,550 --> 00:03:12,260 get on the internet? 69 00:03:12,260 --> 00:03:13,540 Well, it turns out-- oh, yeah? 70 00:03:13,540 --> 00:03:14,500 Nakissa? 71 00:03:14,500 --> 00:03:15,163 Sorry? 72 00:03:15,163 --> 00:03:15,990 >> AUDIENCE: User ID. 73 00:03:15,990 --> 00:03:16,636 >> DAVID J. MALAN: A user ID. 74 00:03:16,636 --> 00:03:18,344 So you might have to log in to something. 75 00:03:18,344 --> 00:03:20,650 Although, typically at home, most typically 76 00:03:20,650 --> 00:03:22,320 this would just work these days. 77 00:03:22,320 --> 00:03:24,640 >> But as we just saw, in environments like universities, companies, 78 00:03:24,640 --> 00:03:25,431 you have to log in. 79 00:03:25,431 --> 00:03:28,320 So let's avoid the login scenario for now. 80 00:03:28,320 --> 00:03:30,000 Keep it simple. 81 00:03:30,000 --> 00:03:31,380 >> AUDIENCE: Open up a browser. 82 00:03:31,380 --> 00:03:33,255 >> DAVID J. MALAN: You might open a web browser. 83 00:03:33,255 --> 00:03:34,002 Or what, Pat? 84 00:03:34,002 --> 00:03:35,252 >> AUDIENCE: Number or passcode. 85 00:03:35,252 --> 00:03:36,960 DAVID J. MALAN: Ah, a number or passcode. 86 00:03:36,960 --> 00:03:39,251 So let's go with number, not so much passcode just yet. 87 00:03:39,251 --> 00:03:41,880 Let's not worry about security for this particular discussion. 88 00:03:41,880 --> 00:03:42,950 But a number. 89 00:03:42,950 --> 00:03:47,130 >> So, yeah, in fact, much like all of our homes or a building like 90 00:03:47,130 --> 00:03:48,420 has a physical address. 91 00:03:48,420 --> 00:03:54,910 This building is One Brattle Square in Cambridge, Massachusetts, 02138, USA. 92 00:03:54,910 --> 00:04:00,400 That address uniquely identifies us, in theory, in the whole world. 93 00:04:00,400 --> 00:04:01,360 >> AUDIENCE: An IP. 94 00:04:01,360 --> 00:04:04,710 >> DAVID J. MALAN: An IP address, exactly, is the analog in the computer world 95 00:04:04,710 --> 00:04:07,700 that uniquely addresses a computer. 96 00:04:07,700 --> 00:04:13,159 So an IP address, or internet protocol address, is just a numeric address. 97 00:04:13,159 --> 00:04:15,450 Computers prefer things that are a little simpler, that 98 00:04:15,450 --> 00:04:19,130 are easier to read than long phrases like One Brattle Square, Cambridge, 99 00:04:19,130 --> 00:04:20,110 Mass., and so forth. 100 00:04:20,110 --> 00:04:24,560 >> And so an IP address is a number of the form something 101 00:04:24,560 --> 00:04:29,160 dot something dot something dot something. 102 00:04:29,160 --> 00:04:33,890 And each of these somethings, as denoted by the pound sign here, 103 00:04:33,890 --> 00:04:37,720 is a number between 0 and 255. 104 00:04:37,720 --> 00:04:40,510 And so it's a four-dotted decimal number-- something 105 00:04:40,510 --> 00:04:42,260 dot something dot something dot something. 106 00:04:42,260 --> 00:04:45,270 >> And this numeric address, in theory, uniquely 107 00:04:45,270 --> 00:04:48,010 identifies a computer on the internet. 108 00:04:48,010 --> 00:04:50,420 So at the risk of oversimplifying, let's now 109 00:04:50,420 --> 00:04:55,450 assume that when I connect to Wi-Fi or via cable, at home, 110 00:04:55,450 --> 00:05:01,070 my home router is what is somehow giving me an IP address. 111 00:05:01,070 --> 00:05:03,690 Because gone are the days for the most part, 112 00:05:03,690 --> 00:05:06,560 at least locally here, where when you sign up 113 00:05:06,560 --> 00:05:11,000 for Comcast, or RCN, or your local internet service provider, 114 00:05:11,000 --> 00:05:14,220 no longer does a technician have to come to your house with a printout, 115 00:05:14,220 --> 00:05:19,020 and then have you, or him, or her type in your IP address into your computer. 116 00:05:19,020 --> 00:05:21,200 >> Rather, this is all discovered dynamically. 117 00:05:21,200 --> 00:05:23,576 When you open your laptop's lid or turn on your computer, 118 00:05:23,576 --> 00:05:26,158 your computer just starts broadcasting a message, essentially. 119 00:05:26,158 --> 00:05:26,900 It says, hello. 120 00:05:26,900 --> 00:05:27,610 I'm awake. 121 00:05:27,610 --> 00:05:29,550 What should my IP address be? 122 00:05:29,550 --> 00:05:32,640 >> And the purpose in life of a home router these days, among them, 123 00:05:32,640 --> 00:05:35,260 is to give you exactly one of these addresses. 124 00:05:35,260 --> 00:05:39,630 And the mechanism by which it does it, just to tease apart some jargon, 125 00:05:39,630 --> 00:05:42,660 is called a DHCP server. 126 00:05:42,660 --> 00:05:45,497 Fancy way of saying Dynamic Host Configuration Protocol. 127 00:05:45,497 --> 00:05:47,205 It's just a really fancy way of saying it 128 00:05:47,205 --> 00:05:52,640 is a piece of software running inside of our home router 129 00:05:52,640 --> 00:05:54,700 that, upon hearing your request-- hello. 130 00:05:54,700 --> 00:05:55,480 I'm online. 131 00:05:55,480 --> 00:05:58,214 Please give me an IP address-- responds with exactly that. 132 00:05:58,214 --> 00:06:01,380 And it tells you to use something dot something dot something dot something. 133 00:06:01,380 --> 00:06:04,057 And then, your Mac or PC does exactly that. 134 00:06:04,057 --> 00:06:05,890 And just to make this a little more concrete 135 00:06:05,890 --> 00:06:09,620 before we take your question, on Mac OS, and there's 136 00:06:09,620 --> 00:06:15,100 a comparable window in Windows, if I go to Network, 137 00:06:15,100 --> 00:06:18,280 I can actually see here that my laptop is connected 138 00:06:18,280 --> 00:06:20,080 to Harvard University, which is the Wi-Fi, 139 00:06:20,080 --> 00:06:23,870 and has the IP address 10.254.25.237. 140 00:06:23,870 --> 00:06:27,560 >> If I'm more curious, I can click Advanced on my Mac. 141 00:06:27,560 --> 00:06:31,660 I can go up to TCP/IP. 142 00:06:31,660 --> 00:06:37,030 And notice what is now familiar, perhaps. 143 00:06:37,030 --> 00:06:40,040 What protocol, what feature is my laptop using 144 00:06:40,040 --> 00:06:43,010 to do exactly what we've just described? 145 00:06:43,010 --> 00:06:43,842 DHCP. 146 00:06:43,842 --> 00:06:44,800 I can't even change it. 147 00:06:44,800 --> 00:06:46,508 Because I'm already configured right now. 148 00:06:46,508 --> 00:06:47,610 It's locked, this setting. 149 00:06:47,610 --> 00:06:50,410 But my computer's configured using DHCP. 150 00:06:50,410 --> 00:06:54,300 And it looks like what the Harvard's DHCP server 151 00:06:54,300 --> 00:07:01,062 has given me is an IP address-- and 254.25.237-- a subnet mask, 152 00:07:01,062 --> 00:07:02,270 which we won't go into today. 153 00:07:02,270 --> 00:07:04,580 >> But a subnet mask is just an additional number 154 00:07:04,580 --> 00:07:06,590 that specifies what network you're on. 155 00:07:06,590 --> 00:07:07,747 Maybe it's this room's. 156 00:07:07,747 --> 00:07:09,080 Maybe it's a different building. 157 00:07:09,080 --> 00:07:10,704 Maybe it's a different part of Harvard. 158 00:07:10,704 --> 00:07:13,600 It's a way of segmenting a local network. 159 00:07:13,600 --> 00:07:16,270 >> Router, that word sounds familiar. 160 00:07:16,270 --> 00:07:18,320 Because we were just talking about it here. 161 00:07:18,320 --> 00:07:21,070 And even though I'm on Harvard's network, not like a home network, 162 00:07:21,070 --> 00:07:23,250 the principles are still the same here. 163 00:07:23,250 --> 00:07:28,620 >> Harvard has also told me the IP address of a router-- 10.254.16.1. 164 00:07:28,620 --> 00:07:32,920 And as an aside, generally as a convention, but it's not required, 165 00:07:32,920 --> 00:07:38,250 a router's IP address does tend to end with .1, which is a useful signal, 166 00:07:38,250 --> 00:07:39,420 just to know this. 167 00:07:39,420 --> 00:07:41,610 So what do these things do? 168 00:07:41,610 --> 00:07:45,800 >> The IPv4 address, version 4, which is sort of the older but most popular 169 00:07:45,800 --> 00:07:49,760 version of internet protocol these days, is that address. 170 00:07:49,760 --> 00:07:50,980 I've got a router address. 171 00:07:50,980 --> 00:07:53,920 So why do I need to know a router's address? 172 00:07:53,920 --> 00:07:55,880 >> Isn't it sufficient to know where I am? 173 00:07:55,880 --> 00:07:57,946 174 00:07:57,946 --> 00:08:00,112 AUDIENCE: That's [INAUDIBLE] related to my question. 175 00:08:00,112 --> 00:08:02,354 So if you have two routers in the same room 176 00:08:02,354 --> 00:08:04,595 so we can get connected to each other, then you 177 00:08:04,595 --> 00:08:06,504 will get a separate IP address because it's 178 00:08:06,504 --> 00:08:07,832 going to be associated with a network. 179 00:08:07,832 --> 00:08:09,390 >> DAVID J. MALAN: Ah, so this is where we actually 180 00:08:09,390 --> 00:08:12,240 have to start teasing apart what we really mean by router. 181 00:08:12,240 --> 00:08:14,910 Because the term, certainly in the consumer market, is overused. 182 00:08:14,910 --> 00:08:17,680 So in this room alone, we have what most people would 183 00:08:17,680 --> 00:08:19,790 call two routers, these things with antennas 184 00:08:19,790 --> 00:08:21,960 and the blue lights on either side of the wall. 185 00:08:21,960 --> 00:08:25,087 >> But router, in this case, they're not. 186 00:08:25,087 --> 00:08:26,420 These aren't quite home routers. 187 00:08:26,420 --> 00:08:29,640 But let's just suppose, for simplicity, we do have two such things here. 188 00:08:29,640 --> 00:08:33,500 If you had two access points, as they're more properly called 189 00:08:33,500 --> 00:08:37,789 because of the antennas-- a wireless access point or AP-- 190 00:08:37,789 --> 00:08:41,309 they should be configured in a way that they, in turn, connect 191 00:08:41,309 --> 00:08:45,420 to one central device, whose purpose in life is to do what you're describing, 192 00:08:45,420 --> 00:08:46,840 to give out the IP address. 193 00:08:46,840 --> 00:08:49,160 >> If you did have two of these kinds of devices at home, 194 00:08:49,160 --> 00:08:53,950 maybe two Linksys, devices two D-Link devices, two AirPort Extremes at home, 195 00:08:53,950 --> 00:08:55,290 or AirPort Expresses. 196 00:08:55,290 --> 00:08:57,440 You can configure all of those products, even 197 00:08:57,440 --> 00:09:00,720 if you have two identical models, to make one the primary, 198 00:09:00,720 --> 00:09:02,390 and then the other the secondary. 199 00:09:02,390 --> 00:09:04,717 So that you run a wire between them, typically, 200 00:09:04,717 --> 00:09:07,050 or you have someone come do it for you behind the walls. 201 00:09:07,050 --> 00:09:08,320 >> And then, one is the primary. 202 00:09:08,320 --> 00:09:11,780 One is in charge of giving out IP addresses. 203 00:09:11,780 --> 00:09:14,610 And the other one is just responsible for extending 204 00:09:14,610 --> 00:09:16,510 the range of your wireless signal. 205 00:09:16,510 --> 00:09:18,990 In fact, at home I have two such things. 206 00:09:18,990 --> 00:09:21,220 >> We have in our office five such things, all of which 207 00:09:21,220 --> 00:09:22,470 are physically wired together. 208 00:09:22,470 --> 00:09:24,470 But it's just to give us more wireless coverage. 209 00:09:24,470 --> 00:09:26,570 But one of them is in charge. 210 00:09:26,570 --> 00:09:30,500 >> OK, so with that said, why does my Mac in this room right now, 211 00:09:30,500 --> 00:09:34,430 need to know what the IP address of the router is? 212 00:09:34,430 --> 00:09:37,234 Isn't it sufficient just to be told what my address is? 213 00:09:37,234 --> 00:09:38,400 AUDIENCE: But it can change. 214 00:09:38,400 --> 00:09:40,969 If you get connected to the VPN, it's going to be different. 215 00:09:40,969 --> 00:09:44,010 DAVID J. MALAN: Oh, now you're using another word I don't know yet-- VPN. 216 00:09:44,010 --> 00:09:44,750 So let's not go there. 217 00:09:44,750 --> 00:09:46,300 Because VPN's going to complicate it. 218 00:09:46,300 --> 00:09:50,640 I just want to get, little old me wants to get on the internet right now. 219 00:09:50,640 --> 00:09:53,715 Well, this really invites the question, how does the internet work? 220 00:09:53,715 --> 00:09:55,200 >> All right, I might have an address. 221 00:09:55,200 --> 00:09:56,590 That's all fine and good. 222 00:09:56,590 --> 00:09:58,590 But why do I have an address? 223 00:09:58,590 --> 00:10:01,665 >> Well, let's consider what really is going on on the internet. 224 00:10:01,665 --> 00:10:04,740 I'll use a different picture for the moment. 225 00:10:04,740 --> 00:10:12,930 And in the actual internet, we might have me over here on my laptop. 226 00:10:12,930 --> 00:10:15,160 We might have the internet over here. 227 00:10:15,160 --> 00:10:20,460 And then, we might have, let's say, Amazon.com this time. 228 00:10:20,460 --> 00:10:22,150 >> And this is me. 229 00:10:22,150 --> 00:10:26,440 And, somehow, I want to connect to Amazon.com, through the internet, 230 00:10:26,440 --> 00:10:30,710 and get my data from point A to point B. Or I guess, in Amazon, 231 00:10:30,710 --> 00:10:32,840 from point A to point Z in Amazon's case. 232 00:10:32,840 --> 00:10:35,410 >> So what is inside of this internet? 233 00:10:35,410 --> 00:10:39,450 It turns out, there's a whole bunch of things called routers. 234 00:10:39,450 --> 00:10:41,000 And now, we're mixing terms. 235 00:10:41,000 --> 00:10:43,442 But we'll see how even home routers relate to the dots 236 00:10:43,442 --> 00:10:44,900 that I've just drawn on the screen. 237 00:10:44,900 --> 00:10:48,429 >> A router on the internet is generally like a medium-sized device. 238 00:10:48,429 --> 00:10:49,720 It's not like an old mainframe. 239 00:10:49,720 --> 00:10:53,234 But it's a device that's probably this wide, maybe this tall, maybe this tall, 240 00:10:53,234 --> 00:10:53,900 maybe this tall. 241 00:10:53,900 --> 00:10:55,870 Depends on how expensive a model you have. 242 00:10:55,870 --> 00:10:59,203 >> And it's got a lot of cables coming into it and a lot of cables going out to it. 243 00:10:59,203 --> 00:11:02,980 And at the risk of oversimplifying, you can think of a router's purpose in life 244 00:11:02,980 --> 00:11:08,540 as being to take in data from this cable here, look at the information that's 245 00:11:08,540 --> 00:11:10,130 come in, and look at its address. 246 00:11:10,130 --> 00:11:13,240 Where is this information being sent? 247 00:11:13,240 --> 00:11:15,660 And then say, OK, I'm going to send this along this way. 248 00:11:15,660 --> 00:11:17,660 If I get another piece of information over here, 249 00:11:17,660 --> 00:11:19,160 it's destined for a different address. 250 00:11:19,160 --> 00:11:21,400 I'm going to send it this way, instead, up this cable. 251 00:11:21,400 --> 00:11:23,180 And if I see another piece of information destined 252 00:11:23,180 --> 00:11:25,980 for yet a different address, I'm going to send it out this cable, 253 00:11:25,980 --> 00:11:26,940 over in this way. 254 00:11:26,940 --> 00:11:30,440 >> So a router's purpose in life is to truly route information. 255 00:11:30,440 --> 00:11:34,740 And in it's simplest form, a router just has a big Excel file inside of it 256 00:11:34,740 --> 00:11:38,181 that says any IP address starting with the number 1, send it this way. 257 00:11:38,181 --> 00:11:40,680 Any IP address starting with the number 2, send it this way. 258 00:11:40,680 --> 00:11:41,804 Number 3, send it this way. 259 00:11:41,804 --> 00:11:43,460 Number 4, send it that way. 260 00:11:43,460 --> 00:11:47,080 >> Oversimplifying, but it uses those numbers and, specifically, 261 00:11:47,080 --> 00:11:50,990 prefixes of numbers, typically, to decide to go left, right, back, 262 00:11:50,990 --> 00:11:51,742 forward. 263 00:11:51,742 --> 00:11:54,700 Because a router, typically, has multiple connections to other routers. 264 00:11:54,700 --> 00:11:56,920 In fact, I've not drawn them here. 265 00:11:56,920 --> 00:12:01,560 >> But you can imagine this being a web, not to be confused with the web we use, 266 00:12:01,560 --> 00:12:06,740 but a web of devices, all of which are interconnected very deliberately so. 267 00:12:06,740 --> 00:12:09,810 In fact, the origins of the internet are militaristic in design. 268 00:12:09,810 --> 00:12:14,350 And one of the designing principles was that if a router, or worse, a city 269 00:12:14,350 --> 00:12:17,550 were taken out in a military sense, you want the data to be 270 00:12:17,550 --> 00:12:19,260 able to route around that problem. 271 00:12:19,260 --> 00:12:22,670 >> And so what happens when I send a request to Amazon.com for their home 272 00:12:22,670 --> 00:12:27,080 page, my data might leave my computer, go to my default router, 273 00:12:27,080 --> 00:12:29,580 or default gateway as it's often called. 274 00:12:29,580 --> 00:12:34,200 Then, maybe that router will decide to send it here, here, here, here, here, 275 00:12:34,200 --> 00:12:37,770 here, here, and then on its way to Amazon. 276 00:12:37,770 --> 00:12:40,540 >> And that was an arbitrary path I drew. 277 00:12:40,540 --> 00:12:45,620 But what's noteworthy about the red line I just drew? 278 00:12:45,620 --> 00:12:48,330 How would you describe it? 279 00:12:48,330 --> 00:12:49,710 >> AUDIENCE: It's not direct. 280 00:12:49,710 --> 00:12:51,043 >> DAVID J. MALAN: It's not direct. 281 00:12:51,043 --> 00:12:57,880 So contrary to the popular saying, "The shortest distance between two points 282 00:12:57,880 --> 00:13:00,980 is a straight line," it's not necessarily true on the internet 283 00:13:00,980 --> 00:13:02,780 when it comes to routing information. 284 00:13:02,780 --> 00:13:05,980 Because geographic distance isn't necessarily the only metric 285 00:13:05,980 --> 00:13:07,030 you care about. 286 00:13:07,030 --> 00:13:11,530 Rather, what else might govern what direction the data should take in order 287 00:13:11,530 --> 00:13:13,564 to get from point A to point B? 288 00:13:13,564 --> 00:13:14,230 AUDIENCE: Speed? 289 00:13:14,230 --> 00:13:15,146 DAVID J. MALAN: Speed. 290 00:13:15,146 --> 00:13:20,550 So it turns out you might configure a router to favor a faster connection. 291 00:13:20,550 --> 00:13:22,960 Even if you might have to go a few hundred extra miles, 292 00:13:22,960 --> 00:13:25,870 maybe it's just faster to go this way than over, maybe, 293 00:13:25,870 --> 00:13:29,100 an old school satellite connection this way just to get from one point 294 00:13:29,100 --> 00:13:29,600 to another. 295 00:13:29,600 --> 00:13:32,571 It doesn't even have to be physical devices on the ground. 296 00:13:32,571 --> 00:13:35,070 It can be physical devices in the sky, for instance, or even 297 00:13:35,070 --> 00:13:37,200 underwater these days, or so forth. 298 00:13:37,200 --> 00:13:38,420 >> So that's true. 299 00:13:38,420 --> 00:13:42,814 What else might dictate that a company, an internet service provider, or ISP, 300 00:13:42,814 --> 00:13:45,855 want to send data this way instead of that way, even though it's farther? 301 00:13:45,855 --> 00:13:50,470 302 00:13:50,470 --> 00:13:54,960 >> Well, it turns out the way the internet itself is governed commercially 303 00:13:54,960 --> 00:13:57,770 is that there's a lot of big players out here on the internet, 304 00:13:57,770 --> 00:14:02,327 whether it's Comcast, or Verizon, or Level 3, or more arcane names that you 305 00:14:02,327 --> 00:14:04,910 might not have heard of but that are fairly big infrastructure 306 00:14:04,910 --> 00:14:09,240 companies that compose the internet's backbone-- the wiring, the routers, 307 00:14:09,240 --> 00:14:11,930 the cabling that you just don't really see or care about. 308 00:14:11,930 --> 00:14:14,820 Because it's all in the inside run commercially. 309 00:14:14,820 --> 00:14:17,010 >> Well, there are things called peering points 310 00:14:17,010 --> 00:14:20,320 whereby a big ISP might have some server, 311 00:14:20,320 --> 00:14:22,950 might have some routers and some cables in a data center. 312 00:14:22,950 --> 00:14:25,000 And other ISPs might have the same. 313 00:14:25,000 --> 00:14:27,820 And other ISPs might have the same all inside the same data center. 314 00:14:27,820 --> 00:14:28,740 >> And the intraconnect. 315 00:14:28,740 --> 00:14:31,970 It's a peering point in so far as they all connect. 316 00:14:31,970 --> 00:14:33,240 That's where peers connect. 317 00:14:33,240 --> 00:14:35,350 >> And by nature of financial arrangements, it 318 00:14:35,350 --> 00:14:38,740 might be the case that Comcast has agreed to send as much of its data 319 00:14:38,740 --> 00:14:41,830 as it can this way instead of this way. 320 00:14:41,830 --> 00:14:43,740 Because, maybe, the vendor over here is going 321 00:14:43,740 --> 00:14:48,089 to charge them more per gigabyte to send their data over in that direction. 322 00:14:48,089 --> 00:14:51,130 So it might be financial decisions that govern which direction things go. 323 00:14:51,130 --> 00:14:54,270 >> It might just be performance implications, even more commonly. 324 00:14:54,270 --> 00:14:55,450 Routers get overloaded. 325 00:14:55,450 --> 00:14:57,430 If there's a lot of people get home at 5:00 PM 326 00:14:57,430 --> 00:15:00,860 and start getting on the internet, maybe there's congestion on the internet. 327 00:15:00,860 --> 00:15:03,380 And the algorithms, the software running on routers, 328 00:15:03,380 --> 00:15:05,590 generally will say, if I start to get overloaded, 329 00:15:05,590 --> 00:15:08,030 I should provide some feedback to other routers near me 330 00:15:08,030 --> 00:15:10,400 so that they, hopefully, go in another direction, 331 00:15:10,400 --> 00:15:12,560 much like you would avoid a traffic jam. 332 00:15:12,560 --> 00:15:16,540 >> So this is not all that unlikely of a path that data might take from point A 333 00:15:16,540 --> 00:15:18,920 to point B. And in fact, you can generally 334 00:15:18,920 --> 00:15:23,080 assume that your data is going to take 30 or fewer such hops from point A 335 00:15:23,080 --> 00:15:27,340 to point B. That is there might be as many as 30 or so routers between you 336 00:15:27,340 --> 00:15:28,400 and point B. 337 00:15:28,400 --> 00:15:29,850 >> And we can, sometimes, see this. 338 00:15:29,850 --> 00:15:31,820 Let me see if the network here cooperates. 339 00:15:31,820 --> 00:15:35,000 Otherwise, I'll try a different example. 340 00:15:35,000 --> 00:15:38,170 Let me see if I can do it on this network. 341 00:15:38,170 --> 00:15:38,950 And I can. 342 00:15:38,950 --> 00:15:47,310 >> So I have just run, let me simplify my outputs slightly. 343 00:15:47,310 --> 00:15:52,640 I'm going to do not that. 344 00:15:52,640 --> 00:15:53,910 Here, OK. 345 00:15:53,910 --> 00:15:57,106 >> So I'm going to do the following command called traceroute. 346 00:15:57,106 --> 00:15:58,480 So right now, I'm just on my Mac. 347 00:15:58,480 --> 00:16:01,146 I'm in an old school black and white interface, nothing like DOS 348 00:16:01,146 --> 00:16:01,860 from yesteryear. 349 00:16:01,860 --> 00:16:03,720 But I just want to see some textual output. 350 00:16:03,720 --> 00:16:06,050 >> And I, literally, here at Harvard University 351 00:16:06,050 --> 00:16:10,650 want to trace the route between me and www.cnn.com. 352 00:16:10,650 --> 00:16:13,077 So let's see what happens now when I hit Enter. 353 00:16:13,077 --> 00:16:15,410 A whole bunch of stuff starts flashing up on the screen. 354 00:16:15,410 --> 00:16:18,090 >> And let's see if we can't make some sense of this. 355 00:16:18,090 --> 00:16:22,720 So 1, 2, 3, 4, 5, 6, 7, and it's kind of hanging right now. 356 00:16:22,720 --> 00:16:24,930 We'll see if it completes this process or not. 357 00:16:24,930 --> 00:16:27,900 It turns out that each of the lines of output, on the screen, 358 00:16:27,900 --> 00:16:29,380 represent something. 359 00:16:29,380 --> 00:16:32,170 And based on our leading discussion thus far, 360 00:16:32,170 --> 00:16:36,500 what do each of these lines of output, numbered 1 through 11 at the moment, 361 00:16:36,500 --> 00:16:37,430 represent? 362 00:16:37,430 --> 00:16:38,614 >> AUDIENCE: Different routers. 363 00:16:38,614 --> 00:16:41,280 DAVID J. MALAN: Different routers, different dots on the screen. 364 00:16:41,280 --> 00:16:43,196 And so what this program, traceroute, is doing 365 00:16:43,196 --> 00:16:45,760 is it's literally tracing the route between me and CNN.com. 366 00:16:45,760 --> 00:16:52,160 So in this case, step 1 is, apparently, a router whose IP address is what? 367 00:16:52,160 --> 00:16:54,229 >> AUDIENCE: [INAUDIBLE] 368 00:16:54,229 --> 00:16:56,520 DAVID J. MALAN: Yeah, but specifically, its IP address. 369 00:16:56,520 --> 00:16:58,040 Remember, its IP address is numeric. 370 00:16:58,040 --> 00:17:00,520 So to just make sure we're all on the same page, what's 371 00:17:00,520 --> 00:17:03,360 the IP address of the first router between me and Harvard? 372 00:17:03,360 --> 00:17:06,800 I mean, sorry, between me and CNN? 373 00:17:06,800 --> 00:17:07,691 >> AUDIENCE: [INAUDIBLE] 374 00:17:07,691 --> 00:17:08,690 DAVID J. MALAN: Perfect. 375 00:17:08,690 --> 00:17:09,670 AUDIENCE: [INAUDIBLE] 376 00:17:09,670 --> 00:17:10,180 DAVID J. MALAN: Exactly. 377 00:17:10,180 --> 00:17:12,170 We're just inferring this from the reality 378 00:17:12,170 --> 00:17:15,115 that this first hop, so to speak, just has that address. 379 00:17:15,115 --> 00:17:16,740 It doesn't have a name for some reason. 380 00:17:16,740 --> 00:17:19,448 But that's just because the humans decided not to give it a name. 381 00:17:19,448 --> 00:17:20,170 And so be it. 382 00:17:20,170 --> 00:17:22,951 >> Step 2 is another router. 383 00:17:22,951 --> 00:17:24,450 But again, I said it was convention. 384 00:17:24,450 --> 00:17:26,720 It's not required that routers IPs end in .1. 385 00:17:26,720 --> 00:17:27,920 This one does not. 386 00:17:27,920 --> 00:17:32,200 The second router's IP is this. 387 00:17:32,200 --> 00:17:35,310 >> Now, it looks like the humans got a little more organized 388 00:17:35,310 --> 00:17:37,690 and have started naming their routers with what 389 00:17:37,690 --> 00:17:40,064 look like URLs or portions of URLs. 390 00:17:40,064 --> 00:17:40,730 But they're not. 391 00:17:40,730 --> 00:17:43,040 They're just the names that humans give to things. 392 00:17:43,040 --> 00:17:46,610 >> And it, apparently, is the case that this router, not surprisingly, 393 00:17:46,610 --> 00:17:49,392 is owned by whom probably? 394 00:17:49,392 --> 00:17:50,600 It's probably Harvard, right? 395 00:17:50,600 --> 00:17:53,550 Because the name of the thing ends in harvard.edu. 396 00:17:53,550 --> 00:17:54,550 What is the name? 397 00:17:54,550 --> 00:17:58,990 coregw1, core just means important, in the middle. 398 00:17:58,990 --> 00:18:01,984 gw is-- I said it earlier. 399 00:18:01,984 --> 00:18:02,810 >> AUDIENCE: Gateway. 400 00:18:02,810 --> 00:18:06,120 >> DAVID J. MALAN: Gateway, just a synonym for router. 401 00:18:06,120 --> 00:18:09,010 So this is the very important core gateway number 1. 402 00:18:09,010 --> 00:18:10,290 I don't know what te means. 403 00:18:10,290 --> 00:18:11,411 3-5, don't know. 404 00:18:11,411 --> 00:18:12,910 core, probably means the same thing. 405 00:18:12,910 --> 00:18:15,890 >> .net.harvard.edu, doesn't necessarily look clean. 406 00:18:15,890 --> 00:18:18,770 But it's useful to some system administrator somewhere at Harvard. 407 00:18:18,770 --> 00:18:22,710 Step 4, I'm inferring from convention. 408 00:18:22,710 --> 00:18:24,816 What do you think 4 represents? 409 00:18:24,816 --> 00:18:26,950 It's still a router. 410 00:18:26,950 --> 00:18:31,280 >> What does bdr probably, what does it sound like? 411 00:18:31,280 --> 00:18:31,880 Border. 412 00:18:31,880 --> 00:18:36,040 So this is probably a router that's physically on the border of Harvard 413 00:18:36,040 --> 00:18:39,470 and the rest of the world, so on the edge of the campus somewhere. 414 00:18:39,470 --> 00:18:43,070 >> Step 5 is interesting. 415 00:18:43,070 --> 00:18:45,660 Step 5 still says harvard. 416 00:18:45,660 --> 00:18:49,300 But NoX tends to stand for Northern Crossroads, which 417 00:18:49,300 --> 00:18:53,710 is a very popular peering point-- as I described earlier, a data center where 418 00:18:53,710 --> 00:18:57,230 lots of different people, Harvard and other big ISPs, come together 419 00:18:57,230 --> 00:19:00,640 and interconnect their cabling so that data can go out elsewhere 420 00:19:00,640 --> 00:19:01,590 on the internet. 421 00:19:01,590 --> 00:19:04,740 >> And now, things get a little more interesting. 422 00:19:04,740 --> 00:19:06,940 I don't know where this is just yet. 423 00:19:06,940 --> 00:19:11,322 Apparently, rtr, I'm guessing, is router. 424 00:19:11,322 --> 00:19:15,080 Equinix in New York is possibly the origin of that. 425 00:19:15,080 --> 00:19:21,300 But internet2 is a super fast internet connectivity among universities, 426 00:19:21,300 --> 00:19:21,860 especially. 427 00:19:21,860 --> 00:19:23,943 So that seems to be what we're connected to there. 428 00:19:23,943 --> 00:19:27,460 For whatever reason, the routers in steps 7, 8, and 9 429 00:19:27,460 --> 00:19:28,610 are just not answering us. 430 00:19:28,610 --> 00:19:30,790 That's probably because of either misconfiguration 431 00:19:30,790 --> 00:19:31,920 or conscious configuration. 432 00:19:31,920 --> 00:19:35,250 Whoever runs those routers doesn't care to disclose information. 433 00:19:35,250 --> 00:19:38,230 >> But step 10 is interesting enough. 434 00:19:38,230 --> 00:19:43,540 Because I can guess from this, with some probability, 435 00:19:43,540 --> 00:19:48,370 that my data, the data leaving my laptop, by step 436 00:19:48,370 --> 00:19:53,020 10-- 10 steps later-- has entered what geography? 437 00:19:53,020 --> 00:19:54,270 New York. 438 00:19:54,270 --> 00:19:58,040 >> And how fast did it take my data, from my laptop, to get to New York 439 00:19:58,040 --> 00:20:00,760 on its way to CNN would you guess? 440 00:20:00,760 --> 00:20:02,240 28 milliseconds. 441 00:20:02,240 --> 00:20:04,020 And this tool not only traces the route. 442 00:20:04,020 --> 00:20:05,380 It also times things. 443 00:20:05,380 --> 00:20:06,630 >> And things can get congested. 444 00:20:06,630 --> 00:20:10,222 So the numbers could sometimes jump up or down a little unexpectedly. 445 00:20:10,222 --> 00:20:12,680 But if you think, now, how long it takes to get to New York 446 00:20:12,680 --> 00:20:16,050 from here, which is probably about four or so hours by car or train, 447 00:20:16,050 --> 00:20:18,945 it's much faster to send yourself via electronically 448 00:20:18,945 --> 00:20:22,732 if it takes just 28 milliseconds to get from here to there. 449 00:20:22,732 --> 00:20:25,440 Now unfortunately, the other routers don't seem to be disclosing. 450 00:20:25,440 --> 00:20:26,356 Let's try another one. 451 00:20:26,356 --> 00:20:30,030 Just for kicks, let's try Amazon.com and see 452 00:20:30,030 --> 00:20:32,715 if the routers are a little more cooperating, knowing that it 453 00:20:32,715 --> 00:20:34,340 could take a completely different path. 454 00:20:34,340 --> 00:20:36,992 So maybe we won't hit as much blockages there. 455 00:20:36,992 --> 00:20:38,910 >> It looks a little different here. 456 00:20:38,910 --> 00:20:41,940 I don't think we saw aws sum1 net. 457 00:20:41,940 --> 00:20:44,790 And in fact, aws is Amazon Web Services. 458 00:20:44,790 --> 00:20:47,517 Harvard has a service called Direct Connect with Amazon, 459 00:20:47,517 --> 00:20:49,350 where we pay a little bit of money to Amazon 460 00:20:49,350 --> 00:20:51,410 to get faster connectivity to Amazon's network. 461 00:20:51,410 --> 00:20:53,659 So we use a lot of their cloud services, some of which 462 00:20:53,659 --> 00:20:55,120 we might talk about a little later. 463 00:20:55,120 --> 00:20:57,560 >> Seems the routers here, too, are being a little shy. 464 00:20:57,560 --> 00:20:59,560 So we don't see all that much more. 465 00:20:59,560 --> 00:21:02,380 But let's see if we can glean a little something more 466 00:21:02,380 --> 00:21:04,600 by going a different direction altogether. 467 00:21:04,600 --> 00:21:07,807 >> Let's try our friends at Stanford.edu. 468 00:21:07,807 --> 00:21:08,890 See if we get any farther. 469 00:21:08,890 --> 00:21:12,066 470 00:21:12,066 --> 00:21:15,430 No, still being a little private. 471 00:21:15,430 --> 00:21:18,060 472 00:21:18,060 --> 00:21:22,514 Seems this same path is hiding itself a little bit. 473 00:21:22,514 --> 00:21:24,930 So we'll try one more if this doesn't yield juicy results. 474 00:21:24,930 --> 00:21:31,150 But you can kind of see those IPs, I can make an inference here. 475 00:21:31,150 --> 00:21:35,830 What might you conclude, even if you're not a network engineer, 476 00:21:35,830 --> 00:21:40,260 is true based on the numbers you're seeing in step 7 through 9 and 12 477 00:21:40,260 --> 00:21:42,110 through 15? 478 00:21:42,110 --> 00:21:43,780 >> What's an educated guess here? 479 00:21:43,780 --> 00:21:46,690 What's a true statement? 480 00:21:46,690 --> 00:21:49,515 >> AUDIENCE: Something around the 205 [INAUDIBLE]. 481 00:21:49,515 --> 00:21:52,320 >> DAVID J. MALAN: True, and I'm looking at the numbers to the right. 482 00:21:52,320 --> 00:21:57,210 Where are these routers, even though they don't seem to have names? 483 00:21:57,210 --> 00:22:00,150 >> AUDIENCE: Somewhere further away than [INAUDIBLE]. 484 00:22:00,150 --> 00:22:01,330 >> DAVID J. MALAN: Yeah. 485 00:22:01,330 --> 00:22:02,640 And I don't know where. 486 00:22:02,640 --> 00:22:05,330 But notice step 7 says 123 milliseconds. 487 00:22:05,330 --> 00:22:09,310 But just three hops prior, it only took 3 milliseconds. 488 00:22:09,310 --> 00:22:10,509 >> AUDIENCE: So [INAUDIBLE] 489 00:22:10,509 --> 00:22:11,800 DAVID J. MALAN: Not here, yeah. 490 00:22:11,800 --> 00:22:13,430 So maybe it is middle of the country. 491 00:22:13,430 --> 00:22:14,846 Maybe it's the West Coast already. 492 00:22:14,846 --> 00:22:16,840 I really don't know, completely guessing. 493 00:22:16,840 --> 00:22:20,890 >> But given that every other hop thereafter also took more time, 494 00:22:20,890 --> 00:22:23,410 feels reasonable to conclude that there's just 495 00:22:23,410 --> 00:22:26,390 physical geography between us and them. 496 00:22:26,390 --> 00:22:30,700 And to be clear, each of these numbers isn't pairwise. 497 00:22:30,700 --> 00:22:33,230 It doesn't mean each hop takes 100 milliseconds. 498 00:22:33,230 --> 00:22:36,660 >> Each of these numbers represents from point A to that intermediate hop. 499 00:22:36,660 --> 00:22:39,842 So in general, they should just be incrementing ever so slightly. 500 00:22:39,842 --> 00:22:42,550 So the fact that all of these, now, are roughly 100 milliseconds, 501 00:22:42,550 --> 00:22:44,490 feels like it's got to be farther away. 502 00:22:44,490 --> 00:22:45,870 And I'll try one last one. 503 00:22:45,870 --> 00:22:48,480 >> But I'm guessing we're going to see a bunch of stars. 504 00:22:48,480 --> 00:22:52,545 Let's try the Japanese version of CNN's website. 505 00:22:52,545 --> 00:22:54,180 Oh, OK, now it's getting juicy. 506 00:22:54,180 --> 00:22:59,010 Because apparently it really has taken a different path through the US. 507 00:22:59,010 --> 00:23:00,990 >> Let's take a look at, oh, this is great. 508 00:23:00,990 --> 00:23:01,970 This one finished. 509 00:23:01,970 --> 00:23:03,860 So this is powerful. 510 00:23:03,860 --> 00:23:11,203 In steps 1 through 4, what town are we probably in? 511 00:23:11,203 --> 00:23:12,037 >> AUDIENCE: Cambridge. 512 00:23:12,037 --> 00:23:13,119 DAVID J. MALAN: Cambridge. 513 00:23:13,119 --> 00:23:14,170 And why do you say that? 514 00:23:14,170 --> 00:23:15,680 It's all harvard.edu. 515 00:23:15,680 --> 00:23:18,330 In step 5, where might we be? 516 00:23:18,330 --> 00:23:18,890 Boston. 517 00:23:18,890 --> 00:23:20,550 In step 6, where might we be? 518 00:23:20,550 --> 00:23:21,350 >> AUDIENCE: Number 6. 519 00:23:21,350 --> 00:23:22,812 >> DAVID J. MALAN: And where is San Jose? 520 00:23:22,812 --> 00:23:23,960 >> AUDIENCE: It's in California. 521 00:23:23,960 --> 00:23:24,740 >> DAVID J. MALAN: California? 522 00:23:24,740 --> 00:23:27,448 It's probably the San Jose, California, which is kind of amazing. 523 00:23:27,448 --> 00:23:28,500 Now, why do we say that? 524 00:23:28,500 --> 00:23:30,770 So one, San Jose-- that's the only San Jose I know of. 525 00:23:30,770 --> 00:23:32,020 But I'm sure there are others. 526 00:23:32,020 --> 00:23:36,756 But corroborating that hunch is what other piece of data? 527 00:23:36,756 --> 00:23:38,789 >> AUDIENCE: The geographical. 528 00:23:38,789 --> 00:23:40,580 DAVID J. MALAN: The geographical path feels 529 00:23:40,580 --> 00:23:42,940 like that's the direction we probably are going to go 530 00:23:42,940 --> 00:23:45,250 to get to Japan over the Pacific Ocean. 531 00:23:45,250 --> 00:23:48,320 And what furthermore piece of data corroborates that, 532 00:23:48,320 --> 00:23:52,660 yeah, we just took a left turn to California? 533 00:23:52,660 --> 00:23:53,950 The time really jumps. 534 00:23:53,950 --> 00:24:02,550 >> Notice we go from 1.989 milliseconds, in row 5, to 74 milliseconds in row 6, 535 00:24:02,550 --> 00:24:05,300 which suggests there's probably some big body of land. 536 00:24:05,300 --> 00:24:10,590 So there's also some really expensive, powerful cable, it would seem, 537 00:24:10,590 --> 00:24:15,370 going across the entire country leading from Boston to San Jose in this case. 538 00:24:15,370 --> 00:24:16,740 Don't know where step 7 is. 539 00:24:16,740 --> 00:24:20,030 >> But it gets really cool when we look, now, at step 8 and 9 onward. 540 00:24:20,030 --> 00:24:22,100 Where are those routers? 541 00:24:22,100 --> 00:24:23,090 Probably Japan. 542 00:24:23,090 --> 00:24:27,706 So what is between step 7 and 8 most likely? 543 00:24:27,706 --> 00:24:28,680 >> AUDIENCE: London. 544 00:24:28,680 --> 00:24:30,846 >> DAVID J. MALAN: Yeah, so there's also trans-Pacific, 545 00:24:30,846 --> 00:24:35,750 transatlantic, transoceanic cabling that really big ships just 546 00:24:35,750 --> 00:24:38,950 roll out and put on the bottom of the ocean, that carries 547 00:24:38,950 --> 00:24:40,460 all of this internet connectivity. 548 00:24:40,460 --> 00:24:42,440 And that's why our network connection gets 549 00:24:42,440 --> 00:24:44,520 so much slower, relatively speaking. 550 00:24:44,520 --> 00:24:46,687 And I mentioned earlier, generally, and well, this 551 00:24:46,687 --> 00:24:49,020 is something a web developer might want to keep in mind. 552 00:24:49,020 --> 00:24:50,770 >> We won't go into too much detail tomorrow. 553 00:24:50,770 --> 00:24:54,090 But generally, a human will start to notice delays on a web page 554 00:24:54,090 --> 00:24:56,775 if something takes 200 or more milliseconds to load. 555 00:24:56,775 --> 00:24:59,670 I mean, that's still super fast-- a fifth of a second. 556 00:24:59,670 --> 00:25:02,270 But this is one of the metrics that a web developer 557 00:25:02,270 --> 00:25:05,290 should keep in mind when designing a page, when he or she is 558 00:25:05,290 --> 00:25:10,360 creating graphics, or adding in third-party software-- advertisements, 559 00:25:10,360 --> 00:25:10,970 perhaps. 560 00:25:10,970 --> 00:25:12,900 >> You don't want to slow down the page load. 561 00:25:12,900 --> 00:25:15,320 You, ideally, want to keep it as fast as possible. 562 00:25:15,320 --> 00:25:18,440 And if you start having page load times of 200 plus milliseconds, 563 00:25:18,440 --> 00:25:21,420 the human's going to notice that it's not truly instant. 564 00:25:21,420 --> 00:25:24,770 And so these numbers aren't all that unfamiliar to us. 565 00:25:24,770 --> 00:25:29,340 >> So this, then, captures a little more quantitatively what's going on here. 566 00:25:29,340 --> 00:25:31,870 And it truly is, even though I'm sort of bemoaning 567 00:25:31,870 --> 00:25:33,545 how slow it is to get to Japan. 568 00:25:33,545 --> 00:25:36,050 I mean, it's still less than half a second 569 00:25:36,050 --> 00:25:38,310 to get your data halfway around the world, 570 00:25:38,310 --> 00:25:42,730 whether that's an email, a web page, or anything else along these lines. 571 00:25:42,730 --> 00:25:47,500 >> All right, so how does this, then, relate to where we were going earlier. 572 00:25:47,500 --> 00:25:49,120 We were talking about an IP address. 573 00:25:49,120 --> 00:25:52,500 And every computer, on the internet, has a unique address, we'll say for now-- 574 00:25:52,500 --> 00:25:54,660 but a bit of a white lie-- called an IP address. 575 00:25:54,660 --> 00:25:56,890 And that IP address is used how? 576 00:25:56,890 --> 00:26:00,230 >> It's used by these routers to decide whether the data should go here, here, 577 00:26:00,230 --> 00:26:01,280 here, or here. 578 00:26:01,280 --> 00:26:04,256 And I simplified things by saying it just looks at the first digit. 579 00:26:04,256 --> 00:26:05,380 But that's not really true. 580 00:26:05,380 --> 00:26:08,060 It looks at more of the digits, typically, to figure this out. 581 00:26:08,060 --> 00:26:11,310 >> And either humans have decided or computer algorithms 582 00:26:11,310 --> 00:26:13,980 have decided what the best route is for that data. 583 00:26:13,980 --> 00:26:15,950 So that, hopefully, within 30 or so hops, 584 00:26:15,950 --> 00:26:18,850 it eventually gets to its destination. 585 00:26:18,850 --> 00:26:22,270 Once I've requested Amazon's home page, how does Amazon 586 00:26:22,270 --> 00:26:26,330 know to whom to send the home page? 587 00:26:26,330 --> 00:26:28,680 >> Right, in old school form, I send a postcard 588 00:26:28,680 --> 00:26:31,500 to Amazon saying, please send me your home page. 589 00:26:31,500 --> 00:26:35,350 Amazon's going to respond with some kind of message, some kind of postcard, 590 00:26:35,350 --> 00:26:36,970 some kind of envelope of its own. 591 00:26:36,970 --> 00:26:39,560 So let's do exactly this just to visualize this for a moment. 592 00:26:39,560 --> 00:26:41,700 >> So the internet these days, as you may have heard, 593 00:26:41,700 --> 00:26:44,200 seems to be filled with cats and pictures of cats. 594 00:26:44,200 --> 00:26:48,300 So suppose that someone's trying to visit not Amazon.com, but some website 595 00:26:48,300 --> 00:26:49,790 to download a picture of a cat. 596 00:26:49,790 --> 00:26:53,805 So my laptop wants to send a request, via the web, to some websites saying, 597 00:26:53,805 --> 00:26:55,560 give me today's picture of a cat. 598 00:26:55,560 --> 00:26:58,780 >> And this cat, hopefully, has to then get downloaded to my computer. 599 00:26:58,780 --> 00:27:00,094 So what's really happening? 600 00:27:00,094 --> 00:27:01,510 Well, let me go ahead and do this. 601 00:27:01,510 --> 00:27:04,430 I've got four old school envelopes here. 602 00:27:04,430 --> 00:27:05,680 And this is a useful metaphor. 603 00:27:05,680 --> 00:27:08,260 Because this is, essentially, electronically what 604 00:27:08,260 --> 00:27:10,570 happens underneath the hood when I send a message. 605 00:27:10,570 --> 00:27:15,850 >> So for the sake of discussion, let's say this is no longer Amazon. 606 00:27:15,850 --> 00:27:18,200 This is cats.com or something. 607 00:27:18,200 --> 00:27:24,250 And my IP address, I'm going to say for simplicity, is 1.2.3.4. 608 00:27:24,250 --> 00:27:29,950 And the cat website will be 5.6.7.8. 609 00:27:29,950 --> 00:27:33,090 >> And what this means for me is the following. 610 00:27:33,090 --> 00:27:40,840 I am going to put 1.2.3.4, 1.2.3.4. 611 00:27:40,840 --> 00:27:43,555 And I'll hold these up in a second. 612 00:27:43,555 --> 00:27:46,350 1.2.3.4. 613 00:27:46,350 --> 00:27:51,087 I'm going to put my return address on all of these envelopes, 614 00:27:51,087 --> 00:27:52,920 in the top left-hand corner as you typically 615 00:27:52,920 --> 00:27:54,211 would when mailing an envelope. 616 00:27:54,211 --> 00:27:58,905 And now, just take a guess what needs to go in the main part of the envelope. 617 00:27:58,905 --> 00:27:59,780 AUDIENCE: [INAUDIBLE] 618 00:27:59,780 --> 00:28:00,430 DAVID J. MALAN: Yeah, yeah. 619 00:28:00,430 --> 00:28:00,930 That's all. 620 00:28:00,930 --> 00:28:03,600 So 5.6.7.8. 621 00:28:03,600 --> 00:28:13,970 So 5.6.7.8, 5.6.7.8, 5.6.7.8, 5.6.7.8. 622 00:28:13,970 --> 00:28:18,450 >> And now, this cat here, by design, is going 623 00:28:18,450 --> 00:28:21,030 to be chomped up into multiple pieces after I request it. 624 00:28:21,030 --> 00:28:22,960 So let's say, for the sake of this story, 625 00:28:22,960 --> 00:28:24,890 I've already sent out an envelope of my own 626 00:28:24,890 --> 00:28:28,114 to cats.com saying, please give me today's cats. 627 00:28:28,114 --> 00:28:30,280 So what we're talking about, now, is the latter half 628 00:28:30,280 --> 00:28:35,450 of the transaction, when the reply comes back from cats.com to little old me. 629 00:28:35,450 --> 00:28:39,380 >> So it turns out that the protocol, that these computers speak, 630 00:28:39,380 --> 00:28:44,470 is generally something called TCP/IP, which you probably have seen somewhere 631 00:28:44,470 --> 00:28:48,670 or other on your Mac, or PC, or media, or on a movie, 632 00:28:48,670 --> 00:28:50,040 or a TV show, or the like. 633 00:28:50,040 --> 00:28:51,370 So what does this all mean? 634 00:28:51,370 --> 00:28:53,900 This is actually a combination of two protocols. 635 00:28:53,900 --> 00:28:57,050 >> And a protocol is just a language that two computers speak. 636 00:28:57,050 --> 00:28:59,620 In fact, a protocol in the human world, hello. 637 00:28:59,620 --> 00:29:00,370 My name's David. 638 00:29:00,370 --> 00:29:01,570 >> AUDIENCE: Hello. 639 00:29:01,570 --> 00:29:02,945 >> DAVID J. MALAN: Nice to meet you. 640 00:29:02,945 --> 00:29:05,930 So this is a fairly stupid human protocol, where I extend my hand. 641 00:29:05,930 --> 00:29:07,320 And Arwa extends her hand. 642 00:29:07,320 --> 00:29:09,050 And we meet and greet. 643 00:29:09,050 --> 00:29:11,150 And then, the transaction is complete. 644 00:29:11,150 --> 00:29:13,980 >> But it's a protocol in so far as it's a set of steps 645 00:29:13,980 --> 00:29:18,900 that it's a script that both of us know how to act out. 646 00:29:18,900 --> 00:29:19,900 And there's a beginning. 647 00:29:19,900 --> 00:29:21,060 And there's an end to it. 648 00:29:21,060 --> 00:29:24,170 Similarly, when it comes to computers, they 649 00:29:24,170 --> 00:29:27,350 have protocols-- sets of conventions that, in fairness, have 650 00:29:27,350 --> 00:29:28,830 been decided by humans. 651 00:29:28,830 --> 00:29:33,220 But they're used by computers that dictate how computers intercommunicate. 652 00:29:33,220 --> 00:29:38,490 >> IP is the half of this pair of protocols that governs how you address computers. 653 00:29:38,490 --> 00:29:39,860 How do you address computers? 654 00:29:39,860 --> 00:29:42,790 Exactly like this. 655 00:29:42,790 --> 00:29:45,380 >> So IP is a set of conventions that says make 656 00:29:45,380 --> 00:29:47,370 sure you have an IP address of the recipient 657 00:29:47,370 --> 00:29:49,000 and an IP address of the sender. 658 00:29:49,000 --> 00:29:51,070 And use it in dotted, something dot something 659 00:29:51,070 --> 00:29:52,700 dot something dot something format. 660 00:29:52,700 --> 00:29:58,820 For instance, TCP is a different protocol, used in conjunction with IP, 661 00:29:58,820 --> 00:30:01,410 that generally guarantees delivery. 662 00:30:01,410 --> 00:30:04,590 IP just tells computers how to address each other. 663 00:30:04,590 --> 00:30:07,560 >> It's just when I said David, you said Arwa. 664 00:30:07,560 --> 00:30:11,860 That was our IP equivalent, our steps for addressing each other. 665 00:30:11,860 --> 00:30:15,970 But to confirm delivery, computers use a protocol 666 00:30:15,970 --> 00:30:19,960 called TCP, Transmission Control Protocol, which is just 667 00:30:19,960 --> 00:30:23,690 a fancy way of saying there are additional features used 668 00:30:23,690 --> 00:30:28,650 by computers to ensure that all of these envelopes I keep holding up actually 669 00:30:28,650 --> 00:30:30,140 get to their destination. 670 00:30:30,140 --> 00:30:32,810 >> And one mechanism for that is as follows. 671 00:30:32,810 --> 00:30:37,870 I seem to have how many envelopes here at the moment? 672 00:30:37,870 --> 00:30:38,659 >> AUDIENCE: Four. 673 00:30:38,659 --> 00:30:39,700 DAVID J. MALAN: OK, four. 674 00:30:39,700 --> 00:30:43,890 So feels like, just to be a little tidy about this all, I'm going to number 675 00:30:43,890 --> 00:30:46,900 them in the bottom left-hand corner, like the memo field. 676 00:30:46,900 --> 00:30:52,940 And I'm just going to say 1, 2, 3, 4. 677 00:30:52,940 --> 00:30:56,050 But now, start thinking a bit more like an engineer. 678 00:30:56,050 --> 00:30:59,700 >> Have I jotted down as much information as I actually have? 679 00:30:59,700 --> 00:31:02,850 Can I be even more uptight than this when it comes 680 00:31:02,850 --> 00:31:04,150 to specifying these numbers? 681 00:31:04,150 --> 00:31:07,342 What more could I put on the envelope that just maybe is useful? 682 00:31:07,342 --> 00:31:08,580 >> AUDIENCE: [INAUDIBLE]. 683 00:31:08,580 --> 00:31:08,920 >> DAVID J. MALAN: What's that? 684 00:31:08,920 --> 00:31:10,636 >> AUDIENCE: The number of total envelopes that you have. 685 00:31:10,636 --> 00:31:12,270 >> DAVID J. MALAN: Yeah, the total number. 686 00:31:12,270 --> 00:31:15,514 I feel like I'm not capturing as much available information as I have. 687 00:31:15,514 --> 00:31:17,180 So, you know, I probably should do that. 688 00:31:17,180 --> 00:31:22,660 So 1 out of 4, 2 out of 4, 3 out of 4, 4 out of 4. 689 00:31:22,660 --> 00:31:23,840 >> And now, why is that? 690 00:31:23,840 --> 00:31:27,530 What's the intuition behind also jotting down the total number of envelopes 691 00:31:27,530 --> 00:31:29,380 I'm about to send? 692 00:31:29,380 --> 00:31:31,130 AUDIENCE: Find out if something's missing. 693 00:31:31,130 --> 00:31:32,129 DAVID J. MALAN: Exactly. 694 00:31:32,129 --> 00:31:34,100 So TCP leverages this. 695 00:31:34,100 --> 00:31:36,450 It uses something called a sequence number, very similar 696 00:31:36,450 --> 00:31:38,730 in spirit to what we're drawing here. 697 00:31:38,730 --> 00:31:42,381 But it needs to know how many packets, or envelopes, there're supposed to be. 698 00:31:42,381 --> 00:31:44,130 Because otherwise, how do you know if when 699 00:31:44,130 --> 00:31:46,870 you get 1, 2, and 3 should there have been a 4? 700 00:31:46,870 --> 00:31:50,950 >> You can infer if you get 1, 2, and 4, wait a minute. 701 00:31:50,950 --> 00:31:52,700 There probably was a number 3. 702 00:31:52,700 --> 00:31:55,020 And in fact, that's closer to how TCP works. 703 00:31:55,020 --> 00:31:58,240 But for our purposes now, let's just be super precise and say this is 1 of 4, 704 00:31:58,240 --> 00:32:01,690 2 of 4, 3 of 4, 4 of 4 so that we know at the end of the process, 705 00:32:01,690 --> 00:32:05,750 the end of the handshake if you will, if the whole thing is actually complete. 706 00:32:05,750 --> 00:32:09,220 >> Now, it turns out TCP does one other thing. 707 00:32:09,220 --> 00:32:14,520 TCP also allows a computer to provide multiple services. 708 00:32:14,520 --> 00:32:19,050 And by services I mean web, email, chats, voice over IP. 709 00:32:19,050 --> 00:32:22,500 There's bunches of different things the internet and servers on the internet 710 00:32:22,500 --> 00:32:23,570 can do these days. 711 00:32:23,570 --> 00:32:28,699 >> So for instance, just thinking hypothetically, if I hand this to Arwa, 712 00:32:28,699 --> 00:32:31,240 how do you know what's going to be inside of these envelopes? 713 00:32:31,240 --> 00:32:33,130 Is it going to be a request for a web page? 714 00:32:33,130 --> 00:32:34,090 Is it an email? 715 00:32:34,090 --> 00:32:35,680 Is it an instant message? 716 00:32:35,680 --> 00:32:37,450 >> You don't know based on this information. 717 00:32:37,450 --> 00:32:41,730 All you know is who it's from, who it's to, and what number of envelope 718 00:32:41,730 --> 00:32:42,230 this is. 719 00:32:42,230 --> 00:32:43,965 So we need one more piece of information. 720 00:32:43,965 --> 00:32:45,840 And we're talking about the web in this case, 721 00:32:45,840 --> 00:32:47,090 just because it's pictures of cats. 722 00:32:47,090 --> 00:32:48,320 But it could be anything. 723 00:32:48,320 --> 00:32:50,440 >> So I could write web on it. 724 00:32:50,440 --> 00:32:53,950 Or more properly, I could write HTTP, which 725 00:32:53,950 --> 00:32:58,250 is the protocol used by web browsers and servers to communicate. 726 00:32:58,250 --> 00:32:59,560 More on that in a moment. 727 00:32:59,560 --> 00:33:02,480 But I'm going to be even more computer-oriented than that. 728 00:33:02,480 --> 00:33:06,510 >> It turns out that humans, some time ago, decided 729 00:33:06,510 --> 00:33:10,090 to assign unique numbers to popular internet services. 730 00:33:10,090 --> 00:33:15,020 HTTP happens to use the number 80, or as we'll see, 443. 731 00:33:15,020 --> 00:33:17,770 But 80 is fine for now. 732 00:33:17,770 --> 00:33:22,530 >> SMTP, which is a fancy way of saying outbound email. 733 00:33:22,530 --> 00:33:24,910 This is Simple Mail Transfer Protocol. 734 00:33:24,910 --> 00:33:27,810 Just the set of conventions that governs how computers send email 735 00:33:27,810 --> 00:33:29,200 from one computer to another. 736 00:33:29,200 --> 00:33:33,430 Happens to use the number 25. 737 00:33:33,430 --> 00:33:37,710 >> FTP, with which some of you might be familiar, what does FTP do? 738 00:33:37,710 --> 00:33:39,001 >> AUDIENCE: File transfer. 739 00:33:39,001 --> 00:33:42,000 DAVID J. MALAN: Yeah, File Transfer Protocol should not be used anymore. 740 00:33:42,000 --> 00:33:44,082 If your company still uses it, you're probably 741 00:33:44,082 --> 00:33:46,040 using it without encryption, which means you've 742 00:33:46,040 --> 00:33:49,140 been sending your username and password across the internet all of this time. 743 00:33:49,140 --> 00:33:50,223 Probably shouldn't use it. 744 00:33:50,223 --> 00:33:51,890 Because secure versions exist. 745 00:33:51,890 --> 00:33:53,820 It uses port 21. 746 00:33:53,820 --> 00:33:56,762 And there's bunches of other examples like this. 747 00:33:56,762 --> 00:33:58,470 So in other words, humans, some time ago, 748 00:33:58,470 --> 00:34:01,820 decided that, hey, let's just assign numbers to all these services 749 00:34:01,820 --> 00:34:03,280 to keep everything nice and tidy. 750 00:34:03,280 --> 00:34:05,571 But what that really means, even though this envelope's 751 00:34:05,571 --> 00:34:09,530 starting to look a little arcane, I can now put on the end of it, 752 00:34:09,530 --> 00:34:11,989 for instance, colon 80. 753 00:34:11,989 --> 00:34:13,780 And I'm just going to use a colon here just 754 00:34:13,780 --> 00:34:16,969 because that's computer convention. 755 00:34:16,969 --> 00:34:21,440 I'm going to add a colon 80 to the end of the address 756 00:34:21,440 --> 00:34:27,260 just to arcanely capture the fact that this is destined for 5.6.7.8 port 80. 757 00:34:27,260 --> 00:34:31,610 So now, when I hand it to Arwa, assuming she is running an email server, a web 758 00:34:31,610 --> 00:34:33,864 server, an instant message server, she now 759 00:34:33,864 --> 00:34:37,301 knows that upon seeing the number 80, oh, this should go into this bucket. 760 00:34:37,301 --> 00:34:38,800 Or this should go into this mailbox. 761 00:34:38,800 --> 00:34:41,380 Or this should be handed off to this service that's 762 00:34:41,380 --> 00:34:43,659 running on her particular server. 763 00:34:43,659 --> 00:34:45,650 >> So now, the last piece of it, this is the cat. 764 00:34:45,650 --> 00:34:47,250 And why do I have four envelopes? 765 00:34:47,250 --> 00:34:51,810 Well, one of the features offered by IP, in addition to addressing, 766 00:34:51,810 --> 00:34:54,179 is also the ability to fragment requests. 767 00:34:54,179 --> 00:34:55,830 >> This is a pretty big cat. 768 00:34:55,830 --> 00:35:02,910 And in fact, for efficiency and to maximize throughput, so to speak, 769 00:35:02,910 --> 00:35:07,110 what fragmentation is good for is taking big files like this 770 00:35:07,110 --> 00:35:11,070 and tearing them up into smaller pieces for fragments, 771 00:35:11,070 --> 00:35:14,240 we'll say in this case, the upside of which 772 00:35:14,240 --> 00:35:17,800 is that just because one person is monopolizing 773 00:35:17,800 --> 00:35:20,480 your network by downloading really big video files, 774 00:35:20,480 --> 00:35:24,110 those video files are still going to be chopped up into super small pieces 775 00:35:24,110 --> 00:35:26,950 and transmitted one or more at a time. 776 00:35:26,950 --> 00:35:29,750 So that little of me with my cat, or my email, 777 00:35:29,750 --> 00:35:32,900 or my instant message, or something more important than any of those things 778 00:35:32,900 --> 00:35:37,604 can also have an opportunity to go out from your computer or your home 779 00:35:37,604 --> 00:35:38,770 to the rest of the internet. 780 00:35:38,770 --> 00:35:40,100 >> And it's up to the software and the routers 781 00:35:40,100 --> 00:35:41,970 to decide how to send these things out. 782 00:35:41,970 --> 00:35:44,370 But eventually, they will all get to their destinations. 783 00:35:44,370 --> 00:35:49,950 As an aside, if you've ever thought about the issue of, or read about, 784 00:35:49,950 --> 00:35:52,162 the issue of net neutrality? 785 00:35:52,162 --> 00:35:55,120 Net neutrality, this was in vogue for quite some time, in this country, 786 00:35:55,120 --> 00:35:58,970 where politically it became a hotbed issue. 787 00:35:58,970 --> 00:36:02,930 Because some companies, for instance, wanted to prioritize certain traffic 788 00:36:02,930 --> 00:36:03,870 over others. 789 00:36:03,870 --> 00:36:06,610 For instance, people were worried that maybe 790 00:36:06,610 --> 00:36:12,160 Microsoft with Skype, or Google with Hangouts, or maybe Netflix with videos 791 00:36:12,160 --> 00:36:15,840 would, maybe, be willing to pay Comcast, or Verizon, 792 00:36:15,840 --> 00:36:19,567 or who knows, even the government more money to prioritize their traffic. 793 00:36:19,567 --> 00:36:21,650 Now, what does that actually mean technologically? 794 00:36:21,650 --> 00:36:25,980 That might mean that an ISP, upon seeing certain IP addresses, 795 00:36:25,980 --> 00:36:28,500 might give those packets, those envelopes, priority. 796 00:36:28,500 --> 00:36:32,960 Upon seeing certain port numbers, might give those packets priority and, then, 797 00:36:32,960 --> 00:36:35,840 slow down my e-mail, or slow down my service. 798 00:36:35,840 --> 00:36:42,780 And it really just boils down to prioritizing or quality of service 799 00:36:42,780 --> 00:36:44,647 for these various different services. 800 00:36:44,647 --> 00:36:46,980 So and that's how it would be done on a technical level. 801 00:36:46,980 --> 00:36:49,021 >> So in any case, we now have these four envelopes. 802 00:36:49,021 --> 00:36:54,000 I'm going to put one quarter of the cat in this envelope, one 803 00:36:54,000 --> 00:37:02,370 quarter of the cat in this envelope, one quarter in this envelope. 804 00:37:02,370 --> 00:37:10,440 And now, suppose my goal is to send these, let's say, to Jeffery. 805 00:37:10,440 --> 00:37:13,890 Recall that just like the picture up here suggests, 806 00:37:13,890 --> 00:37:16,270 they don't all necessarily have to take the same route. 807 00:37:16,270 --> 00:37:20,467 >> So if I am the cats.com server, I'm responding to Jeffery's request 808 00:37:20,467 --> 00:37:21,050 in this story. 809 00:37:21,050 --> 00:37:22,510 I'm going to pass one off here. 810 00:37:22,510 --> 00:37:24,250 They probably start in the same location. 811 00:37:24,250 --> 00:37:26,980 So Arwa, if you want to decide whom to route this to next, 812 00:37:26,980 --> 00:37:28,690 you can go ahead and send it that way. 813 00:37:28,690 --> 00:37:31,120 And don't send it to the same router every time. 814 00:37:31,120 --> 00:37:31,640 >> [CHUCKLING] 815 00:37:31,640 --> 00:37:33,139 >> So Dan's getting a little congested. 816 00:37:33,139 --> 00:37:36,342 817 00:37:36,342 --> 00:37:37,920 There you go. 818 00:37:37,920 --> 00:37:39,670 All right. 819 00:37:39,670 --> 00:37:41,837 And so those need to make their way around the room. 820 00:37:41,837 --> 00:37:44,378 And again, you as a router generally know Jeffery's that way. 821 00:37:44,378 --> 00:37:45,840 So just keep sending it that way. 822 00:37:45,840 --> 00:37:53,170 823 00:37:53,170 --> 00:37:55,340 And now, suppose Dan didn't quite make it. 824 00:37:55,340 --> 00:37:59,290 And so this packet got dropped along the way, if I can steal that away from you 825 00:37:59,290 --> 00:38:00,193 forcefully, sorry. 826 00:38:00,193 --> 00:38:06,150 827 00:38:06,150 --> 00:38:06,760 >> Very nice. 828 00:38:06,760 --> 00:38:09,119 It's not necessarily the most geographic direct route. 829 00:38:09,119 --> 00:38:10,410 Still trying to get to Jeffery. 830 00:38:10,410 --> 00:38:11,959 And complete. 831 00:38:11,959 --> 00:38:13,000 Now, this was deliberate. 832 00:38:13,000 --> 00:38:14,875 I didn't mean to hit your hand when I did it. 833 00:38:14,875 --> 00:38:17,720 But packet 4 of 4 did get lost or dropped. 834 00:38:17,720 --> 00:38:20,550 And maybe that happened because there was a hardware error. 835 00:38:20,550 --> 00:38:23,864 Maybe that's because Dan got overloaded or Andrew got overloaded. 836 00:38:23,864 --> 00:38:24,530 But it happened. 837 00:38:24,530 --> 00:38:26,488 So if, Jefferey, you'd like to reassemble that. 838 00:38:26,488 --> 00:38:29,700 What picture do you have in front of you right now? 839 00:38:29,700 --> 00:38:32,144 If you'd like to take the messages out of the envelopes. 840 00:38:32,144 --> 00:38:33,840 >> AUDIENCE: 1, 2, 3. 841 00:38:33,840 --> 00:38:37,570 >> DAVID J. MALAN: OK, go ahead and open them up and take the pieces of cat out. 842 00:38:37,570 --> 00:38:39,390 >> AUDIENCE: [INAUDIBLE]. 843 00:38:39,390 --> 00:38:42,360 >> DAVID J. MALAN: All right, so we have the top left of the cat, 844 00:38:42,360 --> 00:38:47,760 the bottom right, and the bottom left. 845 00:38:47,760 --> 00:38:49,910 So we're missing the top right of the cat. 846 00:38:49,910 --> 00:38:53,770 So TCP, again, is this protocol that kicks in here. 847 00:38:53,770 --> 00:38:59,190 So Jeffery, upon receiving 1, and 2, and 3 of 4, in this scenario, 848 00:38:59,190 --> 00:39:03,370 somehow sends a message back to me, via some route-- 849 00:39:03,370 --> 00:39:05,840 could be any number of different hops here-- that says, 850 00:39:05,840 --> 00:39:06,798 hey, but wait a minute. 851 00:39:06,798 --> 00:39:08,670 Resend 4 out of 4. 852 00:39:08,670 --> 00:39:12,480 >> And so what I have to go and do is-- it's all electronic data. 853 00:39:12,480 --> 00:39:15,740 So I can very easily copy the cat inside of my own RAM or memory. 854 00:39:15,740 --> 00:39:17,950 I can come up with another envelope, put another copy 855 00:39:17,950 --> 00:39:19,640 of just this fragment for efficiency. 856 00:39:19,640 --> 00:39:21,181 I don't have to resend the whole cat. 857 00:39:21,181 --> 00:39:23,500 I can put it in a new envelope, send it all around. 858 00:39:23,500 --> 00:39:26,290 And some number of milliseconds later, Jeffrey, hopefully, 859 00:39:26,290 --> 00:39:28,640 has the entirety of the packet. 860 00:39:28,640 --> 00:39:30,860 So it took a little time to tell this story. 861 00:39:30,860 --> 00:39:32,610 And that's not unreasonable. 862 00:39:32,610 --> 00:39:35,150 >> Because there is a lot of complexity going on here. 863 00:39:35,150 --> 00:39:36,530 These protocols aren't simple. 864 00:39:36,530 --> 00:39:39,040 But if you want to guarantee delivery in this way, 865 00:39:39,040 --> 00:39:42,540 you need to have those extra measures, that extra metadata, if you will. 866 00:39:42,540 --> 00:39:45,230 >> And just to toss a term out there, data that we care about 867 00:39:45,230 --> 00:39:46,860 is like the cat inside the envelope. 868 00:39:46,860 --> 00:39:50,227 Metadata, which is data that's useful but not what I actually 869 00:39:50,227 --> 00:39:52,310 care about at the end of the day, is all the stuff 870 00:39:52,310 --> 00:39:54,184 that I wrote on the outside of the envelope-- 871 00:39:54,184 --> 00:39:57,850 the address, the destination, the port number, the sequence numbers. 872 00:39:57,850 --> 00:39:58,850 All of that is metadata. 873 00:39:58,850 --> 00:39:59,560 It's useful. 874 00:39:59,560 --> 00:40:02,591 But it's not what I ultimately want out of that whole transaction. 875 00:40:02,591 --> 00:40:04,840 Now, this seems pretty compelling that no matter what, 876 00:40:04,840 --> 00:40:07,310 Jeffrey will get a copy of that cat, assuming we 877 00:40:07,310 --> 00:40:10,160 have a physical connection to him at the end of the day. 878 00:40:10,160 --> 00:40:12,680 But are there certain types of applications 879 00:40:12,680 --> 00:40:16,980 where guaranteeing delivery would be a bad design 880 00:40:16,980 --> 00:40:21,424 decision and an undesirable feature? 881 00:40:21,424 --> 00:40:24,270 882 00:40:24,270 --> 00:40:27,280 Do you always want to retransmit like I proposed just now? 883 00:40:27,280 --> 00:40:31,935 884 00:40:31,935 --> 00:40:33,740 >> AUDIENCE: Pay for it, I guess. 885 00:40:33,740 --> 00:40:36,182 >> DAVID J. MALAN: If you pay, what might you mean? 886 00:40:36,182 --> 00:40:38,070 >> AUDIENCE: [INAUDIBLE]. 887 00:40:38,070 --> 00:40:40,270 >> DAVID J. MALAN: Oh, OK, good question. 888 00:40:40,270 --> 00:40:42,620 Might you get double charged if it's like checking out 889 00:40:42,620 --> 00:40:44,700 of Amazon or something? 890 00:40:44,700 --> 00:40:46,090 Short answer, no. 891 00:40:46,090 --> 00:40:50,410 Because in that these fragments are, so to speak, at a lower level. 892 00:40:50,410 --> 00:40:53,910 And they need to be reassembled before you could actually be charged. 893 00:40:53,910 --> 00:40:56,046 So good thought but not worrisome in this case. 894 00:40:56,046 --> 00:41:01,930 895 00:41:01,930 --> 00:41:03,150 >> Let's reason backwards. 896 00:41:03,150 --> 00:41:06,484 So retransmitting required a little more effort. 897 00:41:06,484 --> 00:41:07,900 That didn't feel like a huge deal. 898 00:41:07,900 --> 00:41:10,370 But it does require a little more time. 899 00:41:10,370 --> 00:41:13,030 >> Because now, Jeffrey has to wait few more milliseconds 900 00:41:13,030 --> 00:41:16,340 to get that fourth piece of data again. 901 00:41:16,340 --> 00:41:18,256 Minor blip, but it will slow things down. 902 00:41:18,256 --> 00:41:19,880 And maybe the internet's super crowded. 903 00:41:19,880 --> 00:41:22,760 >> And maybe Andrew keeps dropping packets on the floor. 904 00:41:22,760 --> 00:41:25,360 So these delays start to accumulate. 905 00:41:25,360 --> 00:41:29,320 So after a while, this cat doesn't take 74 milliseconds to get there. 906 00:41:29,320 --> 00:41:31,390 It takes 1.5 seconds. 907 00:41:31,390 --> 00:41:35,100 >> And maybe the next picture of a cat takes half a second, two seconds. 908 00:41:35,100 --> 00:41:37,850 In other words, we start bogging things down. 909 00:41:37,850 --> 00:41:42,380 What applications might be annoying to bog down in this way? 910 00:41:42,380 --> 00:41:43,790 >> AUDIENCE: Video streams or voice. 911 00:41:43,790 --> 00:41:47,110 >> DAVID J. MALAN: Yeah, so what if you're watching a baseball game online, 912 00:41:47,110 --> 00:41:51,760 or what if you're Skyping with someone, or FaceTime, 913 00:41:51,760 --> 00:41:56,060 especially in the case of video conferencing, kind of not acceptable, 914 00:41:56,060 --> 00:42:01,260 at some point, to start hearing your human response a second late. 915 00:42:01,260 --> 00:42:05,160 Wouldn't it be better to just leave that packet on the ground, 916 00:42:05,160 --> 00:42:09,230 only show 3/4 of the cat, or in this case, a video conferencing, 917 00:42:09,230 --> 00:42:13,030 show 3/4 of my face with my mouth moving as I'm talking, 918 00:42:13,030 --> 00:42:16,097 and just let the audio, at least, go through, for instance. 919 00:42:16,097 --> 00:42:17,930 So there's this notion of quality of service 920 00:42:17,930 --> 00:42:20,010 here, more generally, where you know what, 921 00:42:20,010 --> 00:42:23,210 for real-time applications-- whether it's streaming a sporting event 922 00:42:23,210 --> 00:42:26,490 or streaming video conferencing-- maybe you don't need all of the bits. 923 00:42:26,490 --> 00:42:29,140 And maybe it's actually better to just bite your tongue 924 00:42:29,140 --> 00:42:33,630 and just keep plowing forward with more and more data, never looking back. 925 00:42:33,630 --> 00:42:36,620 Because the human will figure it out in his or her own mind 926 00:42:36,620 --> 00:42:37,730 what they actually missed. 927 00:42:37,730 --> 00:42:40,911 >> And it would be more annoying to buffer, buffer. 928 00:42:40,911 --> 00:42:41,410 Right? 929 00:42:41,410 --> 00:42:44,110 There's this thing, with which we're all familiar, 930 00:42:44,110 --> 00:42:51,140 where I just start talking while being, that's just annoying to actually have 931 00:42:51,140 --> 00:42:52,540 that, to wait for me to catch up. 932 00:42:52,540 --> 00:42:55,210 >> Maybe it's better if you just miss a few seconds of what I say. 933 00:42:55,210 --> 00:42:56,587 But then, it comes back strong. 934 00:42:56,587 --> 00:42:57,920 So it's again, it's a trade-off. 935 00:42:57,920 --> 00:43:03,300 And in fact, the protocol that allows you to do that would not be TCP, 936 00:43:03,300 --> 00:43:09,290 but something called UDP, which is simply a different protocol used 937 00:43:09,290 --> 00:43:12,690 sometimes for those contexts. 938 00:43:12,690 --> 00:43:13,440 Yeah, question. 939 00:43:13,440 --> 00:43:21,990 >> AUDIENCE: [INAUDIBLE] certain [INAUDIBLE] protocol slow [INAUDIBLE]? 940 00:43:21,990 --> 00:43:24,949 >> DAVID J. MALAN: To stop slow in what sense? 941 00:43:24,949 --> 00:43:28,200 >> AUDIENCE: I want to send my data as fast as possible. 942 00:43:28,200 --> 00:43:29,200 >> DAVID J. MALAN: OK. 943 00:43:29,200 --> 00:43:32,700 >> AUDIENCE: If somebody doesn't want [INAUDIBLE] 944 00:43:32,700 --> 00:43:36,940 transfer to stop [INAUDIBLE]. 945 00:43:36,940 --> 00:43:41,490 >> DAVID J. MALAN: Oh, you absolutely can interfere with any of this data. 946 00:43:41,490 --> 00:43:44,810 For instance, between all of the hops, between point A and B, 947 00:43:44,810 --> 00:43:49,140 all of these hops here can decide just to blacklist all UDP data. 948 00:43:49,140 --> 00:43:50,210 They could just stop. 949 00:43:50,210 --> 00:43:52,924 They could copy it knowing that this is video data that they 950 00:43:52,924 --> 00:43:53,840 might want to look at. 951 00:43:53,840 --> 00:43:58,770 So in short, anyone with access to the wireless or wired connectivity 952 00:43:58,770 --> 00:44:01,660 between two points could absolutely stop it if they want. 953 00:44:01,660 --> 00:44:03,570 >> And in fact, even in our home routers, which 954 00:44:03,570 --> 00:44:05,540 is the story we'll come back to now, might 955 00:44:05,540 --> 00:44:08,890 have settings where you can enable or disable certain services whether it's 956 00:44:08,890 --> 00:44:11,190 for parental reasons, or just not wanting 957 00:44:11,190 --> 00:44:16,890 your kids to watch online videos, or for corporate reasons as well. 958 00:44:16,890 --> 00:44:18,970 So in fact, let's rein things back in. 959 00:44:18,970 --> 00:44:21,580 >> Because we've allowed ourselves to look, now, 960 00:44:21,580 --> 00:44:24,230 at all of the servers inside of the internet here. 961 00:44:24,230 --> 00:44:27,720 But if, at the end of the day, I'm just trying to get to Amazon, 962 00:44:27,720 --> 00:44:31,060 what is that little home router actually doing for me? 963 00:44:31,060 --> 00:44:36,310 Well, it turns out that the home router, that we described earlier, that's 964 00:44:36,310 --> 00:44:42,720 all draw disproportionately large here, has a whole bunch of services built in. 965 00:44:42,720 --> 00:44:46,650 >> It has, typically, a DHCP server built in. 966 00:44:46,650 --> 00:44:49,400 It often has an access point built in. 967 00:44:49,400 --> 00:44:52,560 And that's often because it has these antennas, like these things here. 968 00:44:52,560 --> 00:44:55,590 It often has a firewall built in. 969 00:44:55,590 --> 00:45:00,900 >> It often has a router, which is its own distinct piece of functionality, 970 00:45:00,900 --> 00:45:02,270 built in. 971 00:45:02,270 --> 00:45:06,530 It might have something called a DNS server built in, 972 00:45:06,530 --> 00:45:07,931 if not even other functions. 973 00:45:07,931 --> 00:45:10,430 So let's tease apart just the couple of remaining ones here. 974 00:45:10,430 --> 00:45:15,030 DHCP, just to recap, does what? 975 00:45:15,030 --> 00:45:16,150 >> AUDIENCE: Assigns the IP. 976 00:45:16,150 --> 00:45:16,530 >> DAVID J. MALAN: Exactly. 977 00:45:16,530 --> 00:45:18,196 Assigns IP address and few other things. 978 00:45:18,196 --> 00:45:21,940 It will also tell my Mac or PC what my default router is 979 00:45:21,940 --> 00:45:24,560 and a few other details, like we saw on my Mac screen. 980 00:45:24,560 --> 00:45:27,694 Access point just means, these days, that it supports Wi-Fi. 981 00:45:27,694 --> 00:45:29,860 And it wirelessly will allow people to connect, just 982 00:45:29,860 --> 00:45:32,260 like a physical cable from yesteryear. 983 00:45:32,260 --> 00:45:36,380 >> Firewall between two buildings or two stores in a building, 984 00:45:36,380 --> 00:45:39,990 it's a physical device that, ideally, prevents fire 985 00:45:39,990 --> 00:45:42,440 from spreading from one store to another. 986 00:45:42,440 --> 00:45:47,480 In the virtual world, it prevents data from getting from one place to another. 987 00:45:47,480 --> 00:45:49,740 So in fact, if your home network, or even 988 00:45:49,740 --> 00:45:52,800 your corporate or university network, have somehow 989 00:45:52,800 --> 00:45:59,050 blacklisted, let's say, all access to Facebook.com, 990 00:45:59,050 --> 00:46:03,450 deeming it a waste of time, how might your university, or home, 991 00:46:03,450 --> 00:46:07,380 or company do that in the context of envelopes like these? 992 00:46:07,380 --> 00:46:12,190 >> In other words, if all of my computers here-- my laptop and any other-- 993 00:46:12,190 --> 00:46:14,900 is somehow talking to the internet through this home 994 00:46:14,900 --> 00:46:20,460 router, or this corporate router, or this university router, 995 00:46:20,460 --> 00:46:25,362 what information would a firewall use in order to stop traffic from flowing? 996 00:46:25,362 --> 00:46:27,350 >> AUDIENCE: [INAUDIBLE]. 997 00:46:27,350 --> 00:46:29,740 >> DAVID J. MALAN: Yeah, so if they know that Facebook's web 998 00:46:29,740 --> 00:46:33,170 server, on the internet, has the IP address 5.6.7.8, 999 00:46:33,170 --> 00:46:37,840 it is trivial for a system administrator to configure a firewall, just deny 1000 00:46:37,840 --> 00:46:40,870 and to drop all envelopes destined for that IP address. 1001 00:46:40,870 --> 00:46:44,290 In reality, Facebook has a few different IPs, maybe dozens, maybe hundreds. 1002 00:46:44,290 --> 00:46:47,020 But so long as those are publicly known, an administrator 1003 00:46:47,020 --> 00:46:48,620 can actually blacklist all of those. 1004 00:46:48,620 --> 00:46:52,505 >> Or if that's not possible, just because Facebook, maybe, has too many IPs 1005 00:46:52,505 --> 00:46:55,440 or they change too frequently, well, it turns out, as we'll see, 1006 00:46:55,440 --> 00:46:57,440 any time you make a request for a web page, 1007 00:46:57,440 --> 00:47:00,621 like Facebook.com, instead of there being a cat in the envelope, 1008 00:47:00,621 --> 00:47:01,870 there's going to be a mention. 1009 00:47:01,870 --> 00:47:07,780 Oh, this user wants Facebook.com/MarkZuckerberg.php 1010 00:47:07,780 --> 00:47:09,360 or whatever the file may be. 1011 00:47:09,360 --> 00:47:12,590 >> So you can just look inside the envelope and see, oh, this is for Facebook. 1012 00:47:12,590 --> 00:47:13,650 I'm going to drop it now. 1013 00:47:13,650 --> 00:47:16,610 You can look inside of the envelope as a firewall as well. 1014 00:47:16,610 --> 00:47:20,560 >> So a firewall, in short, can look at the IP address. 1015 00:47:20,560 --> 00:47:22,240 It can look at the port number. 1016 00:47:22,240 --> 00:47:26,560 It can look at the inside of the envelope. 1017 00:47:26,560 --> 00:47:29,360 >> And by port number, this is an interesting one too. 1018 00:47:29,360 --> 00:47:33,410 A firewall, therefore, could block, it seems, all web access, if it wants, 1019 00:47:33,410 --> 00:47:37,060 just by blacklisting any envelopes that have the number 80 on them, 1020 00:47:37,060 --> 00:47:43,600 or all email by blacklisting port 25, or blocking FTP, by blocking port 21. 1021 00:47:43,600 --> 00:47:45,250 And the list goes on and on. 1022 00:47:45,250 --> 00:47:49,810 >> As an aside, do any of you use Google's DNS server-- 8.8.8.8? 1023 00:47:49,810 --> 00:47:51,420 Does this sound familiar? 1024 00:47:51,420 --> 00:47:51,950 No? 1025 00:47:51,950 --> 00:47:56,615 >> So turns out you can configure your computer to use custom addresses. 1026 00:47:56,615 --> 00:47:58,490 And we'll come back to this in just a moment. 1027 00:47:58,490 --> 00:48:01,100 And it's very common for corporate networks and hotel networks 1028 00:48:01,100 --> 00:48:03,750 to block that kind of thing, as we'll soon see. 1029 00:48:03,750 --> 00:48:06,460 >> So the last bit of functionality, then, here is a router and DNS. 1030 00:48:06,460 --> 00:48:08,116 A router, again, very simple idea. 1031 00:48:08,116 --> 00:48:09,990 It just routes data left, right, up, and down 1032 00:48:09,990 --> 00:48:12,156 based on the wires and the connectivity that it has, 1033 00:48:12,156 --> 00:48:16,470 whether it's a small network at home or a bigger one on the internet itself. 1034 00:48:16,470 --> 00:48:20,540 So DNS is the last of the big acronyms here. 1035 00:48:20,540 --> 00:48:24,030 >> What does a DNS server do? 1036 00:48:24,030 --> 00:48:27,338 It's very useful functionality often built into a home router. 1037 00:48:27,338 --> 00:48:31,750 1038 00:48:31,750 --> 00:48:34,350 Well, we haven't quite connected two dots here. 1039 00:48:34,350 --> 00:48:40,300 When I type out Amazon.com or cats.com into my browser, somehow or other 1040 00:48:40,300 --> 00:48:43,810 that ends up on an envelope, maybe, with Amazon or cats.com 1041 00:48:43,810 --> 00:48:47,560 on the inside of the envelope, as I proposed with Facebook. 1042 00:48:47,560 --> 00:48:51,157 >> But what has to go on the outside, have we been saying? 1043 00:48:51,157 --> 00:48:52,240 AUDIENCE: The IP address-- 1044 00:48:52,240 --> 00:48:53,040 DAVID J. MALAN: The IP address. 1045 00:48:53,040 --> 00:48:54,560 AUDIENCE: [INAUDIBLE] named to the IP address. 1046 00:48:54,560 --> 00:48:55,560 DAVID J. MALAN: Exactly. 1047 00:48:55,560 --> 00:49:00,090 A DNS server, Domain Name System server, it's sole purpose in life 1048 00:49:00,090 --> 00:49:04,350 is to translate domain names to IP addresses and vice versa. 1049 00:49:04,350 --> 00:49:08,180 And so it, too, you can think of like a big Excel file with two columns-- 1050 00:49:08,180 --> 00:49:11,520 domain names in one and IP addresses in the other. 1051 00:49:11,520 --> 00:49:13,280 But it's a particularly big file. 1052 00:49:13,280 --> 00:49:17,490 >> And it turns out that when I turn on my AirPort Extreme, or my Linksys 1053 00:49:17,490 --> 00:49:20,890 device, or my D-Link device, or whatever you have at home, 1054 00:49:20,890 --> 00:49:24,170 surely, that little device does not know about, in advance, 1055 00:49:24,170 --> 00:49:27,332 all possible IP addresses and all possible domain names in the world. 1056 00:49:27,332 --> 00:49:28,040 Because it can't. 1057 00:49:28,040 --> 00:49:31,290 Because what if someone buys a domain name tomorrow, puts it on the internet? 1058 00:49:31,290 --> 00:49:33,581 >> It'd be nice if your home router could still access it. 1059 00:49:33,581 --> 00:49:34,800 And surely, it can. 1060 00:49:34,800 --> 00:49:38,210 So it turns out there's a whole hierarchy of DNS servers in the world. 1061 00:49:38,210 --> 00:49:39,800 >> Your home router, typically, has one. 1062 00:49:39,800 --> 00:49:42,540 But it just is a caching DNS server. 1063 00:49:42,540 --> 00:49:47,020 And by cache I mean C-A-C-H-E, where it just stores copies of information 1064 00:49:47,020 --> 00:49:48,020 temporarily. 1065 00:49:48,020 --> 00:49:52,090 But if I have internet service through Comcast, or Verizon, or RCN, 1066 00:49:52,090 --> 00:49:55,210 very popular vendors locally in the US, or any other company, 1067 00:49:55,210 --> 00:49:58,500 or even Harvard University, Harvard, and Comcast, and Verizon, 1068 00:49:58,500 --> 00:50:01,090 and your local ISP all have their own DNS servers. 1069 00:50:01,090 --> 00:50:03,080 >> And they, too, cache information. 1070 00:50:03,080 --> 00:50:06,960 But there's also some special big DNS servers in the world, at least 13, 1071 00:50:06,960 --> 00:50:11,420 so-called root servers that know where all the dot coms are, and knows where 1072 00:50:11,420 --> 00:50:13,470 all the dot nets are, and all the dot orgs, 1073 00:50:13,470 --> 00:50:17,000 and all of the dozens and dozens of other top level domains these days. 1074 00:50:17,000 --> 00:50:19,010 And so there's this whole hierarchical system 1075 00:50:19,010 --> 00:50:26,480 to DNS such that if you don't know and your higher up doesn't, hopefully, 1076 00:50:26,480 --> 00:50:28,250 your higher up's higher up knows. 1077 00:50:28,250 --> 00:50:30,449 Because the buck ultimately stops up here. 1078 00:50:30,449 --> 00:50:32,490 And so, as we'll see, when you buy a domain name, 1079 00:50:32,490 --> 00:50:35,980 you're essentially informing one of these top folks. 1080 00:50:35,980 --> 00:50:39,450 And the information trickles down to all other computers on the internet. 1081 00:50:39,450 --> 00:50:40,550 But there's a danger here. 1082 00:50:40,550 --> 00:50:47,600 >> Suppose that Comcast is suddenly taken over by someone who doesn't, Comcast 1083 00:50:47,600 --> 00:50:49,344 wants to put Facebook out of business. 1084 00:50:49,344 --> 00:50:51,260 How does Comcast go about putting Facebook out 1085 00:50:51,260 --> 00:50:54,490 of business for quite a few people? 1086 00:50:54,490 --> 00:50:56,430 What does it configure its DNS server to do? 1087 00:50:56,430 --> 00:51:02,090 1088 00:51:02,090 --> 00:51:02,840 What would you do? 1089 00:51:02,840 --> 00:51:03,840 >> AUDIENCE: Just block it. 1090 00:51:03,840 --> 00:51:04,500 Just block it. 1091 00:51:04,500 --> 00:51:05,916 >> DAVID J. MALAN: Just block, right? 1092 00:51:05,916 --> 00:51:08,840 So if I'm Comcast, and maybe I'm the nontechnical CEO, 1093 00:51:08,840 --> 00:51:12,680 I have just announced a decree, don't let our customers go to Facebook.com. 1094 00:51:12,680 --> 00:51:14,770 Because for whatever business reason, we're 1095 00:51:14,770 --> 00:51:16,810 not playing nicely with them right now. 1096 00:51:16,810 --> 00:51:17,720 >> Well, what do you do? 1097 00:51:17,720 --> 00:51:19,540 It's a pretty trivial implementation. 1098 00:51:19,540 --> 00:51:21,640 You just have to ask some system administrator 1099 00:51:21,640 --> 00:51:25,770 to tweak the DNS server to say, if you receive requests for Facebook.com, 1100 00:51:25,770 --> 00:51:30,746 don't respond with an IP address, or respond with a bogus one-- 1.2.3.4, 1101 00:51:30,746 --> 00:51:31,620 which is meaningless. 1102 00:51:31,620 --> 00:51:33,340 Because it doesn't belong to Facebook. 1103 00:51:33,340 --> 00:51:35,500 >> And in fact, certain countries have been known 1104 00:51:35,500 --> 00:51:38,162 to do this, where if they've wanted to blacklist 1105 00:51:38,162 --> 00:51:40,620 certain sites-- this sort of great firewall of China, which 1106 00:51:40,620 --> 00:51:42,410 can be implemented in any number of ways-- 1107 00:51:42,410 --> 00:51:45,560 might do exactly this just based on DNS alone. 1108 00:51:45,560 --> 00:51:48,680 So if you tweak your user's DNS server to just respond 1109 00:51:48,680 --> 00:51:54,000 no or bogus DNS or responses, you can very easily block access. 1110 00:51:54,000 --> 00:51:57,730 >> Now, as I alluded to earlier, and this is only 1111 00:51:57,730 --> 00:52:00,630 how a naive network would do this, I can actually 1112 00:52:00,630 --> 00:52:03,730 go in my Mac, click DNS, which notice now is, hopefully, 1113 00:52:03,730 --> 00:52:04,750 another familiar tab. 1114 00:52:04,750 --> 00:52:09,200 Perhaps a bit ago, you only knew what the term Wi-Fi meant. 1115 00:52:09,200 --> 00:52:11,280 Now, hopefully, we know a bit more about TCP/IP. 1116 00:52:11,280 --> 00:52:12,820 Now, we have DNS. 1117 00:52:12,820 --> 00:52:16,400 >> These, it seems, are the DNS servers that Harvard has automatically 1118 00:52:16,400 --> 00:52:17,680 assigned to my computer. 1119 00:52:17,680 --> 00:52:21,130 When I said earlier that DHCP gives me more than just an IP address, 1120 00:52:21,130 --> 00:52:22,640 it gives my router's address. 1121 00:52:22,640 --> 00:52:26,370 Also gives me one or more DNS servers that I'm supposed to use when 1122 00:52:26,370 --> 00:52:27,840 here on Harvard's network. 1123 00:52:27,840 --> 00:52:31,086 >> I can actually override this by clicking, oh, I can't. 1124 00:52:31,086 --> 00:52:32,460 Because I'm on the guest account. 1125 00:52:32,460 --> 00:52:36,730 OK, so if I could actually physically click this plus sign, 1126 00:52:36,730 --> 00:52:39,310 I could type in any DNS server I want. 1127 00:52:39,310 --> 00:52:45,060 >> A popular one to use is 8.8.8.8, which Google bought some time ago. 1128 00:52:45,060 --> 00:52:50,220 And if my Mac let me, I could then tell my own Mac here, 1129 00:52:50,220 --> 00:52:51,900 don't use Harvard's DNS servers. 1130 00:52:51,900 --> 00:52:54,610 Use Google's instead. 1131 00:52:54,610 --> 00:52:58,617 >> So this is a common way of avoiding either one system restrictions, 1132 00:52:58,617 --> 00:52:59,950 like the ones we just described. 1133 00:52:59,950 --> 00:53:03,810 If they're poorly implemented, you can just use a different DNS server. 1134 00:53:03,810 --> 00:53:07,250 Very much in vogue on home ISPs, and perhaps you too, 1135 00:53:07,250 --> 00:53:09,990 if you've ever made a typo when typing out a domain name, 1136 00:53:09,990 --> 00:53:12,370 you should just get an error message from your browser. 1137 00:53:12,370 --> 00:53:13,828 That's what they're designed to do. 1138 00:53:13,828 --> 00:53:16,080 404 or, actually in this case, something different, 1139 00:53:16,080 --> 00:53:18,580 you could get an invalid response page. 1140 00:53:18,580 --> 00:53:22,620 But some of you, do you ever see advertisements if you make a typo 1141 00:53:22,620 --> 00:53:23,890 and mistype a domain name? 1142 00:53:23,890 --> 00:53:27,600 If so, it's possible, and Comcast has been known to do this. 1143 00:53:27,600 --> 00:53:33,470 They, very obnoxiously, will intercept incorrect DNS lookups. 1144 00:53:33,470 --> 00:53:36,380 >> If you type Facebook.com but make a typo, 1145 00:53:36,380 --> 00:53:40,030 they'll return an IP address to you, not Facebook's but one 1146 00:53:40,030 --> 00:53:42,880 of Comcast's advertising servers' IP addresses 1147 00:53:42,880 --> 00:53:45,540 so that you, then, suddenly see ads, and maybe suggested 1148 00:53:45,540 --> 00:53:47,250 misspellings, and the like. 1149 00:53:47,250 --> 00:53:50,420 So some people might use Google to work around that. 1150 00:53:50,420 --> 00:53:53,645 Sometimes it's very common in hotels, and airports, and the like 1151 00:53:53,645 --> 00:53:55,960 where the DNS servers are just bad. 1152 00:53:55,960 --> 00:53:56,940 Or they're just broken. 1153 00:53:56,940 --> 00:53:58,210 Or they're dysfunctional. 1154 00:53:58,210 --> 00:54:00,710 >> So very often, if I'm not getting internet connectivity 1155 00:54:00,710 --> 00:54:03,270 but my icon suggests I should be on the network, 1156 00:54:03,270 --> 00:54:05,706 I'll manually change my DNS server to Google's just 1157 00:54:05,706 --> 00:54:06,830 to see if it start working. 1158 00:54:06,830 --> 00:54:10,540 And two times out of 10, that seems to solve the problem. 1159 00:54:10,540 --> 00:54:14,320 And the takeaway here is not so much all these silly little work-arounds 1160 00:54:14,320 --> 00:54:15,840 but why they actually work. 1161 00:54:15,840 --> 00:54:19,920 >> You're just telling your computer to talk to some other device instead. 1162 00:54:19,920 --> 00:54:24,100 So this home router, that you might have paid 0 or more dollars for 1163 00:54:24,100 --> 00:54:28,560 to put in your home, is doing all of this functionality and even more 1164 00:54:28,560 --> 00:54:30,300 all just in this tiny little box. 1165 00:54:30,300 --> 00:54:33,740 But when we explode this story to the whole internet, 1166 00:54:33,740 --> 00:54:36,260 it tends to be dedicated servers and computers doing 1167 00:54:36,260 --> 00:54:38,460 each of those individual services. 1168 00:54:38,460 --> 00:54:41,201 But our homes are just little microcosms of the whole story. 1169 00:54:41,201 --> 00:54:44,730 1170 00:54:44,730 --> 00:54:45,950 >> Any questions? 1171 00:54:45,950 --> 00:54:47,871 Yeah. 1172 00:54:47,871 --> 00:54:48,720 Yeah, Dan? 1173 00:54:48,720 --> 00:54:52,330 >> AUDIENCE: Earlier, you talked about the ports, the specific ports, 1174 00:54:52,330 --> 00:54:54,614 but it's specific services. 1175 00:54:54,614 --> 00:54:59,476 So for instance, you said if I don't block a certain service, 1176 00:54:59,476 --> 00:55:02,248 I say don't log that port? 1177 00:55:02,248 --> 00:55:06,620 Is it possible for a service to be completed through the port? 1178 00:55:06,620 --> 00:55:08,410 >> DAVID J. MALAN: Absolutely. 1179 00:55:08,410 --> 00:55:10,939 Yes, in fact, you will often find on a network 1180 00:55:10,939 --> 00:55:13,230 that the only ports that are allowed are, for instance, 1181 00:55:13,230 --> 00:55:15,135 port 80 and 443-- web traffic. 1182 00:55:15,135 --> 00:55:18,420 This is very common in hotels or airports 1183 00:55:18,420 --> 00:55:22,317 where they presumptuously think, eh, 90 plus percent of our users 1184 00:55:22,317 --> 00:55:23,650 only need these services anyway. 1185 00:55:23,650 --> 00:55:24,970 Let's block everything else. 1186 00:55:24,970 --> 00:55:29,590 >> And that leaves people like me out cold, out to dry, hung out to dry. 1187 00:55:29,590 --> 00:55:34,040 Because I can't access certain servers at Harvard, which use different ports. 1188 00:55:34,040 --> 00:55:36,840 I could, preemptively before leaving campus, 1189 00:55:36,840 --> 00:55:40,720 change my special server to use port 80 or 443. 1190 00:55:40,720 --> 00:55:44,560 Even though humanity has decided that should be for web traffic, 1191 00:55:44,560 --> 00:55:45,666 it doesn't have to be. 1192 00:55:45,666 --> 00:55:47,540 I can send my email through that or the like. 1193 00:55:47,540 --> 00:55:50,668 >> AUDIENCE: So that was my second question to it. 1194 00:55:50,668 --> 00:55:52,060 So humanity decided. 1195 00:55:52,060 --> 00:55:55,992 Is there a published list somewhere that say these are best practices before? 1196 00:55:55,992 --> 00:55:56,950 DAVID J. MALAN: Indeed. 1197 00:55:56,950 --> 00:56:04,480 And in fact, if I go here, common TCP port, here we go. 1198 00:56:04,480 --> 00:56:07,230 On Wikipedia itself is the first hit. 1199 00:56:07,230 --> 00:56:08,790 Here's well-known ports. 1200 00:56:08,790 --> 00:56:13,480 >> So the list, up to essentially 1,024, is very standardized, 1201 00:56:13,480 --> 00:56:14,630 and even some beyond that. 1202 00:56:14,630 --> 00:56:16,750 So there's a lot of services that-- 1203 00:56:16,750 --> 00:56:20,220 >> AUDIENCE: So if you were developing a service, in theory, 1204 00:56:20,220 --> 00:56:24,711 you should go there and decide what port lines for that service? 1205 00:56:24,711 --> 00:56:25,710 DAVID J. MALAN: Correct. 1206 00:56:25,710 --> 00:56:28,330 And if you've come up with some new application, like Napster 1207 00:56:28,330 --> 00:56:31,977 back in the day or like WhatsApp more modernly, you would generally, 1208 00:56:31,977 --> 00:56:34,810 if you're a good designer, you would take a look at a list like this 1209 00:56:34,810 --> 00:56:37,580 and make sure you're choosing a number that is within a range 1210 00:56:37,580 --> 00:56:39,455 that you should be choosing from, essentially 1211 00:56:39,455 --> 00:56:43,445 a big enough number that no one else has chosen. 1212 00:56:43,445 --> 00:56:45,756 >> AUDIENCE: That would be about port designs, correct? 1213 00:56:45,756 --> 00:56:47,130 DAVID J. MALAN: Correct, correct. 1214 00:56:47,130 --> 00:56:47,879 And there's a lot. 1215 00:56:47,879 --> 00:56:50,130 I mean, a port number is generally a 16-bit number, 1216 00:56:50,130 --> 00:56:53,800 which gives you 65,536 possibilities. 1217 00:56:53,800 --> 00:56:56,170 And only a few of those are actually standardized. 1218 00:56:56,170 --> 00:57:00,420 >> And the reality is there's only so many popular services these days. 1219 00:57:00,420 --> 00:57:02,594 So there really isn't that much contention. 1220 00:57:02,594 --> 00:57:03,760 So it's not such a big deal. 1221 00:57:03,760 --> 00:57:08,690 >> But from a clever undergraduate's perspective or dissident 1222 00:57:08,690 --> 00:57:13,430 within a country, you might indeed, if a country, or a corporate entity, 1223 00:57:13,430 --> 00:57:16,630 or university is blocking certain traffic, what's very commonly 1224 00:57:16,630 --> 00:57:20,300 done, by sophisticated enough people, would be to tunnel, so to speak, 1225 00:57:20,300 --> 00:57:22,720 to route all of their traffic with envelopes 1226 00:57:22,720 --> 00:57:26,860 that don't say what they should say, but instead just using 80 for everything. 1227 00:57:26,860 --> 00:57:31,080 Even if it is FaceTime, or Skype, or financial transactions, or whatever, 1228 00:57:31,080 --> 00:57:33,687 you just make it look like it's actually web traffic. 1229 00:57:33,687 --> 00:57:35,770 And better still is another solution that Victoria 1230 00:57:35,770 --> 00:57:38,070 alluded to earlier, which is a VPN. 1231 00:57:38,070 --> 00:57:41,720 >> And quite often is VPN traffic allowed on a network. 1232 00:57:41,720 --> 00:57:45,500 In fact, I found myself commonly in airports, and hotels, and on planes 1233 00:57:45,500 --> 00:57:48,030 where I can't access certain secure servers at Harvard. 1234 00:57:48,030 --> 00:57:52,520 Because they're running on fairly unusual port numbers-- 555 or whatever 1235 00:57:52,520 --> 00:57:53,800 the number might be. 1236 00:57:53,800 --> 00:57:59,090 >> But if I first connect via VPN from that airplane or hotel to Harvard 1237 00:57:59,090 --> 00:58:01,650 University, what a VPN does is what? 1238 00:58:01,650 --> 00:58:04,470 Do you know what it does for you underneath the hood, Victoria? 1239 00:58:04,470 --> 00:58:08,520 >> AUDIENCE: Well, it will presumably change the server [INAUDIBLE]. 1240 00:58:08,520 --> 00:58:09,520 DAVID J. MALAN: It does. 1241 00:58:09,520 --> 00:58:10,020 It does. 1242 00:58:10,020 --> 00:58:13,062 It makes it look, to someone else, like you're coming from another place. 1243 00:58:13,062 --> 00:58:15,561 It looks like you're coming from your corporate headquarters 1244 00:58:15,561 --> 00:58:16,780 when visiting some sites. 1245 00:58:16,780 --> 00:58:20,830 And what it also does is it tunnels, so to speak, all of your traffic, 1246 00:58:20,830 --> 00:58:24,010 whether it is email, or web, or printing, or the like all 1247 00:58:24,010 --> 00:58:26,580 through this encrypted channel between you 1248 00:58:26,580 --> 00:58:28,890 and your corporate headquarters, typically, 1249 00:58:28,890 --> 00:58:35,230 so that no one-- including the local country, or airline, or cafe-- 1250 00:58:35,230 --> 00:58:37,694 knows what's inside of your encrypted tunnel. 1251 00:58:37,694 --> 00:58:39,110 And so it looks like random noise. 1252 00:58:39,110 --> 00:58:41,318 And so very often, a VPN will work around those kinds 1253 00:58:41,318 --> 00:58:44,700 of port restrictions, too, if the VPN port itself is not 1254 00:58:44,700 --> 00:58:47,450 blocked, which is sometimes the case. 1255 00:58:47,450 --> 00:58:49,740 And Dacosta, you we're about to say? 1256 00:58:49,740 --> 00:58:55,765 >> AUDIENCE: What time [INAUDIBLE] jump especially 1257 00:58:55,765 --> 00:59:08,710 using the [INAUDIBLE] can jump group of [INAUDIBLE] Is this cloud different? 1258 00:59:08,710 --> 00:59:12,670 What [INAUDIBLE] to jump? [INAUDIBLE] value [INAUDIBLE] 1259 00:59:12,670 --> 00:59:15,535 1260 00:59:15,535 --> 00:59:17,785 DAVID J. MALAN: And by jump, what do you mean exactly? 1261 00:59:17,785 --> 00:59:19,659 AUDIENCE: That they would block, [INAUDIBLE]. 1262 00:59:19,659 --> 00:59:25,662 1263 00:59:25,662 --> 00:59:28,120 DAVID J. MALAN: Oh, and it's broken within a given country? 1264 00:59:28,120 --> 00:59:29,060 AUDIENCE: Yes, it's blocked. 1265 00:59:29,060 --> 00:59:29,700 DAVID J. MALAN: Oh, blocked. 1266 00:59:29,700 --> 00:59:32,070 So it can be implemented in any number of ways. 1267 00:59:32,070 --> 00:59:37,670 The simplest, again, would be that the country and anyone in it, via DNS, 1268 00:59:37,670 --> 00:59:42,140 they just don't return the IP address to you when you visit Facebook.com. 1269 00:59:42,140 --> 00:59:45,090 Two, they can actually look inside everyone's envelopes 1270 00:59:45,090 --> 00:59:47,640 and see if those requests are headed to Facebook.com. 1271 00:59:47,640 --> 00:59:50,734 In which case, they would similarly block the traffic as well. 1272 00:59:50,734 --> 00:59:52,400 AUDIENCE: You can block the [INAUDIBLE]. 1273 00:59:52,400 --> 00:59:52,870 DAVID J. MALAN: Indeed. 1274 00:59:52,870 --> 00:59:53,500 And it depends. 1275 00:59:53,500 --> 00:59:58,200 I mean, so long as there are relatively few internet connections 1276 00:59:58,200 --> 01:00:01,030 coming into the country-- so dozens, or hundreds, 1277 01:00:01,030 --> 01:00:03,450 not thousands or tens of thousands-- then yes, 1278 01:00:03,450 --> 01:00:06,290 so long as they have control over all wires, wireless, 1279 01:00:06,290 --> 01:00:10,720 or otherwise coming into the country, absolutely, they can block everything. 1280 01:00:10,720 --> 01:00:16,290 >> So and worse yet, and a very possible attack 1281 01:00:16,290 --> 01:00:19,255 is if, for instance, we're all here on Harvard's network. 1282 01:00:19,255 --> 01:00:21,880 And therefore, your computers, by the story we've been telling, 1283 01:00:21,880 --> 01:00:24,139 are all using Harvard's DHCP server. 1284 01:00:24,139 --> 01:00:25,930 Some of you might have, in a tab right now, 1285 01:00:25,930 --> 01:00:31,347 Facebook.com open, or Gmail.com, or some other random website. 1286 01:00:31,347 --> 01:00:33,680 Do you necessarily know you're at the real Facebook.com? 1287 01:00:33,680 --> 01:00:37,610 >> I mean, maybe you're subjects in a Harvard psychology experiment 1288 01:00:37,610 --> 01:00:40,160 here, where we're feeding you fake Facebook information. 1289 01:00:40,160 --> 01:00:43,470 Or we're telling you you've been poked by someone you haven't been. 1290 01:00:43,470 --> 01:00:47,280 Or we're changing messages to sound angrier than they actually are. 1291 01:00:47,280 --> 01:00:50,310 >> I mean, really when you have control over the network, 1292 01:00:50,310 --> 01:00:53,960 you have control over quite a few aspects of the user's experience. 1293 01:00:53,960 --> 01:00:56,710 Now, thankfully, it's not as frightening as that. 1294 01:00:56,710 --> 01:00:59,880 Because most of you, in your URL bars, of any such tabs, 1295 01:00:59,880 --> 01:01:00,940 probably start with what? 1296 01:01:00,940 --> 01:01:06,340 HTTPS, hopefully. 1297 01:01:06,340 --> 01:01:09,140 Because the S does designate secure. 1298 01:01:09,140 --> 01:01:11,650 >> And in theory, what that means is that you do actually 1299 01:01:11,650 --> 01:01:15,310 have an encrypted connection between you and Facebook, you and Amazon, you 1300 01:01:15,310 --> 01:01:17,760 and Gmail.com, or wherever you are. 1301 01:01:17,760 --> 01:01:19,280 And that's a good thing. 1302 01:01:19,280 --> 01:01:21,410 Because there's this whole system of trust. 1303 01:01:21,410 --> 01:01:24,570 >> And this is actually a good segue to web traffic specifically. 1304 01:01:24,570 --> 01:01:28,540 There's this whole system of trust, in the world, that allows us 1305 01:01:28,540 --> 01:01:32,485 with some reassurance to trust that if I go to Facebook.com, 1306 01:01:32,485 --> 01:01:35,600 and I see a little padlock icon in my browser, 1307 01:01:35,600 --> 01:01:38,850 I am very, very, very likely to be actually connected 1308 01:01:38,850 --> 01:01:40,486 to the real Facebook.com. 1309 01:01:40,486 --> 01:01:42,000 Now, why is that? 1310 01:01:42,000 --> 01:01:46,297 >> So it turns out that when you put a website on the world wide web, 1311 01:01:46,297 --> 01:01:47,880 you need an IP address, it would seem. 1312 01:01:47,880 --> 01:01:49,270 Your server needs an IP address. 1313 01:01:49,270 --> 01:01:50,950 And you probably need a domain name. 1314 01:01:50,950 --> 01:01:52,250 So what does that involve? 1315 01:01:52,250 --> 01:01:55,770 >> Well, have any of you ever bought a domain name before? 1316 01:01:55,770 --> 01:01:56,270 Yes? 1317 01:01:56,270 --> 01:01:56,580 Yeah? 1318 01:01:56,580 --> 01:01:57,079 OK. 1319 01:01:57,079 --> 01:02:00,100 And what websites have you used or looked at for buying domain names? 1320 01:02:00,100 --> 01:02:02,400 >> Any in particular come to mind? 1321 01:02:02,400 --> 01:02:04,470 OK, GoDaddy is pretty popular. 1322 01:02:04,470 --> 01:02:08,160 And there's others-- Namecheap, Network Solutions, others. 1323 01:02:08,160 --> 01:02:11,240 >> And so if I want to go to something like, 1324 01:02:11,240 --> 01:02:17,096 if I want to buy a domain like ComputerScienceforBusinessLeaders.com-- 1325 01:02:17,096 --> 01:02:19,600 awful name because it's atrocious to type. 1326 01:02:19,600 --> 01:02:21,850 It doesn't even fit on one line, apparently. 1327 01:02:21,850 --> 01:02:24,560 For $11.99, I can buy that domain name. 1328 01:02:24,560 --> 01:02:26,690 >> Now, what does that mean? 1329 01:02:26,690 --> 01:02:30,340 If I click Select and put this into my Shopping Cart, let me first caution. 1330 01:02:30,340 --> 01:02:32,340 GoDaddy is atrocious about trying to upsell you. 1331 01:02:32,340 --> 01:02:34,256 So you will be asked if you want email, if you 1332 01:02:34,256 --> 01:02:36,860 want web hosting, if you want a phone call for all this stuff. 1333 01:02:36,860 --> 01:02:39,130 It's hard to check out at GoDaddy. 1334 01:02:39,130 --> 01:02:41,860 >> But when you finally get there, you will own that domain name 1335 01:02:41,860 --> 01:02:44,460 for a period of one year, typically, or two, or three years. 1336 01:02:44,460 --> 01:02:45,400 You have to renew these things. 1337 01:02:45,400 --> 01:02:47,170 So it's more like renting a domain name. 1338 01:02:47,170 --> 01:02:49,350 >> But once you own that domain name, you need 1339 01:02:49,350 --> 01:02:51,960 to tell GoDaddy something, typically. 1340 01:02:51,960 --> 01:02:57,580 You need to tell GoDaddy what your web servers, DNS servers shall be. 1341 01:02:57,580 --> 01:03:00,550 How do you know what your servers, DNS servers are going to be? 1342 01:03:00,550 --> 01:03:02,820 >> Well, typically, in another tab, you have 1343 01:03:02,820 --> 01:03:05,387 to buy, or pay, for web hosting if you don't actually 1344 01:03:05,387 --> 01:03:08,470 physically own your own servers, and your own company, or in your own data 1345 01:03:08,470 --> 01:03:09,270 center. 1346 01:03:09,270 --> 01:03:11,190 So you'd go to a web hosting company. 1347 01:03:11,190 --> 01:03:12,190 And it could be GoDaddy. 1348 01:03:12,190 --> 01:03:14,620 They offer the same service as one of their upsells. 1349 01:03:14,620 --> 01:03:16,910 >> But there's hundreds, thousands of web hosting 1350 01:03:16,910 --> 01:03:18,640 companies of varying quality out there. 1351 01:03:18,640 --> 01:03:20,930 And when you pay someone else for web hosting, 1352 01:03:20,930 --> 01:03:24,570 you get a username, and a password, and some amount of space 1353 01:03:24,570 --> 01:03:27,390 in the cloud, so to speak, to which you can upload your files, 1354 01:03:27,390 --> 01:03:30,810 and create your web pages, and put your website online. 1355 01:03:30,810 --> 01:03:33,110 So essentially, you have to tell GoDaddy what 1356 01:03:33,110 --> 01:03:36,990 the DNS servers are that that web hosting company has provided to you. 1357 01:03:36,990 --> 01:03:39,770 Probably in a e-mail or in a web page, they inform you. 1358 01:03:39,770 --> 01:03:43,600 >> And then GoDaddy's responsibility is to tell the rest of the world 1359 01:03:43,600 --> 01:03:46,630 by way of those root servers and other DNS servers. 1360 01:03:46,630 --> 01:03:48,520 So that, the next day, when someone tries 1361 01:03:48,520 --> 01:03:51,290 to visit ComputerScienceforBusinessLeaders.com, 1362 01:03:51,290 --> 01:03:53,410 their DNS server probably doesn't know the answer. 1363 01:03:53,410 --> 01:03:54,785 Because it's a brand new website. 1364 01:03:54,785 --> 01:03:57,000 So their DNS server asks this one, asks this one. 1365 01:03:57,000 --> 01:03:58,090 This one knows. 1366 01:03:58,090 --> 01:04:02,490 And then, the information propagates back down to the rest of the world. 1367 01:04:02,490 --> 01:04:08,030 So this is how to if you don't pay the bill for renewing your domain name. 1368 01:04:08,030 --> 01:04:09,510 All of this can just kind of stop. 1369 01:04:09,510 --> 01:04:13,000 >> Because GoDaddy, for instance, can delete those DNS records 1370 01:04:13,000 --> 01:04:16,540 so that no one in the world knows whom to ask where is your website. 1371 01:04:16,540 --> 01:04:18,130 What is your IP address? 1372 01:04:18,130 --> 01:04:20,530 And so that's how they enforce this kind of control. 1373 01:04:20,530 --> 01:04:25,320 >> But what GoDaddy also sells, I want to see here if we can chat with them here. 1374 01:04:25,320 --> 01:04:28,360 They want our business. 1375 01:04:28,360 --> 01:04:32,720 If we go to All Products, this is overwhelming. 1376 01:04:32,720 --> 01:04:38,750 >> I want to buy SSL. 1377 01:04:38,750 --> 01:04:40,730 Here we go, Web Security. 1378 01:04:40,730 --> 01:04:41,910 So, oh, it's on sale. 1379 01:04:41,910 --> 01:04:42,410 Nice. 1380 01:04:42,410 --> 01:04:43,270 >> OK. 1381 01:04:43,270 --> 01:04:49,690 So here, too, this is kind of overwhelming at first glance for folks. 1382 01:04:49,690 --> 01:04:55,270 So there's different types of SSL certificates as they're called. 1383 01:04:55,270 --> 01:04:59,520 So it's not just enough to have a domain name or have a web hosting account. 1384 01:04:59,520 --> 01:05:02,880 If you want to have encryption, which, frankly, is just a given nowadays. 1385 01:05:02,880 --> 01:05:06,630 And this is becoming de facto practice. 1386 01:05:06,630 --> 01:05:09,290 >> You should also buy an SSL certificate. 1387 01:05:09,290 --> 01:05:11,540 Unfortunately, it can be hard to navigate all of this. 1388 01:05:11,540 --> 01:05:14,749 But let's see where this leads to this sort of system of trust. 1389 01:05:14,749 --> 01:05:17,040 So if I just have one domain name, www.ComputerSciencef 1390 01:05:17,040 --> 01:05:23,860 orBusinessLeaders.com, I'm going to go ahead and just buy the $62.99 1391 01:05:23,860 --> 01:05:24,690 version here. 1392 01:05:24,690 --> 01:05:26,110 However, even this is expensive. 1393 01:05:26,110 --> 01:05:29,830 You can go on other websites, like Namecheap.com and a few others, 1394 01:05:29,830 --> 01:05:31,500 where varying degrees of reputation. 1395 01:05:31,500 --> 01:05:33,170 But you can spend even less than this. 1396 01:05:33,170 --> 01:05:34,070 Beware. 1397 01:05:34,070 --> 01:05:40,240 >> And in fact, let's go somewhere we shouldn't-- Verisign.com. 1398 01:05:40,240 --> 01:05:47,130 This is a global leader in domain names and internet security apparently. 1399 01:05:47,130 --> 01:05:50,610 And you know it's expensive when they don't just say what they sell. 1400 01:05:50,610 --> 01:05:54,410 1401 01:05:54,410 --> 01:06:01,950 Verisign SSL certificate, you can see how many competitors they have, 1402 01:06:01,950 --> 01:06:04,350 who are advertising for that same query. 1403 01:06:04,350 --> 01:06:07,600 >> All right, so via Google, I found this page I wanted. 1404 01:06:07,600 --> 01:06:09,140 So let's see. 1405 01:06:09,140 --> 01:06:10,660 Oh, here we go. 1406 01:06:10,660 --> 01:06:14,520 >> So it looks like if I want a Secure Site, 1407 01:06:14,520 --> 01:06:18,640 their SSL certificates start at $399. 1408 01:06:18,640 --> 01:06:23,240 If I want more security, with EV, which I think is extended validation 1409 01:06:23,240 --> 01:06:27,190 or enhanced validation, that's $995, point 00. 1410 01:06:27,190 --> 01:06:29,960 Or Secure Site Pro with EV, $1,500. 1411 01:06:29,960 --> 01:06:33,290 Almost all of this is atrocious and, also, unnecessary. 1412 01:06:33,290 --> 01:06:36,320 >> But let's understand what the tradeoffs here are and how it all works. 1413 01:06:36,320 --> 01:06:40,080 At the end of the day, the math and the fundamental cryptography 1414 01:06:40,080 --> 01:06:43,565 underlying your website security is all the same, for the most parts. 1415 01:06:43,565 --> 01:06:47,470 All of this is upsells and, largely, marketing things. 1416 01:06:47,470 --> 01:06:51,620 >> Oh, and please, don't ever put something like this on your website, 1417 01:06:51,620 --> 01:06:53,750 even if the consultant proposes that you do. 1418 01:06:53,750 --> 01:06:55,180 It means absolutely nothing. 1419 01:06:55,180 --> 01:06:58,400 You'll see, later today or tomorrow, it is absolutely trivial 1420 01:06:58,400 --> 01:07:02,390 to add an image to a website and simply saying you are Norton secured 1421 01:07:02,390 --> 01:07:03,570 means absolutely nothing. 1422 01:07:03,570 --> 01:07:05,960 >> And all you're doing is training your customers, 1423 01:07:05,960 --> 01:07:08,610 or humanity more generally, to look for that symbol, which 1424 01:07:08,610 --> 01:07:12,080 surely a bad guy could put on his or her own website and just claim they, 1425 01:07:12,080 --> 01:07:13,320 too, are Norton secured. 1426 01:07:13,320 --> 01:07:17,360 So we've gotten into some bad habits, as humans, as embodied even right here. 1427 01:07:17,360 --> 01:07:23,140 So just as an aside, the reason there are different styles of certificates, 1428 01:07:23,140 --> 01:07:25,520 they keep wanting to talk to us. 1429 01:07:25,520 --> 01:07:30,110 >> You can buy a SSL certificate for just one domain name, 1430 01:07:30,110 --> 01:07:32,586 dub dub dub dot ComputerScienceforBusinessLeaders.com. 1431 01:07:32,586 --> 01:07:35,027 Multiple websites, suppose I had dub dub dub dot 1432 01:07:35,027 --> 01:07:36,610 ComputerScienceforBusinessLeaders.com. 1433 01:07:36,610 --> 01:07:39,750 But I also wanted users to be able to visit 1434 01:07:39,750 --> 01:07:42,394 ComputerScienceforBusinessLeaders.com without the www. 1435 01:07:42,394 --> 01:07:44,852 Or, maybe, I have a third domain, like email.ComputerScienc 1436 01:07:44,852 --> 01:07:45,851 eforBusinessLeaders.com. 1437 01:07:45,851 --> 01:07:48,170 1438 01:07:48,170 --> 01:07:50,550 So if I have multiple domain names, they actually each 1439 01:07:50,550 --> 01:07:52,633 need a different type of certificate, potentially. 1440 01:07:52,633 --> 01:07:55,830 So I might as well get this version, which allows exactly that. 1441 01:07:55,830 --> 01:08:00,180 >> Or all subdomains, if you just want to have, and this is for fancier setups, 1442 01:08:00,180 --> 01:08:05,070 if you want to have 10 or 20 different websites or servers that 1443 01:08:05,070 --> 01:08:08,550 start with something, dot ComputerScienceforBusinessLeaders.com, 1444 01:08:08,550 --> 01:08:10,890 then you get what's called a wildcard certificate. 1445 01:08:10,890 --> 01:08:13,800 And it supports all of those variations. 1446 01:08:13,800 --> 01:08:16,670 >> Now, once you buy this, you install. 1447 01:08:16,670 --> 01:08:18,040 It's a file that you download. 1448 01:08:18,040 --> 01:08:19,748 And that file, essentially, just contains 1449 01:08:19,748 --> 01:08:22,716 a really big, random number that has some mathematical relationship 1450 01:08:22,716 --> 01:08:24,840 to some other number that you've already generated. 1451 01:08:24,840 --> 01:08:28,490 We'll call it a public key and a private key, as I did just before. 1452 01:08:28,490 --> 01:08:31,790 >> And the idea here is that you install into your web server 1453 01:08:31,790 --> 01:08:34,250 by just using FTP or some other protocol, 1454 01:08:34,250 --> 01:08:36,370 dragging and dropping or copying and pasting 1455 01:08:36,370 --> 01:08:38,497 these really big numbers into your own web server. 1456 01:08:38,497 --> 01:08:41,330 And you follow the instructions consistent with your server software 1457 01:08:41,330 --> 01:08:42,359 to do this. 1458 01:08:42,359 --> 01:08:45,270 And your web server, henceforth, any time someone 1459 01:08:45,270 --> 01:08:49,920 visits your business' website-- www.ComputerScienceBusinessLeaders.com-- 1460 01:08:49,920 --> 01:08:51,901 your web server automatically, because this 1461 01:08:51,901 --> 01:08:53,859 is built-in functionality these days, will just 1462 01:08:53,859 --> 01:08:56,459 tell the world what its public key is. 1463 01:08:56,459 --> 01:08:59,250 And remember that the public key has this mathematical relationship 1464 01:08:59,250 --> 01:09:01,000 with a so-called private key. 1465 01:09:01,000 --> 01:09:05,109 And so when users, customers talk securely to your server, 1466 01:09:05,109 --> 01:09:07,680 their envelopes, like the ones we've been passing around, 1467 01:09:07,680 --> 01:09:10,950 have seeming nonsense inside of them. 1468 01:09:10,950 --> 01:09:12,970 Because the contents are encrypted. 1469 01:09:12,970 --> 01:09:15,710 >> And only your business' private key, which 1470 01:09:15,710 --> 01:09:19,340 you generated as part of this process of buying an SSL certificate, 1471 01:09:19,340 --> 01:09:21,790 can actually decrypt. 1472 01:09:21,790 --> 01:09:23,819 And all of that happens transparently. 1473 01:09:23,819 --> 01:09:26,950 But you can only buy these certificates from a finite number 1474 01:09:26,950 --> 01:09:28,760 of companies in the world. 1475 01:09:28,760 --> 01:09:33,330 >> Because Microsoft, who makes IE and Edge, and Google, who makes Chrome, 1476 01:09:33,330 --> 01:09:36,470 and Mozilla, who makes Firefox, and a few other players 1477 01:09:36,470 --> 01:09:40,020 have all decided to ship their browsers. 1478 01:09:40,020 --> 01:09:43,890 When you install any of those browsers-- IE, Edge, Firefox, Mozilla, Opera, 1479 01:09:43,890 --> 01:09:50,180 or any others, Chrome-- they come with a finite number of certificates, 1480 01:09:50,180 --> 01:09:52,010 so to speak, built into them. 1481 01:09:52,010 --> 01:09:57,420 A finite list of, let's call them, companies whose SSL certificates should 1482 01:09:57,420 --> 01:10:00,330 be allowed and considered secure. 1483 01:10:00,330 --> 01:10:04,105 >> So this means that I, David Malan, can't just go on DavidMalan.com 1484 01:10:04,105 --> 01:10:06,050 and start selling SSL certificates. 1485 01:10:06,050 --> 01:10:08,210 Because if I don't have some kind of relationship 1486 01:10:08,210 --> 01:10:12,810 with Google, and Microsoft, and Mozilla, or contractors of theirs, 1487 01:10:12,810 --> 01:10:17,250 no one's browsers will trust David Malan's certificates, 1488 01:10:17,250 --> 01:10:19,830 even if I sell them at a discount versus everyone else. 1489 01:10:19,830 --> 01:10:21,370 I can make them mathematically. 1490 01:10:21,370 --> 01:10:25,430 But I can't trick the browsers into trusting them. 1491 01:10:25,430 --> 01:10:26,940 >> And what do I mean by trust? 1492 01:10:26,940 --> 01:10:27,660 Well, notice. 1493 01:10:27,660 --> 01:10:29,690 We are on GoDaddy.com. 1494 01:10:29,690 --> 01:10:34,450 And as is the case with many websites, notice the padlock up at top right. 1495 01:10:34,450 --> 01:10:38,420 What is that padlock presumably indicate, either prior 1496 01:10:38,420 --> 01:10:40,830 to today's discussion or as of now? 1497 01:10:40,830 --> 01:10:41,970 >> AUDIENCE: It's secure. 1498 01:10:41,970 --> 01:10:43,344 >> DAVID J. MALAN: That it's secure. 1499 01:10:43,344 --> 01:10:46,390 That just means that I am using some kind of cryptography, 1500 01:10:46,390 --> 01:10:48,190 encryption between me and GoDaddy.com. 1501 01:10:48,190 --> 01:10:49,690 And it doesn't have to be a GoDaddy. 1502 01:10:49,690 --> 01:10:50,690 Let's go somewhere else. 1503 01:10:50,690 --> 01:10:52,182 Let's go to Facebook.com. 1504 01:10:52,182 --> 01:10:55,420 And notice I end up at HTTPS colon slash slash. 1505 01:10:55,420 --> 01:10:59,090 So even if you don't type HTTPS, increasingly, our websites 1506 01:10:59,090 --> 01:11:03,910 today redirecting you to the secure version of the website. 1507 01:11:03,910 --> 01:11:08,612 This was often true when you typed in your passwords for quite some time. 1508 01:11:08,612 --> 01:11:11,320 But then, you would often get the insecure version of the website 1509 01:11:11,320 --> 01:11:14,370 after you logged in or after you checked out with your shopping cart and credit 1510 01:11:14,370 --> 01:11:14,910 card. 1511 01:11:14,910 --> 01:11:19,010 >> Nowadays, increasingly, are websites-- because it's getting easier and cheaper 1512 01:11:19,010 --> 01:11:23,520 to use this kind of encryption, and it's becoming expected-- are just 1513 01:11:23,520 --> 01:11:25,399 using it for absolutely every web page. 1514 01:11:25,399 --> 01:11:26,440 And this is a good thing. 1515 01:11:26,440 --> 01:11:28,190 Because this means, for instance, when you 1516 01:11:28,190 --> 01:11:31,710 go to Google, who also has started enabling SSL by default, 1517 01:11:31,710 --> 01:11:33,940 this means when you search for something on Google, 1518 01:11:33,940 --> 01:11:36,310 it's absolutely true that Google knows everything 1519 01:11:36,310 --> 01:11:39,370 you're searching for on the internet, for all time unless you 1520 01:11:39,370 --> 01:11:40,560 delete your history. 1521 01:11:40,560 --> 01:11:43,000 And even then, hopefully, it actually deletes. 1522 01:11:43,000 --> 01:11:46,030 >> But no one in between you and Google, in theory, 1523 01:11:46,030 --> 01:11:47,370 knows what you're searching for. 1524 01:11:47,370 --> 01:11:50,380 So if you're searching for something private, or medical, or whatnot, 1525 01:11:50,380 --> 01:11:53,990 so long as that bar is green, and you see the padlock, and the URL is HTTPS, 1526 01:11:53,990 --> 01:11:56,924 and you're connected to Google, hopefully, your employer 1527 01:11:56,924 --> 01:11:58,090 can't see what you're doing. 1528 01:11:58,090 --> 01:12:00,170 Your university can't see what you're doing. 1529 01:12:00,170 --> 01:12:02,290 >> Now, if someone looks over your shoulder, they might still. 1530 01:12:02,290 --> 01:12:05,165 And if it ends up in your browser's history, people might still know. 1531 01:12:05,165 --> 01:12:09,960 But at least that tunnel between you and Google, in this case, is secure. 1532 01:12:09,960 --> 01:12:11,390 And we can see this a little more. 1533 01:12:11,390 --> 01:12:12,765 And you can do this at home, too. 1534 01:12:12,765 --> 01:12:14,744 If I click on the padlock, on Chrome at least, 1535 01:12:14,744 --> 01:12:16,660 there's a bunch of technical information here. 1536 01:12:16,660 --> 01:12:20,200 If I click Connection, notice that, "Chrome verified that the Digi/Cert 1537 01:12:20,200 --> 01:12:24,100 SHA2 High Assurance Server CA," certificate authority, 1538 01:12:24,100 --> 01:12:25,740 "Issued this website's certificate." 1539 01:12:25,740 --> 01:12:28,260 >> Let's click on the Certificate Information. 1540 01:12:28,260 --> 01:12:32,350 And we can see that Facebook, someone at Facebook bought this certificate. 1541 01:12:32,350 --> 01:12:33,330 And notice the star. 1542 01:12:33,330 --> 01:12:35,350 That's the wildcard that I alluded to earlier, 1543 01:12:35,350 --> 01:12:37,570 the something dot Facebook.com. 1544 01:12:37,570 --> 01:12:41,680 Notice that their certificate expires when? 1545 01:12:41,680 --> 01:12:45,512 >> December, so Facebook better pay the SSL bill over the next few months. 1546 01:12:45,512 --> 01:12:48,470 And they're going to have to install new certificates on their servers. 1547 01:12:48,470 --> 01:12:51,901 And if I really want to get curious, I can click on Details. 1548 01:12:51,901 --> 01:12:53,900 And this is going to be more arcane than I want. 1549 01:12:53,900 --> 01:12:55,608 >> But you can see that this is, apparently, 1550 01:12:55,608 --> 01:12:58,900 bought by Facebook, Inc. in Menlo Park. 1551 01:12:58,900 --> 01:13:01,550 This is some technical information, where they bought it from. 1552 01:13:01,550 --> 01:13:05,190 SHA-256 refers to something similar to encryption. 1553 01:13:05,190 --> 01:13:06,090 It's called hash. 1554 01:13:06,090 --> 01:13:09,200 RSA is the encryption if you've heard of RSA. 1555 01:13:09,200 --> 01:13:12,280 >> And then, there's even more fancy stuff in here. 1556 01:13:12,280 --> 01:13:16,470 Elliptic Curve Public, this refers to a type of cryptography. 1557 01:13:16,470 --> 01:13:19,760 Most of this is way more information than you actually need. 1558 01:13:19,760 --> 01:13:23,300 But you can see that this is the technical detail underlying 1559 01:13:23,300 --> 01:13:24,620 Facebook certificate. 1560 01:13:24,620 --> 01:13:27,900 >> Now, unfortunately, just to speak to social engineering, 1561 01:13:27,900 --> 01:13:32,030 this now is a pretty useful indicator of the fact 1562 01:13:32,030 --> 01:13:35,090 that someone, one, has a secure connection and, in turn, 1563 01:13:35,090 --> 01:13:37,950 that the server you visited paid for that certificate. 1564 01:13:37,950 --> 01:13:42,870 But it wasn't that long ago that websites could have default icons. 1565 01:13:42,870 --> 01:13:45,574 In fact, do you notice these icons in Chrome's tabs right now? 1566 01:13:45,574 --> 01:13:47,490 And browsers have kind of learned their lesson 1567 01:13:47,490 --> 01:13:51,190 and put these icons up there, the logo for a website? 1568 01:13:51,190 --> 01:13:54,230 >> It wasn't that long ago that these fav icons, 1569 01:13:54,230 --> 01:13:57,480 or favorite icons as they're called, were right there next to the address. 1570 01:13:57,480 --> 01:14:00,570 In fact, I did a search during our break. 1571 01:14:00,570 --> 01:14:07,500 For instance, not that long ago, let me open this one. 1572 01:14:07,500 --> 01:14:09,750 Just on Google Images. 1573 01:14:09,750 --> 01:14:11,010 >> Let me zoom out. 1574 01:14:11,010 --> 01:14:12,970 Come on. 1575 01:14:12,970 --> 01:14:18,720 So not that long ago, browsers were doing this. 1576 01:14:18,720 --> 01:14:22,050 Not only did they put the favorite icon up here in the tab, 1577 01:14:22,050 --> 01:14:24,420 they also put it right next to the address bar. 1578 01:14:24,420 --> 01:14:24,920 Why? 1579 01:14:24,920 --> 01:14:26,060 Just, eh, it looked good. 1580 01:14:26,060 --> 01:14:26,893 >> It was kind of nice. 1581 01:14:26,893 --> 01:14:29,530 You see the company's logo right next to its URL. 1582 01:14:29,530 --> 01:14:32,650 So now, think from the perspective of an adversary, a bad guy. 1583 01:14:32,650 --> 01:14:35,850 If you were a bad guy and the browsers were dumb enough 1584 01:14:35,850 --> 01:14:39,660 to allow you to put a custom icon right next to the browsers URL, 1585 01:14:39,660 --> 01:14:42,220 what icon would you choose for your fake website 1586 01:14:42,220 --> 01:14:46,919 that's trying to fish for people's credit card information and such? 1587 01:14:46,919 --> 01:14:48,210 AUDIENCE: The original website. 1588 01:14:48,210 --> 01:14:49,640 DAVID J. MALAN: The original website, certainly, 1589 01:14:49,640 --> 01:14:51,450 if you're mimicking one websites. 1590 01:14:51,450 --> 01:14:55,150 What else might you put there that's even more deceitful? 1591 01:14:55,150 --> 01:15:00,020 A padlock icon, which looks like a padlock and semantically suggests 1592 01:15:00,020 --> 01:15:03,500 this site is secure, but has no technical meaning whatsoever, 1593 01:15:03,500 --> 01:15:06,550 and which is to say you're conditioning people. 1594 01:15:06,550 --> 01:15:09,720 >> We, as a society, are conditioning people when you see padlock, 1595 01:15:09,720 --> 01:15:10,970 assume site is secure. 1596 01:15:10,970 --> 01:15:13,430 And that same logic can be completely reversed 1597 01:15:13,430 --> 01:15:15,615 and manipulated so that people, now, are tricked 1598 01:15:15,615 --> 01:15:16,990 into thinking something's secure. 1599 01:15:16,990 --> 01:15:18,823 And the worst offenders, frankly, are people 1600 01:15:18,823 --> 01:15:22,210 like banks, who idiotically, to this day-- let's see if Bank of America, 1601 01:15:22,210 --> 01:15:25,970 a popular local one or national one, is doing the same. 1602 01:15:25,970 --> 01:15:27,000 >> OK. 1603 01:15:27,000 --> 01:15:27,875 So what is this? 1604 01:15:27,875 --> 01:15:28,750 What do you see here. 1605 01:15:28,750 --> 01:15:32,080 This is the log in form for their website. 1606 01:15:32,080 --> 01:15:33,710 They've done the exact same thing. 1607 01:15:33,710 --> 01:15:35,780 You're training humans to think when you see 1608 01:15:35,780 --> 01:15:38,430 a button on a website with a padlock that that 1609 01:15:38,430 --> 01:15:40,460 means the connection is secure. 1610 01:15:40,460 --> 01:15:42,940 >> That means only that there is a graphic designer who 1611 01:15:42,940 --> 01:15:46,260 knows how to make a picture of a padlock and put it on a website. 1612 01:15:46,260 --> 01:15:50,890 Now, in this case, it is true, that the website is secure. 1613 01:15:50,890 --> 01:15:53,000 Because notice the green padlock up here. 1614 01:15:53,000 --> 01:15:55,380 And I'm using a new enough version of Chrome 1615 01:15:55,380 --> 01:15:58,660 that I can't just put an arbitrary logo next to the URL. 1616 01:15:58,660 --> 01:16:01,410 Now, only the secure icon goes there or not. 1617 01:16:01,410 --> 01:16:04,420 >> But this is absolutely meaningless here. 1618 01:16:04,420 --> 01:16:06,890 And we humans continue to make these kinds of mistakes. 1619 01:16:06,890 --> 01:16:09,650 Because we condition people to look for certain cues 1620 01:16:09,650 --> 01:16:11,330 and infer meaning from them. 1621 01:16:11,330 --> 01:16:13,520 But again, that same meaning can be abused. 1622 01:16:13,520 --> 01:16:15,654 >> So when building one's own corporate website, 1623 01:16:15,654 --> 01:16:17,320 these signals are generally a bad thing. 1624 01:16:17,320 --> 01:16:19,430 And even in emails, too, we have, as a society, 1625 01:16:19,430 --> 01:16:22,340 conditioned people to click links on emails. 1626 01:16:22,340 --> 01:16:26,080 And so it's not surprising that bad guys send out fake emails from PayPal, 1627 01:16:26,080 --> 01:16:28,672 from Bank of America with links. 1628 01:16:28,672 --> 01:16:30,880 Because we've trained people to click links in email. 1629 01:16:30,880 --> 01:16:33,530 >> A far better practice would be for Bank of America, 1630 01:16:33,530 --> 01:16:38,720 when emailing its customers, say only, please visit Bank of America's website 1631 01:16:38,720 --> 01:16:40,070 at your earliest convenience. 1632 01:16:40,070 --> 01:16:41,797 And don't give people the URL. 1633 01:16:41,797 --> 01:16:43,880 Because otherwise, they're just going to click it. 1634 01:16:43,880 --> 01:16:44,580 Let it go. 1635 01:16:44,580 --> 01:16:48,460 Let them search for it or, actually, go to it manually. 1636 01:16:48,460 --> 01:16:50,450 >> All right, so a bit of a sidetrack there. 1637 01:16:50,450 --> 01:16:54,620 But the goal here was to paint the picture of this system of trust. 1638 01:16:54,620 --> 01:16:57,170 With browsers, there are these things in the world 1639 01:16:57,170 --> 01:17:00,450 called certificate authorities-- companies, a finite number of them, 1640 01:17:00,450 --> 01:17:02,710 that are allowed to issue SSL certificates. 1641 01:17:02,710 --> 01:17:08,740 Or, in turn, they are allowed to validate other third-party contractors 1642 01:17:08,740 --> 01:17:10,244 to issue SSL certificates. 1643 01:17:10,244 --> 01:17:12,660 If you're not on that list, though, you can mathematically 1644 01:17:12,660 --> 01:17:16,310 create these big, random numbers that work for cryptography. 1645 01:17:16,310 --> 01:17:18,700 >> But the browser is, generally, going to yell at you. 1646 01:17:18,700 --> 01:17:22,090 In fact, can I go to a website? 1647 01:17:22,090 --> 01:17:22,710 Let me see. 1648 01:17:22,710 --> 01:17:24,940 This site is not secure. 1649 01:17:24,940 --> 01:17:30,070 If we just look for a Google image here, you might see screens like this. 1650 01:17:30,070 --> 01:17:32,180 Browser manufacturers keep changing them. 1651 01:17:32,180 --> 01:17:34,040 >> This is typically what you would see. 1652 01:17:34,040 --> 01:17:38,226 You see a red line in the URL, where HTTPS is crossed out. 1653 01:17:38,226 --> 01:17:39,600 Because it's trying to be secure. 1654 01:17:39,600 --> 01:17:41,040 But something's going on. 1655 01:17:41,040 --> 01:17:44,090 And here it says, "This is probably not the site you're looking for!" 1656 01:17:44,090 --> 01:17:47,110 >> And this is either malicious, or it's because of a misconfiguration. 1657 01:17:47,110 --> 01:17:50,940 Someone's using the wrong SSL certificate on the server for the site 1658 01:17:50,940 --> 01:17:53,276 that the user is actually trying to visit. 1659 01:17:53,276 --> 01:17:56,520 1660 01:17:56,520 --> 01:17:58,870 Any questions? 1661 01:17:58,870 --> 01:18:03,600 >> Well, let's take, before we break for lunch, one last look at what 1662 01:18:03,600 --> 01:18:05,650 can be inside of these envelopes. 1663 01:18:05,650 --> 01:18:08,434 I'm going to go into a clean browser tab here. 1664 01:18:08,434 --> 01:18:09,350 And this is a feature. 1665 01:18:09,350 --> 01:18:11,399 If you use Chrome, or most any other browser, 1666 01:18:11,399 --> 01:18:12,690 you actually have this feature. 1667 01:18:12,690 --> 01:18:14,120 >> I'm going to go to the Menu. 1668 01:18:14,120 --> 01:18:18,810 I'm going to go to More Tools and Developer Tools. 1669 01:18:18,810 --> 01:18:21,450 Though you sometimes have to enable this special menu. 1670 01:18:21,450 --> 01:18:23,400 And we'll see more of this in a little bit. 1671 01:18:23,400 --> 01:18:25,090 >> And I'm going to go down here to the bottom left. 1672 01:18:25,090 --> 01:18:26,580 And I'm going to click on Network. 1673 01:18:26,580 --> 01:18:28,397 So this is just something an engineer would 1674 01:18:28,397 --> 01:18:31,230 use when he or she wants to look underneath the hood at what's going 1675 01:18:31,230 --> 01:18:34,400 on between a browser and a server. 1676 01:18:34,400 --> 01:18:35,710 >> And let's go ahead and do this. 1677 01:18:35,710 --> 01:18:39,240 I'm going to go to, click Preserve Log. 1678 01:18:39,240 --> 01:18:41,760 In other words, I wanted to save everything that's going on, 1679 01:18:41,760 --> 01:18:42,718 what we're about to do. 1680 01:18:42,718 --> 01:18:49,850 And I'm going to type in HTTP colon slash slash www.Stanford.edu 1681 01:18:49,850 --> 01:18:51,050 for Stanford University. 1682 01:18:51,050 --> 01:18:53,500 I'm going to clear again just so we can start fresh. 1683 01:18:53,500 --> 01:18:55,490 >> And here we go. 1684 01:18:55,490 --> 01:18:57,410 So here is Stanford's home page-- whole bunch 1685 01:18:57,410 --> 01:19:00,900 of text, whole bunch of pictures, maybe some videos, and some other stuff. 1686 01:19:00,900 --> 01:19:05,480 And this web page-- here, I'm going to reload now. 1687 01:19:05,480 --> 01:19:07,980 Because I broke it by heading back. 1688 01:19:07,980 --> 01:19:10,787 >> This web page is written in a language called HTML 1689 01:19:10,787 --> 01:19:12,370 that we'll take a brief look at later. 1690 01:19:12,370 --> 01:19:14,459 And HTML is not a programming language. 1691 01:19:14,459 --> 01:19:16,000 It's what's called a markup language. 1692 01:19:16,000 --> 01:19:18,490 So we'll see it's just English-like syntax that 1693 01:19:18,490 --> 01:19:21,615 tells the web page what to look like, what colors to use, what text to use, 1694 01:19:21,615 --> 01:19:22,440 and the like. 1695 01:19:22,440 --> 01:19:26,510 >> But juicier is in this special Developer tab, 1696 01:19:26,510 --> 01:19:29,620 I can actually see everything that just went on underneath the hood. 1697 01:19:29,620 --> 01:19:34,010 For instance, in this web page, about how many images are there? 1698 01:19:34,010 --> 01:19:39,940 I see 1, 2,3, 4, 5, 6, 7, 8, 9, 10, on the right, 11. 1699 01:19:39,940 --> 01:19:43,230 So there's a dozen or more images on this web page. 1700 01:19:43,230 --> 01:19:47,010 >> Each of those images is a file on Stanford's web server. 1701 01:19:47,010 --> 01:19:49,950 And this home page, written in this language called HTML, 1702 01:19:49,950 --> 01:19:52,960 is also a file on Stanford's web server. 1703 01:19:52,960 --> 01:19:56,540 So it turns out that a browser is smart enough to know, 1704 01:19:56,540 --> 01:20:00,300 and we'll see this afternoon, when you receive the home page for a website, 1705 01:20:00,300 --> 01:20:03,190 look at that HTML language, as we'll soon see. 1706 01:20:03,190 --> 01:20:07,170 >> And if you notice the names of images inside of it, go get those as well. 1707 01:20:07,170 --> 01:20:09,850 Send additional requests, additional envelopes. 1708 01:20:09,850 --> 01:20:14,560 So we might have gotten back, now, one, maybe 13 or more envelopes 1709 01:20:14,560 --> 01:20:17,830 containing text, and images, maybe some other stuff that we, then, 1710 01:20:17,830 --> 01:20:20,940 assemble inside of my browser to present this entire web page. 1711 01:20:20,940 --> 01:20:25,000 >> And notice down here the very first of those 1712 01:20:25,000 --> 01:20:30,810 was a request just for HTTP colon slash slash www.Stanford.edu itself. 1713 01:20:30,810 --> 01:20:35,440 And if I click on this row, I'm going to see some pretty arcane information. 1714 01:20:35,440 --> 01:20:37,960 But let me scroll down and see if I can understand 1715 01:20:37,960 --> 01:20:39,990 exactly what's going on here. 1716 01:20:39,990 --> 01:20:44,920 >> Let me make this a little bigger so we can see more at a time. 1717 01:20:44,920 --> 01:20:47,570 And notice this. 1718 01:20:47,570 --> 01:20:52,040 If I click on View Source, this text here, that I just highlighted, 1719 01:20:52,040 --> 01:20:57,360 when I send, my browser sends that first envelope from here in Cambridge 1720 01:20:57,360 --> 01:21:02,180 to Stanford, saying give me your home page, what is inside this envelope 1721 01:21:02,180 --> 01:21:04,520 is exactly what I've highlighted there. 1722 01:21:04,520 --> 01:21:08,520 >> HTTP, Hypertext Transfer Protocol, is the set of conventions 1723 01:21:08,520 --> 01:21:11,660 that a web browser uses when requesting web pages of a server. 1724 01:21:11,660 --> 01:21:14,450 So just as I reached out with my hand to Arwa earlier, 1725 01:21:14,450 --> 01:21:19,590 this is the digital equivalent of my browser reaching out digitally 1726 01:21:19,590 --> 01:21:22,760 to Stanford's web server, putting this message inside this envelope. 1727 01:21:22,760 --> 01:21:25,500 The most important line is the very first. 1728 01:21:25,500 --> 01:21:29,457 >> GET is a standard verb, used in this convention, 1729 01:21:29,457 --> 01:21:31,290 that literally just means get the following. 1730 01:21:31,290 --> 01:21:31,876 Get slash. 1731 01:21:31,876 --> 01:21:34,010 Slash is just the default home page. 1732 01:21:34,010 --> 01:21:35,660 It's nothing more specific than that. 1733 01:21:35,660 --> 01:21:38,820 And use the version of HTTP known as 1.1. 1734 01:21:38,820 --> 01:21:40,970 It's got some newer features than 1.0 had. 1735 01:21:40,970 --> 01:21:44,370 >> And the second most important line is this one-- Host colon 1736 01:21:44,370 --> 01:21:46,050 dub dub dub dot Stanford.edu. 1737 01:21:46,050 --> 01:21:49,590 When I mentioned earlier that a firewall could look inside of an envelope 1738 01:21:49,590 --> 01:21:52,990 and figure out what website is being requested-- maybe it's Facebook. 1739 01:21:52,990 --> 01:21:54,330 And we want to blacklist it. 1740 01:21:54,330 --> 01:21:59,910 >> The reason is the browser is very kindly telling us, inside the envelope, what 1741 01:21:59,910 --> 01:22:01,380 it is requesting. 1742 01:22:01,380 --> 01:22:04,370 And then, there's some less interesting stuff that's more technical. 1743 01:22:04,370 --> 01:22:07,840 But slightly interesting, if not a little unnerving at first, 1744 01:22:07,840 --> 01:22:12,122 is that also inside this envelope is apparently what information? 1745 01:22:12,122 --> 01:22:13,185 >> AUDIENCE: [INAUDIBLE]. 1746 01:22:13,185 --> 01:22:15,310 DAVID J. MALAN: Yeah, what kind of computer I have. 1747 01:22:15,310 --> 01:22:16,370 So I have a Mac. 1748 01:22:16,370 --> 01:22:19,940 It's running Mac OS 10.11.2, it seems. 1749 01:22:19,940 --> 01:22:22,730 And if I read farther down, it tells the server 1750 01:22:22,730 --> 01:22:25,470 that I'm using a certain version of Chrome, in fact. 1751 01:22:25,470 --> 01:22:26,762 >> So that's mildly disconcerting. 1752 01:22:26,762 --> 01:22:29,470 But slightly more disconcerting should be the fact that I already 1753 01:22:29,470 --> 01:22:30,990 told Stanford what my IP address is. 1754 01:22:30,990 --> 01:22:34,450 So they can already figure out, perhaps, a little bit more about me from that. 1755 01:22:34,450 --> 01:22:36,325 And then, there's some other stuff there too. 1756 01:22:36,325 --> 01:22:38,080 Now, let me scroll up slightly. 1757 01:22:38,080 --> 01:22:40,830 Here is what Stanford responded with. 1758 01:22:40,830 --> 01:22:44,380 Inside of this envelope was, first and foremost, 1759 01:22:44,380 --> 01:22:47,830 the web page itself, the HTML that we'll see later this afternoon. 1760 01:22:47,830 --> 01:22:52,790 But also inside Stanford's envelope to me is everything I've highlighted here. 1761 01:22:52,790 --> 01:22:56,050 >> The juiciest of lines of which is the top, which says, 1762 01:22:56,050 --> 01:22:59,140 OK, yep, I speak HTTP 1.1. 1763 01:22:59,140 --> 01:23:02,290 200 is my status code, OK. 1764 01:23:02,290 --> 01:23:06,630 Now, you might not have ever seen the number 200 before, which makes sense. 1765 01:23:06,630 --> 01:23:09,690 Because 200, indeed, means OK, all is well. 1766 01:23:09,690 --> 01:23:13,920 >> But you probably have seen a number, on your web browser, that 1767 01:23:13,920 --> 01:23:16,710 was sent to you from some server inside of an envelope that's 1768 01:23:16,710 --> 01:23:17,690 not the number 200. 1769 01:23:17,690 --> 01:23:21,198 What numbers have you seen that spring to mind? 1770 01:23:21,198 --> 01:23:22,152 >> AUDIENCE:404. 1771 01:23:22,152 --> 01:23:23,220 >> DAVID J. MALAN: 404. 1772 01:23:23,220 --> 01:23:27,740 So if you've ever wondered where is this 404 convention coming from, of all 1773 01:23:27,740 --> 01:23:31,320 the arcane things to tell me, 404 file not found, 1774 01:23:31,320 --> 01:23:34,900 that simply means that a web server, if you request this page that doesn't 1775 01:23:34,900 --> 01:23:38,670 exist, it's not there, files not found, this message in blue 1776 01:23:38,670 --> 01:23:44,310 is going to say HTTP 1.1 space 404 not found. 1777 01:23:44,310 --> 01:23:47,217 And your browser notices that and, then, presents it 1778 01:23:47,217 --> 01:23:49,550 to you, maybe in a bigger font, bigger, bold information 1779 01:23:49,550 --> 01:23:51,025 with some explanatory text. 1780 01:23:51,025 --> 01:23:51,650 But that's all. 1781 01:23:51,650 --> 01:23:54,358 >> And then, the rest of the information is more arcane information, 1782 01:23:54,358 --> 01:23:58,330 from the server to you, just telling your browser where it came from. 1783 01:23:58,330 --> 01:24:00,530 Every single request you make over the internet 1784 01:24:00,530 --> 01:24:02,740 contains information like this. 1785 01:24:02,740 --> 01:24:05,200 This is both useful for technical reasons. 1786 01:24:05,200 --> 01:24:07,200 >> It's also useful for login reasons, to know 1787 01:24:07,200 --> 01:24:09,800 who's visiting your website, what browser they're using, 1788 01:24:09,800 --> 01:24:11,770 maybe what browser you should be optimizing 1789 01:24:11,770 --> 01:24:13,820 your website for if everyone's using Chrome these days. 1790 01:24:13,820 --> 01:24:15,910 Maybe you don't need to support Internet Explorer anymore. 1791 01:24:15,910 --> 01:24:16,820 How do you know that? 1792 01:24:16,820 --> 01:24:19,990 You can just log all of the information that's coming in these requests. 1793 01:24:19,990 --> 01:24:22,830 >> Conversely, this clearly means that every time 1794 01:24:22,830 --> 01:24:26,970 you visit any website on the internet, not only do they know your IP address, 1795 01:24:26,970 --> 01:24:30,070 because you gave it to them in the top left corner of the envelope, 1796 01:24:30,070 --> 01:24:33,890 they also know what's your browser is, what day of time it is, 1797 01:24:33,890 --> 01:24:35,520 what pages you're requesting. 1798 01:24:35,520 --> 01:24:39,247 >> And increasingly, especially on websites that have advertisements, 1799 01:24:39,247 --> 01:24:41,205 more worrisome here is if you've got a company, 1800 01:24:41,205 --> 01:24:44,440 and this is super common these days, that has sells advertisements 1801 01:24:44,440 --> 01:24:47,660 for this website, let's call it A.com, and also 1802 01:24:47,660 --> 01:24:50,100 on this website, B.com, and this website, 1803 01:24:50,100 --> 01:24:56,980 C.com, A and B and C.com might not know that they have a customer in common. 1804 01:24:56,980 --> 01:25:00,560 >> But if this third-party advertising company 1805 01:25:00,560 --> 01:25:05,082 is seeing requests from the same IP address visiting both A.com, B.com, 1806 01:25:05,082 --> 01:25:06,640 and C.com, why? 1807 01:25:06,640 --> 01:25:10,490 Because the advertising server's being asked to serve up ads to all three 1808 01:25:10,490 --> 01:25:11,490 of these websites. 1809 01:25:11,490 --> 01:25:14,270 And therefore, it will be provided with your IP address 1810 01:25:14,270 --> 01:25:17,800 so that your web page, your browser sees the ad. 1811 01:25:17,800 --> 01:25:20,330 >> There are these middlemen, so to speak, on the internet that 1812 01:25:20,330 --> 01:25:24,080 know even more about you than the websites you're visiting. 1813 01:25:24,080 --> 01:25:27,150 And Google is certainly among the biggest offenders, or featurerers, 1814 01:25:27,150 --> 01:25:27,901 along those lines. 1815 01:25:27,901 --> 01:25:29,775 And in fact, when I mention their DNS server, 1816 01:25:29,775 --> 01:25:32,660 before you might think at first glance, oh, this is a handy feature. 1817 01:25:32,660 --> 01:25:34,661 Google provides the world with a free DNS server 1818 01:25:34,661 --> 01:25:36,285 that sometimes helps me solve problems. 1819 01:25:36,285 --> 01:25:36,790 Mm-mm. 1820 01:25:36,790 --> 01:25:40,430 Now, you're telling Google not only every page you're searching for, 1821 01:25:40,430 --> 01:25:42,880 but every page you're going to directly. 1822 01:25:42,880 --> 01:25:45,846 Because you're saying, hey, Google, I want to go to Z.com. 1823 01:25:45,846 --> 01:25:47,860 What's its IP address? 1824 01:25:47,860 --> 01:25:52,350 >> And this all boils down to these very simple requests and responses 1825 01:25:52,350 --> 01:25:55,630 that we've now seen from top to bottom. 1826 01:25:55,630 --> 01:25:57,510 So why don't we pause here for an hour. 1827 01:25:57,510 --> 01:25:59,116 Return at 1:30 for lunch. 1828 01:25:59,116 --> 01:26:00,490 I'm going to disappear for a bit. 1829 01:26:00,490 --> 01:26:03,710 And we'll resume with a hands-on look and some more concepts. 1830 01:26:03,710 --> 01:26:06,860 And happy to stick around, for a few minutes, with questions individually. 1831 01:26:06,860 --> 01:26:09,364