1 00:00:00,000 --> 00:00:00,207 2 00:00:00,207 --> 00:00:03,040 DAVID J. MALAN: But why don't we begin now with internet technology. 3 00:00:03,040 --> 00:00:07,420 The goal of the session being to better understand 4 00:00:07,420 --> 00:00:11,730 how this thing works that most of us use every day, how we can leverage it, 5 00:00:11,730 --> 00:00:15,910 and most importantly, try to build it conceptually from the ground up. 6 00:00:15,910 --> 00:00:18,160 So that after today and after tomorrow, you 7 00:00:18,160 --> 00:00:21,617 have not only the practical benefit of being able to diagnose things 8 00:00:21,617 --> 00:00:23,450 in the real world a little more effectively, 9 00:00:23,450 --> 00:00:25,500 you've sort of had the opportunity to think through, 10 00:00:25,500 --> 00:00:28,680 with the proverbial engineering hat, how you could go about building things. 11 00:00:28,680 --> 00:00:30,969 That at first glance might seem so terribly complex, 12 00:00:30,969 --> 00:00:33,260 but at the end of the day are the result of some fairly 13 00:00:33,260 --> 00:00:38,320 logical well-defined decisions, again layering from bottom to top. 14 00:00:38,320 --> 00:00:40,610 So with that said, what is the internet? 15 00:00:40,610 --> 00:00:41,511 Let's start there. 16 00:00:41,511 --> 00:00:46,860 17 00:00:46,860 --> 00:00:53,000 Half of you are on it right now, so you must know what is the internet? 18 00:00:53,000 --> 00:00:53,930 What is the internet? 19 00:00:53,930 --> 00:00:54,555 Yes. 20 00:00:54,555 --> 00:00:55,055 Sean. 21 00:00:55,055 --> 00:01:00,419 AUDIENCE: It's a way to access different websites, I guess, [INAUDIBLE]. 22 00:01:00,419 --> 00:01:01,210 DAVID J. MALAN: OK. 23 00:01:01,210 --> 00:01:03,989 So a way to access different websites, network structure. 24 00:01:03,989 --> 00:01:04,489 OK. 25 00:01:04,489 --> 00:01:06,330 Let's dive in a little more physically. 26 00:01:06,330 --> 00:01:07,880 So that's what it does. 27 00:01:07,880 --> 00:01:09,593 What is it actually? 28 00:01:09,593 --> 00:01:13,380 29 00:01:13,380 --> 00:01:15,295 What do you got, [? Abi ?]? 30 00:01:15,295 --> 00:01:16,170 AUDIENCE: [INAUDIBLE] 31 00:01:16,170 --> 00:01:17,320 DAVID J. MALAN: OK. 32 00:01:17,320 --> 00:01:17,820 Good. 33 00:01:17,820 --> 00:01:18,400 More precise. 34 00:01:18,400 --> 00:01:19,800 So a network of devices. 35 00:01:19,800 --> 00:01:22,230 And a network is just an interconnection of things. 36 00:01:22,230 --> 00:01:24,730 And those devices, we'll just simplify, in their computers-- 37 00:01:24,730 --> 00:01:27,559 though very much in vogue these days is IOT or internet of things. 38 00:01:27,559 --> 00:01:29,850 Which means like every stupid little thing in the world 39 00:01:29,850 --> 00:01:32,010 is going to be on the internet for some reason. 40 00:01:32,010 --> 00:01:33,170 So that's trendy right now. 41 00:01:33,170 --> 00:01:35,687 But for today we'll just focus on devices or computers. 42 00:01:35,687 --> 00:01:37,270 And what does it mean to network them. 43 00:01:37,270 --> 00:01:40,640 Well, back in the day it meant physically connecting them with cables, 44 00:01:40,640 --> 00:01:44,017 so that you could actually have a physical connection between devices. 45 00:01:44,017 --> 00:01:45,850 Nowadays, we, of course, have these antennas 46 00:01:45,850 --> 00:01:49,007 on the wall called access points, or Wi-Fi routers, 47 00:01:49,007 --> 00:01:50,590 or any number of other names for them. 48 00:01:50,590 --> 00:01:53,700 But that allows us to all transmit data wirelessly. 49 00:01:53,700 --> 00:01:58,730 So that might give us locally what we call a local area network or LAN. 50 00:01:58,730 --> 00:02:00,870 People don't say that as much anymore but that's 51 00:02:00,870 --> 00:02:03,120 what a local area network is. 52 00:02:03,120 --> 00:02:07,730 You also have WLAN, which is a wireless local area network, which probably 53 00:02:07,730 --> 00:02:09,850 even fewer people say these days. 54 00:02:09,850 --> 00:02:12,120 But these are indeed local networks. 55 00:02:12,120 --> 00:02:13,240 So what is the internet? 56 00:02:13,240 --> 00:02:15,990 Well, the internet is really just a network of networks. 57 00:02:15,990 --> 00:02:19,680 So we might think of this campus, of course, as being Harvard.edu. 58 00:02:19,680 --> 00:02:21,010 Down the road is MIT.edu. 59 00:02:21,010 --> 00:02:22,990 Across the country is stanford.edu. 60 00:02:22,990 --> 00:02:26,169 Not to mention all of the many companies and universities and homes 61 00:02:26,169 --> 00:02:26,960 that are out there. 62 00:02:26,960 --> 00:02:28,970 And so all of those, if they're interconnected, 63 00:02:28,970 --> 00:02:32,420 indeed give us the internet or inter-network. 64 00:02:32,420 --> 00:02:33,170 All right. 65 00:02:33,170 --> 00:02:35,290 So how does it work? 66 00:02:35,290 --> 00:02:37,590 So I've just walked into this room a little bit ago. 67 00:02:37,590 --> 00:02:39,540 I open my laptop screen. 68 00:02:39,540 --> 00:02:42,020 And somehow I'm magically on the internet. 69 00:02:42,020 --> 00:02:43,870 I'm connected to the internet. 70 00:02:43,870 --> 00:02:45,370 But a few steps had to precede that. 71 00:02:45,370 --> 00:02:48,870 Some of you engaged in those steps while you were here on campus this weekend, 72 00:02:48,870 --> 00:02:51,370 or first thing this morning by following some of the instructions. 73 00:02:51,370 --> 00:02:53,286 What was one of the first things you had to do 74 00:02:53,286 --> 00:02:56,990 according to the instructions in this booklet today? 75 00:02:56,990 --> 00:02:58,240 You had to connect and log on. 76 00:02:58,240 --> 00:03:02,531 So most of you probably have the intuitive-- the instincts these days, 77 00:03:02,531 --> 00:03:05,030 when you want to connect on Wi-Fi, you go to your Wi-Fi menu 78 00:03:05,030 --> 00:03:07,375 in the top right or bottom right corner of your screen, 79 00:03:07,375 --> 00:03:09,000 whatever operating system you're using. 80 00:03:09,000 --> 00:03:12,850 And you choose either the most familiar or the most unlocked network 81 00:03:12,850 --> 00:03:15,127 that you possibly can and try to get online. 82 00:03:15,127 --> 00:03:16,710 Sometimes you're asked for a password. 83 00:03:16,710 --> 00:03:17,630 Sometimes you aren't. 84 00:03:17,630 --> 00:03:19,840 And that may or may not have some actual implications 85 00:03:19,840 --> 00:03:21,840 that we'll perhaps scratch the surface of today. 86 00:03:21,840 --> 00:03:24,570 Or talk about in more detail tomorrow morning with security. 87 00:03:24,570 --> 00:03:27,090 But what then happened thereafter? 88 00:03:27,090 --> 00:03:30,650 Well, it turns out that every computer on the internet has a unique address. 89 00:03:30,650 --> 00:03:32,340 A unique IP address. 90 00:03:32,340 --> 00:03:35,870 I'll just toss up jargon here as we discuss some of these things. 91 00:03:35,870 --> 00:03:38,230 IP stands for internet protocol. 92 00:03:38,230 --> 00:03:41,820 And it's just a convention for assigning numeric addresses 93 00:03:41,820 --> 00:03:44,020 to most every computer on the internet. 94 00:03:44,020 --> 00:03:47,500 So we right now are at One Brattle Square in Cambridge, Massachusetts, 95 00:03:47,500 --> 00:03:49,440 01238. 96 00:03:49,440 --> 00:03:52,360 But that pretty much uniquely identifies us. 97 00:03:52,360 --> 00:03:56,010 Especially if we tack on the room number or the floor number and so forth. 98 00:03:56,010 --> 00:03:58,910 And so that's how the mail-- the post office 99 00:03:58,910 --> 00:04:02,010 actually gets mail uniquely to this particular destination. 100 00:04:02,010 --> 00:04:06,300 So my laptop too isn't addressed, of course, by nature of postal addresses, 101 00:04:06,300 --> 00:04:08,180 but by way of numeric addresses. 102 00:04:08,180 --> 00:04:11,610 Just because computers kind of prefer numeric addresses. 103 00:04:11,610 --> 00:04:16,936 Turns out, fun fact, that these are 32-bit addresses mostly. 104 00:04:16,936 --> 00:04:19,769 And that means we can have how many total computers on the internet? 105 00:04:19,769 --> 00:04:22,730 106 00:04:22,730 --> 00:04:24,050 2 to the 32. 107 00:04:24,050 --> 00:04:26,020 Nice way to punt there. 108 00:04:26,020 --> 00:04:27,490 Four billion roughly. 109 00:04:27,490 --> 00:04:29,310 So 2 to the 32 or four billion. 110 00:04:29,310 --> 00:04:31,662 Which actually these days isn't all that many, 111 00:04:31,662 --> 00:04:33,620 if you consider all of the humans in the world, 112 00:04:33,620 --> 00:04:36,370 and all of the laptops and desktops and phones. 113 00:04:36,370 --> 00:04:40,520 And again, because of internet of things, every toaster and thermostats. 114 00:04:40,520 --> 00:04:42,640 And any number of other devices. 115 00:04:42,640 --> 00:04:48,040 So the world has been transitioning away from something called IPv4, which 116 00:04:48,040 --> 00:04:52,630 is version 4, the version we've been using for decades right now, to IPv6. 117 00:04:52,630 --> 00:04:55,120 Which has been around for a while, but no one really 118 00:04:55,120 --> 00:04:58,390 got around to implementing it until really recent years. 119 00:04:58,390 --> 00:05:01,950 At least on any kind of scale, with big companies like Google and Comcast 120 00:05:01,950 --> 00:05:06,400 and the like finally starting to give people not just 32-bit addresses, 121 00:05:06,400 --> 00:05:11,660 but-- anyone want to take a guess what comes after 64-- uh, dammit. 122 00:05:11,660 --> 00:05:15,170 What comes after 32-bit addresses? 123 00:05:15,170 --> 00:05:15,860 No. 124 00:05:15,860 --> 00:05:17,750 128-bit addresses, actually. 125 00:05:17,750 --> 00:05:20,020 So I don't even know why I said that. 126 00:05:20,020 --> 00:05:21,640 128-bit addresses. 127 00:05:21,640 --> 00:05:22,940 Which is kind of unprecedented. 128 00:05:22,940 --> 00:05:25,560 Rarely do humans actually have the foresight 129 00:05:25,560 --> 00:05:30,200 to not just go one notch above but two notches above where we currently are. 130 00:05:30,200 --> 00:05:35,240 And so 128-bit addresses means that we really can have an internet of things. 131 00:05:35,240 --> 00:05:38,420 If I pull up my massive calculator here. 132 00:05:38,420 --> 00:05:41,100 2 to the 32, of course, gives us four billion computers. 133 00:05:41,100 --> 00:05:44,130 2 to the 128, also unpronounceable by me. 134 00:05:44,130 --> 00:05:45,930 That's a lot of devices. 135 00:05:45,930 --> 00:05:48,440 That's a lot of things on the internet. 136 00:05:48,440 --> 00:05:51,340 So we've got a lot of IP addresses available, 137 00:05:51,340 --> 00:05:52,939 at least in the works right now. 138 00:05:52,939 --> 00:05:54,730 So what does it mean to have an IP address? 139 00:05:54,730 --> 00:05:57,050 Well, once my computer has an IP address-- 140 00:05:57,050 --> 00:05:59,080 and how it gets that we'll come back to-- I 141 00:05:59,080 --> 00:06:00,990 can now communicate on the network. 142 00:06:00,990 --> 00:06:03,320 And indeed the way devices on the network work 143 00:06:03,320 --> 00:06:05,280 is essentially you can think of computers 144 00:06:05,280 --> 00:06:08,040 as sending envelopes of information. 145 00:06:08,040 --> 00:06:10,300 So here's like an old school envelope here. 146 00:06:10,300 --> 00:06:12,740 And on this in the human world we might typically 147 00:06:12,740 --> 00:06:14,855 put a to address and a from address. 148 00:06:14,855 --> 00:06:16,730 And those would be physical postal addresses. 149 00:06:16,730 --> 00:06:18,440 But in the digital world, of course, it's 150 00:06:18,440 --> 00:06:21,570 going to be a numeric address, in both the to field and the from field. 151 00:06:21,570 --> 00:06:24,480 Which is to say that both I and the recipient of any information I 152 00:06:24,480 --> 00:06:27,200 send on the internet has got to have such a numeric address. 153 00:06:27,200 --> 00:06:29,800 So Amazon has such an address, Google has such an address. 154 00:06:29,800 --> 00:06:32,170 And actually those bigger fish in the world 155 00:06:32,170 --> 00:06:35,230 tend to often have multiple IP addresses. 156 00:06:35,230 --> 00:06:36,280 And we can see this. 157 00:06:36,280 --> 00:06:40,720 So first on Mac OS, let me go to System Preferences and Network. 158 00:06:40,720 --> 00:06:43,070 And Windows has an analog as well. 159 00:06:43,070 --> 00:06:46,660 I'm going to click Advanced, and I'm going to click on TCP/IP. 160 00:06:46,660 --> 00:06:50,320 And indeed, you can see that my computer has currently the IP address 161 00:06:50,320 --> 00:06:54,690 10.254.16.128 and a whole bunch of other things. 162 00:06:54,690 --> 00:06:57,530 So that is the unique address for my computer right now. 163 00:06:57,530 --> 00:07:00,080 And I should really say unique address. 164 00:07:00,080 --> 00:07:02,780 Because it turns out that even though much of the world 165 00:07:02,780 --> 00:07:06,720 still uses IPv4 or 32-bit addresses, the world also 166 00:07:06,720 --> 00:07:11,470 has started using a technology for some time now called NAT. 167 00:07:11,470 --> 00:07:13,430 Does anyone use this technology at home? 168 00:07:13,430 --> 00:07:17,040 169 00:07:17,040 --> 00:07:18,360 David? 170 00:07:18,360 --> 00:07:19,060 I do. 171 00:07:19,060 --> 00:07:20,601 This is one of those trick questions. 172 00:07:20,601 --> 00:07:23,030 Like who has internet at home? 173 00:07:23,030 --> 00:07:27,475 OK so with very-- and how many people have Wi-Fi at home? 174 00:07:27,475 --> 00:07:28,150 Almost everyone? 175 00:07:28,150 --> 00:07:28,650 OK. 176 00:07:28,650 --> 00:07:32,430 So with very, very high probability, all of us in this room use NAT. 177 00:07:32,430 --> 00:07:34,920 NAT is network address translation. 178 00:07:34,920 --> 00:07:39,050 And what this means is that whether your ISP-- internet service 179 00:07:39,050 --> 00:07:45,106 provider-- is Comcast or RCN or Verizon or any number of other companies, 180 00:07:45,106 --> 00:07:46,980 probably when they installed it in your house 181 00:07:46,980 --> 00:07:48,688 they gave you some kind of little device. 182 00:07:48,688 --> 00:07:50,622 A little router as it might be called. 183 00:07:50,622 --> 00:07:52,080 Although they have different names. 184 00:07:52,080 --> 00:07:54,246 Sometimes they're wireless, sometimes they're wired. 185 00:07:54,246 --> 00:07:57,740 But connected to that is probably a telephone line or a coaxial cable, 186 00:07:57,740 --> 00:07:59,400 like your cable box and so forth. 187 00:07:59,400 --> 00:08:02,510 And that is what gives your house internet access. 188 00:08:02,510 --> 00:08:06,430 And that device comes with typically one IP address. 189 00:08:06,430 --> 00:08:09,050 And that IP address is associated with that machine. 190 00:08:09,050 --> 00:08:13,200 The problem, of course, is that most of us have a phone and a laptop or desktop 191 00:08:13,200 --> 00:08:16,740 or roommates or parents or siblings or kids, all of whom 192 00:08:16,740 --> 00:08:18,041 themselves have devices. 193 00:08:18,041 --> 00:08:19,790 And so it would be unfortunate if you only 194 00:08:19,790 --> 00:08:23,830 had one unique address for all of you if only one of you 195 00:08:23,830 --> 00:08:25,950 could therefore be on the internet at a time. 196 00:08:25,950 --> 00:08:28,870 NAT came about-- network address translation-- 197 00:08:28,870 --> 00:08:31,040 because it's a feature of modern hardware 198 00:08:31,040 --> 00:08:34,020 that allows you to have one IP address for that one 199 00:08:34,020 --> 00:08:37,460 device, the router you bought or leased from Comcast or the like. 200 00:08:37,460 --> 00:08:40,590 But behind that device, your home network-- a.k.a. 201 00:08:40,590 --> 00:08:44,400 LAN-- you can have all dozens of devices, 202 00:08:44,400 --> 00:08:47,840 if not hundreds of devices, all having not that public IP 203 00:08:47,840 --> 00:08:50,410 address, but private IP addresses. 204 00:08:50,410 --> 00:08:53,930 And so network address translation essentially works like this. 205 00:08:53,930 --> 00:09:00,090 If this here is your home, and you have a little box from Verizon or Comcast. 206 00:09:00,090 --> 00:09:03,130 And this here is the internet. 207 00:09:03,130 --> 00:09:05,560 This device here is your home router. 208 00:09:05,560 --> 00:09:08,380 209 00:09:08,380 --> 00:09:11,360 And it has a public IP on this side. 210 00:09:11,360 --> 00:09:18,850 And inside the house are let's say many private IP addresses. 211 00:09:18,850 --> 00:09:21,150 One for each of the devices in your homes. 212 00:09:21,150 --> 00:09:26,396 And so any time you send a request from your laptop for CNN.com or Google.com 213 00:09:26,396 --> 00:09:30,020 or Facebook.com or the like, that request initially, 214 00:09:30,020 --> 00:09:34,230 by nature of the wires or Wi-Fi in your house, first go through that device. 215 00:09:34,230 --> 00:09:36,430 Because it's the only point coming in or going out 216 00:09:36,430 --> 00:09:38,190 to your internet service provider. 217 00:09:38,190 --> 00:09:42,080 It looks at your from address, which is going to be a private address. 218 00:09:42,080 --> 00:09:44,690 Which just means it's not meant to be public on the internet. 219 00:09:44,690 --> 00:09:46,620 It quickly crosses that out. 220 00:09:46,620 --> 00:09:49,580 This device puts its own public IP address 221 00:09:49,580 --> 00:09:53,110 that you get from Comcast or Verizon where the from field 222 00:09:53,110 --> 00:09:54,452 used to be on that envelope. 223 00:09:54,452 --> 00:09:55,660 Sends it out on the internet. 224 00:09:55,660 --> 00:09:57,850 Google or Facebook or whoever respond. 225 00:09:57,850 --> 00:10:00,540 That response comes to that little device in your home. 226 00:10:00,540 --> 00:10:03,310 Your little device in your home checks its records. 227 00:10:03,310 --> 00:10:07,120 Saying hm, who requested this piece of information a split second ago? 228 00:10:07,120 --> 00:10:11,140 It crosses out the to field, which was sent to the public IP. 229 00:10:11,140 --> 00:10:14,070 Puts in the private address of the laptop or desktop or phone 230 00:10:14,070 --> 00:10:16,220 in your home that originally requested it. 231 00:10:16,220 --> 00:10:18,910 And all of this happens nearly instantaneously. 232 00:10:18,910 --> 00:10:20,710 So we don't even notice the difference. 233 00:10:20,710 --> 00:10:25,270 But as a result of this network address translation from public IP 234 00:10:25,270 --> 00:10:28,980 to private IP addresses, plural, do we have the ability 235 00:10:28,980 --> 00:10:32,250 to put multiple devices on the network at once. 236 00:10:32,250 --> 00:10:35,220 It wasn't all that long ago, 20 or so years ago, 237 00:10:35,220 --> 00:10:38,090 when if you wanted to have two computers on the network at home, 238 00:10:38,090 --> 00:10:41,920 someone like Comcast or RCN would just charge you twice or three times 239 00:10:41,920 --> 00:10:44,354 to actually give you dedicated devices or IP addresses. 240 00:10:44,354 --> 00:10:46,020 So this is actually a wonderful feature. 241 00:10:46,020 --> 00:10:49,900 And it has the side effect also of firewalling your computer. 242 00:10:49,900 --> 00:10:52,670 And this is why these terms get so commingled these days. 243 00:10:52,670 --> 00:10:55,875 A firewall in the real world is like a physical wall between stores, 244 00:10:55,875 --> 00:10:57,500 especially in strip malls and the like. 245 00:10:57,500 --> 00:11:00,440 So that if one shop gets on fire, the other's next to it 246 00:11:00,440 --> 00:11:02,440 with high probability are safe. 247 00:11:02,440 --> 00:11:04,800 Because the fire can't get through the firewall. 248 00:11:04,800 --> 00:11:06,990 In the virtual world, you have firewalls that 249 00:11:06,990 --> 00:11:09,350 are software, that are designed to keep data 250 00:11:09,350 --> 00:11:13,510 from the outside coming in or perhaps from the inside going out. 251 00:11:13,510 --> 00:11:18,490 And so the fact that there is this translation from public to private 252 00:11:18,490 --> 00:11:21,690 also means as a side effect, typically, that if you 253 00:11:21,690 --> 00:11:24,310 have some adversary, some bad guy on the internet trying 254 00:11:24,310 --> 00:11:29,220 to get at your laptop or desktop in your home, he or she can't actually do that. 255 00:11:29,220 --> 00:11:31,780 Because the only way you can talk to a private IP 256 00:11:31,780 --> 00:11:35,940 address is if the request were initiated from the inside out. 257 00:11:35,940 --> 00:11:39,620 Now, you can firewall your home and your business even more securely than that. 258 00:11:39,620 --> 00:11:42,340 And we'll come back to that as we discuss security itself. 259 00:11:42,340 --> 00:11:44,990 And there are ways around the scenario I just described. 260 00:11:44,990 --> 00:11:46,500 But that tends to be a side effect. 261 00:11:46,500 --> 00:11:49,416 It's a good thing, actually, there's this network address translation. 262 00:11:49,416 --> 00:11:53,940 Because it means people can't very easily get in from the outside. 263 00:11:53,940 --> 00:11:56,680 The downside of that is that if you're trying to run your own web 264 00:11:56,680 --> 00:11:58,630 server-- for instance, you're trying to start a company 265 00:11:58,630 --> 00:12:01,630 or you're building a website, and you want friends or associates or just 266 00:12:01,630 --> 00:12:04,980 customers to be able to access your website and it's inside your home, 267 00:12:04,980 --> 00:12:06,750 odds are they're not going to be able to. 268 00:12:06,750 --> 00:12:09,770 Because you can't talk to a private IP address 269 00:12:09,770 --> 00:12:14,020 from the outside world in unless you configure your home network 270 00:12:14,020 --> 00:12:16,190 or your office network especially for that. 271 00:12:16,190 --> 00:12:19,650 So there's some interesting layers of protection that are in here. 272 00:12:19,650 --> 00:12:22,060 Also on the screen here are a couple of other addresses. 273 00:12:22,060 --> 00:12:22,820 So router. 274 00:12:22,820 --> 00:12:23,790 Let's come here. 275 00:12:23,790 --> 00:12:26,350 I just drew the internet as this big cloud. 276 00:12:26,350 --> 00:12:28,790 Sort of which has become cloud computing these days. 277 00:12:28,790 --> 00:12:30,490 But what is the internet itself? 278 00:12:30,490 --> 00:12:33,090 Or what really gets data from point A to point B? 279 00:12:33,090 --> 00:12:37,670 If this is me at home, and this is Amazon.com, 280 00:12:37,670 --> 00:12:41,360 how do I actually get data to and from Amazon.com? 281 00:12:41,360 --> 00:12:44,770 Well, I certainly don't have a Wi-Fi connection to Amazon in Seattle 282 00:12:44,770 --> 00:12:45,990 or wherever they are. 283 00:12:45,990 --> 00:12:49,830 I certainly don't have a dedicated wire from my laptop to Amazon. 284 00:12:49,830 --> 00:12:52,060 So once the data leaves my laptop, where does it 285 00:12:52,060 --> 00:12:54,400 go if I'm trying to shop on Amazon.com? 286 00:12:54,400 --> 00:12:57,865 287 00:12:57,865 --> 00:12:58,780 To a cable? 288 00:12:58,780 --> 00:12:59,280 OK. 289 00:12:59,280 --> 00:13:00,491 So some cable somewhere. 290 00:13:00,491 --> 00:13:00,990 Indeed. 291 00:13:00,990 --> 00:13:03,120 And my laptop probably is talking wireless 292 00:13:03,120 --> 00:13:04,870 to some of these antennas in the room. 293 00:13:04,870 --> 00:13:06,870 Those antennas-- even though you don't see them, 294 00:13:06,870 --> 00:13:09,245 because they've been mounted pretty cleanly on the wall-- 295 00:13:09,245 --> 00:13:12,360 do have an ethernet cable that is quite like this thing here. 296 00:13:12,360 --> 00:13:16,060 It's like a thicker phone cable, if you remember what phone cables are like. 297 00:13:16,060 --> 00:13:17,580 This is an RJ45 connector. 298 00:13:17,580 --> 00:13:21,190 Phone cables are RJ11, which just means their size and specs. 299 00:13:21,190 --> 00:13:24,530 But that thing in the wall is somehow connected probably 300 00:13:24,530 --> 00:13:26,530 to a device called a switch. 301 00:13:26,530 --> 00:13:29,500 So that thing is called an access point or AP. 302 00:13:29,500 --> 00:13:32,990 Switches are fairly dumb devices that just 303 00:13:32,990 --> 00:13:35,570 have a lot of phone jack like connectors, 304 00:13:35,570 --> 00:13:37,004 a lot of ethernet connectors. 305 00:13:37,004 --> 00:13:38,920 That once you plug in all the ethernet cables, 306 00:13:38,920 --> 00:13:43,020 the devices can all inter-communicate with minimal amount of security. 307 00:13:43,020 --> 00:13:49,250 And from there, the switch is connected to probably one of Harvard's routers. 308 00:13:49,250 --> 00:13:52,270 What is a router? 309 00:13:52,270 --> 00:13:55,110 Guess even if you've never heard the term. 310 00:13:55,110 --> 00:13:58,300 A router routes information. 311 00:13:58,300 --> 00:13:58,800 Yeah. 312 00:13:58,800 --> 00:13:59,690 It really is that. 313 00:13:59,690 --> 00:14:02,000 So you can think of the internet as being 314 00:14:02,000 --> 00:14:06,670 speckled with a whole bunch of routers represented by those dots there. 315 00:14:06,670 --> 00:14:09,960 And there are edges between these dots or these nodes. 316 00:14:09,960 --> 00:14:11,720 And what's interesting about the internet 317 00:14:11,720 --> 00:14:17,140 is that data doesn't necessarily travel one predictable path or even 318 00:14:17,140 --> 00:14:20,220 one same path on subsequent requests. 319 00:14:20,220 --> 00:14:23,740 It's sort of this mesh network, whereby each of those nodes is a router, 320 00:14:23,740 --> 00:14:25,090 each of those dots is a router. 321 00:14:25,090 --> 00:14:27,640 Each of the edges is just a point of connectivity, 322 00:14:27,640 --> 00:14:29,870 wired typically or wireless. 323 00:14:29,870 --> 00:14:34,930 And so data can travel from point A to point B by getting relayed 324 00:14:34,930 --> 00:14:36,460 by all of these different routers. 325 00:14:36,460 --> 00:14:39,450 And so when a router receives a packet like this one here, 326 00:14:39,450 --> 00:14:44,280 it looks at the to address and sees, oh, this is destined for 1.2.3.4. 327 00:14:44,280 --> 00:14:45,500 I know where that server is. 328 00:14:45,500 --> 00:14:46,500 It's that way. 329 00:14:46,500 --> 00:14:49,050 And so the router routes the packet this way. 330 00:14:49,050 --> 00:14:50,960 The next router in turn might look at this 331 00:14:50,960 --> 00:14:52,980 and be like, oh, I know where this machine is. 332 00:14:52,980 --> 00:14:53,650 It's that way. 333 00:14:53,650 --> 00:14:56,160 And so it might hand this off to the next router. 334 00:14:56,160 --> 00:14:58,490 And by nature of how the internet is configured, 335 00:14:58,490 --> 00:15:05,120 typically it requires a max of 30 or so hops, hand-offs, to get from one point 336 00:15:05,120 --> 00:15:07,060 A to one point B around the entire world. 337 00:15:07,060 --> 00:15:09,550 And typically it's far fewer than that. 338 00:15:09,550 --> 00:15:15,300 And it is the routers that decide how to get the data with high probability 339 00:15:15,300 --> 00:15:17,170 closer and closer and closer. 340 00:15:17,170 --> 00:15:18,824 Not necessarily geographically. 341 00:15:18,824 --> 00:15:20,990 Sometimes it's faster to take a different direction. 342 00:15:20,990 --> 00:15:23,710 Sometimes it's less expensive to take a different direction. 343 00:15:23,710 --> 00:15:28,160 But eventually the data actually makes its way from point A to point B. 344 00:15:28,160 --> 00:15:33,330 Indeed, you can see here the IP address of a router. 345 00:15:33,330 --> 00:15:36,580 Which of the routers in this story is this IP address? 346 00:15:36,580 --> 00:15:38,637 10.254.16.1. 347 00:15:38,637 --> 00:15:39,720 And notice the similarity. 348 00:15:39,720 --> 00:15:43,100 My IP address is almost the same, but ends with 128. 349 00:15:43,100 --> 00:15:46,840 Router IP addresses often, by human convention, end with the number .1. 350 00:15:46,840 --> 00:15:49,850 So you know that they're on the same network. 351 00:15:49,850 --> 00:15:50,880 Where is this router? 352 00:15:50,880 --> 00:15:52,290 Whose router is this? 353 00:15:52,290 --> 00:15:56,140 Or where in the story is it would you think? 354 00:15:56,140 --> 00:15:56,800 Harvard, yeah. 355 00:15:56,800 --> 00:15:59,200 It's probably quite proximal to this building. 356 00:15:59,200 --> 00:16:00,390 Maybe it's in the basement. 357 00:16:00,390 --> 00:16:01,840 Maybe it's around the corner. 358 00:16:01,840 --> 00:16:04,000 It's one of Harvard's routers presumably. 359 00:16:04,000 --> 00:16:08,140 And in turn, that has some kind of connectivity to these other routers 360 00:16:08,140 --> 00:16:08,760 as well. 361 00:16:08,760 --> 00:16:12,317 Typically these edges, these left right decisions are dynamically configured. 362 00:16:12,317 --> 00:16:15,400 So that if you unplug a router-- which wouldn't typically happen but could 363 00:16:15,400 --> 00:16:20,070 happen-- or if one router gets congested, special protocols, software 364 00:16:20,070 --> 00:16:22,150 that these routers are running will dynamically 365 00:16:22,150 --> 00:16:26,100 figure out which is the better or newer or correct way to send the data. 366 00:16:26,100 --> 00:16:27,970 And so it's all very adaptive without humans 367 00:16:27,970 --> 00:16:29,650 necessarily having to get involved. 368 00:16:29,650 --> 00:16:30,480 Yeah? 369 00:16:30,480 --> 00:16:31,355 AUDIENCE: [INAUDIBLE] 370 00:16:31,355 --> 00:16:36,297 371 00:16:36,297 --> 00:16:37,630 DAVID J. MALAN: Totally depends. 372 00:16:37,630 --> 00:16:39,130 And it's different people along the way. 373 00:16:39,130 --> 00:16:41,450 So Harvard, of course, owns it's one or more routers. 374 00:16:41,450 --> 00:16:43,960 I, in my home, might own my tiny little router. 375 00:16:43,960 --> 00:16:46,050 Which really can just get data out of my home. 376 00:16:46,050 --> 00:16:50,390 In the middle are lots of internet service providers, big and small. 377 00:16:50,390 --> 00:16:52,640 Level three is a very big one. 378 00:16:52,640 --> 00:16:55,570 Verizon and Comcast have their own networks as well. 379 00:16:55,570 --> 00:16:57,040 There is yet other bigger fish. 380 00:16:57,040 --> 00:16:59,069 Google has its own fiber network and so forth-- 381 00:16:59,069 --> 00:16:59,944 AUDIENCE: [INAUDIBLE] 382 00:16:59,944 --> 00:17:03,964 383 00:17:03,964 --> 00:17:04,880 DAVID J. MALAN: Money. 384 00:17:04,880 --> 00:17:06,520 They're peering points, so to speak. 385 00:17:06,520 --> 00:17:09,359 So these larger internet service providers typically 386 00:17:09,359 --> 00:17:11,579 have financial agreements between them that 387 00:17:11,579 --> 00:17:14,250 govern how much they will pay to send their data this way 388 00:17:14,250 --> 00:17:17,500 or that way in order to get data from one place to another. 389 00:17:17,500 --> 00:17:20,092 They themselves are internally incentivized to actually relay 390 00:17:20,092 --> 00:17:21,800 the data to someone if they actually want 391 00:17:21,800 --> 00:17:24,849 to have customers whose data can go from point A to point B. 392 00:17:24,849 --> 00:17:28,550 And so sometimes the decisions to go either to the left or to the right, 393 00:17:28,550 --> 00:17:31,300 so to speak, might be made not so much on technological decisions 394 00:17:31,300 --> 00:17:33,200 but just on business decisions. 395 00:17:33,200 --> 00:17:35,420 To whom do we have appearing connection? 396 00:17:35,420 --> 00:17:38,150 And so even if you as a business owner, for instance, 397 00:17:38,150 --> 00:17:40,820 want to bring your-- run your own servers, which 398 00:17:40,820 --> 00:17:44,060 isn't as common an instinct these days with cloud computing all 399 00:17:44,060 --> 00:17:47,770 the rage-- more on that later-- if you might have your own servers in a data 400 00:17:47,770 --> 00:17:52,410 center in some warehouse out in Western Massachusetts or New Jersey 401 00:17:52,410 --> 00:17:54,720 or those kinds of places, you would typically 402 00:17:54,720 --> 00:17:58,200 decide for yourself who do you want to pay to physically run 403 00:17:58,200 --> 00:18:01,870 a cable into your servers, into your part of the data center, 404 00:18:01,870 --> 00:18:04,340 to establish exactly one of those connections. 405 00:18:04,340 --> 00:18:08,220 And that too would be a financial decision and a reputation decision. 406 00:18:08,220 --> 00:18:11,440 And not so much a technology one. 407 00:18:11,440 --> 00:18:12,311 Yeah, David? 408 00:18:12,311 --> 00:18:13,186 AUDIENCE: [INAUDIBLE] 409 00:18:13,186 --> 00:18:21,230 410 00:18:21,230 --> 00:18:22,860 DAVID J. MALAN: Ah, a good question. 411 00:18:22,860 --> 00:18:26,950 Why wouldn't you just have one router and lots of switches 412 00:18:26,950 --> 00:18:30,160 for the whole campus? 413 00:18:30,160 --> 00:18:32,350 Part of it is distributed management. 414 00:18:32,350 --> 00:18:34,090 So Harvard, for instance, is a big place. 415 00:18:34,090 --> 00:18:37,360 So let me oversimplify and say each of the schools to some extent 416 00:18:37,360 --> 00:18:39,370 might run its own local network so that they 417 00:18:39,370 --> 00:18:42,244 can have their own policies, their own infrastructure, and so forth. 418 00:18:42,244 --> 00:18:44,410 But they want to interconnect to the rest of campus. 419 00:18:44,410 --> 00:18:46,860 These days, Harvard has been transitioning 420 00:18:46,860 --> 00:18:48,792 to having more of a monolithic infrastructure. 421 00:18:48,792 --> 00:18:50,500 But there are still side effects of this. 422 00:18:50,500 --> 00:18:52,958 For instance, in a couple of the offices that I spend time, 423 00:18:52,958 --> 00:18:57,300 we can't actually-- we can have the offices talk to one another certainly. 424 00:18:57,300 --> 00:18:59,720 But we can't create the illusion of what's 425 00:18:59,720 --> 00:19:05,700 called a VLAN or virtual local area network, whereby two separate buildings 426 00:19:05,700 --> 00:19:07,360 appear to be the same network. 427 00:19:07,360 --> 00:19:12,060 Simply because of legacy and actual hardware limitations. 428 00:19:12,060 --> 00:19:14,760 There's also performance. 429 00:19:14,760 --> 00:19:17,410 For instance internal to campus, there's only so much traffic. 430 00:19:17,410 --> 00:19:19,993 But there's certainly a bottleneck when you're leaving campus. 431 00:19:19,993 --> 00:19:22,350 So you might have want to have a separate route, a more 432 00:19:22,350 --> 00:19:24,880 souped up router, that can actually handle that outbound traffic. 433 00:19:24,880 --> 00:19:27,338 Whereas you have smaller less expensive routers internally. 434 00:19:27,338 --> 00:19:32,330 And so it boils down to those kinds of economic and logistical decisions. 435 00:19:32,330 --> 00:19:33,240 Good question. 436 00:19:33,240 --> 00:19:34,900 There's also security implications too. 437 00:19:34,900 --> 00:19:37,152 A switch typically operates technologically 438 00:19:37,152 --> 00:19:38,860 at a certain level that doesn't allow you 439 00:19:38,860 --> 00:19:42,220 the same amount of control over what comes in and out of your network. 440 00:19:42,220 --> 00:19:46,720 Whereas a router is more of a deliberate bottleneck that you have more 441 00:19:46,720 --> 00:19:48,399 control over. 442 00:19:48,399 --> 00:19:50,440 But the line is blurred to some extent these days 443 00:19:50,440 --> 00:19:52,470 between routers and switches and their features. 444 00:19:52,470 --> 00:19:53,040 As an aside. 445 00:19:53,040 --> 00:19:54,260 This is a more arcane detail. 446 00:19:54,260 --> 00:19:58,660 But does anyone-- has anyone probably seen subnet mask before? 447 00:19:58,660 --> 00:20:00,869 Someone know what subnet mask before is? 448 00:20:00,869 --> 00:20:02,910 We don't have to get too far into the weeds here. 449 00:20:02,910 --> 00:20:06,640 But that is simply a number that allows the local computer-- 450 00:20:06,640 --> 00:20:12,730 my Mac in this case-- to decide when it is sending data from point A 451 00:20:12,730 --> 00:20:18,640 to some other point B, if that other computer is on the same local network 452 00:20:18,640 --> 00:20:21,260 or if it's elsewhere on the internet. 453 00:20:21,260 --> 00:20:25,980 And so essentially this subnet mask, 255.255.240.0, 454 00:20:25,980 --> 00:20:28,200 represents a pattern of ones and zeros. 455 00:20:28,200 --> 00:20:32,260 It uses that patterns of ones and zeros to determine, hm, 456 00:20:32,260 --> 00:20:35,710 I am trying to request or send information to this other server. 457 00:20:35,710 --> 00:20:39,040 If that pattern of one and zeros tells my Mac 458 00:20:39,040 --> 00:20:42,030 that, ooh, that other computer is on the local network, what's nice 459 00:20:42,030 --> 00:20:45,730 is my computer will use a different approach, a different protocol, 460 00:20:45,730 --> 00:20:48,550 ethernet specifically, to actually get the data from point A 461 00:20:48,550 --> 00:20:51,180 to point B. It will never go out on the public internet. 462 00:20:51,180 --> 00:20:53,380 By contrast, if that number reveals, oh, this 463 00:20:53,380 --> 00:20:56,010 is actually a computer that's far away, that's 464 00:20:56,010 --> 00:20:59,310 how the computer decides to send it not to the local network, LAN, 465 00:20:59,310 --> 00:21:02,160 but to the next router instead, to that IP. 466 00:21:02,160 --> 00:21:04,710 So it all boils down to what do you write on the envelope? 467 00:21:04,710 --> 00:21:07,870 The local address or the router's address instead. 468 00:21:07,870 --> 00:21:09,440 So let's see if we can't see this. 469 00:21:09,440 --> 00:21:11,290 Let me try to pull up. 470 00:21:11,290 --> 00:21:15,960 Give me just one moment to see if I can get into a server here 471 00:21:15,960 --> 00:21:18,270 that will let us do this. 472 00:21:18,270 --> 00:21:19,422 Nope, that won't do it. 473 00:21:19,422 --> 00:21:23,060 474 00:21:23,060 --> 00:21:23,990 That doesn't work. 475 00:21:23,990 --> 00:21:31,085 476 00:21:31,085 --> 00:21:32,420 Give me one moment. 477 00:21:32,420 --> 00:21:44,850 478 00:21:44,850 --> 00:21:45,430 Come on. 479 00:21:45,430 --> 00:21:50,504 480 00:21:50,504 --> 00:21:51,545 All right, let's do this. 481 00:21:51,545 --> 00:21:58,800 482 00:21:58,800 --> 00:21:59,300 All right. 483 00:21:59,300 --> 00:22:00,370 Let's see if this works. 484 00:22:00,370 --> 00:22:02,350 And then I'll explain what we're doing here. 485 00:22:02,350 --> 00:22:02,850 Whoops. 486 00:22:02,850 --> 00:22:07,700 487 00:22:07,700 --> 00:22:08,420 Transfer out. 488 00:22:08,420 --> 00:22:18,640 489 00:22:18,640 --> 00:22:19,140 All right. 490 00:22:19,140 --> 00:22:22,680 Let's see if this gives me what I want. 491 00:22:22,680 --> 00:22:24,084 Turn this around. 492 00:22:24,084 --> 00:22:26,980 493 00:22:26,980 --> 00:22:27,750 OK, this works. 494 00:22:27,750 --> 00:22:32,980 495 00:22:32,980 --> 00:22:33,720 OK. 496 00:22:33,720 --> 00:22:36,790 So let's go ahead and try this as follows. 497 00:22:36,790 --> 00:22:37,290 -q 1. 498 00:22:37,290 --> 00:22:39,880 499 00:22:39,880 --> 00:22:40,950 OK, perfect. 500 00:22:40,950 --> 00:22:43,050 MIT is seven hops away. 501 00:22:43,050 --> 00:22:43,960 What did I just do? 502 00:22:43,960 --> 00:22:47,490 So this is a command line program, a text-based program on my Mac-- 503 00:22:47,490 --> 00:22:53,680 though an equivalent exists for PCs as well-- and I ran traceroutes query one. 504 00:22:53,680 --> 00:22:57,940 So just give me, send one request at a time, to www.MIT.edu. 505 00:22:57,940 --> 00:23:00,910 Because I am technically interested in how data gets from 506 00:23:00,910 --> 00:23:05,170 point A here at my laptop to point B at MIT down the road. 507 00:23:05,170 --> 00:23:08,080 And it turns out that as we saw a moment ago, 508 00:23:08,080 --> 00:23:12,790 the first hop that my data takes in order to talk to MIT's web server 509 00:23:12,790 --> 00:23:13,610 is to that address. 510 00:23:13,610 --> 00:23:16,310 Which is the IP of what again? 511 00:23:16,310 --> 00:23:19,270 Yeah, the router that my Mac was preconfigured to use. 512 00:23:19,270 --> 00:23:20,910 I don't recognize hop two. 513 00:23:20,910 --> 00:23:22,930 It's just some other IP address. 514 00:23:22,930 --> 00:23:24,310 But I do know it's also private. 515 00:23:24,310 --> 00:23:27,400 It turns out that any IP address that starts with the number 10 516 00:23:27,400 --> 00:23:28,840 is a private IP address. 517 00:23:28,840 --> 00:23:32,960 So you know it's being administered locally by Harvard or by a family 518 00:23:32,960 --> 00:23:34,750 member or someone in your company. 519 00:23:34,750 --> 00:23:37,570 And if you're curious, this is an inexhaustive list. 520 00:23:37,570 --> 00:23:42,260 But anything ending in 10 dot something, anything ending in-- starting with 521 00:23:42,260 --> 00:23:52,510 172.16.something, anything starting with 192.168.something. 522 00:23:52,510 --> 00:23:55,390 So indeed if you go home tonight or this coming week 523 00:23:55,390 --> 00:23:59,720 and try to find your own Mac or PC's IP address at home, or even at work, 524 00:23:59,720 --> 00:24:02,410 perhaps, odds are it starts with one of these values. 525 00:24:02,410 --> 00:24:05,780 These are private IPs that are, by human decision years 526 00:24:05,780 --> 00:24:07,670 ago, never to appear on the public internet. 527 00:24:07,670 --> 00:24:10,400 They're meant to be used in homes and businesses and campuses and the like. 528 00:24:10,400 --> 00:24:12,691 So I know that one and two are somewhere here on campus 529 00:24:12,691 --> 00:24:14,490 because they have private IP addresses. 530 00:24:14,490 --> 00:24:17,890 But then step 3 and 4 get interesting because they have 531 00:24:17,890 --> 00:24:19,290 what are called host names. 532 00:24:19,290 --> 00:24:24,180 Semi-human friendly names that look like domain names, and indeed they are. 533 00:24:24,180 --> 00:24:26,719 And I know it's Harvard for sure. 534 00:24:26,719 --> 00:24:28,510 But I don't know really know where this is. 535 00:24:28,510 --> 00:24:30,120 I do see its IP address. 536 00:24:30,120 --> 00:24:32,580 This is a public IP address at this point. 537 00:24:32,580 --> 00:24:37,730 And I only know from convention, coregw means core gateway. 538 00:24:37,730 --> 00:24:39,610 Gateway as a synonym for router. 539 00:24:39,610 --> 00:24:42,160 So this is like a core router at Harvard. 540 00:24:42,160 --> 00:24:45,380 So it's a really important router at Harvard is as much as I can glean. 541 00:24:45,380 --> 00:24:48,880 And what's interesting is that it took three milliseconds for my data 542 00:24:48,880 --> 00:24:50,240 to reach that router. 543 00:24:50,240 --> 00:24:53,200 It took 1.75 for it to reach the second router. 544 00:24:53,200 --> 00:24:55,300 And two milliseconds to reach the first router. 545 00:24:55,300 --> 00:24:59,651 And what strikes you about these three values? 546 00:24:59,651 --> 00:25:01,567 Especially since I read them in reverse order. 547 00:25:01,567 --> 00:25:05,046 548 00:25:05,046 --> 00:25:06,040 AUDIENCE: [INAUDIBLE] 549 00:25:06,040 --> 00:25:06,910 DAVID J. MALAN: What's that? 550 00:25:06,910 --> 00:25:07,951 They're not all the same. 551 00:25:07,951 --> 00:25:10,504 So there's a lot of variability or non-determinism 552 00:25:10,504 --> 00:25:12,420 when it comes to sending data on the internet. 553 00:25:12,420 --> 00:25:15,280 The routers might be slightly more busy at some times or other. 554 00:25:15,280 --> 00:25:18,020 And by busy I mean maybe more people are sending data. 555 00:25:18,020 --> 00:25:20,390 And so it takes a moment for the router to figure out 556 00:25:20,390 --> 00:25:22,306 all of the different decisions it has to make. 557 00:25:22,306 --> 00:25:24,736 And that slows down my data or someone else's data. 558 00:25:24,736 --> 00:25:25,860 These are all pretty close. 559 00:25:25,860 --> 00:25:27,997 They're all essentially two, three milliseconds. 560 00:25:27,997 --> 00:25:29,080 So it's still pretty fast. 561 00:25:29,080 --> 00:25:31,532 But that variability is to be expected. 562 00:25:31,532 --> 00:25:32,740 And these are not cumulative. 563 00:25:32,740 --> 00:25:36,510 What this program is doing is it's kind of putting its toe in the water 564 00:25:36,510 --> 00:25:38,270 a little deeper and deeper each time. 565 00:25:38,270 --> 00:25:41,010 How fast can I get to the first hop? 566 00:25:41,010 --> 00:25:42,150 Then to the second hop? 567 00:25:42,150 --> 00:25:43,220 Than to the third hop? 568 00:25:43,220 --> 00:25:44,530 So it's progressively going. 569 00:25:44,530 --> 00:25:47,050 They're not additive times. 570 00:25:47,050 --> 00:25:51,190 Step four is another router that I just know from convention is Border Gateway. 571 00:25:51,190 --> 00:25:54,880 So this is probably another router that Harvard owns that's on the border. 572 00:25:54,880 --> 00:25:58,490 So figuratively speaking, the edge, or maybe literally speaking, 573 00:25:58,490 --> 00:26:00,680 the edge of campus. 574 00:26:00,680 --> 00:26:05,040 Then it looks like Harvard's internet service provider is Quest, 575 00:26:05,040 --> 00:26:07,040 which is a really big ISP as well. 576 00:26:07,040 --> 00:26:09,170 Like level 3 and others, to your question earlier. 577 00:26:09,170 --> 00:26:13,290 So Quest would be one of our peers to whom we connect. 578 00:26:13,290 --> 00:26:18,680 It looks like they have at least two routers that are curiously 579 00:26:18,680 --> 00:26:21,130 named the same in the same address. 580 00:26:21,130 --> 00:26:22,850 I do not know why that is happening. 581 00:26:22,850 --> 00:26:25,870 That seems to be a quirk or a bug of some sort. 582 00:26:25,870 --> 00:26:29,350 And then curiously, it looks like MIT's website 583 00:26:29,350 --> 00:26:32,800 has been outsourced to a company that you might know of called Akamai. 584 00:26:32,800 --> 00:26:36,470 They, among other things, are a CDN content delivery network. 585 00:26:36,470 --> 00:26:40,980 Which just means they run servers for people to store their static files on. 586 00:26:40,980 --> 00:26:45,220 So that MIT doesn't have to run its own web servers, its own physical machines. 587 00:26:45,220 --> 00:26:49,060 They just pay Akamai some number of dollars per month or per year 588 00:26:49,060 --> 00:26:49,935 to store it for them. 589 00:26:49,935 --> 00:26:51,518 I have no idea what this number means. 590 00:26:51,518 --> 00:26:53,580 It's probably just a unique random identifier. 591 00:26:53,580 --> 00:26:56,014 And this is apparently where people deploy stuff too. 592 00:26:56,014 --> 00:26:56,930 So that's all we know. 593 00:26:56,930 --> 00:27:01,470 But it took 7 milliseconds to get to MIT, as opposed to a good 15-20 minutes 594 00:27:01,470 --> 00:27:04,247 by car or by [? T ?] or by bike from here. 595 00:27:04,247 --> 00:27:05,580 All right, so let's try another. 596 00:27:05,580 --> 00:27:09,900 Trace route to Stanford.edu and let's see what changes. 597 00:27:09,900 --> 00:27:12,610 Same initial path, but now step 5 is a little different. 598 00:27:12,610 --> 00:27:19,380 599 00:27:19,380 --> 00:27:22,600 The stars generally mean that the router, for whatever reason 600 00:27:22,600 --> 00:27:24,600 of misconfiguration or deliberate security, 601 00:27:24,600 --> 00:27:28,240 is just not responding to the queries I'm sending. 602 00:27:28,240 --> 00:27:31,680 So it's sort of anonymous. 603 00:27:31,680 --> 00:27:33,760 Unfortunately, there aren't many named router 604 00:27:33,760 --> 00:27:35,825 as it seems between us and Stanford.edu. 605 00:27:35,825 --> 00:27:42,390 606 00:27:42,390 --> 00:27:45,600 And there's a lot of anonymity between us and them. 607 00:27:45,600 --> 00:27:48,010 So it started off somewhat interesting, and now is 608 00:27:48,010 --> 00:27:51,080 devolving into not very interesting. 609 00:27:51,080 --> 00:27:52,750 Let's try someone else. 610 00:27:52,750 --> 00:27:54,302 UCBerkeley.ed-- 611 00:27:54,302 --> 00:27:56,910 AUDIENCE: [INAUDIBLE]. 612 00:27:56,910 --> 00:28:00,010 DAVID J. MALAN: Oh, we'll get there. 613 00:28:00,010 --> 00:28:01,770 Berkeley.edu. 614 00:28:01,770 --> 00:28:03,050 This one's juicier. 615 00:28:03,050 --> 00:28:04,450 OK. 616 00:28:04,450 --> 00:28:07,110 So what do we see here? 617 00:28:07,110 --> 00:28:09,410 So here's-- this was new. 618 00:28:09,410 --> 00:28:11,226 Nox, this is the northern crossroads. 619 00:28:11,226 --> 00:28:14,350 This is a really big peering point where lots of different internet service 620 00:28:14,350 --> 00:28:17,260 providers interconnect in the Northeast. 621 00:28:17,260 --> 00:28:19,430 Northern crossroads-- it looks like internet2, 622 00:28:19,430 --> 00:28:22,410 the internet2 is a network used primarily by universities 623 00:28:22,410 --> 00:28:24,470 to be super fast, to allow them to share data 624 00:28:24,470 --> 00:28:28,090 and information in silly little tests like this more effectively. 625 00:28:28,090 --> 00:28:34,720 It looks like, curiously, that step eight and nine 626 00:28:34,720 --> 00:28:37,270 are kind of showing their hand as to where they are. 627 00:28:37,270 --> 00:28:40,000 Notice what changes between step seven and eight What's 628 00:28:40,000 --> 00:28:41,595 striking about these two routers? 629 00:28:41,595 --> 00:28:44,860 630 00:28:44,860 --> 00:28:47,680 What's different? 631 00:28:47,680 --> 00:28:50,560 AUDIENCE: [INAUDIBLE] 632 00:28:50,560 --> 00:28:52,380 DAVID J. MALAN: I'm sorry? 633 00:28:52,380 --> 00:28:55,960 The time really jumps, right, from seven milliseconds-- previous to that 634 00:28:55,960 --> 00:28:58,930 was four and two, two, two milliseconds. 635 00:28:58,930 --> 00:29:00,920 Now it's 50 milliseconds. 636 00:29:00,920 --> 00:29:02,930 Where might step eight be? 637 00:29:02,930 --> 00:29:04,340 AUDIENCE: [INAUDIBLE]. 638 00:29:04,340 --> 00:29:04,900 DAVID J. MALAN: What's that? 639 00:29:04,900 --> 00:29:06,274 It's not a private versus public. 640 00:29:06,274 --> 00:29:08,390 In fact, pretty much everything right now 641 00:29:08,390 --> 00:29:10,640 is public, by nature of it being routers. 642 00:29:10,640 --> 00:29:14,410 Right now everything you're seeing is inside this internet. 643 00:29:14,410 --> 00:29:16,019 AUDIENCE: [INAUDIBLE]. 644 00:29:16,019 --> 00:29:17,310 DAVID J. MALAN: Distance, yeah. 645 00:29:17,310 --> 00:29:20,565 And it turns out that you can sometimes infer from the host name, 646 00:29:20,565 --> 00:29:22,690 just because of human conventions for naming things 647 00:29:22,690 --> 00:29:25,830 that step eight, that router, is probably physically in Houston, 648 00:29:25,830 --> 00:29:26,771 I'm guessing? 649 00:29:26,771 --> 00:29:27,270 Texas. 650 00:29:27,270 --> 00:29:30,454 So it's a good distance away from wherever step seven actually was. 651 00:29:30,454 --> 00:29:32,620 I'm not sure where it is but probably the Northeast. 652 00:29:32,620 --> 00:29:36,280 Meanwhile, step 10 is even farther from step nine, 653 00:29:36,280 --> 00:29:39,440 I'm guessing, because of another jump in time. 654 00:29:39,440 --> 00:29:43,010 Where might step 10 be? 655 00:29:43,010 --> 00:29:44,010 LAX? 656 00:29:44,010 --> 00:29:46,930 For whatever reason, system administrators 657 00:29:46,930 --> 00:29:49,570 have historically liked naming routers after airports. 658 00:29:49,570 --> 00:29:53,680 So this is probably in LA. 659 00:29:53,680 --> 00:29:56,930 This one here, svl. 660 00:29:56,930 --> 00:29:57,660 Still LAX. 661 00:29:57,660 --> 00:29:59,340 I don't know what that is, so I'll go with the LAX. 662 00:29:59,340 --> 00:30:01,048 And then finally, we've reached Berkeley. 663 00:30:01,048 --> 00:30:03,850 And then for some reason, it's just not responding further. 664 00:30:03,850 --> 00:30:08,010 So it takes about 90 milliseconds to send data to Berkeley. 665 00:30:08,010 --> 00:30:12,970 Or about 5 or 6 hours to fly to Berkeley to put things into sort of human terms. 666 00:30:12,970 --> 00:30:15,680 And indeed, I think-- someone suggested something abroad. 667 00:30:15,680 --> 00:30:20,750 If we do like CNN.co.jp, the Japanese version of CNN's website, 668 00:30:20,750 --> 00:30:23,120 let's see if it cooperates here. 669 00:30:23,120 --> 00:30:25,490 So again, we seem to have taken the same direction. 670 00:30:25,490 --> 00:30:28,160 A little anonymity there. 671 00:30:28,160 --> 00:30:32,680 And then some really interesting stuff going on 672 00:30:32,680 --> 00:30:34,610 in step seven, eight, nine, and 10. 673 00:30:34,610 --> 00:30:36,490 Massive jump here. 674 00:30:36,490 --> 00:30:44,060 What might explain the striking increase in time between hops nine and 10? 675 00:30:44,060 --> 00:30:45,830 Pacific what? 676 00:30:45,830 --> 00:30:47,260 Yeah, so the Pacific Ocean. 677 00:30:47,260 --> 00:30:51,120 So indeed, there's huge trans-Atlantic cables. 678 00:30:51,120 --> 00:30:52,600 And trans-Pacific cables. 679 00:30:52,600 --> 00:30:56,630 And just generally, oceanic cables that carry a huge amount of internet traffic 680 00:30:56,630 --> 00:31:01,280 that really big ships just slowly drag and leave at the bottom of the ocean. 681 00:31:01,280 --> 00:31:03,840 And indeed, it might take 100 additional milliseconds 682 00:31:03,840 --> 00:31:07,340 to go from maybe California, can't really tell from the names here 683 00:31:07,340 --> 00:31:11,700 alone, to the coast of Japan so many miles away. 684 00:31:11,700 --> 00:31:14,020 So you can get a sense there of distance as well. 685 00:31:14,020 --> 00:31:17,540 And meanwhile, let's see if we do-- one last one, 686 00:31:17,540 --> 00:31:22,687 let's do like Yale.edu which is still on this coast, and see what we get. 687 00:31:22,687 --> 00:31:24,520 Here too we're going to get similar results. 688 00:31:24,520 --> 00:31:26,780 So nine or so milliseconds alone. 689 00:31:26,780 --> 00:31:29,289 So this puts into more real terms what's actually going on. 690 00:31:29,289 --> 00:31:31,080 And a system administrator, if he or she is 691 00:31:31,080 --> 00:31:32,980 trying to diagnose some issue with a network, 692 00:31:32,980 --> 00:31:35,480 this is actually a real tool that real people might actually 693 00:31:35,480 --> 00:31:37,390 use to figure out where is the data flowing. 694 00:31:37,390 --> 00:31:40,650 Is one of the millisecond counts way bigger than others, 695 00:31:40,650 --> 00:31:41,960 is it anomalously large? 696 00:31:41,960 --> 00:31:44,414 That might mean that one router is malfunctioning 697 00:31:44,414 --> 00:31:45,830 or it's just completely congested. 698 00:31:45,830 --> 00:31:48,070 And so this is just one such diagnostic tool. 699 00:31:48,070 --> 00:31:50,570 But another one that's useful to play with is this. 700 00:31:50,570 --> 00:31:53,190 I proposed earlier that people like Google.com 701 00:31:53,190 --> 00:31:56,710 not have one IP address publicly identifying them. 702 00:31:56,710 --> 00:31:59,820 And indeed, if we hit Google.com, in this case-- 703 00:31:59,820 --> 00:32:01,940 let's see if I'm about to prove my point or not. 704 00:32:01,940 --> 00:32:02,270 Nope. 705 00:32:02,270 --> 00:32:02,770 OK. 706 00:32:02,770 --> 00:32:04,960 Google has just one IP address. 707 00:32:04,960 --> 00:32:09,886 Which is 172.217.0.46, which is a little misleading. 708 00:32:09,886 --> 00:32:11,010 But let's see what happens. 709 00:32:11,010 --> 00:32:15,320 htp:// and that IP address. 710 00:32:15,320 --> 00:32:17,510 And indeed, it brings me to Google's website. 711 00:32:17,510 --> 00:32:21,640 So if Google has this one IP address-- which does not demonstrate the point 712 00:32:21,640 --> 00:32:25,220 I was hoping to make-- why do they not just advertise 713 00:32:25,220 --> 00:32:34,105 their URL as http://172.217.0.46? 714 00:32:34,105 --> 00:32:35,230 Right, it's not meaningful. 715 00:32:35,230 --> 00:32:38,320 I can't even read it, let alone memorize it. 716 00:32:38,320 --> 00:32:41,351 But this is kind of an interesting upgrade from yesteryear. 717 00:32:41,351 --> 00:32:41,850 Right? 718 00:32:41,850 --> 00:32:44,224 Back in the day, some of us might remember 1-800-COLLECT. 719 00:32:44,224 --> 00:32:47,410 800 Which to this day, I don't know what number that is. 720 00:32:47,410 --> 00:32:52,190 But I know the mnemonic allowed me to remember how to dial it, 721 00:32:52,190 --> 00:32:56,070 so long as there's a mapping on the phone pad to letters in this case. 722 00:32:56,070 --> 00:32:59,570 And this is what DNS is for the internet, essentially. 723 00:32:59,570 --> 00:33:00,650 But it's automatic. 724 00:33:00,650 --> 00:33:02,352 DNS is domain name system. 725 00:33:02,352 --> 00:33:05,060 Which is to say that there are special servers in the world whose 726 00:33:05,060 --> 00:33:09,560 purpose in life is to translate numeric addresses to fully qualified domain 727 00:33:09,560 --> 00:33:10,060 names. 728 00:33:10,060 --> 00:33:13,660 Host names like Google.com and vice versa. 729 00:33:13,660 --> 00:33:17,360 So we humans only have to remember, and even our computers initially only 730 00:33:17,360 --> 00:33:19,990 have to write the domain name that the human provided. 731 00:33:19,990 --> 00:33:22,940 And some other server, a DNS server, will actually 732 00:33:22,940 --> 00:33:24,270 do the automatic conversion. 733 00:33:24,270 --> 00:33:26,780 And actually write at the end of the day the numeric address 734 00:33:26,780 --> 00:33:29,410 on the so-called virtual envelope. 735 00:33:29,410 --> 00:33:31,681 But where-- David? 736 00:33:31,681 --> 00:33:32,556 AUDIENCE: [INAUDIBLE] 737 00:33:32,556 --> 00:33:36,564 738 00:33:36,564 --> 00:33:37,980 DAVID J. MALAN: You would hope so. 739 00:33:37,980 --> 00:33:39,150 Not in this case, though. 740 00:33:39,150 --> 00:33:42,280 So DNS also is a distributed system. 741 00:33:42,280 --> 00:33:44,060 And it's a hierarchical system. 742 00:33:44,060 --> 00:33:46,620 Which means there's lots of caching that happens. 743 00:33:46,620 --> 00:33:49,481 So it would be-- in the extreme scenario, 744 00:33:49,481 --> 00:33:51,980 suppose there is just one DNS server in the world that knows 745 00:33:51,980 --> 00:33:53,590 about all IP addresses and all names. 746 00:33:53,590 --> 00:33:57,800 What's the downside of that design intuitively? 747 00:33:57,800 --> 00:34:00,540 If it goes down, gets jammed, can't handle all of the traffic. 748 00:34:00,540 --> 00:34:03,730 So that just feels like bad design, whether or not you're an engineer. 749 00:34:03,730 --> 00:34:07,610 So the DNS system has multiple servers. 750 00:34:07,610 --> 00:34:11,790 But it doesn't just have duplicative server, it has a hierarchical system. 751 00:34:11,790 --> 00:34:15,779 So that there are some servers, typically by convention, at least 13 752 00:34:15,779 --> 00:34:19,929 root servers, whose purpose in life isn't to know all of the answers, 753 00:34:19,929 --> 00:34:22,190 but to know who has the answers. 754 00:34:22,190 --> 00:34:26,250 And the root servers might know who would know the answers for all 755 00:34:26,250 --> 00:34:30,800 of the .com's, for all of the .gov's, for all of the .jp's, and so forth. 756 00:34:30,800 --> 00:34:34,080 Meanwhile, there is-- if you think of a family tree. 757 00:34:34,080 --> 00:34:36,030 If the root servers are up here, you have 758 00:34:36,030 --> 00:34:39,750 a second tier of servers whose purpose in life 759 00:34:39,750 --> 00:34:41,190 is to know those actual answers. 760 00:34:41,190 --> 00:34:43,870 Or if they don't, to know whom to ask in turn. 761 00:34:43,870 --> 00:34:46,020 And that goes all the way down to my laptop. 762 00:34:46,020 --> 00:34:50,626 For efficiency purposes, when my Mac first requests Google.com, 763 00:34:50,626 --> 00:34:52,928 it obviously does not know the IP address. 764 00:34:52,928 --> 00:34:56,219 Because Apple did not ship this in every Mac with the IP address of Google.com, 765 00:34:56,219 --> 00:35:00,790 especially since it might change day to day or week to week or a year to year. 766 00:35:00,790 --> 00:35:03,010 But Harvard does have a DNS server. 767 00:35:03,010 --> 00:35:06,670 But Harvard doesn't know the answers to all IP addresses and host names 768 00:35:06,670 --> 00:35:07,260 in the world. 769 00:35:07,260 --> 00:35:11,087 But Harvard has it so its own internet service provider that it could ask. 770 00:35:11,087 --> 00:35:14,420 And maybe that internet service provider has a bigger internet service provider. 771 00:35:14,420 --> 00:35:16,830 And if that person doesn't know, then at least 772 00:35:16,830 --> 00:35:20,420 the root servers can help us figure out who would actually know. 773 00:35:20,420 --> 00:35:24,560 But along the way, once my Mac gets that answer back the first time, 774 00:35:24,560 --> 00:35:28,010 by convention, Mac OS and Windows are going to remember or cache 775 00:35:28,010 --> 00:35:29,700 the answer locally. 776 00:35:29,700 --> 00:35:30,972 And why would they do that? 777 00:35:30,972 --> 00:35:33,750 778 00:35:33,750 --> 00:35:35,230 So you don't have to ask it again. 779 00:35:35,230 --> 00:35:36,438 Which is good for efficiency. 780 00:35:36,438 --> 00:35:38,270 And frankly, even browsers do this too. 781 00:35:38,270 --> 00:35:40,550 Like Chrome and Internet Explorer might actually 782 00:35:40,550 --> 00:35:43,740 remember that information locally along with other information as well. 783 00:35:43,740 --> 00:35:46,060 So there's lots of layers of caching for efficiency. 784 00:35:46,060 --> 00:35:49,150 Of course, a side effect of this, if you put back on the engineering hat, 785 00:35:49,150 --> 00:35:51,691 what could go wrong if you're caching information, especially 786 00:35:51,691 --> 00:35:53,088 at multiple layers? 787 00:35:53,088 --> 00:35:54,451 AUDIENCE: [INAUDIBLE] 788 00:35:54,451 --> 00:35:57,200 DAVID J. MALAN: Yeah, it makes it really genuinely hard for Google 789 00:35:57,200 --> 00:35:58,660 to change its IP. 790 00:35:58,660 --> 00:36:01,220 Because if they change it, well, when do they change it? 791 00:36:01,220 --> 00:36:04,070 Well, if they change it right now, my laptop 792 00:36:04,070 --> 00:36:07,640 might remember the old IP address for multiple minutes, hours, days. 793 00:36:07,640 --> 00:36:10,430 It depends on how it was configured or misconfigured. 794 00:36:10,430 --> 00:36:12,020 So what does Google then do? 795 00:36:12,020 --> 00:36:14,142 Well, maybe they could do it late at night. 796 00:36:14,142 --> 00:36:15,850 Unfortunately, Google is a global company 797 00:36:15,850 --> 00:36:17,390 and there is every possible time zone. 798 00:36:17,390 --> 00:36:19,390 So just doing it at night really has no meaning. 799 00:36:19,390 --> 00:36:20,490 So that's not a solution. 800 00:36:20,490 --> 00:36:22,190 They could run two servers in parallel. 801 00:36:22,190 --> 00:36:24,000 Or at least two servers in parallel. 802 00:36:24,000 --> 00:36:25,820 One with the old IP, one with the new IP. 803 00:36:25,820 --> 00:36:26,730 That gets us by. 804 00:36:26,730 --> 00:36:29,820 But then we're getting new data on both servers which probably isn't good 805 00:36:29,820 --> 00:36:32,160 if we're trying to move to some new server with an IP. 806 00:36:32,160 --> 00:36:35,320 So this is actually a massive, massive challenge these days. 807 00:36:35,320 --> 00:36:39,530 And one of the ways that companies like Google avoid this is one, 808 00:36:39,530 --> 00:36:41,710 they only have one IP address. 809 00:36:41,710 --> 00:36:46,360 And they use it as sort of the entry point to the entire infrastructure. 810 00:36:46,360 --> 00:36:49,910 So that hopefully, Google never has to change this IP address, certainly 811 00:36:49,910 --> 00:36:50,960 not frequently. 812 00:36:50,960 --> 00:36:54,740 But they can change any number of servers that are behind that IP address 813 00:36:54,740 --> 00:36:55,284 so to speak. 814 00:36:55,284 --> 00:36:57,700 So indeed, it is the case-- and we'll talk more about this 815 00:36:57,700 --> 00:37:02,390 after lunch-- that there's a technology called load balancing, whereby 816 00:37:02,390 --> 00:37:06,060 even though my white lie earlier was that every computer on the internet 817 00:37:06,060 --> 00:37:09,240 has an IP address, that doesn't necessarily 818 00:37:09,240 --> 00:37:13,500 mean that's the IP address to whom we speak when we send data. 819 00:37:13,500 --> 00:37:17,850 There might be many other computers behind a device that has that IP 820 00:37:17,850 --> 00:37:19,580 address for purposes of balancing load. 821 00:37:19,580 --> 00:37:22,750 But we'll come back to that again when talking about cloud computing. 822 00:37:22,750 --> 00:37:24,540 But not all companies do it that way. 823 00:37:24,540 --> 00:37:28,970 If we look at Yahoo.com, Yahoo! it would seem has three IP addresses. 824 00:37:28,970 --> 00:37:30,221 This is a more obvious design. 825 00:37:30,221 --> 00:37:31,595 They have at least three servers. 826 00:37:31,595 --> 00:37:33,810 And frankly, they probably have thousands of servers. 827 00:37:33,810 --> 00:37:37,502 But these are the three ones they expose publicly in terms of IPs. 828 00:37:37,502 --> 00:37:39,290 And I just ran the same command again. 829 00:37:39,290 --> 00:37:42,120 What did you notice is different? 830 00:37:42,120 --> 00:37:43,190 The order changed. 831 00:37:43,190 --> 00:37:46,990 And again, the order is, again, different. 832 00:37:46,990 --> 00:37:50,200 So it seems-- I'm just gleaning this from running the command again 833 00:37:50,200 --> 00:37:50,700 and again. 834 00:37:50,700 --> 00:37:53,140 It's the same three IPs but they're changing order. 835 00:37:53,140 --> 00:37:57,400 It would seem that Yahoo uses round robin load balancing-- more on this 836 00:37:57,400 --> 00:38:01,250 in a bit-- whereby they give all users the same three IP addresses, 837 00:38:01,250 --> 00:38:03,750 but they change the order so that my computer by default 838 00:38:03,750 --> 00:38:05,380 uses just the first. 839 00:38:05,380 --> 00:38:08,139 And this way, they put a third of their users here, 840 00:38:08,139 --> 00:38:10,430 a third of their users here, a third of the users here. 841 00:38:10,430 --> 00:38:13,860 Just probabilistically, based on returning with 33% odds 842 00:38:13,860 --> 00:38:16,870 a different one at the top of the list each time. 843 00:38:16,870 --> 00:38:17,370 All right. 844 00:38:17,370 --> 00:38:18,500 So what actually happens? 845 00:38:18,500 --> 00:38:19,590 Well, let's do this. 846 00:38:19,590 --> 00:38:22,350 Suppose now, to make this more clear as to what's going on here, 847 00:38:22,350 --> 00:38:25,141 the internet as you may have heard is filled with pictures of cats. 848 00:38:25,141 --> 00:38:28,480 So suppose that one of you in back, let's say Sean, 849 00:38:28,480 --> 00:38:30,670 has requested a picture of a cat. 850 00:38:30,670 --> 00:38:34,460 And I am the server, imgur.com or Flickr or whatever. 851 00:38:34,460 --> 00:38:36,610 And I am going to send in this picture of a cat. 852 00:38:36,610 --> 00:38:38,010 Unfortunately, it's a pretty big picture. 853 00:38:38,010 --> 00:38:39,010 It's a couple megabytes. 854 00:38:39,010 --> 00:38:41,280 And that's not great for everyone else in the room you 855 00:38:41,280 --> 00:38:44,330 might want to be sending data to each other or to Sean or from Sean 856 00:38:44,330 --> 00:38:45,360 at the same time. 857 00:38:45,360 --> 00:38:48,720 And so it turns out that what IP does-- which doesn't just 858 00:38:48,720 --> 00:38:50,870 describe IP addresses. 859 00:38:50,870 --> 00:38:54,620 IP stands for itself, internet protocol. 860 00:38:54,620 --> 00:38:57,610 And it works in conjunction with another protocol called 861 00:38:57,610 --> 00:39:00,040 TCP, transmission control protocol. 862 00:39:00,040 --> 00:39:04,400 These are essentially conventions that govern how the data gets from point A 863 00:39:04,400 --> 00:39:05,270 to point B. 864 00:39:05,270 --> 00:39:08,390 And a way to think of a protocol is if I can come over here-- 865 00:39:08,390 --> 00:39:09,560 what's a human protocol? 866 00:39:09,560 --> 00:39:11,260 Hello, my name is David. 867 00:39:11,260 --> 00:39:11,840 [? Shavan ?]? 868 00:39:11,840 --> 00:39:13,430 So this is a protocol, right? 869 00:39:13,430 --> 00:39:16,720 Like I say hello, I extend my hand. [? Shavan ?] knew sort of intuitively 870 00:39:16,720 --> 00:39:20,340 to extend his hand, if awkwardly, to stick my hand in the middle of class. 871 00:39:20,340 --> 00:39:22,720 And then our transaction was complete. 872 00:39:22,720 --> 00:39:26,650 Similarly does IP and CCP govern just how computers speak. 873 00:39:26,650 --> 00:39:33,600 They follow an initial kind of hello, a subsequent kind of goodbye. 874 00:39:33,600 --> 00:39:36,440 And it's just sort of preprogrammed conventions that they adhere to. 875 00:39:36,440 --> 00:39:40,550 And one of the features, meanwhile, of these protocols, IP in particular, 876 00:39:40,550 --> 00:39:42,320 is the ability to fragment things. 877 00:39:42,320 --> 00:39:44,720 Because now, even though in the real world 878 00:39:44,720 --> 00:39:47,460 I've kind of ruined this picture, in the digital world 879 00:39:47,460 --> 00:39:49,332 I've just broken it up into four chunks. 880 00:39:49,332 --> 00:39:51,540 A quarter of the bits are here, a quarter of the bits 881 00:39:51,540 --> 00:39:53,400 are here, quarter here, quarter here. 882 00:39:53,400 --> 00:39:59,180 And now, I can put each of these chunks in its own virtual envelope. 883 00:39:59,180 --> 00:40:04,080 Which depending on the context you might call a packet typically, or datagram, 884 00:40:04,080 --> 00:40:07,730 or segment, which have minor differences semantically. 885 00:40:07,730 --> 00:40:10,200 But it just means that some kind of virtual envelope 886 00:40:10,200 --> 00:40:12,060 that zeros and ones go in. 887 00:40:12,060 --> 00:40:17,150 And then I have to make sure, of course, to address these. 888 00:40:17,150 --> 00:40:23,710 So I'm going to go ahead and say that Sean's IP address will be the number 1. 889 00:40:23,710 --> 00:40:24,210 1. 890 00:40:24,210 --> 00:40:27,030 891 00:40:27,030 --> 00:40:29,300 1. 892 00:40:29,300 --> 00:40:33,640 So I'm just writing To: 1, To: 1. 893 00:40:33,640 --> 00:40:36,640 So every envelope now has this on it, if Sean was the first one 894 00:40:36,640 --> 00:40:38,100 to get an IP address in the room. 895 00:40:38,100 --> 00:40:40,000 But I need a little more information in case 896 00:40:40,000 --> 00:40:44,044 he wants to acknowledge receipt of this cat-- which is what information? 897 00:40:44,044 --> 00:40:44,960 AUDIENCE: [INAUDIBLE]. 898 00:40:44,960 --> 00:40:46,567 DAVID J. MALAN: The from address. 899 00:40:46,567 --> 00:40:48,400 So I'll be the second computer in the world. 900 00:40:48,400 --> 00:40:57,830 So I'm going to put From: 2, From: 2, From: 2, and From: 2. 901 00:40:57,830 --> 00:41:02,830 And then lastly, you might have noticed this, I held this up. 902 00:41:02,830 --> 00:41:08,420 What had I also written on the envelope sort of preemptively. 903 00:41:08,420 --> 00:41:10,330 Yeah, the order, the number. 904 00:41:10,330 --> 00:41:12,300 So 1/4, 2/4, 3/4. 905 00:41:12,300 --> 00:41:14,630 Why did I do that? 906 00:41:14,630 --> 00:41:16,970 AUDIENCE: [INAUDIBLE]. 907 00:41:16,970 --> 00:41:19,580 DAVID J. MALAN: So he knows how to put them back together. 908 00:41:19,580 --> 00:41:20,650 One, two, three, four. 909 00:41:20,650 --> 00:41:23,410 And also, I had a denominator there for a reason. 910 00:41:23,410 --> 00:41:25,490 So that he knows what? 911 00:41:25,490 --> 00:41:26,300 When to stop. 912 00:41:26,300 --> 00:41:29,585 It's not just that it's an infinite string of pieces of cats like a puzzle 913 00:41:29,585 --> 00:41:31,210 where you don't know where the edge is. 914 00:41:31,210 --> 00:41:32,840 Now he knows where the edges. 915 00:41:32,840 --> 00:41:36,940 And now much like we saw with some of the data, that the data took 916 00:41:36,940 --> 00:41:39,080 variable length paths, which might have meant 917 00:41:39,080 --> 00:41:40,550 taking different routes some times. 918 00:41:40,550 --> 00:41:43,530 And much like there's multiple interconnections, all of this data 919 00:41:43,530 --> 00:41:46,654 is going to leave my hands through the same router. 920 00:41:46,654 --> 00:41:48,070 [? Shavan ?] is my default router. 921 00:41:48,070 --> 00:41:50,050 But now if you could presumptuously pass them, 922 00:41:50,050 --> 00:41:53,030 but not necessarily the same direction, the effect 923 00:41:53,030 --> 00:41:55,540 here is to have a room full of routers, each of whom 924 00:41:55,540 --> 00:41:57,890 roughly knows where Sean is. 925 00:41:57,890 --> 00:42:01,280 But is making perhaps independent decisions. 926 00:42:01,280 --> 00:42:02,872 Routing around blockages. 927 00:42:02,872 --> 00:42:04,580 If some classmate's not paying attention, 928 00:42:04,580 --> 00:42:06,163 that might be the router is congested. 929 00:42:06,163 --> 00:42:08,450 And so you pass it to someone else instead. 930 00:42:08,450 --> 00:42:11,130 Two of the packets have gotten there quite quickly. 931 00:42:11,130 --> 00:42:15,040 One is kind of stuck all over here. 932 00:42:15,040 --> 00:42:18,510 And let me if I can steal this one as though something went wrong. 933 00:42:18,510 --> 00:42:24,720 934 00:42:24,720 --> 00:42:28,675 And if, Sean, you'd like to reassemble the cat and draw some conclusion. 935 00:42:28,675 --> 00:42:41,724 936 00:42:41,724 --> 00:42:44,150 Unfortunately, there's a problem. 937 00:42:44,150 --> 00:42:48,092 I deliberately dropped or let a packet be dropped. 938 00:42:48,092 --> 00:42:48,800 And that happens. 939 00:42:48,800 --> 00:42:51,970 I mean, much like the physical this happening, routers will 940 00:42:51,970 --> 00:42:53,962 by design drop packets. 941 00:42:53,962 --> 00:42:55,420 Especially if they were overloaded. 942 00:42:55,420 --> 00:42:58,410 Now I was sort of maliciously-- I took it away from David here. 943 00:42:58,410 --> 00:43:01,117 But just as soon, I still dropped it on the floor. 944 00:43:01,117 --> 00:43:02,450 So what's your conclusion, Sean? 945 00:43:02,450 --> 00:43:03,702 What are you missing? 946 00:43:03,702 --> 00:43:06,699 AUDIENCE: I'm missing the bottom right half. [INAUDIBLE]. 947 00:43:06,699 --> 00:43:07,490 DAVID J. MALAN: OK. 948 00:43:07,490 --> 00:43:08,574 So it's corrupted or lost. 949 00:43:08,574 --> 00:43:11,698 So it was probably, if I numbered them right, four out of four or something 950 00:43:11,698 --> 00:43:12,230 like that. 951 00:43:12,230 --> 00:43:16,960 And so TCP is this protocol that works in conjunction with IP. 952 00:43:16,960 --> 00:43:21,240 Whereas IP is responsible for just fragmenting things ultimately for size 953 00:43:21,240 --> 00:43:27,862 and also addressing things, TCP, among its features, is to guarantee delivery. 954 00:43:27,862 --> 00:43:30,320 And it does it by way of something called sequence numbers, 955 00:43:30,320 --> 00:43:32,700 like my one, two, three, four, and so forth. 956 00:43:32,700 --> 00:43:34,450 It doesn't quite do it in the same way. 957 00:43:34,450 --> 00:43:37,270 But it is a numbering scheme from which Sean and people like 958 00:43:37,270 --> 00:43:40,050 him can infer what packet is missing. 959 00:43:40,050 --> 00:43:43,700 He is going to now send a response back that he was missing for. 960 00:43:43,700 --> 00:43:47,490 And TCP is a protocol that allows me to, uh-oh, let me 961 00:43:47,490 --> 00:43:49,600 go ahead and resend him-- not everything, 962 00:43:49,600 --> 00:43:52,310 because that could just lead to the same cyclical problem-- 963 00:43:52,310 --> 00:43:54,110 but let me just send him the missing piece. 964 00:43:54,110 --> 00:43:57,031 And hopefully, with high probability, it will get through this time. 965 00:43:57,031 --> 00:44:00,280 Now, even though I've lost the piece of the cat, we're talking zeros and ones. 966 00:44:00,280 --> 00:44:02,570 So at the end of the day, it's just duplicating data. 967 00:44:02,570 --> 00:44:05,790 So there's infinite supply of copies I can make. 968 00:44:05,790 --> 00:44:08,700 And so hopefully this will now get to Sean as well. 969 00:44:08,700 --> 00:44:09,820 But this seems a given. 970 00:44:09,820 --> 00:44:13,260 Like why is that a feature to guarantee delivery on the internet? 971 00:44:13,260 --> 00:44:16,850 Well, it turns out there is another protocol that's sometimes used 972 00:44:16,850 --> 00:44:21,360 called UDP, universal datagram protocol. 973 00:44:21,360 --> 00:44:25,690 Did I remember my acronym right? 974 00:44:25,690 --> 00:44:32,280 User datagram protocol, which does not guarantee delivery. 975 00:44:32,280 --> 00:44:36,490 Why in the world might you want to use IP and UDP 976 00:44:36,490 --> 00:44:41,930 when implementing some application for your business or for fun? 977 00:44:41,930 --> 00:44:47,127 As opposed to TCP IP, which is the more commonly paired. 978 00:44:47,127 --> 00:44:48,260 AUDIENCE: [INAUDIBLE]. 979 00:44:48,260 --> 00:44:49,456 DAVID J. MALAN: Speed, why? 980 00:44:49,456 --> 00:44:50,372 AUDIENCE: [INAUDIBLE]. 981 00:44:50,372 --> 00:44:52,576 982 00:44:52,576 --> 00:44:53,450 DAVID J. MALAN: Good. 983 00:44:53,450 --> 00:44:54,800 It's one fewer decision to make. 984 00:44:54,800 --> 00:44:57,690 So if Sean doesn't even have to think about what's missing, great. 985 00:44:57,690 --> 00:44:58,560 Less work. 986 00:44:58,560 --> 00:44:59,930 Less communicating with me. 987 00:44:59,930 --> 00:45:01,380 Less resending by me. 988 00:45:01,380 --> 00:45:02,760 That surely must take less time. 989 00:45:02,760 --> 00:45:04,410 Good. 990 00:45:04,410 --> 00:45:05,010 That-- 991 00:45:05,010 --> 00:45:05,860 AUDIENCE: Anonymity? 992 00:45:05,860 --> 00:45:06,733 DAVID J. MALAN: What's that? 993 00:45:06,733 --> 00:45:07,520 AUDIENCE: Anonymity? 994 00:45:07,520 --> 00:45:08,603 DAVID J. MALAN: Anonymity. 995 00:45:08,603 --> 00:45:10,740 Oh, so that's an interesting one. 996 00:45:10,740 --> 00:45:17,360 A key ingredient or assumption of Sean's ability to re-request packets 997 00:45:17,360 --> 00:45:20,290 is that he has to know from where it came. 998 00:45:20,290 --> 00:45:22,100 Now in this case, it's not quite applicable 999 00:45:22,100 --> 00:45:24,420 because he presumably asked me for the CAD. 1000 00:45:24,420 --> 00:45:26,810 So we already showed his hand as to who he is. 1001 00:45:26,810 --> 00:45:28,600 But that would be true in other scenarios 1002 00:45:28,600 --> 00:45:32,270 where you might want to send information, maybe maliciously, 1003 00:45:32,270 --> 00:45:34,860 like spam or whatever without it being traced back to you. 1004 00:45:34,860 --> 00:45:36,320 You don't want to resend. 1005 00:45:36,320 --> 00:45:39,446 If the recipient doesn't get your spam, eh, so be it. 1006 00:45:39,446 --> 00:45:40,321 AUDIENCE: [INAUDIBLE] 1007 00:45:40,321 --> 00:45:42,774 1008 00:45:42,774 --> 00:45:43,940 DAVID J. MALAN: It could be. 1009 00:45:43,940 --> 00:45:46,100 But any of these could be peer to peer. 1010 00:45:46,100 --> 00:45:49,010 Just depends on how you use them. 1011 00:45:49,010 --> 00:45:49,790 Yeah, Sean? 1012 00:45:49,790 --> 00:45:51,770 AUDIENCE: [INAUDIBLE] 1013 00:45:51,770 --> 00:45:55,180 DAVID J. MALAN: The sender doesn't-- and why might I not want a response? 1014 00:45:55,180 --> 00:45:57,390 AUDIENCE: Because it's [INAUDIBLE] so many devices 1015 00:45:57,390 --> 00:46:00,869 out there, and [INAUDIBLE]. 1016 00:46:00,869 --> 00:46:01,910 DAVID J. MALAN: OK, good. 1017 00:46:01,910 --> 00:46:06,520 I mean if Sean had to acknowledge each of the packets, that seems expensive. 1018 00:46:06,520 --> 00:46:08,700 It's doubling the amount of traffic coming to me, 1019 00:46:08,700 --> 00:46:10,140 so that might be undesirable. 1020 00:46:10,140 --> 00:46:12,520 And what kinds of applications would actually 1021 00:46:12,520 --> 00:46:18,570 be a feature not to waste time retransmitting data if it's lost? 1022 00:46:18,570 --> 00:46:22,352 Instead just blowing ahead and forgetting about it. 1023 00:46:22,352 --> 00:46:23,560 AUDIENCE: Video transmission? 1024 00:46:23,560 --> 00:46:25,052 DAVID J. MALAN: Video, why video. 1025 00:46:25,052 --> 00:46:28,929 AUDIENCE: Like if you're Netflix gives out [INAUDIBLE] 1026 00:46:28,929 --> 00:46:30,470 back and watch what's already passed. 1027 00:46:30,470 --> 00:46:31,450 You just want to pick up and go. 1028 00:46:31,450 --> 00:46:32,440 DAVID J. MALAN: OK, good. 1029 00:46:32,440 --> 00:46:33,939 Although I would push back slightly. 1030 00:46:33,939 --> 00:46:36,490 I feel like Netflix users would be annoyed if they're just 1031 00:46:36,490 --> 00:46:39,150 skipping parts of the show or movie. 1032 00:46:39,150 --> 00:46:42,730 But I can-- I would propose tweaking your answer 1033 00:46:42,730 --> 00:46:45,268 to be a specialized use of video. 1034 00:46:45,268 --> 00:46:49,172 AUDIENCE: Yeah, [INAUDIBLE]-- I was going to say, if it gets pixel-y, 1035 00:46:49,172 --> 00:46:50,640 I don't want to rewatch them. 1036 00:46:50,640 --> 00:46:52,626 DAVID J. MALAN: OK, that's fair. 1037 00:46:52,626 --> 00:46:54,000 AUDIENCE: I just want to move on. 1038 00:46:54,000 --> 00:46:55,010 DAVID J. MALAN: OK, so that's fair. 1039 00:46:55,010 --> 00:46:57,009 But that would be, I think, a different feature. 1040 00:46:57,009 --> 00:46:58,810 More quality of service. 1041 00:46:58,810 --> 00:47:02,150 Netflix and other companies, YouTube have decided that users probably 1042 00:47:02,150 --> 00:47:04,560 would rather the screen suddenly get pixillated, 1043 00:47:04,560 --> 00:47:07,821 but the audio still go through and the video still be discernible. 1044 00:47:07,821 --> 00:47:09,570 Even if it's not as good of an experience. 1045 00:47:09,570 --> 00:47:13,004 As opposed to buffering, buffering, which is annoying. 1046 00:47:13,004 --> 00:47:14,920 But when might you-- but it would be annoying, 1047 00:47:14,920 --> 00:47:16,880 I think, if all of a sudden at the end of the reveal. 1048 00:47:16,880 --> 00:47:18,671 Like Sherlock is about to discover the case 1049 00:47:18,671 --> 00:47:21,590 and you just skip it because you lost those bits. 1050 00:47:21,590 --> 00:47:22,689 Minorly annoying. 1051 00:47:22,689 --> 00:47:23,980 But when is that less annoying? 1052 00:47:23,980 --> 00:47:25,320 What types of video? 1053 00:47:25,320 --> 00:47:26,160 AUDIENCE: Live stream [INAUDIBLE]? 1054 00:47:26,160 --> 00:47:26,890 DAVID J. MALAN: Live. 1055 00:47:26,890 --> 00:47:27,430 Yeah. 1056 00:47:27,430 --> 00:47:30,820 So like baseball games, sporting events, concerts. 1057 00:47:30,820 --> 00:47:34,340 Anything where the user, just by nature of the experience, 1058 00:47:34,340 --> 00:47:37,390 would probably prefer to be in real time, 1059 00:47:37,390 --> 00:47:41,891 even if it means an inferior experience than having a great experience that's 1060 00:47:41,891 --> 00:47:42,390 buffered. 1061 00:47:42,390 --> 00:47:45,050 Especially if you're-- it's a little awkward if you and your friends are 1062 00:47:45,050 --> 00:47:47,966 sort of rooting for your team to win when your team won 10 minutes ago 1063 00:47:47,966 --> 00:47:49,310 because of buffering. 1064 00:47:49,310 --> 00:47:51,120 It kind of takes you out of reality. 1065 00:47:51,120 --> 00:47:52,960 So there might be a conscious decisions. 1066 00:47:52,960 --> 00:47:56,600 And not even just video-- not video for like sports and events. 1067 00:47:56,600 --> 00:47:58,710 But what about video conferencing? 1068 00:47:58,710 --> 00:47:59,210 Right? 1069 00:47:59,210 --> 00:48:01,870 Then it's really problematic if you're talking to someone, 1070 00:48:01,870 --> 00:48:04,430 but their remarks were uttered seconds or minutes ago. 1071 00:48:04,430 --> 00:48:07,940 Really you want to just kind of blow through it and rely on the humans 1072 00:48:07,940 --> 00:48:11,400 to retransmit their own voice again, as opposed to resending 1073 00:48:11,400 --> 00:48:14,810 the bits that might have gotten lost. 1074 00:48:14,810 --> 00:48:15,310 All right. 1075 00:48:15,310 --> 00:48:17,300 So if we have these ingredients, where the heck 1076 00:48:17,300 --> 00:48:18,940 did all this information come from? 1077 00:48:18,940 --> 00:48:22,090 Like all of you this morning, if you'd never used Harvard's network before, 1078 00:48:22,090 --> 00:48:23,120 opened your laptop. 1079 00:48:23,120 --> 00:48:26,490 Went to Harvard University or Harvard guest or whatnot, typed in a password. 1080 00:48:26,490 --> 00:48:31,140 But none of you typed in a DNS address, none of you typed in a router address, 1081 00:48:31,140 --> 00:48:33,110 none of you certainly typed in a subnet mask 1082 00:48:33,110 --> 00:48:35,151 or any of the things we've been assuming we have. 1083 00:48:35,151 --> 00:48:36,430 So where does that come from? 1084 00:48:36,430 --> 00:48:38,650 Well it turns out there is one other protocol that's 1085 00:48:38,650 --> 00:48:42,250 super popular and helpful these days called DHCP. 1086 00:48:42,250 --> 00:48:44,420 And you might have glimpsed this on my screen. 1087 00:48:44,420 --> 00:48:47,570 How is my Mac configured for IPv4? 1088 00:48:47,570 --> 00:48:49,500 Apparently using DHCP. 1089 00:48:49,500 --> 00:48:53,140 And this just means it's a protocol, dynamic host configuration protocol, 1090 00:48:53,140 --> 00:48:56,060 that all Macs and PCs speak these days. 1091 00:48:56,060 --> 00:48:59,275 And its purpose in life is to automatically configure 1092 00:48:59,275 --> 00:49:00,360 your Mac and PC. 1093 00:49:00,360 --> 00:49:02,860 Some of you might have had internet service years ago 1094 00:49:02,860 --> 00:49:04,957 where the technician would come out. 1095 00:49:04,957 --> 00:49:06,540 He or she would have a sheet of paper. 1096 00:49:06,540 --> 00:49:08,640 And he or she or you would have to manually type 1097 00:49:08,640 --> 00:49:11,194 in your IP address, your router address, your subnet mask. 1098 00:49:11,194 --> 00:49:13,360 Even if you don't remember it all these years later. 1099 00:49:13,360 --> 00:49:16,180 That was before there was DHCP. 1100 00:49:16,180 --> 00:49:19,000 Or at least supported by the ISP and your computer. 1101 00:49:19,000 --> 00:49:23,219 But with DHCP, you open up your laptop, you choose the Wi-Fi network. 1102 00:49:23,219 --> 00:49:25,260 And your computer essentially says the equivalent 1103 00:49:25,260 --> 00:49:28,670 of hello, what IP address should I use, what DNS server 1104 00:49:28,670 --> 00:49:30,340 should I use, what router should I use? 1105 00:49:30,340 --> 00:49:34,110 And Harvard, somewhere on campus, has a DHCP server that 1106 00:49:34,110 --> 00:49:36,102 just responds with that information. 1107 00:49:36,102 --> 00:49:37,310 How does it know what to use? 1108 00:49:37,310 --> 00:49:42,690 Well, it turns out that Macs and PCs have, confusingly, a MAC address. 1109 00:49:42,690 --> 00:49:46,160 Which does not mean Macintosh, it means media access control. 1110 00:49:46,160 --> 00:49:52,390 Which is this hexadecimal address, where hexadecimal is a 16-digit alphabet. 1111 00:49:52,390 --> 00:49:54,110 Binary is two, 0 and 1. 1112 00:49:54,110 --> 00:49:55,790 Decimal is 10, 0-9. 1113 00:49:55,790 --> 00:50:01,826 Hexadecimal is 0-F, where you count 0, 1, 2, 3, 4, 5, 6, 7, 8, 9-- 1114 00:50:01,826 --> 00:50:03,450 can't say 10 because that's two digits. 1115 00:50:03,450 --> 00:50:09,320 So 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f is the convention. 1116 00:50:09,320 --> 00:50:10,390 So this is hexadecimal. 1117 00:50:10,390 --> 00:50:14,310 But that's just a really big number expressed with some letters and digits. 1118 00:50:14,310 --> 00:50:18,170 That is how Harvard knows that you are you. 1119 00:50:18,170 --> 00:50:20,970 So even if you're cleverly, like while here for the weekend 1120 00:50:20,970 --> 00:50:25,022 or here on campus in general, sort of using incognito mode or private mode 1121 00:50:25,022 --> 00:50:26,730 in your browser, thinking, ooh, I'm being 1122 00:50:26,730 --> 00:50:30,910 all incognito, like Harvard knows who you are the entirety of the time. 1123 00:50:30,910 --> 00:50:33,840 Because if you registered your computer on Harvard's network, one 1124 00:50:33,840 --> 00:50:36,840 of the pieces of information they glean, besides your name and email 1125 00:50:36,840 --> 00:50:40,670 or password or whatever, is your computer's MAC address. 1126 00:50:40,670 --> 00:50:44,050 Otherwise known as a hardware address or ethernet address. 1127 00:50:44,050 --> 00:50:46,230 That is ultimately tied to you. 1128 00:50:46,230 --> 00:50:50,700 So there really is no anonymity on a place like campus or in a business. 1129 00:50:50,700 --> 00:50:53,130 Because even that lowest level detail is there. 1130 00:50:53,130 --> 00:50:55,380 And even though we haven't discussed the MAC address, 1131 00:50:55,380 --> 00:50:57,840 it turns out we've been over-simplifying. 1132 00:50:57,840 --> 00:51:01,510 Inside of each of these envelopes, much like those old school Russian dolls, 1133 00:51:01,510 --> 00:51:02,840 there isn't just a cat. 1134 00:51:02,840 --> 00:51:04,680 The cat is actually inside this envelope. 1135 00:51:04,680 --> 00:51:06,900 And this envelope is inside this envelope. 1136 00:51:06,900 --> 00:51:10,260 And each of those envelopes has slightly different information. 1137 00:51:10,260 --> 00:51:14,410 The outermost one might have the IP address, to and from. 1138 00:51:14,410 --> 00:51:16,810 But the innermost one might have a MAC address, 1139 00:51:16,810 --> 00:51:21,290 depending on where it is in the story. 1140 00:51:21,290 --> 00:51:23,090 Yeah, Avi. 1141 00:51:23,090 --> 00:51:25,180 AUDIENCE: [INAUDIBLE]? 1142 00:51:25,180 --> 00:51:27,940 DAVID J. MALAN: An IP address is used to get your data 1143 00:51:27,940 --> 00:51:31,900 from one point to another on a WAN, wide area network, 1144 00:51:31,900 --> 00:51:34,110 or for all intents and purposes, the internet. 1145 00:51:34,110 --> 00:51:37,970 A MAC address is used to route your data on a local network. 1146 00:51:37,970 --> 00:51:42,470 So in this room or in this building, behind a router. 1147 00:51:42,470 --> 00:51:45,130 As soon as you involve a router, you need IP. 1148 00:51:45,130 --> 00:51:48,290 If you don't have a router or need a router, you can rely on a MAC address. 1149 00:51:48,290 --> 00:51:50,560 And so when I alluded to subnet mask earlier, 1150 00:51:50,560 --> 00:51:55,030 it's the subnet mask that decides should I address this packet using 1151 00:51:55,030 --> 00:51:57,310 the recipient's MAC address? 1152 00:51:57,310 --> 00:51:59,160 Or the router's MAC address? 1153 00:51:59,160 --> 00:52:00,310 Inside or outside? 1154 00:52:00,310 --> 00:52:02,320 Internal or external? 1155 00:52:02,320 --> 00:52:02,904 AUDIENCE: DNS? 1156 00:52:02,904 --> 00:52:03,903 DAVID J. MALAN: Felicia. 1157 00:52:03,903 --> 00:52:04,403 I'm sorry? 1158 00:52:04,403 --> 00:52:04,986 AUDIENCE: DNS? 1159 00:52:04,986 --> 00:52:06,750 DAVID J. MALAN: DNS is domain name system. 1160 00:52:06,750 --> 00:52:12,190 That's the device that translates IP addresses to host names and back. 1161 00:52:12,190 --> 00:52:15,760 So that is why I, the human, can type Google.com and hit Enter. 1162 00:52:15,760 --> 00:52:20,139 My computer is going to quickly and transparently ask the local DNS server, 1163 00:52:20,139 --> 00:52:21,680 what's the IP address for Google.com? 1164 00:52:21,680 --> 00:52:22,400 OK. 1165 00:52:22,400 --> 00:52:24,770 Let me write that on the envelope instead of the words. 1166 00:52:24,770 --> 00:52:27,960 Because routers understand IP addresses, not domain names. 1167 00:52:27,960 --> 00:52:28,870 David? 1168 00:52:28,870 --> 00:52:42,362 AUDIENCE: [INAUDIBLE] to your MAC address? 1169 00:52:42,362 --> 00:52:43,820 DAVID J. MALAN: They would not, no. 1170 00:52:43,820 --> 00:52:47,040 So the outside world does not see-- let me 1171 00:52:47,040 --> 00:52:49,840 make sure I'm not misspeaking-- they do not see MAC addresses. 1172 00:52:49,840 --> 00:52:51,790 Because the Mac address gets rewritten. 1173 00:52:51,790 --> 00:52:56,160 With every hop, it would be used from router to router, for instance. 1174 00:52:56,160 --> 00:52:59,280 But the recipient would not see it. 1175 00:52:59,280 --> 00:53:03,640 And so I misspoke when I said-- when I was describing the envelope 1176 00:53:03,640 --> 00:53:06,560 inside of an envelope, it isn't necessarily your MAC address 1177 00:53:06,560 --> 00:53:07,870 that's in there the whole time. 1178 00:53:07,870 --> 00:53:10,300 It's actually the MAC address of the next hop. 1179 00:53:10,300 --> 00:53:12,200 The next router, router, router. 1180 00:53:12,200 --> 00:53:13,700 Yes. 1181 00:53:13,700 --> 00:53:32,390 AUDIENCE: [INAUDIBLE] right? [INAUDIBLE] somewhere [INAUDIBLE] [INAUDIBLE] 1182 00:53:32,390 --> 00:53:35,642 probably they are doing it because [INAUDIBLE] anonymous, right? 1183 00:53:35,642 --> 00:53:36,850 You don't have to [INAUDIBLE] 1184 00:53:36,850 --> 00:53:41,340 1185 00:53:41,340 --> 00:53:45,970 DAVID J. MALAN: It's a good q-- the MAC address typically wouldn't-- you 1186 00:53:45,970 --> 00:53:48,400 wouldn't get caught by way of your MAC address. 1187 00:53:48,400 --> 00:53:53,490 Unless the police or the Harvard people, the security people, 1188 00:53:53,490 --> 00:53:57,900 were monitoring the local network looking for your laptop to appear. 1189 00:53:57,900 --> 00:54:05,246 So for instance, let's see, if the FBI or whoever, NSA, were-- and let's 1190 00:54:05,246 --> 00:54:07,120 just suppose they are these days-- monitoring 1191 00:54:07,120 --> 00:54:10,680 all the traffic in this room, they would see inside 1192 00:54:10,680 --> 00:54:13,680 of my virtual envelopes, my MAC address. 1193 00:54:13,680 --> 00:54:17,710 And so if they know from Harvard that David has this MAC address, 1194 00:54:17,710 --> 00:54:21,195 then the whole world-- then rather-- we-- actually it occurs to me, 1195 00:54:21,195 --> 00:54:24,070 this is David's MAC address apparently I'm showing on the whole world 1196 00:54:24,070 --> 00:54:29,410 here-- though it doesn't matter for the same reasons we just discussed-- 1197 00:54:29,410 --> 00:54:31,950 you could identify a user by their MAC address 1198 00:54:31,950 --> 00:54:33,920 if you knew who owned it in advance. 1199 00:54:33,920 --> 00:54:36,460 Harvard is figuring out your MAC address because when 1200 00:54:36,460 --> 00:54:40,940 you register it, when you first logged in, the local network was detecting it. 1201 00:54:40,940 --> 00:54:43,220 And associating that MAC address with your identity. 1202 00:54:43,220 --> 00:54:52,532 AUDIENCE: [INAUDIBLE] So how do they do it? [INAUDIBLE]. 1203 00:54:52,532 --> 00:54:54,907 DAVID J. MALAN: How do they get caught or not get caught? 1204 00:54:54,907 --> 00:54:58,390 AUDIENCE: How do they figure out who is sending [? email? ?] 1205 00:54:58,390 --> 00:55:03,730 DAVID J. MALAN: You know, so you have to make a mistake along the way. 1206 00:55:03,730 --> 00:55:07,170 Or you have to somehow associate yourself with the addresses 1207 00:55:07,170 --> 00:55:08,710 that the watchers are seeing. 1208 00:55:08,710 --> 00:55:12,510 So your computer has at least two unique addresses. 1209 00:55:12,510 --> 00:55:14,530 Your IP address, which is publicly routable, 1210 00:55:14,530 --> 00:55:17,460 and your MAC address, which is locally routable. 1211 00:55:17,460 --> 00:55:20,660 The moment any of you registered your computer on the Wi-Fi network 1212 00:55:20,660 --> 00:55:23,390 earlier today or earlier this year, you forever 1213 00:55:23,390 --> 00:55:27,940 associated your identity with that MAC address or that IP address. 1214 00:55:27,940 --> 00:55:28,990 Now, Harvard knows that. 1215 00:55:28,990 --> 00:55:31,750 So Harvard could be subpoenaed for that information. 1216 00:55:31,750 --> 00:55:36,260 Even if you're a bad guy who's just bought a disposable iPad, 1217 00:55:36,260 --> 00:55:38,300 the way to avoid this is you somehow have 1218 00:55:38,300 --> 00:55:40,690 to change your MAC address, which is possible, 1219 00:55:40,690 --> 00:55:43,950 or spoof your IP address to match that of someone else who is already 1220 00:55:43,950 --> 00:55:44,750 registered. 1221 00:55:44,750 --> 00:55:47,370 That is absolutely already possible. 1222 00:55:47,370 --> 00:55:50,440 But barring that, even if you do that, all the bad guy has to do 1223 00:55:50,440 --> 00:55:51,510 is mess up just once. 1224 00:55:51,510 --> 00:55:55,140 If he or she visit some website that is not encrypted-- 1225 00:55:55,140 --> 00:55:56,850 that is the information's not scrambled-- 1226 00:55:56,850 --> 00:55:59,300 and they just log into their email account or some service 1227 00:55:59,300 --> 00:56:03,120 once, if the NSA has been logging all of that traffic, 1228 00:56:03,120 --> 00:56:05,920 they can just go back through weeks or years worth of data. 1229 00:56:05,920 --> 00:56:09,650 And all you have to do is to have screwed up once in order for them 1230 00:56:09,650 --> 00:56:13,696 to then rewind history and say oh, if this was David at this point in time, 1231 00:56:13,696 --> 00:56:15,570 it must have been David with high probability 1232 00:56:15,570 --> 00:56:17,062 in all previous points in time. 1233 00:56:17,062 --> 00:56:20,270 And that's the danger of something like what the NSA was doing for some time. 1234 00:56:20,270 --> 00:56:21,790 They're not just looking at moments in time. 1235 00:56:21,790 --> 00:56:24,580 If they're storing information for a ridiculous amount of time, 1236 00:56:24,580 --> 00:56:27,920 all you have to do is reveal yourself once. 1237 00:56:27,920 --> 00:56:30,130 And you can reconstruct that entire history. 1238 00:56:30,130 --> 00:56:33,430 And that's what's especially frightening about the storage of so much data. 1239 00:56:33,430 --> 00:56:36,060 1240 00:56:36,060 --> 00:56:37,850 So in short, it's really hard. 1241 00:56:37,850 --> 00:56:39,420 You have to be so careful. 1242 00:56:39,420 --> 00:56:42,400 And not just-- and never screw up really. 1243 00:56:42,400 --> 00:56:43,959 Or never screw up and be noticed. 1244 00:56:43,959 --> 00:56:46,000 And in fact, there's-- I forget what the case is. 1245 00:56:46,000 --> 00:56:47,580 I just read the other night and might pull it up 1246 00:56:47,580 --> 00:56:49,164 for tomorrow's discussion of security. 1247 00:56:49,164 --> 00:56:51,913 There was a case where the bad guys thought they were being clever 1248 00:56:51,913 --> 00:56:53,230 by not actually sending emails. 1249 00:56:53,230 --> 00:56:55,710 They were logging into some shared email account. 1250 00:56:55,710 --> 00:56:58,962 They would compose a draft email, not send it. 1251 00:56:58,962 --> 00:57:01,420 But then the other bad guy would log into the same account, 1252 00:57:01,420 --> 00:57:03,019 look at the drafts email. 1253 00:57:03,019 --> 00:57:06,060 Because they were thinking presumably it's not going out on the internet. 1254 00:57:06,060 --> 00:57:10,490 Which did narrow the scope of the threat to their malicious behavior. 1255 00:57:10,490 --> 00:57:13,990 But even then, the server, whoever it was, Yahoo or Facebook or whoever, 1256 00:57:13,990 --> 00:57:17,200 was surely logging who was accessing that shared account. 1257 00:57:17,200 --> 00:57:21,890 So with corporate cooperation can you reveal what's going on as well. 1258 00:57:21,890 --> 00:57:29,103 AUDIENCE: So maybe my next question is so where [INAUDIBLE] 1259 00:57:29,103 --> 00:57:39,470 And how do they control [INAUDIBLE] So where-- how do they control that? 1260 00:57:39,470 --> 00:57:41,845 And my last question is [INAUDIBLE]. 1261 00:57:41,845 --> 00:57:46,595 But it would be [INAUDIBLE] on the internet. 1262 00:57:46,595 --> 00:57:47,150 Right? 1263 00:57:47,150 --> 00:57:51,378 So if you've changed IP addresses to your numbers, how is that being done? 1264 00:57:51,378 --> 00:57:52,634 Is there-- 1265 00:57:52,634 --> 00:57:55,300 DAVID J. MALAN: There is an Internet Assigned Numbers Authority, 1266 00:57:55,300 --> 00:57:58,740 which is a nonprofit entity that's responsible for allocating 1267 00:57:58,740 --> 00:58:00,460 IP addresses throughout the world. 1268 00:58:00,460 --> 00:58:04,560 They typically sell or rent IP addresses to bigger fish like internet service 1269 00:58:04,560 --> 00:58:05,060 providers. 1270 00:58:05,060 --> 00:58:08,170 Who in turn rent them effectively to little people like us. 1271 00:58:08,170 --> 00:58:11,704 Our home or our smaller business or smaller school or the like. 1272 00:58:11,704 --> 00:58:13,870 So there's a hierarchical system for allocating them 1273 00:58:13,870 --> 00:58:17,590 in a way that ensures that the same IP address is not 1274 00:58:17,590 --> 00:58:19,020 rented to multiple people. 1275 00:58:19,020 --> 00:58:21,320 In terms of where you sniff traffic, you can certainly 1276 00:58:21,320 --> 00:58:22,780 sniff it in any of these points. 1277 00:58:22,780 --> 00:58:24,905 And that's what's so frightening about sending data 1278 00:58:24,905 --> 00:58:27,910 on the internet from point A to point B, were B is who knows where. 1279 00:58:27,910 --> 00:58:31,830 Because anyone with physical access to any of these wires or any of these 1280 00:58:31,830 --> 00:58:34,780 physical machines could absolutely be snooping on all of our data. 1281 00:58:34,780 --> 00:58:38,060 Our best defense is disinterest. 1282 00:58:38,060 --> 00:58:41,252 If no one really cares what we're doing, that sort of our best protection. 1283 00:58:41,252 --> 00:58:44,210 But if you do have a threat or someone is just fishing for information, 1284 00:58:44,210 --> 00:58:47,570 whether it's a government or a company or a hacker or the like, 1285 00:58:47,570 --> 00:58:50,800 they can look at all of the unencrypted traffic inside of this network. 1286 00:58:50,800 --> 00:58:53,240 Which is a lot of unencrypted traffic today. 1287 00:58:53,240 --> 00:58:56,370 In terms of companies or countries, they would typically, 1288 00:58:56,370 --> 00:58:59,980 especially in certain Asian countries or Middle Eastern countries where 1289 00:58:59,980 --> 00:59:03,350 there are very tight restrictions these days on internet connectivity, 1290 00:59:03,350 --> 00:59:07,240 they will generally have relatively few-- or in the extreme case, only one 1291 00:59:07,240 --> 00:59:12,130 router that routes data from inside the country to outside the country. 1292 00:59:12,130 --> 00:59:14,750 And so the great firewall of China or restrictions 1293 00:59:14,750 --> 00:59:16,520 in Pakistan or other countries might just 1294 00:59:16,520 --> 00:59:21,520 have relatively few devices that are imposing those firewall rules, 1295 00:59:21,520 --> 00:59:24,748 preventing Facebook traffic from coming in or going out, for instance. 1296 00:59:24,748 --> 00:59:31,824 AUDIENCE: [INAUDIBLE] So how do they control them? [INAUDIBLE] 1297 00:59:31,824 --> 00:59:32,990 DAVID J. MALAN: It's harder. 1298 00:59:32,990 --> 00:59:36,570 I mean, if you are running-- well, if you've 1299 00:59:36,570 --> 00:59:40,690 built your own Facebook knockoff, you have one or more IP addresses 1300 00:59:40,690 --> 00:59:41,610 associated with it. 1301 00:59:41,610 --> 00:59:44,880 So even if you're running those within the boundaries of the country, 1302 00:59:44,880 --> 00:59:47,180 if you only have a finite number of IP addresses, 1303 00:59:47,180 --> 00:59:51,100 the country could, if they control DNS, simply 1304 00:59:51,100 --> 00:59:53,950 prevent resolution for like ourFacebook.com 1305 00:59:53,950 --> 00:59:58,730 from actually being converted to IP addresses. 1306 00:59:58,730 --> 01:00:01,940 So there are sort of choke points that you could exercise control over. 1307 01:00:01,940 --> 01:00:05,322 Whether it's at the packet level or at the system level like this. 1308 01:00:05,322 --> 01:00:08,280 In fact, there was a mistake at one point where some country or company 1309 01:00:08,280 --> 01:00:10,196 accidentally brought down much of the internet 1310 01:00:10,196 --> 01:00:12,210 abroad for a brief amount of time. 1311 01:00:12,210 --> 01:00:14,380 Just because of a misconfiguration of DNS. 1312 01:00:14,380 --> 01:00:16,340 And because so many people are reliant on DNS. 1313 01:00:16,340 --> 01:00:18,470 If this starts returning bogus information or just 1314 01:00:18,470 --> 01:00:20,980 incorrect information, it has this cascading effect 1315 01:00:20,980 --> 01:00:23,274 of breaking most anything. 1316 01:00:23,274 --> 01:00:24,690 So there's lots of different ways. 1317 01:00:24,690 --> 01:00:27,770 I mean, mostly what's happening in recent years and months 1318 01:00:27,770 --> 01:00:30,600 with all these revelations is people are just realizing how 1319 01:00:30,600 --> 01:00:33,507 insecure the network has always been. 1320 01:00:33,507 --> 01:00:34,590 These are not new threats. 1321 01:00:34,590 --> 01:00:37,920 These are just threats that are being more publicized. 1322 01:00:37,920 --> 01:00:40,910 But lots more scary stories tomorrow, as well. 1323 01:00:40,910 --> 01:00:41,510 All right. 1324 01:00:41,510 --> 01:00:43,510 So where did this come from? 1325 01:00:43,510 --> 01:00:45,490 DHCP. 1326 01:00:45,490 --> 01:00:47,390 So dynamic host configuration protocol. 1327 01:00:47,390 --> 01:00:51,550 So that's just something that our Mac or PC is pre-configured to know about. 1328 01:00:51,550 --> 01:00:56,050 So what is that actually let us do ultimately? 1329 01:00:56,050 --> 01:00:58,650 Is actually use the internet without actually having 1330 01:00:58,650 --> 01:01:01,530 to manually configure our machine. 1331 01:01:01,530 --> 01:01:02,200 All right. 1332 01:01:02,200 --> 01:01:06,600 So let's toss a couple of more items into alphabet soup. 1333 01:01:06,600 --> 01:01:08,680 Which is actually germane to exactly that chat. 1334 01:01:08,680 --> 01:01:11,096 And some of you might use this at work, even if you're not 1335 01:01:11,096 --> 01:01:13,450 quite sure what it's doing for you. 1336 01:01:13,450 --> 01:01:16,010 Who here uses a VPN for work? 1337 01:01:16,010 --> 01:01:16,510 OK. 1338 01:01:16,510 --> 01:01:19,400 So about a quarter of the folks here. 1339 01:01:19,400 --> 01:01:20,440 Why do you use it? 1340 01:01:20,440 --> 01:01:24,310 Or what does it do for you? 1341 01:01:24,310 --> 01:01:25,526 Grace? 1342 01:01:25,526 --> 01:01:28,222 AUDIENCE: Just to access any of our secure data 1343 01:01:28,222 --> 01:01:31,576 or certain websites or tools [INAUDIBLE] VPN. 1344 01:01:31,576 --> 01:01:34,700 DAVID J. MALAN: OK, to access certain websites or tools within the company. 1345 01:01:34,700 --> 01:01:35,315 [? Avi ?]? 1346 01:01:35,315 --> 01:01:36,190 AUDIENCE: [INAUDIBLE] 1347 01:01:36,190 --> 01:01:39,782 1348 01:01:39,782 --> 01:01:41,990 DAVID J. MALAN: To remotely access the local network, 1349 01:01:41,990 --> 01:01:43,460 back home or at your company. 1350 01:01:43,460 --> 01:01:46,347 Anyone else have disparate use cases? 1351 01:01:46,347 --> 01:01:47,680 This is generally the principle. 1352 01:01:47,680 --> 01:01:50,130 VPN is virtual private network. 1353 01:01:50,130 --> 01:01:53,090 And it allows you to by running some software, 1354 01:01:53,090 --> 01:01:56,880 usually logging in with the username and password that maybe get preconfigured, 1355 01:01:56,880 --> 01:02:00,340 to create the illusion that your computer is not on Harvard's network, 1356 01:02:00,340 --> 01:02:02,650 for instance, but on your own company's network. 1357 01:02:02,650 --> 01:02:03,150 And 1358 01:02:03,150 --> 01:02:05,860 So you will suddenly have an IP address that 1359 01:02:05,860 --> 01:02:08,650 appears to be not only-- you have two IP addresses. 1360 01:02:08,650 --> 01:02:12,000 One that's at Harvard, and often one is that your company 1361 01:02:12,000 --> 01:02:14,990 or at your home, wherever the end point is for that VPN. 1362 01:02:14,990 --> 01:02:17,860 The upside of that is that if your company or your home 1363 01:02:17,860 --> 01:02:21,700 or your campus' system administrators have decided this financial software 1364 01:02:21,700 --> 01:02:24,210 or whatever is just too sensitive to be on the internet, 1365 01:02:24,210 --> 01:02:27,160 we want people to be physically or virtually on our network, 1366 01:02:27,160 --> 01:02:30,590 they can restrict access to that piece of software or website 1367 01:02:30,590 --> 01:02:34,940 or whatever to only those people who are on the network physically, 1368 01:02:34,940 --> 01:02:38,380 or virtually, as via VPN in the latter scenario. 1369 01:02:38,380 --> 01:02:42,560 The upside of this is that you have a secure connection encrypted. 1370 01:02:42,560 --> 01:02:44,920 All of the traffic to in Grace's laptop, wherever 1371 01:02:44,920 --> 01:02:47,260 she is in the world and her company, are encrypted. 1372 01:02:47,260 --> 01:02:50,990 So that even if a bad guy sees zeros and ones flying by, 1373 01:02:50,990 --> 01:02:52,210 they're seemingly random. 1374 01:02:52,210 --> 01:02:56,400 And they're not information that they could glean much detail from. 1375 01:02:56,400 --> 01:02:59,750 The VPN has another application. 1376 01:02:59,750 --> 01:03:04,680 Why do some people abroad in the countries like we've been describing 1377 01:03:04,680 --> 01:03:06,330 or the scenarios use VPN? 1378 01:03:06,330 --> 01:03:08,149 What problem might it also solve? 1379 01:03:08,149 --> 01:03:10,482 AUDIENCE: For instance, China where Facebook is blocked, 1380 01:03:10,482 --> 01:03:12,035 people use VPN to access Facebook. 1381 01:03:12,035 --> 01:03:12,910 DAVID J. MALAN: Yeah. 1382 01:03:12,910 --> 01:03:14,780 It's surprising how many Harvard undergrads 1383 01:03:14,780 --> 01:03:17,500 seem to visit China and posts on Facebook while 1384 01:03:17,500 --> 01:03:19,770 in China, which is blocked. 1385 01:03:19,770 --> 01:03:23,540 But the reality is that if you have enough technical savvy and enough 1386 01:03:23,540 --> 01:03:27,560 access technologically, you could, in theory, in China or any other country 1387 01:03:27,560 --> 01:03:31,980 use your laptop to establish a VPN connection, a virtual private network 1388 01:03:31,980 --> 01:03:34,330 connection, to a place like Harvard or your company. 1389 01:03:34,330 --> 01:03:36,310 Or you can even pay third parties these days 1390 01:03:36,310 --> 01:03:38,560 to have a VPN connection in any number of countries 1391 01:03:38,560 --> 01:03:40,440 to really bounce yourself around the world. 1392 01:03:40,440 --> 01:03:44,530 And what happens then is that China or whatever the country is 1393 01:03:44,530 --> 01:03:47,880 would know that you have an internet connection between you 1394 01:03:47,880 --> 01:03:49,030 and the outside world. 1395 01:03:49,030 --> 01:03:52,800 But by nature of it being encrypted, they can't see inside. 1396 01:03:52,800 --> 01:03:57,510 And so they therefore either by conscious choice or disinterest 1397 01:03:57,510 --> 01:04:01,320 or oversight, allow you to maintain that connection. 1398 01:04:01,320 --> 01:04:03,600 But the result is that all of your Facebook traffic 1399 01:04:03,600 --> 01:04:06,630 doesn't go directly from you in that country to Facebook.com. 1400 01:04:06,630 --> 01:04:11,430 It first goes through America or wherever your VPN server actually 1401 01:04:11,430 --> 01:04:13,019 is or company actually is. 1402 01:04:13,019 --> 01:04:14,060 Then it goes to Facebook. 1403 01:04:14,060 --> 01:04:15,710 Then it goes back to your company. 1404 01:04:15,710 --> 01:04:17,220 Then it goes to you. 1405 01:04:17,220 --> 01:04:20,620 So just intuitively, what's the downside of this approach 1406 01:04:20,620 --> 01:04:22,790 to circumventing those kinds of protections? 1407 01:04:22,790 --> 01:04:23,964 AUDIENCE: [INAUDIBLE]. 1408 01:04:23,964 --> 01:04:26,630 DAVID J. MALAN: It's got to be slower if you're adding distance. 1409 01:04:26,630 --> 01:04:28,314 And frankly, you're adding cryptography. 1410 01:04:28,314 --> 01:04:29,230 More on that tomorrow. 1411 01:04:29,230 --> 01:04:31,180 You're adding the scrambling of information 1412 01:04:31,180 --> 01:04:32,800 which takes a non-zero amount of time. 1413 01:04:32,800 --> 01:04:33,950 Might slow things down. 1414 01:04:33,950 --> 01:04:36,130 But the upside, of course, is that you can access, 1415 01:04:36,130 --> 01:04:38,040 theoretically, protected resources. 1416 01:04:38,040 --> 01:04:41,020 Now, there is no reason the country or the company 1417 01:04:41,020 --> 01:04:44,940 couldn't prevent VPN connections, or prevent VPN connections 1418 01:04:44,940 --> 01:04:47,920 to known places like Harvard.edu. 1419 01:04:47,920 --> 01:04:51,620 But assuming the cryptography, the scrambling information 1420 01:04:51,620 --> 01:04:54,810 is correct and secure, they at least can't see what's 1421 01:04:54,810 --> 01:04:57,170 inside of it if they do allow it. 1422 01:04:57,170 --> 01:04:59,960 So these are the kinds of trade-offs. 1423 01:04:59,960 --> 01:05:02,660 But VPN services seems to be very much in vogue these days. 1424 01:05:02,660 --> 01:05:08,590 And people do not so much for issues of-- well, still 1425 01:05:08,590 --> 01:05:09,800 for issues of circumvention. 1426 01:05:09,800 --> 01:05:14,540 VPNs are very popular among Netflix subscribers and Hulu subscribers. 1427 01:05:14,540 --> 01:05:16,445 Why? 1428 01:05:16,445 --> 01:05:18,661 AUDIENCE: Access location specific streams? 1429 01:05:18,661 --> 01:05:20,660 DAVID J. MALAN: Yeah, location specific streams. 1430 01:05:20,660 --> 01:05:23,687 So like certain shows only air in the UK initially. 1431 01:05:23,687 --> 01:05:26,770 And so you might want-- like Downtown Abbey, if you were really into that, 1432 01:05:26,770 --> 01:05:30,760 and you want to get access before PBS in the US has it, you can VPN into the UK 1433 01:05:30,760 --> 01:05:36,020 and watch it on whatever the network-- the BBC's channels there, perhaps. 1434 01:05:36,020 --> 01:05:39,020 Netflix was in the press recently because they've been clamping down 1435 01:05:39,020 --> 01:05:41,090 on their own customers' use of VPNs. 1436 01:05:41,090 --> 01:05:44,840 Because if you're traveling abroad-- even if you're a legitimate Netflix 1437 01:05:44,840 --> 01:05:47,741 subscriber but you travel abroad, and you go to Netflix.com, 1438 01:05:47,741 --> 01:05:49,990 log in with your American account, you might still not 1439 01:05:49,990 --> 01:05:51,740 be able to access the resources. 1440 01:05:51,740 --> 01:05:54,680 Because of whatever partner or licensing arrangements 1441 01:05:54,680 --> 01:05:57,416 they've made with the big film studios and TV studios, 1442 01:05:57,416 --> 01:05:58,790 they just won't stream it to you. 1443 01:05:58,790 --> 01:06:03,390 Unless you pretend to be in America, as you could with a VPN service. 1444 01:06:03,390 --> 01:06:06,630 But there are certain VPN services that have gotten super popular, it seems. 1445 01:06:06,630 --> 01:06:09,460 So Netflix simply blacklists their IP addresses. 1446 01:06:09,460 --> 01:06:12,040 So that even that doesn't work. 1447 01:06:12,040 --> 01:06:12,775 Yeah? 1448 01:06:12,775 --> 01:06:13,650 AUDIENCE: [INAUDIBLE] 1449 01:06:13,650 --> 01:06:16,585 1450 01:06:16,585 --> 01:06:17,460 DAVID J. MALAN: Sure. 1451 01:06:17,460 --> 01:06:19,480 So Tor is an interesting thing. 1452 01:06:19,480 --> 01:06:22,380 This is the-- Tor Onion Router, so to speak, 1453 01:06:22,380 --> 01:06:25,190 which is an anonymous technology. 1454 01:06:25,190 --> 01:06:30,210 This essentially allows you and your laptop or desktop 1455 01:06:30,210 --> 01:06:34,244 to create essentially a multi-hub VPN connection. 1456 01:06:34,244 --> 01:06:37,160 Where you connect to someone else, he or she connects to someone else, 1457 01:06:37,160 --> 01:06:38,510 he or she connects to someone else. 1458 01:06:38,510 --> 01:06:40,468 So you have these several layers of indirection 1459 01:06:40,468 --> 01:06:41,820 that aren't at the router level. 1460 01:06:41,820 --> 01:06:46,020 They're just at individual laptops or desktops. 1461 01:06:46,020 --> 01:06:49,840 But the result is that you have plausible deniability in some way. 1462 01:06:49,840 --> 01:06:54,860 Whereby if you are requesting some websites that's way over here, 1463 01:06:54,860 --> 01:06:57,154 but your traffic appears to, not unlike the movies 1464 01:06:57,154 --> 01:07:00,320 have come from here to here to here to here to here to here to here to here, 1465 01:07:00,320 --> 01:07:03,270 the resulting website isn't going to know who you are. 1466 01:07:03,270 --> 01:07:05,880 They're going to know you only as whoever the most 1467 01:07:05,880 --> 01:07:08,750 recent hop in that network was. 1468 01:07:08,750 --> 01:07:12,110 So that might seem to put them at a disadvantage and you at an advantage. 1469 01:07:12,110 --> 01:07:14,870 But insofar as you're participating in this anonymization network, 1470 01:07:14,870 --> 01:07:19,160 the same might be true when that person wants to access something anonymously 1471 01:07:19,160 --> 01:07:22,430 that your request might appear to be coming from your laptop. 1472 01:07:22,430 --> 01:07:26,640 The catch, though, is that certainly if all of the middlemen in the story 1473 01:07:26,640 --> 01:07:28,000 reveal their logs. 1474 01:07:28,000 --> 01:07:30,730 Like everyone in the story knows to whom they've been connecting 1475 01:07:30,730 --> 01:07:32,290 and to whom they're connecting. 1476 01:07:32,290 --> 01:07:35,225 So you're really just making it harder for like subpoenas to chip 1477 01:07:35,225 --> 01:07:36,100 away at this problem. 1478 01:07:36,100 --> 01:07:37,970 You'd have to subpoena this machine, this machine, this machine, 1479 01:07:37,970 --> 01:07:38,650 this machine. 1480 01:07:38,650 --> 01:07:40,608 And if they're well configured technologically, 1481 01:07:40,608 --> 01:07:43,560 they won't remember any of this information by design anyway. 1482 01:07:43,560 --> 01:07:44,590 But it's not foolproof. 1483 01:07:44,590 --> 01:07:47,690 In fact, a few years ago there was a bomb scare 1484 01:07:47,690 --> 01:07:51,660 at Harvard during, coincidentally, exam period, which was actually traced back 1485 01:07:51,660 --> 01:07:53,730 to a student, I believe. 1486 01:07:53,730 --> 01:07:56,810 Even though he-- and this is all publicly documented in 1487 01:07:56,810 --> 01:08:00,200 newspaper articles-- was using, I believe, Tor at the time. 1488 01:08:00,200 --> 01:08:02,760 The problem, though, is that-- and I'm only inferring, 1489 01:08:02,760 --> 01:08:06,950 I think, from the details-- if you're the only Tor user, or one of few Tor 1490 01:08:06,950 --> 01:08:09,680 users on campus when a bomb threat is called in, 1491 01:08:09,680 --> 01:08:11,480 you might appear to be anonymous. 1492 01:08:11,480 --> 01:08:14,901 But that protocol is still identifiable. 1493 01:08:14,901 --> 01:08:16,609 So you might not know, or the authorities 1494 01:08:16,609 --> 01:08:19,520 might not know what those few Tor users were doing. 1495 01:08:19,520 --> 01:08:23,010 But they certainly knew who presumably to question first. 1496 01:08:23,010 --> 01:08:26,334 And so it only helps in certain scenarios. 1497 01:08:26,334 --> 01:08:27,209 AUDIENCE: [INAUDIBLE] 1498 01:08:27,209 --> 01:08:31,880 1499 01:08:31,880 --> 01:08:32,880 DAVID J. MALAN: Correct. 1500 01:08:32,880 --> 01:08:34,500 The more, the better. 1501 01:08:34,500 --> 01:08:38,240 But even then-- Google up on it, security of Tor networks. 1502 01:08:38,240 --> 01:08:41,109 The folks at the edges are sometimes exposed nonetheless 1503 01:08:41,109 --> 01:08:43,100 because of issues like this. 1504 01:08:43,100 --> 01:08:48,410 So it's not something to sort of build a business or super secure communication 1505 01:08:48,410 --> 01:08:49,100 on necessarily. 1506 01:08:49,100 --> 01:08:51,441 It helps, but nothing is foolproof. 1507 01:08:51,441 --> 01:08:52,316 AUDIENCE: [INAUDIBLE] 1508 01:08:52,316 --> 01:08:55,990 1509 01:08:55,990 --> 01:08:57,086 DAVID J. MALAN: To what? 1510 01:08:57,086 --> 01:08:57,529 AUDIENCE: [INAUDIBLE] 1511 01:08:57,529 --> 01:08:58,320 DAVID J. MALAN: Oh. 1512 01:08:58,320 --> 01:09:00,399 So that's very different. 1513 01:09:00,399 --> 01:09:02,470 I know less about that one. 1514 01:09:02,470 --> 01:09:05,510 But let's come back to that one tomorrow morning in more detail. 1515 01:09:05,510 --> 01:09:08,550 Less we go too far into security today. 1516 01:09:08,550 --> 01:09:10,439 Other questions? 1517 01:09:10,439 --> 01:09:10,950 All right. 1518 01:09:10,950 --> 01:09:13,420 So we have this whole alphabet soup, all of which 1519 01:09:13,420 --> 01:09:15,417 together actually allow us to do something. 1520 01:09:15,417 --> 01:09:16,500 So what is that something? 1521 01:09:16,500 --> 01:09:17,670 What is one of the most common things? 1522 01:09:17,670 --> 01:09:20,794 Let's conclude our look at internet technologies at the one that most of us 1523 01:09:20,794 --> 01:09:23,880 are using so much these days, which is web related stuff. 1524 01:09:23,880 --> 01:09:28,569 And in fact the one acronym we haven't mentioned yet that you sort of see 1525 01:09:28,569 --> 01:09:30,630 or use all the time is http. 1526 01:09:30,630 --> 01:09:33,710 Hyper text transfer protocol. 1527 01:09:33,710 --> 01:09:36,250 Which is yet another sort of handshake kind of protocol 1528 01:09:36,250 --> 01:09:38,840 that dictates how computers interact with servers. 1529 01:09:38,840 --> 01:09:40,490 And what is http used for? 1530 01:09:40,490 --> 01:09:42,770 Well, typically in a browser, back in the day, 1531 01:09:42,770 --> 01:09:47,540 you would type http://www.google.com and hit Enter. 1532 01:09:47,540 --> 01:09:49,720 Nowadays most of us probably don't type http. 1533 01:09:49,720 --> 01:09:52,189 We've sort of fallen out of that habit. 1534 01:09:52,189 --> 01:09:55,800 Most of us probably don't even type www these days. 1535 01:09:55,800 --> 01:10:01,140 Why is that but none of us seem to do that anymore? 1536 01:10:01,140 --> 01:10:02,850 DNS is true. 1537 01:10:02,850 --> 01:10:09,250 So long as Google.com has an IP address and not just www.google.com, 1538 01:10:09,250 --> 01:10:10,310 it will just work. 1539 01:10:10,310 --> 01:10:13,960 And in fact, web servers can forcibly redirect you. 1540 01:10:13,960 --> 01:10:16,730 If you visit Google.com, they can send a response 1541 01:10:16,730 --> 01:10:18,200 that is not Google's home page. 1542 01:10:18,200 --> 01:10:21,660 It's an initial response that says no, go here instead. 1543 01:10:21,660 --> 01:10:25,520 Your browser will then behind the scenes request www.google.com, 1544 01:10:25,520 --> 01:10:27,350 thereby filling in the blanks for you. 1545 01:10:27,350 --> 01:10:30,270 Why do we humans rarely if ever type http anymore? 1546 01:10:30,270 --> 01:10:35,614 1547 01:10:35,614 --> 01:10:36,406 AUDIENCE: Protocol? 1548 01:10:36,406 --> 01:10:37,613 DAVID J. MALAN: New protocol? 1549 01:10:37,613 --> 01:10:38,500 Not even. 1550 01:10:38,500 --> 01:10:39,600 This is more of a-- 1551 01:10:39,600 --> 01:10:39,930 AUDIENCE: [INAUDIBLE]. 1552 01:10:39,930 --> 01:10:40,430 DAVID J. MALAN: What's that? 1553 01:10:40,430 --> 01:10:41,430 AUDIENCE: [INAUDIBLE]. 1554 01:10:41,430 --> 01:10:42,720 DAVID J. MALAN: The browser does it for you. 1555 01:10:42,720 --> 01:10:42,880 Right? 1556 01:10:42,880 --> 01:10:45,290 If you're using a browser-- even though browsers actually 1557 01:10:45,290 --> 01:10:48,160 can do a few different things, typically, different protocols FTP, 1558 01:10:48,160 --> 01:10:50,670 http, and a few others sometimes-- the reality 1559 01:10:50,670 --> 01:10:52,930 is 99 point whatever percent of the time you're just 1560 01:10:52,930 --> 01:10:54,800 using them for http traffic. 1561 01:10:54,800 --> 01:10:58,590 So you want your browser to speak this protocol, which we'll see in a moment. 1562 01:10:58,590 --> 01:10:59,820 And so it's just inferred. 1563 01:10:59,820 --> 01:11:04,140 And in fact, browsers even assume that if you type in something dot com 1564 01:11:04,140 --> 01:11:08,070 and something dot com doesn't exist, they will often presumptuously pre-pend 1565 01:11:08,070 --> 01:11:14,180 for you www.-- try that address just in case the website's not configured 1566 01:11:14,180 --> 01:11:14,870 for you. 1567 01:11:14,870 --> 01:11:18,060 In fact, I still remember, it was an amazing example for class discussions, 1568 01:11:18,060 --> 01:11:22,340 years ago http://harvard.edu did not work. 1569 01:11:22,340 --> 01:11:25,950 Which was infuriating, because you'd go to it and you'd hit a dead end. 1570 01:11:25,950 --> 01:11:28,370 Which sort of doesn't reflect well, I think. 1571 01:11:28,370 --> 01:11:32,030 But it was an amazing opportunity to discuss in class why it was broken. 1572 01:11:32,030 --> 01:11:35,290 And then at some point, I must-- someone new 1573 01:11:35,290 --> 01:11:38,370 was hired at Harvard who similarly thought this was ridiculous. 1574 01:11:38,370 --> 01:11:39,880 And within days it was fixed. 1575 01:11:39,880 --> 01:11:41,970 So unfortunately, it's not a good story anymore. 1576 01:11:41,970 --> 01:11:47,130 But it is purely by human convention that most websites still 1577 01:11:47,130 --> 01:11:48,920 start with www. 1578 01:11:48,920 --> 01:11:52,370 And in fact, it's why you see on advertisements something advertised 1579 01:11:52,370 --> 01:11:56,880 as www.something.com, because for a lot of users out there, 1580 01:11:56,880 --> 01:11:59,170 technophobes some of them or less technical people, 1581 01:11:59,170 --> 01:12:01,790 it's a visual identifier that this is a website. 1582 01:12:01,790 --> 01:12:04,310 Now these days, you might argue that do you really need it? 1583 01:12:04,310 --> 01:12:07,320 Like what's another good visual cue that when you see something 1584 01:12:07,320 --> 01:12:10,160 you should type it into a browser. 1585 01:12:10,160 --> 01:12:12,130 .com alone is pretty good. 1586 01:12:12,130 --> 01:12:17,310 But nowadays, this is a new problem that will start to resurge again. 1587 01:12:17,310 --> 01:12:23,500 Most of us have seen .com and .edu and probably .net and probably .org. 1588 01:12:23,500 --> 01:12:27,133 But there is bunches of others like .travel-- 1589 01:12:27,133 --> 01:12:28,735 AUDIENCE: [INAUDIBLE]. 1590 01:12:28,735 --> 01:12:29,776 DAVID J. MALAN: Dot what? 1591 01:12:29,776 --> 01:12:32,210 AUDIENCE: [INAUDIBLE]. 1592 01:12:32,210 --> 01:12:37,980 .gl, .tv, .io. 1593 01:12:37,980 --> 01:12:42,990 And to make a bad problem worse, if you go to a website now, Registrar-- 1594 01:12:42,990 --> 01:12:49,290 more on this in a moment-- there are an atrocious number of generic 1595 01:12:49,290 --> 01:12:52,670 TLD's as they're called-- and this is a recent change in the past year 1596 01:12:52,670 --> 01:12:58,990 or so where the world just got really messy-- OK. .guru, .clothing, .singles, 1597 01:12:58,990 --> 01:13:04,407 .holdings, .ventures, .equipment, .plumbing, .cam-- I mean, it's just-- 1598 01:13:04,407 --> 01:13:05,740 I'm embarrassed to say them all. 1599 01:13:05,740 --> 01:13:10,540 They're just so innumerable now. 1600 01:13:10,540 --> 01:13:14,010 So gone are the days-- now you just have to look for a period. 1601 01:13:14,010 --> 01:13:17,840 And when you see a period or a dot, like type it into a browser it would seem. 1602 01:13:17,840 --> 01:13:21,190 So there's kind of this interesting issue 1603 01:13:21,190 --> 01:13:26,890 now, whereby much like we benefited with the phone numbers for years. 1604 01:13:26,890 --> 01:13:29,140 Like if you saw 800 dash something, dash something, 1605 01:13:29,140 --> 01:13:30,690 you just knew intuitively it was a phone number. 1606 01:13:30,690 --> 01:13:32,981 No one had to tell you to call this number necessarily. 1607 01:13:32,981 --> 01:13:35,570 And we've had that same tendency for first with http. 1608 01:13:35,570 --> 01:13:36,860 Then we just www. 1609 01:13:36,860 --> 01:13:37,489 Then with .com. 1610 01:13:37,489 --> 01:13:40,530 And now there's perhaps a regression in so far as I know lots of people-- 1611 01:13:40,530 --> 01:13:42,571 and even I wouldn't necessarily know that if they 1612 01:13:42,571 --> 01:13:44,340 see something dot something that it's not 1613 01:13:44,340 --> 01:13:47,040 just some trendy way of marketing your sort of syntax 1614 01:13:47,040 --> 01:13:49,690 as opposed to it being an actual domain name. 1615 01:13:49,690 --> 01:13:52,300 So it'll be interesting to see what happens now. 1616 01:13:52,300 --> 01:13:58,720 But this is to say that websites ultimately have domain names. 1617 01:13:58,720 --> 01:14:01,416 And they're accessed by way of this protocol, http. 1618 01:14:01,416 --> 01:14:02,790 And let's see what it looks like. 1619 01:14:02,790 --> 01:14:05,270 It turns out that if I go to google.com, I of course 1620 01:14:05,270 --> 01:14:07,420 see a page that looks a little something like this. 1621 01:14:07,420 --> 01:14:10,850 Most browsers have a feature somewhere like under Chrome, 1622 01:14:10,850 --> 01:14:14,640 it's under Developer, View Source, where you can actually see the underlying 1623 01:14:14,640 --> 01:14:17,210 code that composes the web page. 1624 01:14:17,210 --> 01:14:18,760 This isn't programming code per se. 1625 01:14:18,760 --> 01:14:22,800 This is mostly a language called html and css, more on that 1626 01:14:22,800 --> 01:14:24,090 later this afternoon. 1627 01:14:24,090 --> 01:14:26,700 But this is the language in which web pages are written. 1628 01:14:26,700 --> 01:14:31,350 It's scary looking here because Google has compacted it as much as possible. 1629 01:14:31,350 --> 01:14:35,290 And it does a little more than a typical website might in terms of features. 1630 01:14:35,290 --> 01:14:37,920 But this is what was inside the virtual envelope. 1631 01:14:37,920 --> 01:14:39,130 So Sean got a cat. 1632 01:14:39,130 --> 01:14:43,160 But when I go to google.com, I get all of this inside of my virtual envelopes. 1633 01:14:43,160 --> 01:14:49,307 And together, remarkably, all of this implements just this simplicity. 1634 01:14:49,307 --> 01:14:52,390 But there's actually some programming code in a language called JavaScript 1635 01:14:52,390 --> 01:14:53,910 there as well. 1636 01:14:53,910 --> 01:14:56,050 But notice, I can get that same response. 1637 01:14:56,050 --> 01:14:58,014 I'm going to pretend to be a browser here. 1638 01:14:58,014 --> 01:14:59,930 I'm going to run a program called Telnet which 1639 01:14:59,930 --> 01:15:03,850 is an old school program, good for diagnostics these days, to Google.com. 1640 01:15:03,850 --> 01:15:06,030 That's the server to which I want to connect. 1641 01:15:06,030 --> 01:15:08,800 And specifically, I want to connect to port 80. 1642 01:15:08,800 --> 01:15:12,980 It turns out that TCP which we discussed earlier, which guarantees delivery, 1643 01:15:12,980 --> 01:15:19,340 it also, among its features assigns numbers to different services. 1644 01:15:19,340 --> 01:15:23,120 So if I summarize a couple of these, 80 is 1645 01:15:23,120 --> 01:15:26,790 for-- actually, we'll do it right here. 1646 01:15:26,790 --> 01:15:30,945 Http has the TCP port number called 80. 1647 01:15:30,945 --> 01:15:33,070 Https, which most folks probably know means secure, 1648 01:15:33,070 --> 01:15:37,210 it's encrypted somehow-- more on that tomorrow-- is 443. 1649 01:15:37,210 --> 01:15:41,300 Email, otherwise known as simple mail transfer protocol, 1650 01:15:41,300 --> 01:15:50,980 is usually on port 25 or 487 or-- I'm getting this wrong-- 5-- 46-- 1651 01:15:50,980 --> 01:15:52,230 I can't remember offhand. 1652 01:15:52,230 --> 01:15:53,630 Usually on 25. 1653 01:15:53,630 --> 01:15:55,110 And two other numbers. 1654 01:15:55,110 --> 01:15:57,871 400-something and 500-something. 1655 01:15:57,871 --> 01:16:00,120 But in short, I'm going to use this Telnet program now 1656 01:16:00,120 --> 01:16:03,730 to connect to Google server on that port. 1657 01:16:03,730 --> 01:16:06,950 So I'm not sending an email as it might look like if I did this. 1658 01:16:06,950 --> 01:16:09,090 I'm instead pretending to be a browser. 1659 01:16:09,090 --> 01:16:11,670 And notice that I'm connected to google.com. 1660 01:16:11,670 --> 01:16:14,870 I can now type textually, just for demonstration's sake, the following. 1661 01:16:14,870 --> 01:16:21,466 GET / HTTP/1.1 from the host www.google.com. 1662 01:16:21,466 --> 01:16:23,950 What I have just typed is the digital equivalent 1663 01:16:23,950 --> 01:16:26,790 of my having extended my hand to [? Shavan ?] earlier. 1664 01:16:26,790 --> 01:16:28,660 This is what's inside my virtual envelope 1665 01:16:28,660 --> 01:16:30,400 when I request a page from Google. 1666 01:16:30,400 --> 01:16:33,920 When I hit Enter now, I get back all of that messiness. 1667 01:16:33,920 --> 01:16:35,900 Which previously we saw in Chrome, now I just 1668 01:16:35,900 --> 01:16:37,640 see in my little black and white window. 1669 01:16:37,640 --> 01:16:39,590 But that's all http is. 1670 01:16:39,590 --> 01:16:42,180 It has more commands and more instructions 1671 01:16:42,180 --> 01:16:43,390 that the server understands. 1672 01:16:43,390 --> 01:16:47,020 But at its essence, all you are saying is get me 1673 01:16:47,020 --> 01:16:51,937 a specific URL with version like 1.1 of the particular protocol. 1674 01:16:51,937 --> 01:16:53,520 And that's all that's happening there. 1675 01:16:53,520 --> 01:16:59,020 And if you're using https, all of that goes and it comes back encrypted, 1676 01:16:59,020 --> 01:17:00,410 scrambled. 1677 01:17:00,410 --> 01:17:03,780 So what does that mean at the end of the day? 1678 01:17:03,780 --> 01:17:05,730 At the end of the day, this web page still 1679 01:17:05,730 --> 01:17:07,620 needs to get written in another language. 1680 01:17:07,620 --> 01:17:09,270 And we'll come back to that later. 1681 01:17:09,270 --> 01:17:11,061 The other language is going to be something 1682 01:17:11,061 --> 01:17:13,730 called htmp, HyperText Markup Language. 1683 01:17:13,730 --> 01:17:16,610 With something called CSS, cascading style sheets. 1684 01:17:16,610 --> 01:17:19,030 Also some JavaScript. 1685 01:17:19,030 --> 01:17:20,180 More on that tomorrow. 1686 01:17:20,180 --> 01:17:22,715 And if I'm ever rattling off acronyms just super fast, 1687 01:17:22,715 --> 01:17:27,560 it's because usually that doesn't matter what they are, but how they work 1688 01:17:27,560 --> 01:17:28,850 or how they're relevant. 1689 01:17:28,850 --> 01:17:31,150 And html and CSS are the actual languages 1690 01:17:31,150 --> 01:17:35,080 that implement underneath the hood the web page. 1691 01:17:35,080 --> 01:17:36,650 So let's answer one final question. 1692 01:17:36,650 --> 01:17:39,250 Where did www.google.com come from? 1693 01:17:39,250 --> 01:17:42,890 Or if you're starting a business or if you have some personal portfolio site 1694 01:17:42,890 --> 01:17:45,870 and you want to put yourself on the internet 1695 01:17:45,870 --> 01:17:49,180 with your own domain name, what do you do? 1696 01:17:49,180 --> 01:17:52,380 Does anyone want to offer if you've done this before? 1697 01:17:52,380 --> 01:17:53,300 AUDIENCE: [INAUDIBLE]. 1698 01:17:53,300 --> 01:17:54,508 DAVID J. MALAN: Register, OK. 1699 01:17:54,508 --> 01:17:56,900 So you go to an internet registrar so to speak. 1700 01:17:56,900 --> 01:17:57,620 Registrar. 1701 01:17:57,620 --> 01:18:00,120 And there's dozens, if not hundreds, of them these days. 1702 01:18:00,120 --> 01:18:03,100 And from them you buy a domain name. 1703 01:18:03,100 --> 01:18:04,800 Or you kind of rent the domain name. 1704 01:18:04,800 --> 01:18:07,110 Because by nature of how the internet was set up, 1705 01:18:07,110 --> 01:18:09,800 we pay annually for these things. 1706 01:18:09,800 --> 01:18:17,640 From like $5 to maybe $200 depending on which top level domain or TLD you want. 1707 01:18:17,640 --> 01:18:21,950 By TLD I mean .com or .org or .guru. 1708 01:18:21,950 --> 01:18:23,700 In fact, if we go back to namecheap, which 1709 01:18:23,700 --> 01:18:26,250 is a popular, fairly inexpensive website, 1710 01:18:26,250 --> 01:18:35,970 if I want to get computerscience.guru, search, let's see if this is available. 1711 01:18:35,970 --> 01:18:39,170 It is not available, someone already owns that. 1712 01:18:39,170 --> 01:18:40,600 Let's see what it is. 1713 01:18:40,600 --> 01:18:43,162 1714 01:18:43,162 --> 01:18:44,870 Nope, they're not doing anything with it. 1715 01:18:44,870 --> 01:18:47,260 Oh and notice, maybe now even Chrome's error messages 1716 01:18:47,260 --> 01:18:48,900 make a little more sense. 1717 01:18:48,900 --> 01:18:53,090 Computer science guru server's DNS address could not be found. 1718 01:18:53,090 --> 01:18:56,250 So that means that my computer tried to figure out 1719 01:18:56,250 --> 01:18:58,862 the IP address of computerscience.guru, and it 1720 01:18:58,862 --> 01:19:01,070 seems the person might have paid for this domain name 1721 01:19:01,070 --> 01:19:02,910 but not actually done step two. 1722 01:19:02,910 --> 01:19:06,470 Which is actually an amazing segue to what step two should be. 1723 01:19:06,470 --> 01:19:10,142 Once you own the domain name, what comes next if you've done it? 1724 01:19:10,142 --> 01:19:11,600 AUDIENCE: Link it to an IP address. 1725 01:19:11,600 --> 01:19:13,110 DAVID J. MALAN: You have to link it to an IP address. 1726 01:19:13,110 --> 01:19:15,460 So you need some kind of web host, which would 1727 01:19:15,460 --> 01:19:17,390 be the generic way of describing it. 1728 01:19:17,390 --> 01:19:19,200 And this is either your own server. 1729 01:19:19,200 --> 01:19:22,130 You could in theory run your own server, get an IP address 1730 01:19:22,130 --> 01:19:23,880 or plug it into the internet and so forth. 1731 01:19:23,880 --> 01:19:25,421 Most people don't do that these days. 1732 01:19:25,421 --> 01:19:26,590 They'll use a cloud service. 1733 01:19:26,590 --> 01:19:28,830 More on that after lunch time. 1734 01:19:28,830 --> 01:19:34,040 But you'll sign up for someone like dreamhost.com or heroku.com 1735 01:19:34,040 --> 01:19:37,890 or Amazon Web Services or Microsoft Azure or Google App Engine. 1736 01:19:37,890 --> 01:19:41,705 Or any number of third parties who run servers and software 1737 01:19:41,705 --> 01:19:44,330 and give you storage space where you can put all of your files, 1738 01:19:44,330 --> 01:19:48,100 all of your web content, and they give you an IP address, or really 1739 01:19:48,100 --> 01:19:50,780 a shared IP address, maybe with other sites using it. 1740 01:19:50,780 --> 01:19:53,199 And you actually put your content here. 1741 01:19:53,199 --> 01:19:54,990 And so there's two steps, and both of these 1742 01:19:54,990 --> 01:19:57,540 involve some kind of financial transaction 1743 01:19:57,540 --> 01:20:01,160 if you're indeed using someone else's servers and software 1744 01:20:01,160 --> 01:20:02,064 to run your website. 1745 01:20:02,064 --> 01:20:03,480 And there's more steps in between. 1746 01:20:03,480 --> 01:20:07,280 Generally, it's documented and it completely varies by whom you partner 1747 01:20:07,280 --> 01:20:08,160 with. 1748 01:20:08,160 --> 01:20:12,170 But we'll see later today what it looks like to actually write code that you 1749 01:20:12,170 --> 01:20:15,220 might put on to step two servers. 1750 01:20:15,220 --> 01:20:17,048