1 00:00:00,000 --> 00:00:03,960 [MUSIC PLAYING] 2 00:00:03,960 --> 00:00:16,860 3 00:00:16,860 --> 00:00:19,770 DAVID MALAN: All right, this is CS50's Introduction 4 00:00:19,770 --> 00:00:21,960 to Cybersecurity My name is David Malan. 5 00:00:21,960 --> 00:00:25,470 And this week, let's focus on securing systems, particularly 6 00:00:25,470 --> 00:00:29,170 those that are somehow networked or even inter-networked as well. 7 00:00:29,170 --> 00:00:32,310 Now, recall from last time that we presented encryption. 8 00:00:32,310 --> 00:00:34,450 Is really the solution to a lot of our problems. 9 00:00:34,450 --> 00:00:36,783 And that's going to be a building block that we continue 10 00:00:36,783 --> 00:00:40,440 to use to solve a lot of our concerns around the security of not 11 00:00:40,440 --> 00:00:43,793 only our accounts, our data, but now also our systems. 12 00:00:43,793 --> 00:00:46,710 For instance, let's consider something that you, yourself, are perhaps 13 00:00:46,710 --> 00:00:51,090 using right now, which is Wi-Fi, somehow connected via wireless technology 14 00:00:51,090 --> 00:00:53,350 to the internet and beyond. 15 00:00:53,350 --> 00:00:56,377 So probably by now you've realized that when you're on Wi-Fi, 16 00:00:56,377 --> 00:00:57,960 you have to first choose your network. 17 00:00:57,960 --> 00:01:00,002 And you might choose your network from a dropdown 18 00:01:00,002 --> 00:01:02,910 menu on your computer or the like, or it might auto select it. 19 00:01:02,910 --> 00:01:04,780 But there's at least two types of networks. 20 00:01:04,780 --> 00:01:09,840 One that are unsecured, and then two, which are secured in some way. 21 00:01:09,840 --> 00:01:11,850 And odds are you recognize, and you've already 22 00:01:11,850 --> 00:01:14,790 been taught through practice to recognize the little padlock 23 00:01:14,790 --> 00:01:17,010 icon on your phone, or laptop, or desktop 24 00:01:17,010 --> 00:01:21,340 as signifying that your Wi-Fi connection is, indeed, encrypted. 25 00:01:21,340 --> 00:01:22,900 Now what does that actually mean? 26 00:01:22,900 --> 00:01:25,080 Well, in particular, in this context, it would 27 00:01:25,080 --> 00:01:29,730 mean that any of the traffic, any of the internet packets of information, 28 00:01:29,730 --> 00:01:33,150 so to speak, like envelopes of information going from your device 29 00:01:33,150 --> 00:01:35,820 off onto the internet, are somehow encrypted. 30 00:01:35,820 --> 00:01:39,630 At least encrypted until they reach whatever device 31 00:01:39,630 --> 00:01:42,870 your device is talking to wirelessly. 32 00:01:42,870 --> 00:01:47,040 So you have an encrypted connection between that wireless device, 33 00:01:47,040 --> 00:01:48,730 often called an access point. 34 00:01:48,730 --> 00:01:50,700 Now what is the actual technology that's being 35 00:01:50,700 --> 00:01:53,460 used to secure Wi-Fi networks nowadays? 36 00:01:53,460 --> 00:01:56,370 Well, hopefully you're using among the latest versions of this-- 37 00:01:56,370 --> 00:01:59,043 Wi-Fi Protected Access, or WPA. 38 00:01:59,043 --> 00:02:00,960 And this has evolved over the years, and there 39 00:02:00,960 --> 00:02:02,418 have been a few different versions. 40 00:02:02,418 --> 00:02:05,010 And so in general, whenever you configure a phone, 41 00:02:05,010 --> 00:02:07,380 or whenever you can figure a laptop or desktop, 42 00:02:07,380 --> 00:02:10,229 ideally you're connecting to a device nowadays 43 00:02:10,229 --> 00:02:13,512 that supports this technology and the latest version thereof. 44 00:02:13,512 --> 00:02:15,720 And in a nutshell, what that ensures is that, indeed, 45 00:02:15,720 --> 00:02:18,960 your traffic from your phone, laptop, or desktop is somehow 46 00:02:18,960 --> 00:02:22,200 scrambled between you and that other device. 47 00:02:22,200 --> 00:02:24,180 And that device, in turn, is probably connected 48 00:02:24,180 --> 00:02:27,940 to devices called routers, computers that route-- left, right, up, 49 00:02:27,940 --> 00:02:29,940 and down-- information on the internet, which 50 00:02:29,940 --> 00:02:32,220 might then connect to other routers, until it finally 51 00:02:32,220 --> 00:02:33,700 reaches its destination. 52 00:02:33,700 --> 00:02:36,750 So for our purposes, though, Wi-Fi Protected Access 53 00:02:36,750 --> 00:02:40,710 and using a secure Wi-Fi network is only technically encrypting 54 00:02:40,710 --> 00:02:44,520 your traffic between you and whatever device is maybe on the wall, 55 00:02:44,520 --> 00:02:47,100 on the ceiling, nearby, the little thing with antennas, 56 00:02:47,100 --> 00:02:50,790 perhaps, that you, yourself, are actually talking to. 57 00:02:50,790 --> 00:02:53,140 So why do we care about this? 58 00:02:53,140 --> 00:02:56,290 Why do you want even your Wi-Fi connection, for instance, 59 00:02:56,290 --> 00:02:57,120 to be encrypted? 60 00:02:57,120 --> 00:03:01,170 Well, it turns out a lot of what you and I do on the internet isn't necessarily 61 00:03:01,170 --> 00:03:03,180 encrypted already for us. 62 00:03:03,180 --> 00:03:06,000 Now fortunately, this is decreasingly the case. 63 00:03:06,000 --> 00:03:09,810 The world has gotten better about using more and more encryption 64 00:03:09,810 --> 00:03:13,320 in various products, and software, and applications that you and I use, 65 00:03:13,320 --> 00:03:14,730 but not necessarily. 66 00:03:14,730 --> 00:03:17,940 Because odds are, even some of you are probably 67 00:03:17,940 --> 00:03:24,420 in the habit of typing http://www.example.com 68 00:03:24,420 --> 00:03:26,940 or whatever domain you're actually trying to visit. 69 00:03:26,940 --> 00:03:29,340 Now maybe you don't type even that, but you're certainly 70 00:03:29,340 --> 00:03:32,880 familiar with this prefix, this acronym, HTTP, 71 00:03:32,880 --> 00:03:35,910 which stands for Hypertext Transfer Protocol, which 72 00:03:35,910 --> 00:03:38,610 is a fancy way of saying that this is a protocol, 73 00:03:38,610 --> 00:03:43,590 a language that computers use when talking on the worldwide web, 74 00:03:43,590 --> 00:03:44,710 or web for short. 75 00:03:44,710 --> 00:03:46,390 So what do we mean by that? 76 00:03:46,390 --> 00:03:49,530 Well, if you're sitting down at a browser on your phone, or laptop, 77 00:03:49,530 --> 00:03:53,550 or desktop, and you're visiting a URL that starts with HTTP, 78 00:03:53,550 --> 00:03:57,510 your device is about to communicate with some remote server using 79 00:03:57,510 --> 00:04:00,990 this protocol, this language, really a set of conventions 80 00:04:00,990 --> 00:04:02,970 for talking between each other. 81 00:04:02,970 --> 00:04:05,730 But the catch is, per last week, when we focused 82 00:04:05,730 --> 00:04:09,360 on the security of our data, the information, or the packets 83 00:04:09,360 --> 00:04:12,330 of information that we're sending from browser to server 84 00:04:12,330 --> 00:04:15,420 and back are vulnerable to eavesdropping, potentially, 85 00:04:15,420 --> 00:04:17,730 if you're only using HTTP. 86 00:04:17,730 --> 00:04:18,730 Well, why is that? 87 00:04:18,730 --> 00:04:22,000 Well HTTP by definition is not encrypted. 88 00:04:22,000 --> 00:04:26,040 It's just text messages, often English like in nature, 89 00:04:26,040 --> 00:04:27,480 but they're not at all scrambled. 90 00:04:27,480 --> 00:04:31,050 Which means if Alice is sitting at her desktop, or laptop, or phone, 91 00:04:31,050 --> 00:04:34,770 and trying to visit some website, here represented as our friend, Bob, 92 00:04:34,770 --> 00:04:39,960 there could be some third party, Eve, who's eavesdropping in between-- 93 00:04:39,960 --> 00:04:42,180 a machine in the middle, so to speak, that 94 00:04:42,180 --> 00:04:46,230 could be looking at every request Alice is making and every response 95 00:04:46,230 --> 00:04:49,350 that Bob is sending, if again, Alice is the user in the story 96 00:04:49,350 --> 00:04:51,780 and Bob is the web server in this story. 97 00:04:51,780 --> 00:04:56,580 So you're vulnerable if you're only using HTTP and certain other protocols 98 00:04:56,580 --> 00:04:59,790 or technologies to these machine in the middle attacks. 99 00:04:59,790 --> 00:05:01,500 And the attack in this case might just be 100 00:05:01,500 --> 00:05:05,460 someone nosily trying to know what it is you are doing on the internet. 101 00:05:05,460 --> 00:05:09,990 But worse, they can even manipulate what you're sending or receiving 102 00:05:09,990 --> 00:05:13,960 if these systems are not using some form of encryption. 103 00:05:13,960 --> 00:05:15,870 So let's take a specific example. 104 00:05:15,870 --> 00:05:18,160 When you visit a website on the internet, 105 00:05:18,160 --> 00:05:20,950 you are downloading effectively a language 106 00:05:20,950 --> 00:05:23,830 called HTML, Hypertext Markup Language. 107 00:05:23,830 --> 00:05:27,478 If you take a course like CS50, itself, or an introduction to web development, 108 00:05:27,478 --> 00:05:30,520 you'll actually learn a language that looks a little something like this. 109 00:05:30,520 --> 00:05:32,228 And I've shown only the highlights, using 110 00:05:32,228 --> 00:05:36,190 ellipsis here-- dot, dot, dot-- to wave my hands at details that won't matter. 111 00:05:36,190 --> 00:05:38,860 But this is the kind of text, or language, 112 00:05:38,860 --> 00:05:41,590 that we get back from a server when you visit something, 113 00:05:41,590 --> 00:05:46,630 like http://www.example.com. 114 00:05:46,630 --> 00:05:49,122 But notice that this doesn't seem to be scrambled, 115 00:05:49,122 --> 00:05:50,830 even though it might look cryptic to you, 116 00:05:50,830 --> 00:05:53,170 if you've never written or seen HTML before. 117 00:05:53,170 --> 00:05:56,740 It doesn't look like random zeros and ones certainly. 118 00:05:56,740 --> 00:05:58,240 It looks pretty intelligible. 119 00:05:58,240 --> 00:06:01,760 And, in fact, it looks somewhat English like, with words like body here. 120 00:06:01,760 --> 00:06:07,150 Well, the catch is, if this is the response coming back from a web server 121 00:06:07,150 --> 00:06:08,860 to a web browser-- 122 00:06:08,860 --> 00:06:10,660 for instance Alice's own-- 123 00:06:10,660 --> 00:06:13,300 what could happen is that some eavesdropper, 124 00:06:13,300 --> 00:06:15,340 some machine in the middle, could actually 125 00:06:15,340 --> 00:06:20,950 inject additional HTML code into the web pages that you, and I, 126 00:06:20,950 --> 00:06:22,480 and Alice are downloading. 127 00:06:22,480 --> 00:06:24,160 Now what is this representative of? 128 00:06:24,160 --> 00:06:26,035 Well, this is another feature you might learn 129 00:06:26,035 --> 00:06:29,230 in a course on website development, but let me highlight just one key phrase 130 00:06:29,230 --> 00:06:31,180 here because this is a common use case. 131 00:06:31,180 --> 00:06:34,390 It is possible, therefore, for a machine in the middle 132 00:06:34,390 --> 00:06:38,590 to inject something like advertisements, or worse something actually 133 00:06:38,590 --> 00:06:40,960 malicious that tries to steal your data in some way. 134 00:06:40,960 --> 00:06:44,020 But a common scenario for this machine in the middle attack 135 00:06:44,020 --> 00:06:48,340 is when your internet service provider, or the coffee shop whose Wi-Fi you're 136 00:06:48,340 --> 00:06:51,370 using, or the hotel whose Wi-Fi you're using 137 00:06:51,370 --> 00:06:55,390 wants to inject advertisements into maybe each and every web 138 00:06:55,390 --> 00:06:58,180 page you are visiting, even if those web pages weren't even 139 00:06:58,180 --> 00:06:59,680 designed to have advertisements. 140 00:06:59,680 --> 00:07:03,820 These things might be pre-pended to the very top of the page, for instance. 141 00:07:03,820 --> 00:07:09,170 Now the reason that machines in the middle are able to do this, 142 00:07:09,170 --> 00:07:11,890 though, is simply because if you're not using encryption, 143 00:07:11,890 --> 00:07:16,030 and Alice and Bob are communicating insecurely in that sense, well, 144 00:07:16,030 --> 00:07:19,810 there's no telling what they could add to the responses that 145 00:07:19,810 --> 00:07:22,150 are coming back from this web server. 146 00:07:22,150 --> 00:07:25,480 Let me pause here before we move on to yet other threats 147 00:07:25,480 --> 00:07:30,160 to see if there are now any questions on this particular attack. 148 00:07:30,160 --> 00:07:33,350 AUDIENCE: How to detect machine in the middle and get rid of it? 149 00:07:33,350 --> 00:07:34,850 DAVID MALAN: A really good question. 150 00:07:34,850 --> 00:07:37,870 So how can you detect a machine in the middle and get rid of it? 151 00:07:37,870 --> 00:07:40,660 That certainly should be a worthy goal. 152 00:07:40,660 --> 00:07:44,810 Short answer is you cannot necessarily detect it. 153 00:07:44,810 --> 00:07:48,970 It is possible that a machine in the middle can be doing all of this 154 00:07:48,970 --> 00:07:51,350 without your own knowledge. 155 00:07:51,350 --> 00:07:52,718 How do you get rid of it? 156 00:07:52,718 --> 00:07:54,760 Well, that too is going to be the focus of today. 157 00:07:54,760 --> 00:07:59,470 Namely, encryption is going to help us push back on exactly this threat. 158 00:07:59,470 --> 00:08:02,770 But first, let's consider some additional threats or concerns 159 00:08:02,770 --> 00:08:03,620 that we might have. 160 00:08:03,620 --> 00:08:06,080 And one of them technically is called packet sniffing. 161 00:08:06,080 --> 00:08:09,820 So, again, a packet is the a virtual envelope of sorts 162 00:08:09,820 --> 00:08:12,050 that you might use when sending data on the internet. 163 00:08:12,050 --> 00:08:14,625 And so, in fact, here is a pretty standard envelope 164 00:08:14,625 --> 00:08:16,750 in which, in the human world, I might put a letter, 165 00:08:16,750 --> 00:08:19,240 and then write something on the outside, and send it off 166 00:08:19,240 --> 00:08:20,740 to someone through the mail system. 167 00:08:20,740 --> 00:08:24,220 Well, you can think of packets in the context of computer systems 168 00:08:24,220 --> 00:08:28,750 as being analogous to this, whereby this is an envelope, whose purpose in life 169 00:08:28,750 --> 00:08:31,720 is to get data from one point, A, to another point, 170 00:08:31,720 --> 00:08:33,340 B-- so from Alice to Bob. 171 00:08:33,340 --> 00:08:37,750 And inside of this packet is the actual message that Alice is sending to Bob, 172 00:08:37,750 --> 00:08:39,640 and Bob is hopefully sending back to Alice. 173 00:08:39,640 --> 00:08:41,950 So it might not all fit in one packet. 174 00:08:41,950 --> 00:08:44,800 So indeed, the internet tends to use multiple packets like these. 175 00:08:44,800 --> 00:08:46,810 But that's an appropriate metaphor to think 176 00:08:46,810 --> 00:08:49,840 about what it is we're doing here, otherwise, digitally. 177 00:08:49,840 --> 00:08:52,360 So packet sniffing then is kind like-- 178 00:08:52,360 --> 00:08:55,720 [SNIFFS]---- trying to get a sense of what's going on inside of these 179 00:08:55,720 --> 00:08:56,350 envelopes. 180 00:08:56,350 --> 00:09:00,310 And, indeed, if the contents of these envelopes are not encrypted, 181 00:09:00,310 --> 00:09:04,540 scrambled securely in some way, well then any machine in the middle 182 00:09:04,540 --> 00:09:07,450 can technically take a quick glance inside of these envelopes, 183 00:09:07,450 --> 00:09:10,210 so to speak, see what's inside of them, even change 184 00:09:10,210 --> 00:09:13,370 what's inside of them, as we've seen, and then pass it along. 185 00:09:13,370 --> 00:09:17,230 So what are the implications for what this makes possible, therefore, 186 00:09:17,230 --> 00:09:21,220 if you are vulnerable to packet sniffing by, again, not 187 00:09:21,220 --> 00:09:24,940 having your systems use encryption when they are talking to one another? 188 00:09:24,940 --> 00:09:27,850 Well, here, for instance, is one example of what 189 00:09:27,850 --> 00:09:31,840 could be inside, metaphorically, an envelope like that. 190 00:09:31,840 --> 00:09:35,290 This is the kind of message, written mostly in English, 191 00:09:35,290 --> 00:09:40,540 that represents a browser requesting a web page from a server. 192 00:09:40,540 --> 00:09:43,690 In particular, when you visit a search engine, for instance, 193 00:09:43,690 --> 00:09:46,780 and type into the search box what it is you're searching for-- maybe 194 00:09:46,780 --> 00:09:48,130 you're searching for cats-- 195 00:09:48,130 --> 00:09:50,380 what happens is your phone, or your laptop, 196 00:09:50,380 --> 00:09:53,080 or your desktop creates a virtual envelope like that. 197 00:09:53,080 --> 00:09:56,590 Opens it up, puts inside of it a message that looks like this. 198 00:09:56,590 --> 00:09:59,410 Closes the envelope, and then sends the envelope off 199 00:09:59,410 --> 00:10:02,220 onto the internet to a web server that, conversely, 200 00:10:02,220 --> 00:10:05,470 is going to open the envelope, read this message, and hopefully, send you back 201 00:10:05,470 --> 00:10:08,380 a whole bunch of search results about cats. 202 00:10:08,380 --> 00:10:12,680 Now most of this is arcane detail that we don't particularly care about, 203 00:10:12,680 --> 00:10:14,650 including where it is we're visiting. 204 00:10:14,650 --> 00:10:16,657 For this story, I'm using example.com. 205 00:10:16,657 --> 00:10:18,740 It's just something generic, but you could imagine 206 00:10:18,740 --> 00:10:20,780 it being Google, or Bing, or the like. 207 00:10:20,780 --> 00:10:24,710 Here you have a command-- get-- which means literally get me a web page. 208 00:10:24,710 --> 00:10:27,780 /Search is implying that I'm searching for something. 209 00:10:27,780 --> 00:10:29,460 But here's the interesting part. 210 00:10:29,460 --> 00:10:32,810 What I'm searching for is a query for cats, 211 00:10:32,810 --> 00:10:38,430 where a query is a question or a search request of yours, and cats, of course, 212 00:10:38,430 --> 00:10:39,530 is what I'm searching for. 213 00:10:39,530 --> 00:10:41,900 So this is the say inside of this envelope, 214 00:10:41,900 --> 00:10:43,700 whether you're searching for cats, or dogs, 215 00:10:43,700 --> 00:10:47,960 or anything else, there is some mention of that search query 216 00:10:47,960 --> 00:10:51,260 inside of the message inside of that envelope that's 217 00:10:51,260 --> 00:10:55,532 being sent from Alice to Bob, or from you to Google or Bing. 218 00:10:55,532 --> 00:10:57,740 Lastly, there's this mention here, which is referring 219 00:10:57,740 --> 00:11:00,170 to exactly this same protocol, HTTP. 220 00:11:00,170 --> 00:11:02,420 In this case, perhaps, version 3. 221 00:11:02,420 --> 00:11:06,050 But ultimately, it's the yellow highlighted text here, 222 00:11:06,050 --> 00:11:09,230 that mention of cats that's worrisome if some machine 223 00:11:09,230 --> 00:11:13,040 in the middle, some adversary, can sniff this packet and see what's 224 00:11:13,040 --> 00:11:14,210 going on inside. 225 00:11:14,210 --> 00:11:16,760 Now there's other threats similar in spirit. 226 00:11:16,760 --> 00:11:19,340 When you request pages on the internet, you don't sometimes 227 00:11:19,340 --> 00:11:21,917 just search for information, like cats, which I don't really 228 00:11:21,917 --> 00:11:24,500 care about if people know, since everyone else on the internet 229 00:11:24,500 --> 00:11:25,417 is searching for cats. 230 00:11:25,417 --> 00:11:28,640 But what if I'm trying to check out on some website, like Amazon, 231 00:11:28,640 --> 00:11:30,470 and buy something with a credit card? 232 00:11:30,470 --> 00:11:32,900 Well, then what is inside of this envelope 233 00:11:32,900 --> 00:11:34,880 is some text that looks fairly similar. 234 00:11:34,880 --> 00:11:37,610 We still have a host in this story of example.com. 235 00:11:37,610 --> 00:11:39,200 I'm still using HTTP. 236 00:11:39,200 --> 00:11:41,280 I'm not getting information, per se. 237 00:11:41,280 --> 00:11:46,160 I'm posting information, akin to uploading my credit card to the server. 238 00:11:46,160 --> 00:11:49,820 And in particular, down here is a representative 239 00:11:49,820 --> 00:11:53,390 of how my credit card might be stored inside of this virtual envelope. 240 00:11:53,390 --> 00:11:56,210 And if I highlight that, indeed, you'll see a credit card 241 00:11:56,210 --> 00:11:59,450 number that, hopefully, doesn't actually work, but it is the right length. 242 00:11:59,450 --> 00:12:02,360 If anyone sniffs this packet, they might actually 243 00:12:02,360 --> 00:12:05,750 be able to find my credit card, and maybe my name, and my address, 244 00:12:05,750 --> 00:12:09,260 and the little code that you need, and more with respect to whatever it 245 00:12:09,260 --> 00:12:10,520 is I'm checking out for. 246 00:12:10,520 --> 00:12:14,570 So it's that easy if the data is itself not encrypted. 247 00:12:14,570 --> 00:12:19,670 Let me pause here then too and see if there are any questions. 248 00:12:19,670 --> 00:12:23,750 AUDIENCE: If someone is performing an attack from a machine in the middle, 249 00:12:23,750 --> 00:12:26,240 does the person actually have to be connected 250 00:12:26,240 --> 00:12:28,730 to the same Wi-Fi network as you? 251 00:12:28,730 --> 00:12:31,020 Or they could be on a whole other network? 252 00:12:31,020 --> 00:12:32,520 DAVID MALAN: A really good question. 253 00:12:32,520 --> 00:12:37,520 So in general, they would be connected to the same Wi-Fi network so that they 254 00:12:37,520 --> 00:12:38,498 are-- 255 00:12:38,498 --> 00:12:41,040 generally, they would be connected to the same Wi-Fi network, 256 00:12:41,040 --> 00:12:42,420 but even that is not necessary. 257 00:12:42,420 --> 00:12:45,740 So long as they are within a reasonable proximity to you, 258 00:12:45,740 --> 00:12:48,530 and their laptop or their device has an antenna that 259 00:12:48,530 --> 00:12:51,500 can receive all of the wireless packets that are around you, 260 00:12:51,500 --> 00:12:54,810 they don't necessarily have to have access to that same network, 261 00:12:54,810 --> 00:12:56,447 especially if it's unencrypted. 262 00:12:56,447 --> 00:12:58,280 And, in fact, there exists software that can 263 00:12:58,280 --> 00:13:01,230 listen to all possible networks that are around you. 264 00:13:01,230 --> 00:13:05,260 And so that, too, is a potential threat. 265 00:13:05,260 --> 00:13:07,655 Other questions on this here. 266 00:13:07,655 --> 00:13:10,030 AUDIENCE: Do people have to know your IP address in order 267 00:13:10,030 --> 00:13:14,020 to be able to see what you're doing or read what websites you're going to? 268 00:13:14,020 --> 00:13:15,520 DAVID MALAN: A really good question. 269 00:13:15,520 --> 00:13:18,970 For those unfamiliar, an IP address is a unique identifier 270 00:13:18,970 --> 00:13:21,190 that every computer on the internet has, much 271 00:13:21,190 --> 00:13:23,980 like you have a postal address to which humans can send mail. 272 00:13:23,980 --> 00:13:25,810 Short answer, [? Mahal, ?] is no. 273 00:13:25,810 --> 00:13:28,600 Someone does not need to know your IP address in advance 274 00:13:28,600 --> 00:13:32,440 for at least these wireless attacks, because they can simply, 275 00:13:32,440 --> 00:13:37,120 as per my other response, listen to all of the wireless traffic nearby. 276 00:13:37,120 --> 00:13:41,800 And they can actually see the IP addresses of senders and receivers 277 00:13:41,800 --> 00:13:45,700 flying by, so to speak, throughout the air. 278 00:13:45,700 --> 00:13:48,220 So what's another threat we should be mindful of? 279 00:13:48,220 --> 00:13:51,760 Well, it turns out that most any time you visit a website nowadays, 280 00:13:51,760 --> 00:13:54,410 one or more cookies are installed in your computer. 281 00:13:54,410 --> 00:13:55,550 Now what do I mean by that? 282 00:13:55,550 --> 00:13:58,930 When you actually visit a website for the first time, particularly one 283 00:13:58,930 --> 00:14:00,640 that you need to log into, and therefore, 284 00:14:00,640 --> 00:14:04,420 that needs to remember you when you click, click, click on different pages, 285 00:14:04,420 --> 00:14:06,220 for instance, to access different emails, 286 00:14:06,220 --> 00:14:09,760 add different things to your shopping cart, what the server actually does 287 00:14:09,760 --> 00:14:11,300 is a little something like this. 288 00:14:11,300 --> 00:14:14,020 The server responds to your request, for instance 289 00:14:14,020 --> 00:14:18,460 after logging in, with an HTTP response that first says, 290 00:14:18,460 --> 00:14:20,650 200, which is code for OK. 291 00:14:20,650 --> 00:14:22,840 It's a so-called status code, similar in spirit 292 00:14:22,840 --> 00:14:25,670 to the 404 you might have seen in the real world. 293 00:14:25,670 --> 00:14:26,830 But 200 means OK. 294 00:14:26,830 --> 00:14:29,440 And then it additionally sends this line of text inside 295 00:14:29,440 --> 00:14:31,180 of a virtual envelope that gets sent back 296 00:14:31,180 --> 00:14:36,340 to you-- set dash cookie, colon, and then this key value pair, so to speak. 297 00:14:36,340 --> 00:14:39,700 A word like session, then an equal sign, then a value. 298 00:14:39,700 --> 00:14:41,743 Now in practice, the value is actually pretty big 299 00:14:41,743 --> 00:14:44,410 and random with numbers and letters, but I pick something easier 300 00:14:44,410 --> 00:14:46,070 to pronounce for today's purposes-- 301 00:14:46,070 --> 00:14:48,100 1234abcd. 302 00:14:48,100 --> 00:14:50,590 And what this is similar to, actually, is 303 00:14:50,590 --> 00:14:54,820 as though when you visit this website, your hand is being stamped. 304 00:14:54,820 --> 00:14:59,170 Right after you've logged in, the server is now sending you, browser, 305 00:14:59,170 --> 00:15:02,980 this piece of message to say the equivalent of your hand 306 00:15:02,980 --> 00:15:04,000 has now been stamped. 307 00:15:04,000 --> 00:15:06,880 And so the next time you click on a link on that same website, 308 00:15:06,880 --> 00:15:11,710 it's as though you present this hand stamp again, and again, and again, 309 00:15:11,710 --> 00:15:15,280 instead of having to input your username and password again, 310 00:15:15,280 --> 00:15:16,790 and again, and again. 311 00:15:16,790 --> 00:15:21,190 This is a more seamless way, supported by HTTP, to just remind the server, 312 00:15:21,190 --> 00:15:23,560 I'm still David, I'm still David, I'm still David, 313 00:15:23,560 --> 00:15:25,840 by having virtually stamped my hand. 314 00:15:25,840 --> 00:15:29,450 And that's implemented by way of this cookie, so to speak. 315 00:15:29,450 --> 00:15:31,630 So the cookie is exactly what's in yellow here. 316 00:15:31,630 --> 00:15:36,100 And a session is just a concept that refers to the ability of a server 317 00:15:36,100 --> 00:15:37,900 to remember who you are. 318 00:15:37,900 --> 00:15:40,655 It's like your shopping session, in this context, 319 00:15:40,655 --> 00:15:43,780 of a website like amazon.com, where you might have a shopping cart that you 320 00:15:43,780 --> 00:15:45,100 want to keep adding things to. 321 00:15:45,100 --> 00:15:47,452 The website wants to remember what's in your cart. 322 00:15:47,452 --> 00:15:49,660 Therefore, the website needs to remember who you are. 323 00:15:49,660 --> 00:15:52,060 Therefore, the website is going to check your hand stamp, 324 00:15:52,060 --> 00:15:54,580 much like an amusement park, a bar, or club 325 00:15:54,580 --> 00:15:58,360 might once you've already shown them your ticket or your ID. 326 00:15:58,360 --> 00:16:02,500 Now, subsequently, when your browser visits the same website 327 00:16:02,500 --> 00:16:05,470 again and again, it, of course, doesn't have a hand to show like this, 328 00:16:05,470 --> 00:16:08,830 so rather, it sends its own message via HTTP, 329 00:16:08,830 --> 00:16:11,320 inside of its own virtual envelopes back to the server. 330 00:16:11,320 --> 00:16:14,230 every time you click another link, add something to your cart, 331 00:16:14,230 --> 00:16:17,300 open a new email on the site into which you've logged in. 332 00:16:17,300 --> 00:16:20,380 So here I'm just getting the home page of this server. 333 00:16:20,380 --> 00:16:23,170 And then I'm sending not set cookie, but cookie. 334 00:16:23,170 --> 00:16:25,780 This is the textual equivalent in HTTP of my 335 00:16:25,780 --> 00:16:27,700 presenting my hand with that stamp. 336 00:16:27,700 --> 00:16:30,160 I'm sending the exact same value as before-- 337 00:16:30,160 --> 00:16:32,980 session equals 1234abcd. 338 00:16:32,980 --> 00:16:35,290 That's how the server knows at this moment in time 339 00:16:35,290 --> 00:16:38,180 that this is me, David, and not you, for instance, 340 00:16:38,180 --> 00:16:40,690 even if you have logged in separately on your computer. 341 00:16:40,690 --> 00:16:45,580 Because you on your computer would have a different value for this cookie. 342 00:16:45,580 --> 00:16:48,910 Now this cookie might be generally stored in memory temporarily 343 00:16:48,910 --> 00:16:52,330 or it might actually be installed longer term on your computer, for a day, 344 00:16:52,330 --> 00:16:56,180 for a week, a year, depending on how the server's been configured. 345 00:16:56,180 --> 00:16:58,900 But the catch with these cookies, even though they 346 00:16:58,900 --> 00:17:02,860 do solve a very useful problem of retaining state, that 347 00:17:02,860 --> 00:17:06,579 is remembering who you are, they make you vulnerable, potentially, 348 00:17:06,579 --> 00:17:09,940 to what's called session hijacking, at least if you're not 349 00:17:09,940 --> 00:17:13,900 using encryption, that is HTTPS. 350 00:17:13,900 --> 00:17:17,740 Because if you're using HTTP, all of the contents of those envelopes, 351 00:17:17,740 --> 00:17:20,530 including the set cookie line and the cookie line 352 00:17:20,530 --> 00:17:23,140 are just being sent back and forth in the clear, 353 00:17:23,140 --> 00:17:25,390 without any encryption at all. 354 00:17:25,390 --> 00:17:27,050 Now what's the implication of that? 355 00:17:27,050 --> 00:17:29,050 Well, if an adversary is somehow listening 356 00:17:29,050 --> 00:17:32,200 in on your internet traffic, wirelessly or via wires, 357 00:17:32,200 --> 00:17:36,590 and they see your unencrypted HTTP traffic going back and forth, 358 00:17:36,590 --> 00:17:39,310 and they see in this traffic, inside this virtual envelope-- 359 00:17:39,310 --> 00:17:44,530 oh, David's session cookie happens to be 1234abcd, 360 00:17:44,530 --> 00:17:47,170 there's nothing technically stopping an adversary 361 00:17:47,170 --> 00:17:52,870 now from sending its own request, like this, to that same server, 362 00:17:52,870 --> 00:17:57,040 but copying your cookie, and essentially pretending 363 00:17:57,040 --> 00:18:00,280 that cookie is the adversary's and not just mine. 364 00:18:00,280 --> 00:18:04,870 The implication, therefore, logically, is that when I visit that website, 365 00:18:04,870 --> 00:18:06,160 I might still be logged in. 366 00:18:06,160 --> 00:18:09,520 But when the adversary visits that website by using this technique, 367 00:18:09,520 --> 00:18:13,150 they might be logged in too, but as me. 368 00:18:13,150 --> 00:18:16,720 So here too, the solution really is just to ensure 369 00:18:16,720 --> 00:18:19,960 you're using an encryption, namely HTTPS in this context, 370 00:18:19,960 --> 00:18:23,992 to ensure that not only are all of the contents of these message encrypted 371 00:18:23,992 --> 00:18:25,700 that you care about going back and forth, 372 00:18:25,700 --> 00:18:29,230 so are these lower-level details that you might not have even known about, 373 00:18:29,230 --> 00:18:32,540 namely these cookies that are going back and forth. 374 00:18:32,540 --> 00:18:34,960 Now what is HTTPS doing for us? 375 00:18:34,960 --> 00:18:37,930 Well, it's ensuring that the connection between Alice and Bob 376 00:18:37,930 --> 00:18:40,300 is completely encrypted, that is scrambled. 377 00:18:40,300 --> 00:18:43,810 So that even if there are other machines in between Alice and Bob, 378 00:18:43,810 --> 00:18:45,970 as there would be on the internet, none of them 379 00:18:45,970 --> 00:18:49,510 should be able to see what is inside of those virtual envelopes 380 00:18:49,510 --> 00:18:50,485 going back and forth. 381 00:18:50,485 --> 00:18:52,360 In fact, that's a good way to think about it. 382 00:18:52,360 --> 00:18:55,780 Recall that as Alice sends one of these virtual envelopes to Bob, 383 00:18:55,780 --> 00:18:58,330 and Bob might send a virtual envelope back to Alice, 384 00:18:58,330 --> 00:19:01,840 there may very well be identifiable information on the outside, 385 00:19:01,840 --> 00:19:04,990 like the IP address of Bob or the IP address of Alice. 386 00:19:04,990 --> 00:19:09,880 But HTTPS ensures that what's inside of the envelope is, indeed, encrypted. 387 00:19:09,880 --> 00:19:11,890 So that even if some other machine in the middle 388 00:19:11,890 --> 00:19:13,930 intercepts one of these virtual envelopes, 389 00:19:13,930 --> 00:19:17,430 they can't understand what's inside of it. 390 00:19:17,430 --> 00:19:19,020 Now how is that done? 391 00:19:19,020 --> 00:19:21,840 Well, it turns out that there's another protocol 392 00:19:21,840 --> 00:19:27,180 in the world that handles precisely that process of encrypting HTTP traffic, 393 00:19:27,180 --> 00:19:27,810 a.k.a. 394 00:19:27,810 --> 00:19:28,827 HTTPS. 395 00:19:28,827 --> 00:19:30,660 And the most recent version of this protocol 396 00:19:30,660 --> 00:19:33,630 is called TLS, which funny enough is perhaps 397 00:19:33,630 --> 00:19:36,210 an acronym that many people have still not heard of. 398 00:19:36,210 --> 00:19:39,570 But you might have heard of SSL, which is essentially 399 00:19:39,570 --> 00:19:40,980 an earlier version of it. 400 00:19:40,980 --> 00:19:44,490 But TLS is essentially the new and improved version of SSL 401 00:19:44,490 --> 00:19:47,310 and what modern browsers should now be using. 402 00:19:47,310 --> 00:19:51,030 What does this do to go about encrypting your traffic? 403 00:19:51,030 --> 00:19:53,400 Well, it turns out it relies on our focus 404 00:19:53,400 --> 00:19:56,770 from last time of public key cryptography. 405 00:19:56,770 --> 00:20:00,120 And this principle that if you give two parties, A and B, 406 00:20:00,120 --> 00:20:03,993 each their own public and private key, using that, 407 00:20:03,993 --> 00:20:05,910 you can solve that chicken and the egg problem 408 00:20:05,910 --> 00:20:09,330 and actually communicate securely, even if in advance you 409 00:20:09,330 --> 00:20:11,250 don't have a shared secret. 410 00:20:11,250 --> 00:20:15,030 So recall that asymmetric cryptography allows 411 00:20:15,030 --> 00:20:18,480 us to establish a secure connection even if you've never visited 412 00:20:18,480 --> 00:20:21,370 some website or some app before now. 413 00:20:21,370 --> 00:20:26,520 So what does it mean for a browser to be using TLS, and thus, 414 00:20:26,520 --> 00:20:30,220 HTTPS to communicate securely with a web server? 415 00:20:30,220 --> 00:20:33,750 Well, the web server in this story now has what we'll call a certificate, 416 00:20:33,750 --> 00:20:35,040 a digital certificate. 417 00:20:35,040 --> 00:20:38,580 And you can think of this, for now, as really a public key 418 00:20:38,580 --> 00:20:41,830 that has been signed by someone else. 419 00:20:41,830 --> 00:20:44,130 So the website has a public key and a private key. 420 00:20:44,130 --> 00:20:46,560 And the private key, as always, stays private. 421 00:20:46,560 --> 00:20:50,010 But in this case, the web server has a public key 422 00:20:50,010 --> 00:20:52,740 that's also been digitally signed by some third party. 423 00:20:52,740 --> 00:20:56,040 And for now, let's assume that there are some big third parties out there, 424 00:20:56,040 --> 00:20:59,940 companies really, that we all or, at least the browser manufacturers, 425 00:20:59,940 --> 00:21:04,320 the Google's, the Microsofts, the Apples of the world, all trust on our behalf. 426 00:21:04,320 --> 00:21:06,840 And they are the ones signing off on the legitimacy 427 00:21:06,840 --> 00:21:08,590 of these so-called certificates. 428 00:21:08,590 --> 00:21:12,240 Well, these certificates technically, are of a type called X.509, 429 00:21:12,240 --> 00:21:14,820 if you're curious about the type of protocol being used. 430 00:21:14,820 --> 00:21:18,360 But this is just the standard format in which these certificates live. 431 00:21:18,360 --> 00:21:20,910 But you can think of a certificate really as almost a printed 432 00:21:20,910 --> 00:21:23,580 piece of paper with some interesting information on it, 433 00:21:23,580 --> 00:21:27,930 like the name of the website, and how long the certificate is valid for, 434 00:21:27,930 --> 00:21:30,870 and also the public key, which we said last time 435 00:21:30,870 --> 00:21:34,380 is really just a big number that has a mathematical relationship 436 00:21:34,380 --> 00:21:36,640 with that private key as well. 437 00:21:36,640 --> 00:21:40,110 Now who are these big players, these big companies that 438 00:21:40,110 --> 00:21:43,463 are doing the signing of these website's certificates? 439 00:21:43,463 --> 00:21:45,630 Well, you might have heard this phrase at some point 440 00:21:45,630 --> 00:21:47,610 if you set up your own website, perhaps. 441 00:21:47,610 --> 00:21:51,300 These are called Certificate Authorities, or CAs. 442 00:21:51,300 --> 00:21:53,940 And these are a-- 443 00:21:53,940 --> 00:21:57,840 these are a collection of companies and entities whose purpose in life 444 00:21:57,840 --> 00:21:59,940 is to digitally sign certificates. 445 00:21:59,940 --> 00:22:02,670 And the various browser manufacturers of the world-- 446 00:22:02,670 --> 00:22:05,760 Apple, Microsoft, Google, Mozilla, and others 447 00:22:05,760 --> 00:22:09,810 have gotten together and included in their browsers, 448 00:22:09,810 --> 00:22:16,800 like Edge, and Firefox, and Safari, and Chrome, a list of certificate 449 00:22:16,800 --> 00:22:19,170 authorities that they trust. 450 00:22:19,170 --> 00:22:24,010 And the idea is that if you trust Apple, and you trust Google, and Microsoft, 451 00:22:24,010 --> 00:22:27,390 and Mozilla, and other browser manufacturers, then by transitivity, 452 00:22:27,390 --> 00:22:32,050 you should trust any of the certificate authorities that they, in turn, trust. 453 00:22:32,050 --> 00:22:34,710 So what actually happens when your browser visits a website? 454 00:22:34,710 --> 00:22:37,770 Well, it first downloads the certificate from that website, 455 00:22:37,770 --> 00:22:39,900 assuming you're using HTTPS. 456 00:22:39,900 --> 00:22:43,380 Your browser then calculates a hash value for that certificate 457 00:22:43,380 --> 00:22:45,810 by looking at certain fields within it, using 458 00:22:45,810 --> 00:22:48,990 a special hash function that produces a fixed 459 00:22:48,990 --> 00:22:51,250 representation of the certificate. 460 00:22:51,250 --> 00:22:53,050 Now why does it bother doing that? 461 00:22:53,050 --> 00:22:56,220 Well, the next step that your browser does is it does this. 462 00:22:56,220 --> 00:22:59,760 It takes a look at the signature on that certificate. 463 00:22:59,760 --> 00:23:05,550 It uses the certificate authority who signed that certificate's public key. 464 00:23:05,550 --> 00:23:09,480 And then it uses that CA's public key, the signature 465 00:23:09,480 --> 00:23:10,770 from that server certificate. 466 00:23:10,770 --> 00:23:15,150 It runs it through this algorithm here, effectively decrypting the signature 467 00:23:15,150 --> 00:23:16,540 with the public key. 468 00:23:16,540 --> 00:23:21,150 And that should produce the exact same hash value. 469 00:23:21,150 --> 00:23:23,940 That is to say, if you visit a server, and it's presenting you 470 00:23:23,940 --> 00:23:27,210 with a certificate, and it says that that certificate has been digitally 471 00:23:27,210 --> 00:23:30,120 signed by a certificate authority, your browser 472 00:23:30,120 --> 00:23:35,620 can use the certificate authority's public key to decrypt that signature. 473 00:23:35,620 --> 00:23:38,940 And by way of these hashes, confirm or deny 474 00:23:38,940 --> 00:23:42,750 that, yes, that server's certificate was indeed 475 00:23:42,750 --> 00:23:44,830 signed by the certificate authority. 476 00:23:44,830 --> 00:23:49,110 And, again, if you trust Google, if you trust Microsoft, Apple, Mozilla-- 477 00:23:49,110 --> 00:23:51,930 and that's another question all to itself, but if you trust them, 478 00:23:51,930 --> 00:23:54,540 and they, in turn, trust these certificate authorities, 479 00:23:54,540 --> 00:23:58,260 the presumption is that you should trust your secure connection 480 00:23:58,260 --> 00:24:00,960 with this particular website. 481 00:24:00,960 --> 00:24:07,890 Now with that said, does HTTPS, and in turn TLS, keep you secure? 482 00:24:07,890 --> 00:24:10,950 Mathematically, yes, but you and I, as the humans, 483 00:24:10,950 --> 00:24:13,260 are again, the potential weakness here. 484 00:24:13,260 --> 00:24:13,830 Why? 485 00:24:13,830 --> 00:24:18,050 There's another attack called SSL stripping, for historical reasons. 486 00:24:18,050 --> 00:24:20,360 But now it refers also to TLS. 487 00:24:20,360 --> 00:24:24,110 And what this attack involves is tricking the user 488 00:24:24,110 --> 00:24:27,290 into thinking they have a secure connection to a website, 489 00:24:27,290 --> 00:24:31,280 when they might actually have not an encrypted connection to that website. 490 00:24:31,280 --> 00:24:34,130 And worse yet, they might actually have an encrypted connection 491 00:24:34,130 --> 00:24:38,090 to a third party a machine in the middle's own website. 492 00:24:38,090 --> 00:24:39,480 So how might this work? 493 00:24:39,480 --> 00:24:43,640 Well, if you and I are in the habit of only typing still URLs 494 00:24:43,640 --> 00:24:50,900 as http://www.example.com, or maybe you are and I are in the habit of just 495 00:24:50,900 --> 00:24:56,270 typing www.//example.com, Enter, into our browsers, or maybe, more likely, 496 00:24:56,270 --> 00:24:59,750 you and I are in the habit of just typing example.com, Enter, 497 00:24:59,750 --> 00:25:00,800 in our browsers. 498 00:25:00,800 --> 00:25:03,988 Well if you watch the URL bar, the address bar in your browser, 499 00:25:03,988 --> 00:25:05,780 you've probably noticed over time that even 500 00:25:05,780 --> 00:25:08,390 if you type the most succinct of those inputs, 501 00:25:08,390 --> 00:25:13,040 it eventually gets converted into a longer URL with the HTTP, 502 00:25:13,040 --> 00:25:17,370 maybe with the www, and perhaps even more characters as well. 503 00:25:17,370 --> 00:25:19,940 And that's because your browser is just trying to be helpful. 504 00:25:19,940 --> 00:25:23,210 Technically, to visit a website you need to use a URL. 505 00:25:23,210 --> 00:25:30,050 And a URL, in this case, should start with or http:// or maybe https://. 506 00:25:30,050 --> 00:25:31,800 Your browser is just trying to be helpful. 507 00:25:31,800 --> 00:25:35,060 So if you don't even type any of those, it might first try HTTP 508 00:25:35,060 --> 00:25:38,240 and then it might try HTTPS even. 509 00:25:38,240 --> 00:25:43,400 But the catch is that if you start your interaction with a web server using 510 00:25:43,400 --> 00:25:48,260 HTTP, that alone might be enough of a window of opportunity 511 00:25:48,260 --> 00:25:50,900 for an adversary to do something malicious 512 00:25:50,900 --> 00:25:53,990 with the packet of information you're sending to the server. 513 00:25:53,990 --> 00:25:57,410 And then maybe send you back a response that tricks you 514 00:25:57,410 --> 00:26:00,770 into ending up at some other website altogether, or perhaps just 515 00:26:00,770 --> 00:26:02,720 the adversary's own website. 516 00:26:02,720 --> 00:26:05,270 And they might be super clever and make you 517 00:26:05,270 --> 00:26:08,642 think that you have a secure connection to the original website 518 00:26:08,642 --> 00:26:09,350 that you visited. 519 00:26:09,350 --> 00:26:13,040 So what do I mean by that? if inside of your browser's virtual envelope 520 00:26:13,040 --> 00:26:15,560 is an HTTP message like this, which is just saying, 521 00:26:15,560 --> 00:26:18,440 get me the home page of example.com. 522 00:26:18,440 --> 00:26:22,550 But suppose for the sake of discussion that you are not using HTTPS, 523 00:26:22,550 --> 00:26:25,820 you just typed HTTP, or nothing at all, and you're 524 00:26:25,820 --> 00:26:28,070 trusting your browser to fill this in for you, 525 00:26:28,070 --> 00:26:30,320 what might come back from the server? 526 00:26:30,320 --> 00:26:34,100 Well the server could respond with a message that says this. 527 00:26:34,100 --> 00:26:38,990 HTTP version 3 307, which is another status code, which means 528 00:26:38,990 --> 00:26:41,780 redirect the browser to a different URL. 529 00:26:41,780 --> 00:26:46,200 This is the browser's way of saying, detour, go to this other URL instead. 530 00:26:46,200 --> 00:26:48,290 Well, what location is that URL at? 531 00:26:48,290 --> 00:26:50,240 Well, perhaps this one here. 532 00:26:50,240 --> 00:26:53,870 But the catch is that if you're using HTTP, 533 00:26:53,870 --> 00:26:56,360 and therefore, your request is unencrypted, 534 00:26:56,360 --> 00:27:00,050 and suppose for this story, that there is a machine in the middle, 535 00:27:00,050 --> 00:27:04,100 waiting there, listening, to attack you, it could actually 536 00:27:04,100 --> 00:27:08,450 be that this response is coming from that machine in the middle and not 537 00:27:08,450 --> 00:27:10,770 the actual website that you intended to visit. 538 00:27:10,770 --> 00:27:11,270 Why? 539 00:27:11,270 --> 00:27:15,440 Because if Alice is trying to reach Bob, but Eve, the eavesdropper, 540 00:27:15,440 --> 00:27:18,860 is in the middle, it could actually be Eve in the middle that's 541 00:27:18,860 --> 00:27:20,430 responding with this request. 542 00:27:20,430 --> 00:27:23,460 So Bob doesn't even know what's going on in this story. 543 00:27:23,460 --> 00:27:24,780 But the catch is here. 544 00:27:24,780 --> 00:27:27,920 Notice that this eavesdropper, this machine in the middle 545 00:27:27,920 --> 00:27:29,480 is particularly clever. 546 00:27:29,480 --> 00:27:34,430 Because they're suggesting that you be redirected, per this status code, 307, 547 00:27:34,430 --> 00:27:35,690 to https://. 548 00:27:35,690 --> 00:27:38,630 And odds are, you and I are probably in the habit 549 00:27:38,630 --> 00:27:41,330 of at least making sure that our browser says secure, 550 00:27:41,330 --> 00:27:44,030 or that at least you're at an HTTPS URL. 551 00:27:44,030 --> 00:27:47,600 So you might think, whoo, good, everything is the way it should be. 552 00:27:47,600 --> 00:27:50,990 Now this is very, very subtle, but what if I 553 00:27:50,990 --> 00:27:55,730 draw your attention to the actual URL I'm being sent to here. 554 00:27:55,730 --> 00:27:59,670 At least on a US English keyboard, using this particular font, 555 00:27:59,670 --> 00:28:02,632 this is no longer example.com-- 556 00:28:02,632 --> 00:28:11,450 E-X-A-M-P-L-E-- dot-- C-O-M, This is now, indeed, E-X-A-M-P-1-E-- 557 00:28:11,450 --> 00:28:12,630 dot-- com. 558 00:28:12,630 --> 00:28:17,870 So this is a particularly subtle attack, whereby the adversary in this story 559 00:28:17,870 --> 00:28:22,040 seems to have bought a domain name that looks very similar to example.com, 560 00:28:22,040 --> 00:28:26,030 with an L, but they instead used the number 1, which in some fonts 561 00:28:26,030 --> 00:28:29,000 and on some screens look so close to an L that are you 562 00:28:29,000 --> 00:28:31,340 or I really going to even notice this difference? 563 00:28:31,340 --> 00:28:33,620 The implication, though, of this subtlety 564 00:28:33,620 --> 00:28:38,780 is that you might very well have a perfectly encrypted connection using 565 00:28:38,780 --> 00:28:43,220 HTTPS to a server, but it's not Bob's server in this story, 566 00:28:43,220 --> 00:28:46,010 it's now Eve's server in the middle. 567 00:28:46,010 --> 00:28:49,460 So SSL stripping in this case refers to an attack, 568 00:28:49,460 --> 00:28:53,180 whereby you're sort of stripping out what would be an HTTP 569 00:28:53,180 --> 00:28:57,530 redirection to the right place, and maybe you never even end up at HTTPS, 570 00:28:57,530 --> 00:29:01,610 and the eavesdropper in the middle always keeps it as HTTP. 571 00:29:01,610 --> 00:29:04,700 But an even more malicious adversary might take it one step further 572 00:29:04,700 --> 00:29:08,390 and actually redirect you to their own HTTPS site. 573 00:29:08,390 --> 00:29:11,060 At which point, you might be vulnerable to the phishing attacks 574 00:29:11,060 --> 00:29:13,727 that we've discussed before, where you might provide information 575 00:29:13,727 --> 00:29:17,740 into a website that is not actually legitimate. 576 00:29:17,740 --> 00:29:20,890 So how can you mitigate this kind of attack? 577 00:29:20,890 --> 00:29:23,980 Well, one, if you're a user, a consumer, you 578 00:29:23,980 --> 00:29:30,460 could just get into the habit of always typing out HTTPS and then 579 00:29:30,460 --> 00:29:32,260 the domain name that you want to visit. 580 00:29:32,260 --> 00:29:34,990 I will concede that can get tedious quickly, 581 00:29:34,990 --> 00:29:40,540 but that is, hands down, the most paranoid solution to implement here. 582 00:29:40,540 --> 00:29:43,300 Because you know you will end up at HTTPS. 583 00:29:43,300 --> 00:29:45,670 Hopefully, the website actually supports HTTPS, 584 00:29:45,670 --> 00:29:48,380 but most websites nowadays certainly do. 585 00:29:48,380 --> 00:29:51,280 With that said, if you're on the flipside of the story, 586 00:29:51,280 --> 00:29:55,180 and you're actually the designer of the website, the business owner running 587 00:29:55,180 --> 00:29:59,200 the website, or you have control over not the browser, but the server, 588 00:29:59,200 --> 00:30:01,790 there's a few different things that you can do. 589 00:30:01,790 --> 00:30:04,210 And, in fact, you can use a protocol called 590 00:30:04,210 --> 00:30:10,360 HSTS, which is to say you can actually configure your server to provide hints 591 00:30:10,360 --> 00:30:14,530 to browsers that you know what, they should always use HTTPS 592 00:30:14,530 --> 00:30:18,460 when talking to the server no matter what the human has decided. 593 00:30:18,460 --> 00:30:23,470 And HSTS here, for Hypertext Strict Transport Security, 594 00:30:23,470 --> 00:30:26,410 has you, as the administrator of the server 595 00:30:26,410 --> 00:30:30,220 just configure your server to send an additional HTTP header. 596 00:30:30,220 --> 00:30:33,070 That is to say, an additional line of text inside one 597 00:30:33,070 --> 00:30:37,810 of those virtual envelopes that just informs the server that we really 598 00:30:37,810 --> 00:30:40,450 want to be strict about our transport security. 599 00:30:40,450 --> 00:30:42,550 That is we want to be using TLS. 600 00:30:42,550 --> 00:30:45,100 We want the user to be using HTTPS. 601 00:30:45,100 --> 00:30:49,420 And here, the server is telling the browser, assume that this is the case, 602 00:30:49,420 --> 00:30:52,810 that I want you to use strict security for at least one year. 603 00:30:52,810 --> 00:30:57,490 This is the number of seconds in a 365-day year, for instance. 604 00:30:57,490 --> 00:30:59,573 That's just a really long time telling the browser 605 00:30:59,573 --> 00:31:02,490 that, yes, I'm going to keep my security on for at least a year, which 606 00:31:02,490 --> 00:31:03,580 should be the case anyway. 607 00:31:03,580 --> 00:31:07,990 But you can further configure your server to not just output this response 608 00:31:07,990 --> 00:31:11,140 in every one of those virtual envelopes, going back to browsers. 609 00:31:11,140 --> 00:31:16,730 You can even more protectively say use strict security for subdomains as well. 610 00:31:16,730 --> 00:31:20,350 So even if my user is at example.com, also 611 00:31:20,350 --> 00:31:23,710 make sure that their browser uses HTTPS for something 612 00:31:23,710 --> 00:31:27,950 like www.example.com, where in that scenario www, 613 00:31:27,950 --> 00:31:30,640 you can think of as a subdomain, because it's 614 00:31:30,640 --> 00:31:33,710 part of the example.com domain itself. 615 00:31:33,710 --> 00:31:36,700 And so what is this telling the browser specifically? 616 00:31:36,700 --> 00:31:40,990 Even though you might accidentally, conveniently visit a website 617 00:31:40,990 --> 00:31:45,040 for the very first time using http:// because you typed it, 618 00:31:45,040 --> 00:31:49,300 or you let your browser automatically fill that for you, 619 00:31:49,300 --> 00:31:51,940 if the server responds with this message, 620 00:31:51,940 --> 00:31:56,570 the whole point of HSTS is that the second time, the third time, 621 00:31:56,570 --> 00:32:01,660 the 300th time, the 3,000th time that your browser visits that exact same 622 00:32:01,660 --> 00:32:05,020 domain in the future, up until at least a year from now, 623 00:32:05,020 --> 00:32:09,580 it will automatically switch you to HTTPS. 624 00:32:09,580 --> 00:32:14,298 And it very protectively won't even let you visit the HTTP. 625 00:32:14,298 --> 00:32:16,840 Even if that's what's in the URL bar, it's just going to say, 626 00:32:16,840 --> 00:32:20,420 nope, I've been told to use HTTPS instead. 627 00:32:20,420 --> 00:32:23,950 And that, therefore, decreases the window of opportunity for adversaries 628 00:32:23,950 --> 00:32:28,840 to just that very first request that you might make accidentally, conveniently 629 00:32:28,840 --> 00:32:32,200 that's using HTTP, but every subsequent request 630 00:32:32,200 --> 00:32:37,550 from your browser, according to this model will now be HTTPS instead. 631 00:32:37,550 --> 00:32:39,400 And you can go one step further too. 632 00:32:39,400 --> 00:32:41,290 If you're the administrator of a server, you 633 00:32:41,290 --> 00:32:45,700 can also include the keyword preload in this message that's inside 634 00:32:45,700 --> 00:32:47,620 of all of these virtual envelopes. 635 00:32:47,620 --> 00:32:50,680 And what that will further tell the world is 636 00:32:50,680 --> 00:32:53,590 that if the browser manufacturers would like 637 00:32:53,590 --> 00:32:58,780 to preload this information into Chrome, into other browsers 638 00:32:58,780 --> 00:33:01,540 that humans download, you can even eliminate 639 00:33:01,540 --> 00:33:05,410 that first window of opportunity because you can have your domain name 640 00:33:05,410 --> 00:33:09,220 included, essentially, in the source code for browsers like Chrome. 641 00:33:09,220 --> 00:33:13,570 So that when people download Chrome itself, visit your website, 642 00:33:13,570 --> 00:33:16,300 like example.com, your browser will already 643 00:33:16,300 --> 00:33:22,700 know that they should not use HTTP with this website, they should use HTTPS. 644 00:33:22,700 --> 00:33:28,910 Questions then on HTTP, on TLS, on HSTS. 645 00:33:28,910 --> 00:33:32,690 It's a lot of acronyms, but realize that some of these defenses 646 00:33:32,690 --> 00:33:35,960 are available to you as a user, and some of you, if more technical, 647 00:33:35,960 --> 00:33:39,240 these defenses are available to you as system administrator. 648 00:33:39,240 --> 00:33:44,660 AUDIENCE: So you mentioned that there is such websites as exam1e.com, 649 00:33:44,660 --> 00:33:47,540 they very similarly look to example.com. 650 00:33:47,540 --> 00:33:52,070 And I know that registrars fight against phishing websites like this. 651 00:33:52,070 --> 00:33:54,920 And whenever somebody tries to register a domain that's 652 00:33:54,920 --> 00:33:58,760 looking a little bit suspicious, it usually marks it as fraud 653 00:33:58,760 --> 00:34:00,320 and suspends it. 654 00:34:00,320 --> 00:34:04,190 So how do such domains still exist? 655 00:34:04,190 --> 00:34:06,590 And why is it so common? 656 00:34:06,590 --> 00:34:08,550 DAVID MALAN: That's a really good question. 657 00:34:08,550 --> 00:34:12,380 And that's great that registrars, who are the companies in the world that 658 00:34:12,380 --> 00:34:15,080 sell you, or rent you, domain names nowadays, 659 00:34:15,080 --> 00:34:17,900 are being more vigilant when it comes to your buying 660 00:34:17,900 --> 00:34:20,750 a domain that could be maliciously used in this way 661 00:34:20,750 --> 00:34:25,190 because it's so similar to a brand name or an existing website. 662 00:34:25,190 --> 00:34:27,320 However, there's a lot of registrars out there, 663 00:34:27,320 --> 00:34:29,570 and I would conjecture that not all of them 664 00:34:29,570 --> 00:34:32,190 are as good as others at doing that detection. 665 00:34:32,190 --> 00:34:35,090 There are hundreds of top-level domains nowadays, 666 00:34:35,090 --> 00:34:39,920 TLDs, which means that you could even choose example-dot something else 667 00:34:39,920 --> 00:34:43,130 potentially, and that too might not be this way. 668 00:34:43,130 --> 00:34:46,489 Now, eventually, maybe, especially when you use it maliciously, 669 00:34:46,489 --> 00:34:48,179 you would eventually get shut down. 670 00:34:48,179 --> 00:34:50,719 But maybe it's enough to attack one person, 671 00:34:50,719 --> 00:34:53,760 or two, or 10, or 100 before it's actually shut down. 672 00:34:53,760 --> 00:34:57,080 So these remain theoretical and actual attacks, 673 00:34:57,080 --> 00:35:00,720 but there are certainly ways to push back on this. 674 00:35:00,720 --> 00:35:01,560 A good question. 675 00:35:01,560 --> 00:35:04,500 Others from the group. 676 00:35:04,500 --> 00:35:07,680 AUDIENCE: I would like to know that is HTTP 677 00:35:07,680 --> 00:35:13,680 is the best solution for cookies and super cookies to prevent from attack? 678 00:35:13,680 --> 00:35:18,512 Or should I clear the cookies frequently? 679 00:35:18,512 --> 00:35:19,720 DAVID MALAN: A good question. 680 00:35:19,720 --> 00:35:25,590 Unfortunately, super cookies cannot be stopped at the browser level. 681 00:35:25,590 --> 00:35:28,200 Super cookies refer to a type of cookie that's 682 00:35:28,200 --> 00:35:31,770 embedded by your company, your university, your internet service 683 00:35:31,770 --> 00:35:32,460 provider. 684 00:35:32,460 --> 00:35:35,140 And you would have to opt out at that level. 685 00:35:35,140 --> 00:35:37,770 So for context, for Americans in the group, 686 00:35:37,770 --> 00:35:40,980 AT&T and Verizon started doing this a few years ago, 687 00:35:40,980 --> 00:35:47,790 where they were injecting cookies into cellphone customers' HTTP requests. 688 00:35:47,790 --> 00:35:51,360 You literally, and stupidly, and obnoxiously have to log 689 00:35:51,360 --> 00:35:57,480 into your Verizon.com or your AT&T.com account, via the web or the app, 690 00:35:57,480 --> 00:35:58,930 and opt out of this. 691 00:35:58,930 --> 00:36:00,720 So one thing to your question here. 692 00:36:00,720 --> 00:36:04,740 Super cookies are actually super annoying, super difficult, 693 00:36:04,740 --> 00:36:06,960 super dangerous in that sense. 694 00:36:06,960 --> 00:36:09,210 Because they can happen without your knowing. 695 00:36:09,210 --> 00:36:14,970 However, if you are using HTTPS, that should decrease 696 00:36:14,970 --> 00:36:16,500 the probability of this happening. 697 00:36:16,500 --> 00:36:19,132 Because if your data is encrypted, your company, 698 00:36:19,132 --> 00:36:21,090 your university, your internet service provider 699 00:36:21,090 --> 00:36:25,980 can't insert the cookies into those encrypted messages unless, 700 00:36:25,980 --> 00:36:29,670 and we'll talk a bit about this more later, unless you have given 701 00:36:29,670 --> 00:36:32,340 permission to your company, or university to install 702 00:36:32,340 --> 00:36:36,680 special software on your Mac or PC. 703 00:36:36,680 --> 00:36:40,470 So in short, simplest advice is always use HTTPS. 704 00:36:40,470 --> 00:36:44,030 And if you have a cellphone provider, google around, 705 00:36:44,030 --> 00:36:46,950 find out if they might be doing this to you. 706 00:36:46,950 --> 00:36:49,490 And if so, figure out if you can opt out. 707 00:36:49,490 --> 00:36:51,800 How about one more question here? 708 00:36:51,800 --> 00:36:54,770 AUDIENCE: The question will, be from a macro perspective, 709 00:36:54,770 --> 00:36:56,750 will it be feasible in the same way that there 710 00:36:56,750 --> 00:36:59,540 are machines in the middle of [INAUDIBLE] 711 00:36:59,540 --> 00:37:04,550 in the middle verifications, like a request from, let's say, 712 00:37:04,550 --> 00:37:11,250 my cellphone to the [INAUDIBLE] needs to do so many hops to reach the server, 713 00:37:11,250 --> 00:37:11,750 right? 714 00:37:11,750 --> 00:37:17,000 Will it be [INAUDIBLE] between each hop, every packet 715 00:37:17,000 --> 00:37:22,030 will get a stamp, like a passport, to verify the integrity of the connection? 716 00:37:22,030 --> 00:37:25,280 DAVID MALAN: Potentially, the catch is-- and if I'm understanding the question 717 00:37:25,280 --> 00:37:30,020 correctly, the catch is you, I, we don't control all of these machines 718 00:37:30,020 --> 00:37:31,800 in the middle when it comes to routers. 719 00:37:31,800 --> 00:37:33,560 So it's possible to do what you're doing, 720 00:37:33,560 --> 00:37:35,730 but there just isn't coordination at that level. 721 00:37:35,730 --> 00:37:40,410 And so per our discussion last week of end to end encryption, in general, 722 00:37:40,410 --> 00:37:44,330 it's best that the sender and receiver worry about doing the encryption. 723 00:37:44,330 --> 00:37:47,510 Because that way you don't have to trust anyone in between you, 724 00:37:47,510 --> 00:37:53,960 any machines in the middle, so long as you are using a protocol that supports 725 00:37:53,960 --> 00:37:56,310 some form of encryption end to end. 726 00:37:56,310 --> 00:37:59,263 So how else can you keep your system secure, particularly when 727 00:37:59,263 --> 00:38:00,680 they're communicating with others? 728 00:38:00,680 --> 00:38:03,470 Well, another technology with which you might already be familiar 729 00:38:03,470 --> 00:38:06,630 is this, a VPN, or a Virtual Private Network. 730 00:38:06,630 --> 00:38:12,530 Now whereas HTTPS only secures your web traffic between browser and server, 731 00:38:12,530 --> 00:38:15,380 a VPN is, dare say, a more powerful technology 732 00:38:15,380 --> 00:38:19,190 because it encrypts all of your internet traffic between you 733 00:38:19,190 --> 00:38:22,080 and whatever VPN server to which you're connecting. 734 00:38:22,080 --> 00:38:23,240 So how does this work? 735 00:38:23,240 --> 00:38:28,910 A VPN allows Alice and Bob to establish an encrypted channel 736 00:38:28,910 --> 00:38:32,960 that even if there are machines in the middle, routers or otherwise, 737 00:38:32,960 --> 00:38:34,010 that shouldn't matter. 738 00:38:34,010 --> 00:38:36,380 Because Alice and Bob are using cryptography 739 00:38:36,380 --> 00:38:42,020 to encrypt all of the information going in between points A and B. Now 740 00:38:42,020 --> 00:38:43,110 how much do you use this? 741 00:38:43,110 --> 00:38:45,110 Well, it's very common if you work for a company 742 00:38:45,110 --> 00:38:47,480 that itself has servers that you might need access 743 00:38:47,480 --> 00:38:51,560 to, whether it's email, or files, or anything else, you might, 744 00:38:51,560 --> 00:38:54,980 from your laptop, have to, by policy at that company, 745 00:38:54,980 --> 00:38:59,090 connect to your company servers via VPN, Virtual Private Network. 746 00:38:59,090 --> 00:39:02,120 That is to say your Mac, or PC, or your phone 747 00:39:02,120 --> 00:39:05,420 have special software that you start up, you probably 748 00:39:05,420 --> 00:39:08,360 log into using minimally a username and a password, 749 00:39:08,360 --> 00:39:12,865 maybe using a two-factor code, maybe using a USB device that you 750 00:39:12,865 --> 00:39:14,240 have to connect to your computer. 751 00:39:14,240 --> 00:39:16,820 You have to somehow authenticate to that VPN. 752 00:39:16,820 --> 00:39:19,580 And once that software is up and running and authenticated 753 00:39:19,580 --> 00:39:23,390 against the VPN server, run in this case by your company. 754 00:39:23,390 --> 00:39:25,790 All of your internet traffic, thereafter, 755 00:39:25,790 --> 00:39:30,080 should be encrypted between you, point A, and the company, 756 00:39:30,080 --> 00:39:33,590 point B. The motivation for that is that this way the company 757 00:39:33,590 --> 00:39:37,340 can ensure that no matter what services you are accessing inside 758 00:39:37,340 --> 00:39:42,410 of the corporate network, be it email, or files, or maybe video conferencing, 759 00:39:42,410 --> 00:39:46,700 or something else all together, no matter what, by nature of that VPN, 760 00:39:46,700 --> 00:39:49,610 all of that traffic is encrypted. 761 00:39:49,610 --> 00:39:53,270 But you should realize too that VPNs have a few side 762 00:39:53,270 --> 00:39:55,100 effects, perhaps good, perhaps bad. 763 00:39:55,100 --> 00:39:58,580 When Alice connects to Bob, if Bob is the VPN server, 764 00:39:58,580 --> 00:40:00,740 she has this encrypted tunnel. 765 00:40:00,740 --> 00:40:03,950 This is what we mean by a private-- a virtual private network. 766 00:40:03,950 --> 00:40:07,580 She has this encrypted tunnel to Bob, which typically 767 00:40:07,580 --> 00:40:13,100 makes it appear as though Alice's IP address, her internet address, 768 00:40:13,100 --> 00:40:16,700 her unique identifier on the internet, is actually that of Bob 769 00:40:16,700 --> 00:40:17,990 and not her own. 770 00:40:17,990 --> 00:40:21,800 That is to say, if Alice connects to her company's VPN server, 771 00:40:21,800 --> 00:40:26,720 she then gets another IP address from her company's own VPN server 772 00:40:26,720 --> 00:40:27,480 effectively. 773 00:40:27,480 --> 00:40:32,900 So if Alice then visits gmail.com, or amazon.com, or any other website, 774 00:40:32,900 --> 00:40:36,110 those websites will actually think that Alice's IP 775 00:40:36,110 --> 00:40:39,440 address is that of Bob and not Alice. 776 00:40:39,440 --> 00:40:42,620 So this is very commonly used if you're in one country, 777 00:40:42,620 --> 00:40:45,380 and you want to masquerade as though you're in another. 778 00:40:45,380 --> 00:40:48,830 And this might be because you need to in order to access company resources. 779 00:40:48,830 --> 00:40:50,622 Perhaps, from a show of smiles, this might 780 00:40:50,622 --> 00:40:52,580 be because you want to access a streaming media 781 00:40:52,580 --> 00:40:56,510 service that you don't have access to when you're in one country or another. 782 00:40:56,510 --> 00:41:00,470 The point, though, is that you have this encrypted connection between points A 783 00:41:00,470 --> 00:41:06,140 and B. And thereafter, you can visit any website, any service, 784 00:41:06,140 --> 00:41:10,190 as though you are physically at location B. 785 00:41:10,190 --> 00:41:12,650 Beyond that, there are other technologies 786 00:41:12,650 --> 00:41:14,450 that you can use to encrypt communications. 787 00:41:14,450 --> 00:41:19,010 And this one's a little more technical and used by programmers and system 788 00:41:19,010 --> 00:41:20,330 administrators alike. 789 00:41:20,330 --> 00:41:23,690 There's a protocol called SSH, for Secure Shell. 790 00:41:23,690 --> 00:41:26,570 And this is a technology via which you don't necessarily 791 00:41:26,570 --> 00:41:29,210 encrypt all of your traffic between point A and B, 792 00:41:29,210 --> 00:41:32,630 although you can use SSH to create the equivalent 793 00:41:32,630 --> 00:41:35,600 of a virtual private network, or VPN, but SSH 794 00:41:35,600 --> 00:41:39,980 is all about connecting to a remote server and executing commands on it. 795 00:41:39,980 --> 00:41:42,920 So not executing commands on your own machine ultimately, 796 00:41:42,920 --> 00:41:45,290 but on some other machine that's maybe inside 797 00:41:45,290 --> 00:41:48,540 of your company, your university, or somewhere else in the world. 798 00:41:48,540 --> 00:41:51,050 So, for instance, if curious as to how this works, 799 00:41:51,050 --> 00:41:55,640 you might recall that in a previous class I wrote some code on my computer. 800 00:41:55,640 --> 00:41:58,280 And then I opened up a terminal window that started 801 00:41:58,280 --> 00:41:59,790 with this prompt, a dollar sign. 802 00:41:59,790 --> 00:42:01,020 It doesn't mean currency. 803 00:42:01,020 --> 00:42:04,950 It's just a tradition that the dollar sign means type your commands here. 804 00:42:04,950 --> 00:42:08,490 But at the time, I was typing them on my laptop here on a local server, 805 00:42:08,490 --> 00:42:09,470 if you will. 806 00:42:09,470 --> 00:42:13,730 If I, though, want to use a computer to remotely connect to another 807 00:42:13,730 --> 00:42:16,550 and then remotely run a command, it might work like this. 808 00:42:16,550 --> 00:42:18,920 Here I am, let's pretend on my own computer, 809 00:42:18,920 --> 00:42:22,370 and suppose I type out one command like the date command. 810 00:42:22,370 --> 00:42:25,100 Not surprisingly, this will tell me what the current date is. 811 00:42:25,100 --> 00:42:28,430 So suppose that where I am, on my own computer, it is 812 00:42:28,430 --> 00:42:35,400 Thursday, January 1, at midnight Eastern time, in the year 1970, for instance. 813 00:42:35,400 --> 00:42:39,800 If, though, the next command I run is not date again, but I use SSH, 814 00:42:39,800 --> 00:42:43,310 and I connect, for instance, to, oh, how about our friends 815 00:42:43,310 --> 00:42:44,900 at Stanford University. 816 00:42:44,900 --> 00:42:48,830 So I'm going to SSH into stanford.edu's server. 817 00:42:48,830 --> 00:42:53,790 As soon as I'm at that server now, I get another prompt, in this case. 818 00:42:53,790 --> 00:42:56,660 But if I now run the date command, what you'll see 819 00:42:56,660 --> 00:43:00,748 is that the date is now apparently slightly in the past, at least 820 00:43:00,748 --> 00:43:03,290 if I type this quick enough so that the seconds weren't off-- 821 00:43:03,290 --> 00:43:09,290 Wednesday, December 31, 9:00 PM Pacific time, in the year, still 1969. 822 00:43:09,290 --> 00:43:13,550 So this is to say SSH is actually a very common technology used 823 00:43:13,550 --> 00:43:16,730 in the world of software engineering, system administration, when you want 824 00:43:16,730 --> 00:43:18,530 to control one server from another. 825 00:43:18,530 --> 00:43:23,180 And what's powerful about it is that everything I just typed after that SSH 826 00:43:23,180 --> 00:43:27,320 command, even as innocuous as the date command is, on Stanford's server 827 00:43:27,320 --> 00:43:28,520 would be encrypted. 828 00:43:28,520 --> 00:43:31,520 So no one between these points A and B would 829 00:43:31,520 --> 00:43:36,758 be able to know what I'm controlling or what commands I have typed. 830 00:43:36,758 --> 00:43:39,050 All right, let's go ahead and take a five-minute break. 831 00:43:39,050 --> 00:43:42,710 And when we resume, we'll look at some other building blocks of systems 832 00:43:42,710 --> 00:43:46,640 that both solve problems, but also create vulnerabilities for us as well. 833 00:43:46,640 --> 00:43:48,110 Back in a few. 834 00:43:48,110 --> 00:43:50,660 All right, let's talk about now what's actually 835 00:43:50,660 --> 00:43:53,570 been on the outside of these virtual envelopes that's 836 00:43:53,570 --> 00:43:57,750 helping these envelopes get from their source to their destination. 837 00:43:57,750 --> 00:44:00,110 So it turns out that on the outside of these envelopes, 838 00:44:00,110 --> 00:44:02,480 minimally is what we'll call a port number, which 839 00:44:02,480 --> 00:44:05,570 is literally just a unique number that the world has decided 840 00:44:05,570 --> 00:44:08,345 on that uniquely represents the type of service 841 00:44:08,345 --> 00:44:10,890 that that envelope is destined for. 842 00:44:10,890 --> 00:44:13,490 So, for instance, if you at your browser were 843 00:44:13,490 --> 00:44:19,750 going to pull up a website like http://www/example.com, 844 00:44:19,750 --> 00:44:22,520 on the outside of that envelope would not only 845 00:44:22,520 --> 00:44:27,260 be some mention of www.example.com, but also a so-called port 846 00:44:27,260 --> 00:44:31,400 number, namely 80 by convention, which means that this envelope should 847 00:44:31,400 --> 00:44:35,120 be opened not by the other servers' email server or chat server, 848 00:44:35,120 --> 00:44:37,430 but by its web server specifically. 849 00:44:37,430 --> 00:44:41,300 Now, if the web server were to respond to us by saying, 850 00:44:41,300 --> 00:44:45,830 uh-uh, we want you to use HTTPS instead, essentially 851 00:44:45,830 --> 00:44:49,190 redirecting the browser to a secure version of the website, 852 00:44:49,190 --> 00:44:51,530 my browser would then have to send a second request 853 00:44:51,530 --> 00:44:56,010 to the server, this time still mentioning www.example.com, 854 00:44:56,010 --> 00:44:59,210 but on the outside of that envelope, among other details, 855 00:44:59,210 --> 00:45:02,743 would be a different port number, namely 443. 856 00:45:02,743 --> 00:45:05,160 Now these aren't the kinds of things you have to memorize, 857 00:45:05,160 --> 00:45:07,368 but the computers certainly know what they represent. 858 00:45:07,368 --> 00:45:15,650 And, in fact, common numbers include 80 for HTTP, 443 for HTTPS, 22 for SSH, 859 00:45:15,650 --> 00:45:18,140 and hundreds, if not thousands, of other numbers 860 00:45:18,140 --> 00:45:21,410 as well, that humans decided on, but the computers actually 861 00:45:21,410 --> 00:45:24,950 rely on to know what piece of software on a computer 862 00:45:24,950 --> 00:45:28,520 should actually expect and open up these virtual envelopes, 863 00:45:28,520 --> 00:45:30,560 these things we've called packets. 864 00:45:30,560 --> 00:45:33,660 So what's the problem or danger here? 865 00:45:33,660 --> 00:45:38,300 Well, it turns out that your computer can be listening for internet traffic 866 00:45:38,300 --> 00:45:41,540 on none of these ports, in which case it's completely 867 00:45:41,540 --> 00:45:43,820 disconnected from inbound connections. 868 00:45:43,820 --> 00:45:47,870 But very often, computers, especially servers, are listening, so to speak, 869 00:45:47,870 --> 00:45:52,340 for envelopes destined for maybe port 22, maybe port 80, maybe 870 00:45:52,340 --> 00:45:54,208 port 443, maybe others as well. 871 00:45:54,208 --> 00:45:56,000 So you might think, well, that's not great, 872 00:45:56,000 --> 00:45:58,070 because if these numbers are standardized, then 873 00:45:58,070 --> 00:46:03,410 adversaries could maybe try to access my server via those port numbers. 874 00:46:03,410 --> 00:46:05,420 Because they too know what they are. 875 00:46:05,420 --> 00:46:09,380 So you might think, all right, well, let me run my web server on a number 876 00:46:09,380 --> 00:46:12,290 other than 80, or a number other than 443, 877 00:46:12,290 --> 00:46:16,310 and just choose a random number between 0 and 65,000 or so. 878 00:46:16,310 --> 00:46:19,400 Because the odds that the adversary is going to guess that are much lower. 879 00:46:19,400 --> 00:46:21,410 SSH, you might consider doing the same. 880 00:46:21,410 --> 00:46:24,050 But unfortunately, it's all too easy for adversaries 881 00:46:24,050 --> 00:46:27,230 to wage this kind of attack, known as port scanning. 882 00:46:27,230 --> 00:46:30,560 And recall that we did our own brute force 883 00:46:30,560 --> 00:46:33,800 attack in classes past on our own passwords, 884 00:46:33,800 --> 00:46:36,950 for instance, trying to figure out all possible passwords 885 00:46:36,950 --> 00:46:38,600 that might be locking a phone. 886 00:46:38,600 --> 00:46:40,820 And it's not that hard to use code very much 887 00:46:40,820 --> 00:46:45,740 like that with a loop of some sort that just tries every possible port 888 00:46:45,740 --> 00:46:49,130 number between some range, roughly 0 to 65,000. 889 00:46:49,130 --> 00:46:51,710 So port scanning refers to the equivalent 890 00:46:51,710 --> 00:46:58,100 of accessing-- knocking on the door of every possible port number on a server. 891 00:46:58,100 --> 00:47:01,040 Now most of those doors might be closed and no one might be home. 892 00:47:01,040 --> 00:47:06,090 That is they might not be expecting any traffic or visitors on those numbers. 893 00:47:06,090 --> 00:47:09,410 But by writing software that tries all of those port numbers, 894 00:47:09,410 --> 00:47:13,580 you can essentially discover services that are running on certain computers. 895 00:47:13,580 --> 00:47:16,100 Now, hopefully, that in and of itself is not a problem. 896 00:47:16,100 --> 00:47:18,710 Because most likely, the purpose of these services 897 00:47:18,710 --> 00:47:20,720 is to be on the internet. 898 00:47:20,720 --> 00:47:24,080 But hopefully, these services too are using encryption. 899 00:47:24,080 --> 00:47:26,000 Hopefully, they're using authentication. 900 00:47:26,000 --> 00:47:27,810 But that's not always the case. 901 00:47:27,810 --> 00:47:31,190 So security through obscurity, so to speak, 902 00:47:31,190 --> 00:47:35,930 running services on random or arbitrary, non-standard port numbers 903 00:47:35,930 --> 00:47:39,680 is not really a good practice unless you're additionally defending 904 00:47:39,680 --> 00:47:41,720 against all of these common attacks. 905 00:47:41,720 --> 00:47:44,900 Now in the world of port scanning, though, it's 906 00:47:44,900 --> 00:47:48,950 a specific example of what we might call penetration testing more generally. 907 00:47:48,950 --> 00:47:53,600 Penetration testing, or pen testing, is actually a skill, a technique, 908 00:47:53,600 --> 00:47:57,410 a job even, whereby you are, hopefully, not an adversary, 909 00:47:57,410 --> 00:48:00,380 but hopefully, a well paid consultant whose purpose 910 00:48:00,380 --> 00:48:05,120 in life, or whose vocation in life, is to actually try to penetrate someone's 911 00:48:05,120 --> 00:48:06,890 network or penetrate someone's system. 912 00:48:06,890 --> 00:48:08,100 And what do I mean by that? 913 00:48:08,100 --> 00:48:10,850 Well maybe, quite simply, you try scanning all of the ports 914 00:48:10,850 --> 00:48:14,600 on their servers just to see if there are ports that are open, 915 00:48:14,600 --> 00:48:16,490 that is listening that shouldn't be. 916 00:48:16,490 --> 00:48:19,760 Because no sense in opening a door if no one's meant to go through it. 917 00:48:19,760 --> 00:48:24,360 Or you might try to penetrate their network or system in some other way. 918 00:48:24,360 --> 00:48:27,600 Maybe you might try to brute force your way through passwords. 919 00:48:27,600 --> 00:48:30,500 Maybe you might try to socially engineer the employees 920 00:48:30,500 --> 00:48:31,800 of that company or the like. 921 00:48:31,800 --> 00:48:38,430 So penetration testing is all about someone who, for good purposes, 922 00:48:38,430 --> 00:48:43,120 is trying to find possible weaknesses in your infrastructure. 923 00:48:43,120 --> 00:48:46,470 So that, hopefully, you can pay them and thank them, but then 924 00:48:46,470 --> 00:48:50,310 fix those problems before actual adversaries try to exploit it 925 00:48:50,310 --> 00:48:52,450 for malicious purposes instead. 926 00:48:52,450 --> 00:48:55,980 So this might also be referred to as ethical hacking, where 927 00:48:55,980 --> 00:49:00,450 you get all of the cachet of being a hacker and really good with computers, 928 00:49:00,450 --> 00:49:02,940 but the upside of doing it ethically, which 929 00:49:02,940 --> 00:49:06,600 is to say that if you are in the business of trying to find faults 930 00:49:06,600 --> 00:49:12,960 with systems, you don't have to do it for illegal financial gain, 931 00:49:12,960 --> 00:49:17,280 but for very much legal financial gain instead, as in someone 932 00:49:17,280 --> 00:49:20,790 will pay you to find faults in their system so long as you tell them 933 00:49:20,790 --> 00:49:24,610 and only them first so that they can actually fix the same. 934 00:49:24,610 --> 00:49:27,390 And, in fact in this world there's a gamification of it 935 00:49:27,390 --> 00:49:29,400 of sorts in certain companies, where you might 936 00:49:29,400 --> 00:49:32,163 have a red team whose purpose in life in this game 937 00:49:32,163 --> 00:49:34,830 is to actually try to penetrate the network or find some faults. 938 00:49:34,830 --> 00:49:38,140 And then the blue team, so to speak, whose purpose in life in this story 939 00:49:38,140 --> 00:49:40,800 is to defend the systems against those attacks. 940 00:49:40,800 --> 00:49:44,190 And so that too has often yielded better results for some folks, 941 00:49:44,190 --> 00:49:48,240 given that it helps them find weaknesses before adversaries 942 00:49:48,240 --> 00:49:50,160 who don't work for them do. 943 00:49:50,160 --> 00:49:52,950 So how might you keep these attacks out? 944 00:49:52,950 --> 00:49:56,940 And how might you keep even penetration testing out in a good way 945 00:49:56,940 --> 00:50:00,057 to demonstrate that, you know what, we are actually pretty secure? 946 00:50:00,057 --> 00:50:01,890 Well, there's this technology with which you 947 00:50:01,890 --> 00:50:03,960 might be generally familiar by name, namely 948 00:50:03,960 --> 00:50:07,050 a firewall, which actually comes from the real world. 949 00:50:07,050 --> 00:50:09,750 Typically, in buildings that have multiple stores, 950 00:50:09,750 --> 00:50:14,040 there might literally be a firewall between two of the stores 951 00:50:14,040 --> 00:50:16,890 so that if there's a fire in one store, it doesn't somehow 952 00:50:16,890 --> 00:50:19,320 propagate next door to the other store. 953 00:50:19,320 --> 00:50:22,080 Now in the virtual world, a firewall is essentially 954 00:50:22,080 --> 00:50:27,630 a piece of software between you and the outside world or between you 955 00:50:27,630 --> 00:50:30,840 and some other network that keeps data in the network 956 00:50:30,840 --> 00:50:34,440 that you don't want to leave it and it keeps out from the network data 957 00:50:34,440 --> 00:50:36,310 that you don't want coming in. 958 00:50:36,310 --> 00:50:40,200 So, for instance, if you might within your company, or university, or home 959 00:50:40,200 --> 00:50:44,190 have some sort of local chat service, or intercom system, or the like, 960 00:50:44,190 --> 00:50:47,740 none of that traffic ideally should end up on the public internet. 961 00:50:47,740 --> 00:50:52,920 So you might want your firewall to block any intentional or accidental attempts 962 00:50:52,920 --> 00:50:55,500 to transmit that data outside the network. 963 00:50:55,500 --> 00:50:58,410 Conversely, if you have a private network that you only 964 00:50:58,410 --> 00:51:01,740 use for home computing, and watching streaming media, and the like, 965 00:51:01,740 --> 00:51:04,380 you are not yourself a server, and you certainly don't 966 00:51:04,380 --> 00:51:07,500 want random people trying to connect to your laptops, or desktops, 967 00:51:07,500 --> 00:51:10,530 or servers in your home, you might want your firewall 968 00:51:10,530 --> 00:51:13,320 to keep all internet traffic out. 969 00:51:13,320 --> 00:51:15,830 Now with that said, there are some problems 970 00:51:15,830 --> 00:51:17,580 when you want to use services where you do 971 00:51:17,580 --> 00:51:20,320 need to talk to someone on the outside world, 972 00:51:20,320 --> 00:51:22,590 maybe like a Zoom call or the like. 973 00:51:22,590 --> 00:51:25,090 But there are technologies that help mitigate this. 974 00:51:25,090 --> 00:51:27,570 so these firewalls are not necessarily absolute. 975 00:51:27,570 --> 00:51:30,460 You can open them up or poke holes in them, 976 00:51:30,460 --> 00:51:33,390 so to speak, to allow certain services through. 977 00:51:33,390 --> 00:51:35,580 So how might these firewalls actually work? 978 00:51:35,580 --> 00:51:40,530 Well, they might simply block traffic, that is internet packets, based 979 00:51:40,530 --> 00:51:41,490 on IP address. 980 00:51:41,490 --> 00:51:43,980 Because recall on the outside of those virtual envelopes 981 00:51:43,980 --> 00:51:47,070 is not just port numbers, but also-- and not quite 982 00:51:47,070 --> 00:51:51,270 the domain name, like www.example.com, on the outside of those envelopes 983 00:51:51,270 --> 00:51:56,010 is actually the unique address of a server to which you're sending a packet 984 00:51:56,010 --> 00:52:01,710 and the unique address of a client that is expecting some response thereto. 985 00:52:01,710 --> 00:52:05,730 So an IP address is just a numeric unique address for, let's say, 986 00:52:05,730 --> 00:52:07,530 every computer on the internet. 987 00:52:07,530 --> 00:52:10,950 It's a bit of a simplification, but it's very similar to the postal addresses 988 00:52:10,950 --> 00:52:14,850 that you and I use to send mail old-school style or postcards 989 00:52:14,850 --> 00:52:16,990 throughout the Postal system as well. 990 00:52:16,990 --> 00:52:18,348 So you could, quite simply-- 991 00:52:18,348 --> 00:52:20,640 if you, as a parent, for instance, don't want your kids 992 00:52:20,640 --> 00:52:23,220 accessing social media within the home, you 993 00:52:23,220 --> 00:52:25,470 could just configure your home's firewall 994 00:52:25,470 --> 00:52:30,690 to prevent access to the IP addresses of known social media sites. 995 00:52:30,690 --> 00:52:35,640 And so if the kids are trying to use the laptops or desktops in the home 996 00:52:35,640 --> 00:52:38,850 to connect to those IP addresses, it would effectively be blocked 997 00:52:38,850 --> 00:52:42,720 and not allowed through, so long as the software in question, the firewall, 998 00:52:42,720 --> 00:52:45,390 knows or can figure out what those IP addresses are. 999 00:52:45,390 --> 00:52:47,160 Now that said, it's not fail-proof. 1000 00:52:47,160 --> 00:52:50,160 All it takes is for someone in the home to have some out-of-band device, 1001 00:52:50,160 --> 00:52:52,710 like a cellphone, that uses the mobile phone network. 1002 00:52:52,710 --> 00:52:54,870 And then, of course, you circumvent the firewall 1003 00:52:54,870 --> 00:52:57,660 that might be based only on your home network. 1004 00:52:57,660 --> 00:53:01,500 So you have to keep in mind exactly what it is your firewalling 1005 00:53:01,500 --> 00:53:03,000 and which networks they're in. 1006 00:53:03,000 --> 00:53:05,010 Now you might not want to block access to sites 1007 00:53:05,010 --> 00:53:08,440 based solely on their IP address, but perhaps based on those port numbers. 1008 00:53:08,440 --> 00:53:12,690 So, for instance, if you wanted to block all internet traffic in or out 1009 00:53:12,690 --> 00:53:14,910 of a network, you could just use your firewall 1010 00:53:14,910 --> 00:53:16,680 to block all of those port numbers. 1011 00:53:16,680 --> 00:53:18,960 But if, wait a minute, you realize that you still 1012 00:53:18,960 --> 00:53:22,920 want to be able to remotely control a computer or server in that network, 1013 00:53:22,920 --> 00:53:26,700 you could open up just one port number, for instance, 22, 1014 00:53:26,700 --> 00:53:29,730 if you want to allow SSH back and forth, or whatever 1015 00:53:29,730 --> 00:53:32,710 port number your preferred VPN software uses instead. 1016 00:53:32,710 --> 00:53:36,780 So you can use firewalls to block traffic based on IP address, 1017 00:53:36,780 --> 00:53:40,710 based on port number, or even, more sophisticatedly, 1018 00:53:40,710 --> 00:53:43,300 via deep packet inspection. 1019 00:53:43,300 --> 00:53:48,000 Which is a big way of saying that even the most sophisticated of firewalls 1020 00:53:48,000 --> 00:53:52,290 can actually open up, theoretically, these virtual envelopes 1021 00:53:52,290 --> 00:53:54,820 and see what's actually inside them. 1022 00:53:54,820 --> 00:53:59,670 And this way, you can even more reliably block certain sites 1023 00:53:59,670 --> 00:54:02,970 by their domain name-- not just their IP address, but by their name. 1024 00:54:02,970 --> 00:54:05,550 You could, for instance, via deep packet inspection 1025 00:54:05,550 --> 00:54:08,627 keep an eye out as to who is emailing whom. 1026 00:54:08,627 --> 00:54:10,710 For instance, corporations that are very concerned 1027 00:54:10,710 --> 00:54:15,360 about their intellectual property ideas and products that they have internally, 1028 00:54:15,360 --> 00:54:17,700 they might use deep packet inspection to make sure 1029 00:54:17,700 --> 00:54:21,720 that you and I are not emailing the press about some new product 1030 00:54:21,720 --> 00:54:23,760 under development, or emailing the competition, 1031 00:54:23,760 --> 00:54:26,160 or anyone in the outside world about some product. 1032 00:54:26,160 --> 00:54:28,740 Because via deep packet inspection, you can pretty much 1033 00:54:28,740 --> 00:54:31,050 look at everything inside of this envelope, 1034 00:54:31,050 --> 00:54:34,290 be it the sender, the receiver, the contents of the message. 1035 00:54:34,290 --> 00:54:38,110 But you can also use this too not just for confidentiality and the like, 1036 00:54:38,110 --> 00:54:42,810 but also, for instance, to check for malware and malicious software. 1037 00:54:42,810 --> 00:54:46,710 That is to say, maybe attachments that you don't want to allow through. 1038 00:54:46,710 --> 00:54:49,980 And we'll consider exactly what those threats might be in just a moment 1039 00:54:49,980 --> 00:54:50,710 as well. 1040 00:54:50,710 --> 00:54:54,000 Now how might a company, a university, or even a home 1041 00:54:54,000 --> 00:54:57,030 implement this kind of firewalling, or more deeply, 1042 00:54:57,030 --> 00:54:58,710 this deep packet inspection? 1043 00:54:58,710 --> 00:55:02,730 Well we're essentially describing a technology that you would call a proxy. 1044 00:55:02,730 --> 00:55:05,388 And a proxy is very often a server. 1045 00:55:05,388 --> 00:55:07,680 Though it doesn't have to be an actual physical server. 1046 00:55:07,680 --> 00:55:10,200 It can be a piece of software running in your network. 1047 00:55:10,200 --> 00:55:14,910 A proxy is a device that essentially implements a potential machine 1048 00:55:14,910 --> 00:55:16,590 in the middle attack. 1049 00:55:16,590 --> 00:55:19,410 But it's not quite an attack in this way, it's by design. 1050 00:55:19,410 --> 00:55:23,040 A proxy is a device, a server, or a piece of software 1051 00:55:23,040 --> 00:55:25,720 that sits between two other points. 1052 00:55:25,720 --> 00:55:28,890 So in this case, Alice might be someone on the inside of a network. 1053 00:55:28,890 --> 00:55:31,200 Bob might be someone on the outside of a network. 1054 00:55:31,200 --> 00:55:34,530 And Eve, in this case, is eavesdropping, literally, 1055 00:55:34,530 --> 00:55:37,590 because her role in this story is to be that of a proxy. 1056 00:55:37,590 --> 00:55:40,500 And the proxy takes data from one side, and ideally, maybe 1057 00:55:40,500 --> 00:55:41,610 hands it out to the other. 1058 00:55:41,610 --> 00:55:45,210 But this proxy might, indeed, be eavesdropping, might be a little nosey. 1059 00:55:45,210 --> 00:55:48,990 And so as a packet comes in this way from Alice, might look in the packet 1060 00:55:48,990 --> 00:55:51,660 and decide, uh-uh, we're not going to send this to Bob. 1061 00:55:51,660 --> 00:55:55,000 And then it might just be dropped or effectively deleted. 1062 00:55:55,000 --> 00:55:58,200 So companies, universities, even home networks 1063 00:55:58,200 --> 00:56:02,130 can use proxies to decide yes or no whether 1064 00:56:02,130 --> 00:56:04,270 or not to allow certain traffic through. 1065 00:56:04,270 --> 00:56:07,650 So in that sense, they're very similar to a firewall. 1066 00:56:07,650 --> 00:56:11,220 But sometimes, proxies are something that have to be configured, 1067 00:56:11,220 --> 00:56:12,660 even for your own devices. 1068 00:56:12,660 --> 00:56:15,810 Because there might be only one path out of a company network, 1069 00:56:15,810 --> 00:56:17,880 only one path out of a university. 1070 00:56:17,880 --> 00:56:22,270 But the catch here is that, literally, all of that traffic by design 1071 00:56:22,270 --> 00:56:24,690 now is going through this middle point. 1072 00:56:24,690 --> 00:56:27,280 And here's where things can get troubling, 1073 00:56:27,280 --> 00:56:31,710 especially if you do attend a university or you do work for a company 1074 00:56:31,710 --> 00:56:37,180 where they maybe issued your laptop, or desktop, or phone. 1075 00:56:37,180 --> 00:56:39,870 So if your company or your school has given you 1076 00:56:39,870 --> 00:56:42,900 a laptop, or desktop, or phone, that might be nice 1077 00:56:42,900 --> 00:56:45,190 that this is one of the perks, to have this hardware. 1078 00:56:45,190 --> 00:56:47,760 But if they have preconfigured it with software, 1079 00:56:47,760 --> 00:56:50,850 realize that there are implications of that. 1080 00:56:50,850 --> 00:56:54,360 You might have your own username and password on that laptop, or desktop, 1081 00:56:54,360 --> 00:56:58,950 or phone, but they might have installed administratively, with root access, 1082 00:56:58,950 --> 00:57:02,980 so to speak, some kind of software that could actually be monitoring everything 1083 00:57:02,980 --> 00:57:04,230 you're doing on that computer. 1084 00:57:04,230 --> 00:57:06,780 And this is not uncommon in the corporate workplace. 1085 00:57:06,780 --> 00:57:09,060 But they can also do things more technically. 1086 00:57:09,060 --> 00:57:13,500 Like they could install their own certificate authority 1087 00:57:13,500 --> 00:57:17,400 on your laptop, or desktop, or phone, essentially 1088 00:57:17,400 --> 00:57:20,640 adding to the list of certificate authorities 1089 00:57:20,640 --> 00:57:23,310 that Microsoft, and Google, and Apple, and Mozilla 1090 00:57:23,310 --> 00:57:25,860 have baked into their own browsers. 1091 00:57:25,860 --> 00:57:28,620 The implication of this is that even if you 1092 00:57:28,620 --> 00:57:33,660 think that you, as Alice, can securely communicate with Bob, 1093 00:57:33,660 --> 00:57:36,630 and maybe Bob in this story is gmail.com, 1094 00:57:36,630 --> 00:57:42,450 or amazon.com, or facebook.com, even if you think that Alice and Bob can-- 1095 00:57:42,450 --> 00:57:44,820 you, Alice, can communicate securely with Bob 1096 00:57:44,820 --> 00:57:51,630 and establish a connection between https://gmail.com, 1097 00:57:51,630 --> 00:57:56,010 if your browser, that is your phone, or laptop, or desktop 1098 00:57:56,010 --> 00:58:00,300 has a certificate installed by your company or university 1099 00:58:00,300 --> 00:58:03,240 that's acting as a CA, a certificate authority, 1100 00:58:03,240 --> 00:58:06,000 essentially your computer could be tricked 1101 00:58:06,000 --> 00:58:10,200 into thinking that you're connecting to the real gmail.com, 1102 00:58:10,200 --> 00:58:13,740 but you're actually connecting to the company's proxy server. 1103 00:58:13,740 --> 00:58:18,210 But because you have this additional certificate on your computer, 1104 00:58:18,210 --> 00:58:20,970 even though the company is masquerading-- 1105 00:58:20,970 --> 00:58:24,510 pretending to be gmail.com, you're actually 1106 00:58:24,510 --> 00:58:25,920 connected to their proxy server. 1107 00:58:25,920 --> 00:58:28,670 And they might actually be forwarding your traffic somewhere else, 1108 00:58:28,670 --> 00:58:33,000 but this is a definition of a machine in the middle attack, 1109 00:58:33,000 --> 00:58:37,410 but it's facilitated by someone else having used our technologies that we 1110 00:58:37,410 --> 00:58:41,640 talked about earlier to trick, really, your local device into thinking 1111 00:58:41,640 --> 00:58:45,180 that this is the real gmail.com, whereas the math might actually 1112 00:58:45,180 --> 00:58:48,840 be based on the company's certificate not on Gmail's own. 1113 00:58:48,840 --> 00:58:50,070 So be mindful of that. 1114 00:58:50,070 --> 00:58:53,070 Again, whenever you're using a device that has left your control 1115 00:58:53,070 --> 00:58:54,930 or has not always been under your control, 1116 00:58:54,930 --> 00:58:59,320 that you do not necessarily know what is installed on it. 1117 00:58:59,320 --> 00:59:02,790 So how else might a company, a university 1118 00:59:02,790 --> 00:59:07,348 be observing or keeping an eye out, either for good or evil purposes? 1119 00:59:07,348 --> 00:59:09,390 Well, they actually might also be doing something 1120 00:59:09,390 --> 00:59:11,430 that we might call URL rewriting. 1121 00:59:11,430 --> 00:59:14,700 So if you have a company or a school email address, 1122 00:59:14,700 --> 00:59:17,340 via which you receive mails from the outside world, 1123 00:59:17,340 --> 00:59:22,650 you might notice that whenever an email contains a link, if you hover over 1124 00:59:22,650 --> 00:59:26,200 the link and look in the bottom corner of your browser, 1125 00:59:26,200 --> 00:59:30,270 it might actually not go to the actual link destination 1126 00:59:30,270 --> 00:59:31,470 that you think it does. 1127 00:59:31,470 --> 00:59:34,005 It might actually go to a URL like this-- maybe 1128 00:59:34,005 --> 00:59:40,860 https://example.com?url= something. 1129 00:59:40,860 --> 00:59:44,940 And the implication of this here is that what companies and schools might 1130 00:59:44,940 --> 00:59:49,890 do to combat malware, maybe malicious software that could be accidentally 1131 00:59:49,890 --> 00:59:53,340 installed or sent your way via URLs, or to prevent you 1132 00:59:53,340 --> 00:59:56,520 from accessing phishing websites that are trying to steal your, 1133 00:59:56,520 --> 00:59:58,380 or the company, or the school's information, 1134 00:59:58,380 --> 01:00:01,500 they might automatically, through some kind of proxy server, 1135 01:00:01,500 --> 01:00:06,360 acting in this case on email, change every URL in an email 1136 01:00:06,360 --> 01:00:11,700 you receive to be example.com and then embed at the end of that example.com 1137 01:00:11,700 --> 01:00:16,440 URL the actual URL, so that you can still reach your destination. 1138 01:00:16,440 --> 01:00:20,070 But the implication of this is that if you click on this link here, 1139 01:00:20,070 --> 01:00:22,530 you're first going to visit example.com. 1140 01:00:22,530 --> 01:00:27,330 That's going to include some data from that proxy server that has added 1141 01:00:27,330 --> 01:00:31,080 your url= something, where something might be gmail.com, amazon.com, 1142 01:00:31,080 --> 01:00:33,030 whatever the actual URL is. 1143 01:00:33,030 --> 01:00:37,440 But because the company or the school in this story controls example.com, 1144 01:00:37,440 --> 01:00:40,950 they know what your URL you're trying to visit. 1145 01:00:40,950 --> 01:00:43,680 Now, one, they could minimally just log that information 1146 01:00:43,680 --> 01:00:46,590 and know that, oh, David seems to be visiting Gmail again. 1147 01:00:46,590 --> 01:00:50,910 Or they could actually check in their database, is gmail.com, 1148 01:00:50,910 --> 01:00:52,980 or whatever this URL is, known to be malicious? 1149 01:00:52,980 --> 01:00:54,522 Is it known to be a phishing website? 1150 01:00:54,522 --> 01:00:57,790 And they can just prevent you from accessing it altogether. 1151 01:00:57,790 --> 01:01:01,290 So this is another form of proxying that's actually very explicit. 1152 01:01:01,290 --> 01:01:04,530 Because you're embedding the machine in the middle, 1153 01:01:04,530 --> 01:01:08,070 quite literally, as example.com, or whatever your university 1154 01:01:08,070 --> 01:01:10,630 or your company's domain name is. 1155 01:01:10,630 --> 01:01:15,720 But it's all toward an end of trying to help protect you from potentially 1156 01:01:15,720 --> 01:01:18,210 malicious phishing type websites. 1157 01:01:18,210 --> 01:01:21,540 But the implication too is that now the company, the school, 1158 01:01:21,540 --> 01:01:26,830 the machine in the middle knows every link you're clicking on as well. 1159 01:01:26,830 --> 01:01:28,950 Let me pause here and see if there are now 1160 01:01:28,950 --> 01:01:32,670 any questions about these techniques of proxying. 1161 01:01:32,670 --> 01:01:36,420 AUDIENCE: I just want to know what's the difference between VPN 1162 01:01:36,420 --> 01:01:41,040 and also Tor network in a higher level? 1163 01:01:41,040 --> 01:01:43,470 Just the difference, because what I do know 1164 01:01:43,470 --> 01:01:48,270 is that these two networks are used to encrypt or anonymize 1165 01:01:48,270 --> 01:01:52,020 our activity or the data that we are sending. 1166 01:01:52,020 --> 01:01:55,650 The second question is that let's say that I'm using a network 1167 01:01:55,650 --> 01:02:01,860 at a university or a company, and they also have their own CA, 1168 01:02:01,860 --> 01:02:04,290 which means that they've got their own databases, 1169 01:02:04,290 --> 01:02:10,170 like which websites I can get in and which websites I cannot get into. 1170 01:02:10,170 --> 01:02:17,430 If I'm using VPN, does that mean that the university or the company 1171 01:02:17,430 --> 01:02:19,442 know what I'm doing? 1172 01:02:19,442 --> 01:02:20,650 DAVID MALAN: A good question. 1173 01:02:20,650 --> 01:02:23,880 So Tor is the anonymization software I was alluding to a moment 1174 01:02:23,880 --> 01:02:25,042 ago in my previous answer. 1175 01:02:25,042 --> 01:02:27,000 And we'll talk about that in our final lecture. 1176 01:02:27,000 --> 01:02:29,650 Tor really is about privacy preserving. 1177 01:02:29,650 --> 01:02:32,040 So it's about covering your tracks so that, 1178 01:02:32,040 --> 01:02:34,530 kind of like the Hollywood movies, when you're here, 1179 01:02:34,530 --> 01:02:38,130 your data is bouncing across all of these different servers in the world, 1180 01:02:38,130 --> 01:02:42,240 and it's difficult to trace it back to its origins, by design. 1181 01:02:42,240 --> 01:02:44,070 So more on that in a couple of weeks' time. 1182 01:02:44,070 --> 01:02:47,925 VPN is an encrypted connection just between one point, A, 1183 01:02:47,925 --> 01:02:52,380 and another point, B. So even though they can't see what you are doing, 1184 01:02:52,380 --> 01:02:55,420 it is very obvious where your data is coming from thereafter. 1185 01:02:55,420 --> 01:02:58,110 So legally, for instance, if the VPN server or company 1186 01:02:58,110 --> 01:03:01,350 were to be subpoenaed, they might have to disclose information about you. 1187 01:03:01,350 --> 01:03:03,660 And so it's not quite as privacy preserving. 1188 01:03:03,660 --> 01:03:07,890 It secures your data, but it doesn't preserve your privacy in the same way. 1189 01:03:07,890 --> 01:03:14,010 On your other question about what you're doing, especially when you're on a VPN, 1190 01:03:14,010 --> 01:03:17,670 if you-- if someone has installed, with administrative privileges, 1191 01:03:17,670 --> 01:03:19,900 software on your computer, all bets are off. 1192 01:03:19,900 --> 01:03:21,910 You should, cannot trust the device. 1193 01:03:21,910 --> 01:03:25,830 So if they've installed their own certificate on your computer or a CA 1194 01:03:25,830 --> 01:03:28,560 to the list of trusted things, any time you 1195 01:03:28,560 --> 01:03:33,030 use that browser, for instance, you are vulnerable to a machine 1196 01:03:33,030 --> 01:03:39,560 in the middle attack or at least a proxying-type implication. 1197 01:03:39,560 --> 01:03:41,708 Beyond that-- but honestly, at that point, 1198 01:03:41,708 --> 01:03:44,000 if they've installed special software in your computer, 1199 01:03:44,000 --> 01:03:46,910 they could be monitoring everything you do on the internet anyway. 1200 01:03:46,910 --> 01:03:49,260 So all bets are off. 1201 01:03:49,260 --> 01:03:52,730 So what is it we're trying to keep, ultimately, out of our systems? 1202 01:03:52,730 --> 01:03:55,640 Well, I dare say, malware is perhaps one of the biggest threats. 1203 01:03:55,640 --> 01:03:58,490 So malware or malicious software is just software 1204 01:03:58,490 --> 01:04:02,090 that someone has written that can do malicious things. 1205 01:04:02,090 --> 01:04:03,710 And that's the nature of software. 1206 01:04:03,710 --> 01:04:05,930 If you've never programmed before, it turns out 1207 01:04:05,930 --> 01:04:08,300 that it's not that hard to write a program that if you 1208 01:04:08,300 --> 01:04:11,300 run it deletes all of the files on your computer, 1209 01:04:11,300 --> 01:04:14,510 or maybe start sending out spam, or maybe starts mining bitcoin, 1210 01:04:14,510 --> 01:04:15,800 or anything else. 1211 01:04:15,800 --> 01:04:17,480 Software can do anything. 1212 01:04:17,480 --> 01:04:20,840 And whether it's malicious or not is really up to the human who wrote it 1213 01:04:20,840 --> 01:04:22,550 or the person who's using it. 1214 01:04:22,550 --> 01:04:24,890 Now you might have heard of specific types of malware, 1215 01:04:24,890 --> 01:04:26,210 for instance, a virus. 1216 01:04:26,210 --> 01:04:30,200 A virus is a piece of software that attaches itself to a host, 1217 01:04:30,200 --> 01:04:32,630 just like in the human physiological world. 1218 01:04:32,630 --> 01:04:36,290 The attach with a virus in the digital world is that you, the human, 1219 01:04:36,290 --> 01:04:39,320 usually have to do something to get infected. 1220 01:04:39,320 --> 01:04:42,447 You have to open a file that's infected with the virus, 1221 01:04:42,447 --> 01:04:45,530 and start running it on your computer, and loading it into your computer's 1222 01:04:45,530 --> 01:04:47,990 memory and CPU or brain. 1223 01:04:47,990 --> 01:04:52,650 You have to click on an attachment in an email that's perhaps infected. 1224 01:04:52,650 --> 01:04:55,970 So viruses generally require human intervention 1225 01:04:55,970 --> 01:04:57,560 and really, human mistakes. 1226 01:04:57,560 --> 01:05:00,410 You're exposing yourself unintentionally to a piece 1227 01:05:00,410 --> 01:05:02,210 of software that can now do anything. 1228 01:05:02,210 --> 01:05:05,030 And a virus can literally do anything that a piece of software can. 1229 01:05:05,030 --> 01:05:07,130 So it can delete all the files on your hard drive. 1230 01:05:07,130 --> 01:05:07,923 It can send spam. 1231 01:05:07,923 --> 01:05:09,090 It can start bitcoin mining. 1232 01:05:09,090 --> 01:05:11,752 It can email all of your files to an adversary. 1233 01:05:11,752 --> 01:05:14,210 Once you have a piece of software running on your computer, 1234 01:05:14,210 --> 01:05:15,600 all bets are off. 1235 01:05:15,600 --> 01:05:20,480 So you might wonder then, well, what's the line between a virus and Microsoft 1236 01:05:20,480 --> 01:05:25,640 Word, or Spotify, or some other piece of software you install on your computer? 1237 01:05:25,640 --> 01:05:28,500 Really, it's ethics, at the end of the day. 1238 01:05:28,500 --> 01:05:32,930 We are trusting that Microsoft Word, and Spotify, and other software you 1239 01:05:32,930 --> 01:05:35,540 might install on your Mac, your PC, or your phone, 1240 01:05:35,540 --> 01:05:37,742 just isn't doing bad things. 1241 01:05:37,742 --> 01:05:40,700 Because very often, once you have a piece of software on your computer, 1242 01:05:40,700 --> 01:05:44,930 it technically can do anything that the operating system-- 1243 01:05:44,930 --> 01:05:47,720 Windows, or Mac OS, or iOS, or Android-- 1244 01:05:47,720 --> 01:05:50,960 make possible, thanks to those manufacturers. 1245 01:05:50,960 --> 01:05:53,480 And it really is a code of ethics. 1246 01:05:53,480 --> 01:05:56,030 It really is, perhaps, capitalistic pressures 1247 01:05:56,030 --> 01:05:59,780 that ensure that companies aren't necessarily infecting us 1248 01:05:59,780 --> 01:06:01,370 with software that's doing bad things. 1249 01:06:01,370 --> 01:06:01,870 Why? 1250 01:06:01,870 --> 01:06:04,220 Probably bad for business, if nothing else, 1251 01:06:04,220 --> 01:06:06,260 if they're caught doing something malicious. 1252 01:06:06,260 --> 01:06:10,693 But with that said, historically it's quite possible for software 1253 01:06:10,693 --> 01:06:13,610 to be written, even within the constraints of an operating system that 1254 01:06:13,610 --> 01:06:14,660 does bad things. 1255 01:06:14,660 --> 01:06:16,670 And heck, maybe it's even accidental. 1256 01:06:16,670 --> 01:06:19,940 It has absolutely been the case that sometimes software deletes things 1257 01:06:19,940 --> 01:06:22,800 that it shouldn't because some human made a mistake. 1258 01:06:22,800 --> 01:06:25,490 Now with that said, gradually the world is getting better 1259 01:06:25,490 --> 01:06:29,930 at designing better and better operating systems that try to Sandbox things. 1260 01:06:29,930 --> 01:06:32,960 In iOS, which runs on iPhones, and iPads, and the like, 1261 01:06:32,960 --> 01:06:35,630 is actually particularly good at this, whereby, 1262 01:06:35,630 --> 01:06:38,780 sometimes annoyingly, apps can't do something 1263 01:06:38,780 --> 01:06:40,730 without your explicit permission. 1264 01:06:40,730 --> 01:06:43,700 Now you and I might not to give much thought to just saying OK, OK, OK, 1265 01:06:43,700 --> 01:06:45,690 because we want the software to do its thing. 1266 01:06:45,690 --> 01:06:50,030 But more so than operating systems past, you and I are increasingly 1267 01:06:50,030 --> 01:06:53,630 being allowed to weigh in on whether some piece of software 1268 01:06:53,630 --> 01:06:57,260 can use the network, can turn on the camera, can turn on the microphone. 1269 01:06:57,260 --> 01:06:59,930 So thankfully, we're getting more and more building blocks 1270 01:06:59,930 --> 01:07:02,010 to mitigate some of these concerns. 1271 01:07:02,010 --> 01:07:03,830 But there are still viruses in the world. 1272 01:07:03,830 --> 01:07:06,290 And you can be infected by them whether it 1273 01:07:06,290 --> 01:07:09,950 is some file you've downloaded and run or some email attachment you've 1274 01:07:09,950 --> 01:07:10,460 clicked. 1275 01:07:10,460 --> 01:07:14,990 But more worrisome is another type of malware that we might call a worm. 1276 01:07:14,990 --> 01:07:17,418 And a worm is very similar in spirit to a virus, 1277 01:07:17,418 --> 01:07:18,960 in that it's just malicious software. 1278 01:07:18,960 --> 01:07:19,910 It can do anything. 1279 01:07:19,910 --> 01:07:23,930 But these things can travel from computer to computer, even 1280 01:07:23,930 --> 01:07:26,690 without interaction by humans. 1281 01:07:26,690 --> 01:07:27,990 Now how does this work? 1282 01:07:27,990 --> 01:07:31,610 Well, a worm, theoretically, once it's installed on one computer, 1283 01:07:31,610 --> 01:07:34,160 having infected it and running, well, it could 1284 01:07:34,160 --> 01:07:36,770 do that technique called port scanning from earlier. 1285 01:07:36,770 --> 01:07:41,180 And maybe it can use that infected computer's internet connection 1286 01:07:41,180 --> 01:07:44,900 and just start scanning the local network or a broader network 1287 01:07:44,900 --> 01:07:49,290 for IP addresses of other computers and ports of other computers. 1288 01:07:49,290 --> 01:07:53,270 And if one of those computer's ports happens to be listening, 1289 01:07:53,270 --> 01:07:58,760 and for unfortunate reasons, that computer is not using encryption, 1290 01:07:58,760 --> 01:08:02,750 it's not using authentication, and it's vulnerable somehow, theoretically 1291 01:08:02,750 --> 01:08:06,230 that worm can travel from computer, to computer, to computer 1292 01:08:06,230 --> 01:08:11,130 by making these network connections via these ports at these IP addresses. 1293 01:08:11,130 --> 01:08:13,010 And that is quite often how they have spread. 1294 01:08:13,010 --> 01:08:15,830 It's the result of mistakes in software. 1295 01:08:15,830 --> 01:08:20,390 It's the result of there having been holes in our firewalls, in our systems 1296 01:08:20,390 --> 01:08:23,600 by allowing them through via these techniques. 1297 01:08:23,600 --> 01:08:25,790 So what's the downside really? 1298 01:08:25,790 --> 01:08:28,880 Well, beyond just wreaking havoc on your own computer, 1299 01:08:28,880 --> 01:08:32,960 by deleting all of your files, spamming people, bitcoin mining, and the like, 1300 01:08:32,960 --> 01:08:36,819 actually what adversaries have been increasingly been doing over the years 1301 01:08:36,819 --> 01:08:38,950 is creating botnets, so to speak. 1302 01:08:38,950 --> 01:08:42,189 That is it turns out it's more valuable to an adversary 1303 01:08:42,189 --> 01:08:46,600 not to completely disable your system, because that doesn't really serve them 1304 01:08:46,600 --> 01:08:51,010 long term, but maybe to install software on your computer that's 1305 01:08:51,010 --> 01:08:52,390 just constantly running. 1306 01:08:52,390 --> 01:08:54,700 And it's not actually doing anything bad to you. 1307 01:08:54,700 --> 01:08:58,359 None of your files are deleted, no spam is being sent, but you are infected. 1308 01:08:58,359 --> 01:09:01,569 And there's just some piece of software constantly running on your computer. 1309 01:09:01,569 --> 01:09:03,939 But maybe this adversary has somehow figured out 1310 01:09:03,939 --> 01:09:06,460 how to infect not just your computer, but my computer, 1311 01:09:06,460 --> 01:09:09,480 and your computer, and your computer, and your computer, and hundreds, 1312 01:09:09,480 --> 01:09:11,470 thousannds of other computers in the world. 1313 01:09:11,470 --> 01:09:17,500 Now, if this software is smart and it's constantly listening for commands, 1314 01:09:17,500 --> 01:09:21,220 an attacker can send some kind of commands, 1315 01:09:21,220 --> 01:09:25,930 not unlike those virtual envelopes, to this entire botnet of computers 1316 01:09:25,930 --> 01:09:30,700 and say, OK computers, now everyone at the same moment 1317 01:09:30,700 --> 01:09:34,090 start attacking some server, or everyone at this moment 1318 01:09:34,090 --> 01:09:37,420 start sending emails, everyone at this moment start mining bitcoin. 1319 01:09:37,420 --> 01:09:41,649 And so you can leverage the network of hundreds or thousands of computers 1320 01:09:41,649 --> 01:09:46,840 all at once and have a much more powerful attack therefore possible. 1321 01:09:46,840 --> 01:09:48,850 And what form do those attacks take? 1322 01:09:48,850 --> 01:09:53,200 Well, quite often what we'd call a denial of service attack, or DOS. 1323 01:09:53,200 --> 01:09:55,150 And this is exactly as the name suggests. 1324 01:09:55,150 --> 01:10:00,080 Sometimes adversaries' goal in life is just to deny service to everyone else, 1325 01:10:00,080 --> 01:10:03,070 whether it's for political reasons, financial reasons, or the like. 1326 01:10:03,070 --> 01:10:07,150 It's one thing for me on me on my laptop or my phone to maybe visit-- 1327 01:10:07,150 --> 01:10:10,630 maybe I'm a little annoyed at Google today, so I go to google.com, 1328 01:10:10,630 --> 01:10:14,230 and then I keep hitting reload, reload, reload, reload-- or faster-- reload, 1329 01:10:14,230 --> 01:10:15,490 reload, reload, reload. 1330 01:10:15,490 --> 01:10:18,250 I'm trying to deny service to other people, 1331 01:10:18,250 --> 01:10:21,070 but realistically, Google has way more resources than me, 1332 01:10:21,070 --> 01:10:23,830 so a denial of service attack only really 1333 01:10:23,830 --> 01:10:27,370 works if you have a lot of resources yourself, more resources 1334 01:10:27,370 --> 01:10:29,530 than the site you're attacking. 1335 01:10:29,530 --> 01:10:33,100 But with botnets, when you control multiple computers, 1336 01:10:33,100 --> 01:10:37,120 you can actually wage a distributed denial of service attack, 1337 01:10:37,120 --> 01:10:39,880 whereby again, you send a command to this whole network 1338 01:10:39,880 --> 01:10:43,750 of infected computers, and you say, all right, everyone, go visit google.com 1339 01:10:43,750 --> 01:10:45,670 right now, and reload, reload, reload. 1340 01:10:45,670 --> 01:10:49,840 And when it's hundreds or thousands of computers doing it to Google, or maybe 1341 01:10:49,840 --> 01:10:52,480 not someone as big as Google, but maybe a small business 1342 01:10:52,480 --> 01:10:54,910 that they're annoyed at or they're competing with, 1343 01:10:54,910 --> 01:11:00,220 you can via a distributed network try to deny service to actual customers 1344 01:11:00,220 --> 01:11:01,360 or users of that site. 1345 01:11:01,360 --> 01:11:01,930 Why? 1346 01:11:01,930 --> 01:11:05,740 Well, a computer, long story short, only has so much memory, 1347 01:11:05,740 --> 01:11:08,300 only can do so much per unit of time. 1348 01:11:08,300 --> 01:11:11,140 And if you completely distract a computer or server 1349 01:11:11,140 --> 01:11:15,580 by all of these requests that are bogus, then the good requests from the real, 1350 01:11:15,580 --> 01:11:19,240 users the real customers, can't necessarily squeeze in. 1351 01:11:19,240 --> 01:11:21,820 And so you're denying service to other people. 1352 01:11:21,820 --> 01:11:24,880 And distributed is particularly malicious. 1353 01:11:24,880 --> 01:11:25,420 Why? 1354 01:11:25,420 --> 01:11:28,420 Well, once a company figured out what's happening, like Google, 1355 01:11:28,420 --> 01:11:30,580 they could just block with their firewall 1356 01:11:30,580 --> 01:11:33,910 my IP address, for instance, coming from my phone, or my laptop, 1357 01:11:33,910 --> 01:11:34,780 or other device. 1358 01:11:34,780 --> 01:11:37,600 Once they know where the attack is coming from, they can just deny 1359 01:11:37,600 --> 01:11:39,160 or they can block service there. 1360 01:11:39,160 --> 01:11:43,780 But if it's distributed, if it's coming from all of us or thousands of people, 1361 01:11:43,780 --> 01:11:47,410 thousands of IP addresses, then it gets a little harder. 1362 01:11:47,410 --> 01:11:51,700 Technically, they could just block all of our IP addresses, thousands of IPs. 1363 01:11:51,700 --> 01:11:56,590 But very often, you and I share IP addresses if we're on the same campus, 1364 01:11:56,590 --> 01:11:57,940 on the same corporate network. 1365 01:11:57,940 --> 01:12:01,900 Even though locally we might have unique addresses, to the outside world 1366 01:12:01,900 --> 01:12:06,790 we might all share one public address from our whole company or school. 1367 01:12:06,790 --> 01:12:11,200 And at some point, it's not going to be in Google's best interest in this story 1368 01:12:11,200 --> 01:12:13,210 to block all of those IP addresses. 1369 01:12:13,210 --> 01:12:17,410 Because then we might be denying service to actual good people on this campus, 1370 01:12:17,410 --> 01:12:22,090 at this company, who just happened to be on the same network as this attacker 1371 01:12:22,090 --> 01:12:24,280 or as this infected computer. 1372 01:12:24,280 --> 01:12:28,660 And so that's really the value of attacking computers nowadays. 1373 01:12:28,660 --> 01:12:31,600 It's not just one thing to get at your computer individually. 1374 01:12:31,600 --> 01:12:33,460 It's what your computer represents. 1375 01:12:33,460 --> 01:12:38,410 You are a node, a potential ally on a network of systems. 1376 01:12:38,410 --> 01:12:41,920 So it's just as well that we try to keep this kind of software 1377 01:12:41,920 --> 01:12:45,250 out altogether instead. 1378 01:12:45,250 --> 01:12:45,760 So how? 1379 01:12:45,760 --> 01:12:46,810 How do you keep it out? 1380 01:12:46,810 --> 01:12:48,935 Well, for years you've probably heard about-- maybe 1381 01:12:48,935 --> 01:12:50,740 you've been using-- antivirus software. 1382 01:12:50,740 --> 01:12:52,300 And that's exactly what it does. 1383 01:12:52,300 --> 01:12:55,180 Antivirus software is software you either download for free, 1384 01:12:55,180 --> 01:12:57,460 maybe pay for, install on your computer. 1385 01:12:57,460 --> 01:12:59,560 And it's generally constantly running. 1386 01:12:59,560 --> 01:13:03,250 Or maybe it runs on a schedule to check are there any known viruses 1387 01:13:03,250 --> 01:13:06,880 or worms on this computer, and if so, let's delete them, or let's somehow 1388 01:13:06,880 --> 01:13:07,828 remove them safely. 1389 01:13:07,828 --> 01:13:10,120 And maybe you have to reboot to get them out of memory, 1390 01:13:10,120 --> 01:13:12,020 but then maybe you're back in business. 1391 01:13:12,020 --> 01:13:13,635 To be fair, that might be too late. 1392 01:13:13,635 --> 01:13:15,760 Because the emails-- the spam might have been sent, 1393 01:13:15,760 --> 01:13:17,230 the files might have been deleted. 1394 01:13:17,230 --> 01:13:22,700 But it at least gets it out of there for the future time, at least. 1395 01:13:22,700 --> 01:13:25,570 But there's a problem with antivirus software 1396 01:13:25,570 --> 01:13:32,050 alone, in that it has to actually be current for the attacks 1397 01:13:32,050 --> 01:13:33,100 that you're facing. 1398 01:13:33,100 --> 01:13:35,800 That is to say, for the antivirus software to work, 1399 01:13:35,800 --> 01:13:38,620 has to know about the virus, has to know about the worm. 1400 01:13:38,620 --> 01:13:40,750 Now how can you make sure you're always current? 1401 01:13:40,750 --> 01:13:44,530 Well, you can enable automatic updates with your antivirus software or even 1402 01:13:44,530 --> 01:13:46,450 your operating system more generally. 1403 01:13:46,450 --> 01:13:49,030 And in recent years, the world has realized 1404 01:13:49,030 --> 01:13:52,000 that even though there are downsides of automatic updates, 1405 01:13:52,000 --> 01:13:56,080 it's generally proving to be, it seems, a net positive for society. 1406 01:13:56,080 --> 01:13:56,620 Why? 1407 01:13:56,620 --> 01:13:59,260 Because it ensures that you, and I, and everyone else 1408 01:13:59,260 --> 01:14:03,010 are generally running the latest versions of software, which generally 1409 01:14:03,010 --> 01:14:07,480 means we fixed security holes, that is security-related bugs 1410 01:14:07,480 --> 01:14:08,750 in previous versions. 1411 01:14:08,750 --> 01:14:12,233 So that at least we're not vulnerable to yesterday's mistakes. 1412 01:14:12,233 --> 01:14:14,650 We're still vulnerable to today's and tomorrow's mistakes, 1413 01:14:14,650 --> 01:14:16,817 when people continue to write software that's buggy. 1414 01:14:16,817 --> 01:14:18,280 But at least we're staying current. 1415 01:14:18,280 --> 01:14:22,630 And this also ensures too that companies can focus their resources generally 1416 01:14:22,630 --> 01:14:24,652 on the newest versions of software, and they 1417 01:14:24,652 --> 01:14:26,860 don't have to worry about being backwards compatible, 1418 01:14:26,860 --> 01:14:30,190 and spending time, and effort, and distraction on software that 1419 01:14:30,190 --> 01:14:32,240 might be older, and older, and older. 1420 01:14:32,240 --> 01:14:36,160 So enabling automatic updates is generally proving to be a good thing, 1421 01:14:36,160 --> 01:14:37,090 I daresay. 1422 01:14:37,090 --> 01:14:38,470 But there are downsides. 1423 01:14:38,470 --> 01:14:41,440 Google, Microsoft, Apple, and others, they're not perfect. 1424 01:14:41,440 --> 01:14:44,950 And it has definitely been the case that their companies have released 1425 01:14:44,950 --> 01:14:48,700 an update that actually breaks your computer or mine, in the sense 1426 01:14:48,700 --> 01:14:51,020 that now I can't access it or something went wrong. 1427 01:14:51,020 --> 01:14:53,470 And so generally, automatic updates are not something 1428 01:14:53,470 --> 01:14:57,820 you do to all of your customers all at once, but maybe a few, then a few more, 1429 01:14:57,820 --> 01:15:01,150 and just to make sure that we're not going to break or brick 1430 01:15:01,150 --> 01:15:02,980 a whole lot of our users' computers. 1431 01:15:02,980 --> 01:15:05,590 But generally speaking, turning on automatic updates 1432 01:15:05,590 --> 01:15:09,010 will at least ensure that you're not vulnerable to problems 1433 01:15:09,010 --> 01:15:10,730 the world has already solved. 1434 01:15:10,730 --> 01:15:14,560 And this is a good thing because there's nothing worse than realizing, oh, I've 1435 01:15:14,560 --> 01:15:17,230 been attacked and there was something you could do about it. 1436 01:15:17,230 --> 01:15:19,480 You could have updated your software already. 1437 01:15:19,480 --> 01:15:22,540 But the catch is that there are also these attacks known 1438 01:15:22,540 --> 01:15:24,100 as zero-day attacks. 1439 01:15:24,100 --> 01:15:27,640 And the problem with antivirus software in general, and even automatic updates, 1440 01:15:27,640 --> 01:15:31,360 is that there's still humans involved in this process, whereby 1441 01:15:31,360 --> 01:15:34,210 they have to realize, oh, there's a new virus in the world, 1442 01:15:34,210 --> 01:15:37,150 oh, there's a new worm in the world, oh, we made a mistake. 1443 01:15:37,150 --> 01:15:38,200 You have to fix it. 1444 01:15:38,200 --> 01:15:42,280 You have to update the antivirus software to detect those new threats, 1445 01:15:42,280 --> 01:15:45,280 or those new viruses, those new worms. 1446 01:15:45,280 --> 01:15:48,610 So a zero-day attack is an example of attack 1447 01:15:48,610 --> 01:15:52,870 where an adversary maybe writes their own virus, their own worm, 1448 01:15:52,870 --> 01:15:55,930 gets it out into the wild, maybe on enough computers 1449 01:15:55,930 --> 01:15:58,870 that they can do something particularly destructive or valuable 1450 01:15:58,870 --> 01:16:03,320 for them with it, and the world just doesn't have time to catch up. 1451 01:16:03,320 --> 01:16:06,970 So even if you have antivirus software installed, automatic updates, 1452 01:16:06,970 --> 01:16:11,830 it might still take a day, a week for the companies who design that software 1453 01:16:11,830 --> 01:16:14,120 to update those products for you. 1454 01:16:14,120 --> 01:16:16,340 So even then, you're still vulnerable. 1455 01:16:16,340 --> 01:16:18,400 And that's why security really is going to be 1456 01:16:18,400 --> 01:16:21,730 this multipronged approach, especially when it comes to our systems. 1457 01:16:21,730 --> 01:16:24,070 It's not enough to just use antivirus software. 1458 01:16:24,070 --> 01:16:26,110 It's not enough just to have a good password. 1459 01:16:26,110 --> 01:16:30,820 It really is a layered defense so that you create this gauntlet of defenses 1460 01:16:30,820 --> 01:16:33,250 ultimately that adversaries have to get through. 1461 01:16:33,250 --> 01:16:35,380 And if they get through one, hopefully you're fine. 1462 01:16:35,380 --> 01:16:37,450 If they get through two, hopefully you're still fine. 1463 01:16:37,450 --> 01:16:40,117 If they get through three, maybe then you should start worrying. 1464 01:16:40,117 --> 01:16:42,370 But security really isn't this absolute. 1465 01:16:42,370 --> 01:16:45,580 Recall from where we began, we really just want to raise the bar, 1466 01:16:45,580 --> 01:16:49,090 raise the cost, raise the risk to the adversary 1467 01:16:49,090 --> 01:16:52,750 so that, again, they hopefully, lose interest in little old me. 1468 01:16:52,750 --> 01:16:55,270 So that's it for today's focus on systems. 1469 01:16:55,270 --> 01:16:58,690 Hereafter, we'll focus on software, specifically on what you can do, 1470 01:16:58,690 --> 01:17:00,910 whether you use or write software. 1471 01:17:00,910 --> 01:17:03,940 And thereafter, we'll take a turn to privacy as well. 1472 01:17:03,940 --> 01:17:06,570 All that next time. 1473 01:17:06,570 --> 01:17:08,000