1 00:00:00,000 --> 00:00:02,928 [MUSIC PLAYING] 2 00:00:02,928 --> 00:00:16,987 3 00:00:16,987 --> 00:00:18,070 DAVID J. MALAN: All right. 4 00:00:18,070 --> 00:00:21,130 This is CS50's Introduction to Cybersecurity. 5 00:00:21,130 --> 00:00:22,360 My name is David Malan. 6 00:00:22,360 --> 00:00:25,480 And this week, let's focus on preserving privacy. 7 00:00:25,480 --> 00:00:27,340 Indeed, over the past several weeks, we've 8 00:00:27,340 --> 00:00:32,350 focused on securing your accounts, your data, your systems, your software. 9 00:00:32,350 --> 00:00:36,640 And all of that is really about keeping communications between points A and B, 10 00:00:36,640 --> 00:00:39,310 for instance, secure, so that no one in between 11 00:00:39,310 --> 00:00:42,070 can actually access the information you're trying to share. 12 00:00:42,070 --> 00:00:47,090 But what if you, A, don't even want B to have some of that information? 13 00:00:47,090 --> 00:00:49,990 So indeed, today, let's focus on some of the technologies 14 00:00:49,990 --> 00:00:52,420 that you and I use every day and some of the technologies 15 00:00:52,420 --> 00:00:55,420 that underlie the software, and applications, 16 00:00:55,420 --> 00:00:58,150 and more that you and I are going to use tomorrow and beyond 17 00:00:58,150 --> 00:01:02,440 and consider exactly what information we're sharing now, 18 00:01:02,440 --> 00:01:04,599 perhaps, even without our knowledge and also 19 00:01:04,599 --> 00:01:07,660 empower you with certain mechanisms via which you can perhaps 20 00:01:07,660 --> 00:01:09,940 restrict all the more of this information 21 00:01:09,940 --> 00:01:13,220 if you, indeed, do not want to share it beyond yourself. 22 00:01:13,220 --> 00:01:16,360 So let's consider first some of the obvious features 23 00:01:16,360 --> 00:01:19,820 that you and I probably use every day, like your web browsing history. 24 00:01:19,820 --> 00:01:23,150 Whether you're on a laptop, or desktop, or mobile device, 25 00:01:23,150 --> 00:01:25,520 odds are you know by now that your browser tends 26 00:01:25,520 --> 00:01:29,180 to keep track of pretty much everywhere you go on the World Wide Web. 27 00:01:29,180 --> 00:01:31,640 That is to say, if you click on your URL bar, 28 00:01:31,640 --> 00:01:35,450 you can sometimes browse through the past few URLs that you visited. 29 00:01:35,450 --> 00:01:38,338 If you go up to your browser's history via some menu, 30 00:01:38,338 --> 00:01:40,130 you can probably see everything you've done 31 00:01:40,130 --> 00:01:43,130 earlier today, yesterday, last week, last year, 32 00:01:43,130 --> 00:01:45,420 or perhaps, even the entirety of your history, 33 00:01:45,420 --> 00:01:48,810 particularly, if you're logging into your Google account, 34 00:01:48,810 --> 00:01:51,060 Microsoft account, or something else. 35 00:01:51,060 --> 00:01:56,720 So the web browsing history is sort of both a concern when it comes 36 00:01:56,720 --> 00:01:58,755 to your privacy, but also a feature. 37 00:01:58,755 --> 00:02:00,380 Well, let's first consider the feature. 38 00:02:00,380 --> 00:02:01,650 Well, why is that useful? 39 00:02:01,650 --> 00:02:04,640 Well, one, I mean, even I occasionally go back 40 00:02:04,640 --> 00:02:06,440 through my history trying to find some web 41 00:02:06,440 --> 00:02:09,050 page that I know I was looking at earlier in the day, 42 00:02:09,050 --> 00:02:11,750 or yesterday, or some previous time in the past 43 00:02:11,750 --> 00:02:14,650 because it just helps me find information more quickly. 44 00:02:14,650 --> 00:02:16,910 And so in that sense, it might solve a problem for me. 45 00:02:16,910 --> 00:02:20,930 Moreover, you probably have noticed that your web browsing history is often 46 00:02:20,930 --> 00:02:22,700 used for features like autocomplete. 47 00:02:22,700 --> 00:02:25,580 So when you start typing a URL or maybe even 48 00:02:25,580 --> 00:02:28,310 a keyword that was in the name of a page, 49 00:02:28,310 --> 00:02:31,860 your browser might remember much more quickly what it is you're looking for. 50 00:02:31,860 --> 00:02:33,410 So you can just hit Enter or click. 51 00:02:33,410 --> 00:02:35,660 And voila, you're at that same page. 52 00:02:35,660 --> 00:02:40,070 But of course, this is a concern, potentially, for your privacy, 53 00:02:40,070 --> 00:02:44,360 whereby, you might not want someone else who has physical access to your device 54 00:02:44,360 --> 00:02:46,700 to start poking through where it is you've gone. 55 00:02:46,700 --> 00:02:49,670 You might not want someone else to have access if you just so happen 56 00:02:49,670 --> 00:02:54,200 to visit that website or those websites on maybe a computer in a lab 57 00:02:54,200 --> 00:02:56,790 environment, or an internet cafe, or the like. 58 00:02:56,790 --> 00:02:59,300 So you can imagine quite a few scenarios in which 59 00:02:59,300 --> 00:03:03,350 this is, yes, a feature, but quite a few other scenarios in which this is not 60 00:03:03,350 --> 00:03:06,050 really a desirable feature because it invades 61 00:03:06,050 --> 00:03:08,780 your privacy in some sense, or at least, puts it at risk 62 00:03:08,780 --> 00:03:10,800 for being invaded by someone else. 63 00:03:10,800 --> 00:03:14,630 So we'll consider how we might at least sanitize this history 64 00:03:14,630 --> 00:03:17,420 or remove it altogether in ways that you might already know about. 65 00:03:17,420 --> 00:03:21,350 For instance, you're probably already familiar with some option 66 00:03:21,350 --> 00:03:24,110 in your browser, whereby, you can clear your browser history. 67 00:03:24,110 --> 00:03:26,750 And that forgets, therefore, all of the places 68 00:03:26,750 --> 00:03:30,560 that you've been, all of the cookies that you might have accumulated, 69 00:03:30,560 --> 00:03:32,450 all of the usernames and passwords that might 70 00:03:32,450 --> 00:03:34,040 have been remembered by your browser. 71 00:03:34,040 --> 00:03:36,650 Although, that tends to be a fairly heavy-handed solution 72 00:03:36,650 --> 00:03:39,230 because when you clear your browser history, assuming you 73 00:03:39,230 --> 00:03:42,110 check all of those boxes, all of it is gone. 74 00:03:42,110 --> 00:03:46,340 And that might mean negatively that you're now logged out of Google, 75 00:03:46,340 --> 00:03:49,012 you're now logged out of Outlook, or some other account 76 00:03:49,012 --> 00:03:50,720 that you actually still want to use, even 77 00:03:50,720 --> 00:03:54,270 if you just wanted to clear your history from something else altogether. 78 00:03:54,270 --> 00:03:57,170 So we'll consider then what else might be 79 00:03:57,170 --> 00:04:00,440 a concern when it comes to your privacy beyond your own browser. 80 00:04:00,440 --> 00:04:04,880 And in fact, it doesn't even matter if you sanitize your own web browsing 81 00:04:04,880 --> 00:04:08,720 history and delete the entirety of it because it turns out 82 00:04:08,720 --> 00:04:12,290 that typically, any website you visit is, itself, 83 00:04:12,290 --> 00:04:16,490 on the server side also keeping track of a lot of that same information. 84 00:04:16,490 --> 00:04:18,959 That is to say that servers typically have logs. 85 00:04:18,959 --> 00:04:21,529 And these are not only for diagnostic purposes. 86 00:04:21,529 --> 00:04:23,660 In case anything goes wrong, the IT staff 87 00:04:23,660 --> 00:04:26,450 can use those logs to reconstruct history 88 00:04:26,450 --> 00:04:28,640 and figure it out, figure out who was doing what 89 00:04:28,640 --> 00:04:30,920 and when and how that might explain some problem. 90 00:04:30,920 --> 00:04:32,930 It might be used for auditing purposes if they 91 00:04:32,930 --> 00:04:35,690 want to keep track of exactly what was accessed on a system. 92 00:04:35,690 --> 00:04:37,850 It might be used for advertising purposes 93 00:04:37,850 --> 00:04:40,820 or analytical purposes more generally to mine 94 00:04:40,820 --> 00:04:45,020 or analyze that data to figure out how we might monetize it or do something 95 00:04:45,020 --> 00:04:46,730 else with that same information. 96 00:04:46,730 --> 00:04:48,770 But what do we mean concretely when we say 97 00:04:48,770 --> 00:04:51,780 that information is logged on a server? 98 00:04:51,780 --> 00:04:54,350 Well, it's very similar to your own web browsing history, 99 00:04:54,350 --> 00:04:56,060 but it's even more detailed. 100 00:04:56,060 --> 00:05:00,350 So here, for instance, is a representative piece of configuration 101 00:05:00,350 --> 00:05:04,190 that captures what is a very common convention for information 102 00:05:04,190 --> 00:05:06,740 that servers log, web servers, specifically, 103 00:05:06,740 --> 00:05:08,210 when you visit them with a browser. 104 00:05:08,210 --> 00:05:10,290 And I'll highlight just a subset thereof. 105 00:05:10,290 --> 00:05:13,310 This log format, the so-called combined format, 106 00:05:13,310 --> 00:05:16,370 indicates to me that it's very common for a server 107 00:05:16,370 --> 00:05:21,510 when you visit some web page on it that the server will log, that is, remember 108 00:05:21,510 --> 00:05:24,790 your remote address, otherwise known as your IP address. 109 00:05:24,790 --> 00:05:28,230 It will remember the day and time at which you accessed that page. 110 00:05:28,230 --> 00:05:30,540 It will remember exactly what you requested, 111 00:05:30,540 --> 00:05:33,330 so the name of the file or folder on the server 112 00:05:33,330 --> 00:05:35,920 specifically that you sought to download or look at. 113 00:05:35,920 --> 00:05:39,210 It'll remember the referrer, that is, the URL from which you came. 114 00:05:39,210 --> 00:05:41,640 And it will even remember the user agent that you use, 115 00:05:41,640 --> 00:05:43,710 that is to say your browser. 116 00:05:43,710 --> 00:05:46,560 So perhaps, unbeknownst to you, every time 117 00:05:46,560 --> 00:05:49,740 you use your browser to visit some website, inside 118 00:05:49,740 --> 00:05:54,510 of that virtual envelope is quite a bit more than just the request 119 00:05:54,510 --> 00:05:56,250 that you're making of the browser-- 120 00:05:56,250 --> 00:05:57,510 of the server, rather. 121 00:05:57,510 --> 00:06:00,510 It includes, yes, your IP address and on the outside 122 00:06:00,510 --> 00:06:02,640 of the envelope, as we've described it in the past. 123 00:06:02,640 --> 00:06:06,330 It includes some number of HTTP headers, as we've discussed in the past. 124 00:06:06,330 --> 00:06:09,450 But in particular, it includes information 125 00:06:09,450 --> 00:06:13,230 that you might not want being stored on servers in perpetuity 126 00:06:13,230 --> 00:06:16,860 and you have no control over deleting necessarily. 127 00:06:16,860 --> 00:06:20,430 Unless there is some regulatory requirement or law that 128 00:06:20,430 --> 00:06:23,910 requires that the server delete it for you on some schedule, 129 00:06:23,910 --> 00:06:27,220 you have much, much less control over this information. 130 00:06:27,220 --> 00:06:29,280 So let's consider in a bit more technical detail 131 00:06:29,280 --> 00:06:31,860 what some of this information is and how you 132 00:06:31,860 --> 00:06:35,670 might at least exert some control over just how much of that information 133 00:06:35,670 --> 00:06:36,880 is being shared. 134 00:06:36,880 --> 00:06:40,710 So let's revisit first this building block of HTTP headers 135 00:06:40,710 --> 00:06:44,250 that we keep coming back to if only because in the world of systems 136 00:06:44,250 --> 00:06:47,970 and software nowadays, things on the web are just so common. 137 00:06:47,970 --> 00:06:52,140 Using HTML, CSS, JavaScript, using web browsers and web servers, 138 00:06:52,140 --> 00:06:54,450 that's driving a lot of today's interactions 139 00:06:54,450 --> 00:06:57,090 with technology, whether it's in native applications 140 00:06:57,090 --> 00:07:01,600 or whether it's with mobile websites, or desktop websites, or the like. 141 00:07:01,600 --> 00:07:05,460 So HTTP headers, recall, are just like key value pairs 142 00:07:05,460 --> 00:07:07,950 that are inside of those virtual envelopes that 143 00:07:07,950 --> 00:07:11,850 indicate some kind of setting or some kind of piece of information 144 00:07:11,850 --> 00:07:13,830 that the browser is sending to the server 145 00:07:13,830 --> 00:07:16,210 or that the server is sending to the browser. 146 00:07:16,210 --> 00:07:19,920 So for instance, if I go on google.com and I search, 147 00:07:19,920 --> 00:07:24,300 as I often do, for cats, well, what might be going on underneath the hood? 148 00:07:24,300 --> 00:07:29,530 Well, in that web page that Google gives me with 10, or 20, or 30, 149 00:07:29,530 --> 00:07:34,960 or six billion, 240 million cats, there might be HTML that looks like this. 150 00:07:34,960 --> 00:07:37,290 And recall that this HTML, which I'm proposing 151 00:07:37,290 --> 00:07:41,460 exists somewhere in Google search results, is an anchor tag for a link. 152 00:07:41,460 --> 00:07:43,320 There's the n tag over there. 153 00:07:43,320 --> 00:07:46,170 The hyper-reference for this link, or href attribute 154 00:07:46,170 --> 00:07:51,720 has a value of https://example.com, for instance. 155 00:07:51,720 --> 00:07:56,190 And the word that the human will see is cats, literally in this case. 156 00:07:56,190 --> 00:07:58,710 Now, I'm assuming for the sake of discussion that today, 157 00:07:58,710 --> 00:08:01,020 example.com is a website full of cats. 158 00:08:01,020 --> 00:08:04,020 And that's why it might be appearing among Google search results 159 00:08:04,020 --> 00:08:06,870 when I search for cats as my keyword. 160 00:08:06,870 --> 00:08:12,930 But when a user like you or me clicks on that link on google.com, 161 00:08:12,930 --> 00:08:15,630 because that is literally where you're looking at the search 162 00:08:15,630 --> 00:08:20,790 results in this story, it turns out that your browser not only goes and requests 163 00:08:20,790 --> 00:08:26,040 that web page, your browser includes an HTTP header 164 00:08:26,040 --> 00:08:29,040 like this in that virtual envelope. 165 00:08:29,040 --> 00:08:33,039 That's specifically called referrer-- that's the key in this case-- 166 00:08:33,039 --> 00:08:37,169 the value of which is the URL from which you came. 167 00:08:37,169 --> 00:08:41,250 So for instance, if I have just gone to google.com, and I've searched for cats, 168 00:08:41,250 --> 00:08:43,830 and I've hit Enter, recall, as in past classes, 169 00:08:43,830 --> 00:08:47,160 I proposed that the shortest version of the URL that you might see 170 00:08:47,160 --> 00:08:50,040 in your browser upon searching for cats is this, 171 00:08:50,040 --> 00:08:57,150 https://www.google.com/search?q=cats. 172 00:08:57,150 --> 00:08:59,640 Now, that is what you'd see in your URL bar. 173 00:08:59,640 --> 00:09:02,610 Below that, you'd see the 10, or the 20, or the 30, or the six billion, 174 00:09:02,610 --> 00:09:06,510 240 million cats, each of which has a link that when clicked, 175 00:09:06,510 --> 00:09:08,280 leads you to a search result. 176 00:09:08,280 --> 00:09:12,510 But the implication of this HTTP header is that by default, 177 00:09:12,510 --> 00:09:15,300 perhaps, unbeknownst to you, indeed, your browser 178 00:09:15,300 --> 00:09:18,270 is telling the whole world from which web page 179 00:09:18,270 --> 00:09:23,490 you came when you visited some other web page via a link. 180 00:09:23,490 --> 00:09:26,250 Now, why in the world is this compelling? 181 00:09:26,250 --> 00:09:30,510 Well, it's actually useful for the website at which you end because it 182 00:09:30,510 --> 00:09:32,100 might be useful for their analytics. 183 00:09:32,100 --> 00:09:34,980 They might want to know, well, how are people finding my website? 184 00:09:34,980 --> 00:09:37,120 How are people finding my business on the internet? 185 00:09:37,120 --> 00:09:37,620 Oh. 186 00:09:37,620 --> 00:09:39,570 It looks like I'm getting a lot of users, 187 00:09:39,570 --> 00:09:42,300 a lot of customers, perhaps, from google.com, 188 00:09:42,300 --> 00:09:47,140 specifically, when someone searches for cats, not dogs, not something else, 189 00:09:47,140 --> 00:09:47,850 but cats. 190 00:09:47,850 --> 00:09:50,610 So you can imagine, especially in the world of commerce, 191 00:09:50,610 --> 00:09:54,240 that just being useful information to know how people are finding you, 192 00:09:54,240 --> 00:09:57,810 or conversely, how people are not apparently finding you. 193 00:09:57,810 --> 00:10:01,228 But this is very invasive because now this website, 194 00:10:01,228 --> 00:10:03,270 even though it's arguably none of their business, 195 00:10:03,270 --> 00:10:07,440 they know I use Google instead of Bing or some other search engine, perhaps. 196 00:10:07,440 --> 00:10:11,310 And you can imagine that there could be links on CS50's own website, 197 00:10:11,310 --> 00:10:13,350 on any number of other websites in the world. 198 00:10:13,350 --> 00:10:15,810 And just because you happened to visit them and you clicked a link, 199 00:10:15,810 --> 00:10:18,380 now they're broadcasting your business to whatever website 200 00:10:18,380 --> 00:10:22,820 you're ending up on by revealing where you came from, from where 201 00:10:22,820 --> 00:10:24,810 you were referred, so to speak. 202 00:10:24,810 --> 00:10:28,110 But this is long been a feature of HTTP. 203 00:10:28,110 --> 00:10:32,480 And this has long been a feature that's enabled by default, unless the website, 204 00:10:32,480 --> 00:10:37,670 or perhaps, you, as the user, turn this off or somehow moderate its response. 205 00:10:37,670 --> 00:10:41,120 Now, some of you might be noticing that there's a bit of a typo on the screen. 206 00:10:41,120 --> 00:10:43,128 And I promise, this isn't actually mine. 207 00:10:43,128 --> 00:10:45,920 In English, at least, this is not typically how you spell the word, 208 00:10:45,920 --> 00:10:46,610 referrer. 209 00:10:46,610 --> 00:10:48,320 And this is actually a fun fact. 210 00:10:48,320 --> 00:10:51,620 In referrer, there should be four R's in total. 211 00:10:51,620 --> 00:10:57,200 It should be R-E-F-E-R-R-E-R. However, fun fact, years ago, 212 00:10:57,200 --> 00:11:01,140 when the specification for this standard was being written, 213 00:11:01,140 --> 00:11:05,360 the poor individual who wrote the specification made a typo that has been 214 00:11:05,360 --> 00:11:08,520 immortalized in history for years to come. 215 00:11:08,520 --> 00:11:11,540 And so this is what browsers and servers have been using and expecting 216 00:11:11,540 --> 00:11:12,320 for years. 217 00:11:12,320 --> 00:11:15,560 There are other variants of this header that these typographical error has 218 00:11:15,560 --> 00:11:16,740 been fixed in. 219 00:11:16,740 --> 00:11:19,680 But it's sort of a fun fact from our internet history. 220 00:11:19,680 --> 00:11:23,880 But this is, indeed, what you might see going from your browser to your server. 221 00:11:23,880 --> 00:11:27,900 So ideally, we'd send less information, at least. 222 00:11:27,900 --> 00:11:31,020 I'd be a little more comfortable if example.com, 223 00:11:31,020 --> 00:11:34,780 which is this website for cats, told them, OK, fine. 224 00:11:34,780 --> 00:11:35,580 I came from Google. 225 00:11:35,580 --> 00:11:36,540 That's not a big deal. 226 00:11:36,540 --> 00:11:39,150 But I'd rather you not know what I was looking for 227 00:11:39,150 --> 00:11:41,400 if only because that seems unnecessary. 228 00:11:41,400 --> 00:11:42,300 It seems invasive. 229 00:11:42,300 --> 00:11:44,820 And who knows what kinds of cats I was looking for? 230 00:11:44,820 --> 00:11:48,510 Maybe I don't want you to know exactly what my preferences are in cats, 231 00:11:48,510 --> 00:11:51,180 or dogs, or whatever types of breeds there might be 232 00:11:51,180 --> 00:11:53,020 in this case of searching for animals. 233 00:11:53,020 --> 00:11:55,770 So it just feels like it's unnecessary information to share. 234 00:11:55,770 --> 00:11:58,020 But better still, I dare say, would not be 235 00:11:58,020 --> 00:12:01,350 to even tell example.com where I'm coming from 236 00:12:01,350 --> 00:12:04,510 and essentially just get rid of this altogether. 237 00:12:04,510 --> 00:12:08,250 So how might a website go about moderating 238 00:12:08,250 --> 00:12:11,460 just how much output comes from the browsers at the server's request? 239 00:12:11,460 --> 00:12:13,830 Or maybe, how might you with special software 240 00:12:13,830 --> 00:12:17,160 suppress some of this information to preserve all the more of your privacy 241 00:12:17,160 --> 00:12:18,810 and what it is you're doing online? 242 00:12:18,810 --> 00:12:23,160 Well, for instance, this is a common tag that web pages 243 00:12:23,160 --> 00:12:27,690 can put in their own HTML code that indicates to the browser 244 00:12:27,690 --> 00:12:32,310 that, yes, you may send the referring address, but only send the origin, 245 00:12:32,310 --> 00:12:36,540 that is, https://www.google.com/. 246 00:12:36,540 --> 00:12:43,230 And that's it, no search, path, no ?q=cats. 247 00:12:43,230 --> 00:12:46,740 Tell them the website you came from, but not the specific page or not 248 00:12:46,740 --> 00:12:49,600 the specific search query or search results. 249 00:12:49,600 --> 00:12:53,040 Notice here in the world of HTML, the typographical error has been fixed. 250 00:12:53,040 --> 00:12:54,670 There's two R's in the middle there. 251 00:12:54,670 --> 00:12:58,710 But otherwise, this is an HTML solution to the problem. 252 00:12:58,710 --> 00:13:01,860 The browser, assuming it respects this HTML, 253 00:13:01,860 --> 00:13:05,280 will therefore, send and refer HTTP header, 254 00:13:05,280 --> 00:13:10,020 but with less information, not the whole URL, but just the origin, so really, 255 00:13:10,020 --> 00:13:13,980 the domain name, itself, and a bit more the protocol. 256 00:13:13,980 --> 00:13:17,670 If you don't want any of that to be sent for your users, for your customers 257 00:13:17,670 --> 00:13:18,802 you could do this instead. 258 00:13:18,802 --> 00:13:20,010 Now, Google does not do this. 259 00:13:20,010 --> 00:13:24,030 Google currently actually sends origin, so part of the URL. 260 00:13:24,030 --> 00:13:26,190 But if you want to be an even better citizen 261 00:13:26,190 --> 00:13:29,820 and not make it easy for browsers to send more information than they need 262 00:13:29,820 --> 00:13:32,760 to, you can include this HTML in your page 263 00:13:32,760 --> 00:13:35,850 instead, informing the browser that you can send-- 264 00:13:35,850 --> 00:13:40,470 don't send a referrer at all because the value of this meta tag, so to speak, 265 00:13:40,470 --> 00:13:43,410 is actually none instead of origin. 266 00:13:43,410 --> 00:13:46,140 And there are other values as well that allow you 267 00:13:46,140 --> 00:13:48,870 a bit of range of opportunities when it comes to these settings. 268 00:13:48,870 --> 00:13:52,980 But these are, perhaps, the most common or ones to consider. 269 00:13:52,980 --> 00:13:54,240 There's an alternative too. 270 00:13:54,240 --> 00:13:56,032 If you happen to be a little more technical 271 00:13:56,032 --> 00:14:00,360 and you have control over the web server and not just the HTML on the server, 272 00:14:00,360 --> 00:14:05,430 you can actually configure a referrer policy, HTTP header, 273 00:14:05,430 --> 00:14:09,640 that goes from the browserver to the browser. 274 00:14:09,640 --> 00:14:12,510 So in this case, the referrer policy can indicate 275 00:14:12,510 --> 00:14:16,260 that you only want the origin to be sent, for instance, the shorter 276 00:14:16,260 --> 00:14:17,400 form of the URL. 277 00:14:17,400 --> 00:14:20,910 Or you can actually indicate that no referrer should actually 278 00:14:20,910 --> 00:14:24,270 be sent in this particular case, so a second mechanism 279 00:14:24,270 --> 00:14:26,400 for actually controlling the same. 280 00:14:26,400 --> 00:14:31,190 Let me pause here and see if there's not only some concerns, perhaps, 281 00:14:31,190 --> 00:14:32,940 now that you understand better, hopefully, 282 00:14:32,940 --> 00:14:37,800 how the web works, at least, by default or how we might mitigate 283 00:14:37,800 --> 00:14:40,080 this concern with your privacy. 284 00:14:40,080 --> 00:14:43,350 AUDIENCE: Is there a way that is easy enough 285 00:14:43,350 --> 00:14:49,110 for us to delete those traces as a client in case 286 00:14:49,110 --> 00:14:51,595 that we don't want to be tracked or something like that? 287 00:14:51,595 --> 00:14:53,220 DAVID J. MALAN: A really good question. 288 00:14:53,220 --> 00:14:57,870 We'll refer you to some URLs outside of the context of class, itself. 289 00:14:57,870 --> 00:15:00,240 But yes, there is actually client-side software 290 00:15:00,240 --> 00:15:02,520 that you can install on your own Mac or PC, 291 00:15:02,520 --> 00:15:05,340 typically, that will scrub some of this information, 292 00:15:05,340 --> 00:15:09,210 so that when your HTTP requests you go from your browser to servers, 293 00:15:09,210 --> 00:15:12,330 you can ensure that this third-party software removes 294 00:15:12,330 --> 00:15:15,750 a lot of that information automatically for you because in that way, 295 00:15:15,750 --> 00:15:19,130 you don't have to trust that the website, like the Googles of the world 296 00:15:19,130 --> 00:15:22,500 will actually reduce the amount of information for you. 297 00:15:22,500 --> 00:15:25,250 You can instead do that for yourself through client-side software. 298 00:15:25,250 --> 00:15:28,100 And we'll provide a few links online. 299 00:15:28,100 --> 00:15:29,990 Other questions on the same? 300 00:15:29,990 --> 00:15:33,160 AUDIENCE: By using a private browser such as Tor, 301 00:15:33,160 --> 00:15:36,980 for example, or using a temporary operating system like Tails, 302 00:15:36,980 --> 00:15:40,040 does this remove all of our traces on the internet? 303 00:15:40,040 --> 00:15:43,157 Or does it leave some on the client side or the server side? 304 00:15:43,157 --> 00:15:44,490 DAVID J. MALAN: A good question. 305 00:15:44,490 --> 00:15:47,330 Short answer is that it does leave some evidence on both the server 306 00:15:47,330 --> 00:15:48,680 side and the client side. 307 00:15:48,680 --> 00:15:52,590 But we'll come back to Tor in just a little bit as well. 308 00:15:52,590 --> 00:15:53,090 All right. 309 00:15:53,090 --> 00:15:54,320 How about one final question? 310 00:15:54,320 --> 00:15:58,430 AUDIENCE: You said previously about the third-party software that's 311 00:15:58,430 --> 00:16:02,210 supposed to be used in order to scrub the information from being submitted 312 00:16:02,210 --> 00:16:04,190 to the server side. 313 00:16:04,190 --> 00:16:07,970 What if that program, itself, is used to eavesdrop 314 00:16:07,970 --> 00:16:10,895 on what we do on the computer? 315 00:16:10,895 --> 00:16:12,770 DAVID J. MALAN: That is a very valid concern. 316 00:16:12,770 --> 00:16:14,690 It is absolutely possible. 317 00:16:14,690 --> 00:16:17,540 In general, what is working in your favor 318 00:16:17,540 --> 00:16:20,840 is either open-source software, where if you're using software 319 00:16:20,840 --> 00:16:23,840 that other people can see the source code of, presumably, 320 00:16:23,840 --> 00:16:26,210 it's less likely that it's doing something malicious. 321 00:16:26,210 --> 00:16:29,090 Capitalism often helps you here too, whereby, 322 00:16:29,090 --> 00:16:32,000 it is often not in a company's own interest 323 00:16:32,000 --> 00:16:35,320 to be violating the privacy of their users because presumably, 324 00:16:35,320 --> 00:16:38,570 that would create some form of backlash, which would not be good for business. 325 00:16:38,570 --> 00:16:41,240 But beyond that, there is a lot of trust on your part 326 00:16:41,240 --> 00:16:44,240 and my part whenever it comes to installing software. 327 00:16:44,240 --> 00:16:46,850 So that is, indeed, very much a risk. 328 00:16:46,850 --> 00:16:48,980 Now, it turns out there's other information 329 00:16:48,980 --> 00:16:52,100 that your browser might be sharing without your realizing 330 00:16:52,100 --> 00:16:53,390 that it's making it available. 331 00:16:53,390 --> 00:16:57,260 And that information is enough via which servers can even 332 00:16:57,260 --> 00:16:58,820 fingerprint you, so to speak. 333 00:16:58,820 --> 00:17:01,070 That is to say there's this technique generally called 334 00:17:01,070 --> 00:17:03,230 fingerprinting that in the context of the web 335 00:17:03,230 --> 00:17:07,310 means to take as input a whole bunch of characteristics 336 00:17:07,310 --> 00:17:09,859 of the request from the internet that's coming in 337 00:17:09,859 --> 00:17:14,060 and see if you can use those characteristics to create 338 00:17:14,060 --> 00:17:17,869 a profile of sorts for the user via which you can uniquely 339 00:17:17,869 --> 00:17:19,440 identify that user. 340 00:17:19,440 --> 00:17:22,579 Now, that doesn't mean you'll know specifically that user is David Malan. 341 00:17:22,579 --> 00:17:25,550 But you will know, according to this system, 342 00:17:25,550 --> 00:17:28,790 if it's the same user today, as you see tomorrow, 343 00:17:28,790 --> 00:17:32,690 as you see the next day because you can use this information 344 00:17:32,690 --> 00:17:36,770 to infer with high probability that, OK, we saw that exact same browser 345 00:17:36,770 --> 00:17:39,050 configuration again, and again, and again. 346 00:17:39,050 --> 00:17:42,170 Odds are it's the same person and not some twin 347 00:17:42,170 --> 00:17:45,150 on the internet who just happens to have precisely those settings. 348 00:17:45,150 --> 00:17:48,260 Now, how might this be implemented or achieved technologically? 349 00:17:48,260 --> 00:17:50,360 Well, the simplest mechanism, perhaps, is just 350 00:17:50,360 --> 00:17:52,190 to rely on something like your IP address. 351 00:17:52,190 --> 00:17:54,830 Recall that any time you're doing something on the internet, 352 00:17:54,830 --> 00:17:57,110 those virtual envelopes we keep talking about 353 00:17:57,110 --> 00:18:00,480 have your IP address on the outside, so to speak, 354 00:18:00,480 --> 00:18:03,470 as well as the IP address of the destination to which you're 355 00:18:03,470 --> 00:18:04,820 trying to send information. 356 00:18:04,820 --> 00:18:07,080 Your IP, in that case, is the return address, 357 00:18:07,080 --> 00:18:10,010 which means you're literally telling the remote server when 358 00:18:10,010 --> 00:18:13,190 using certain protocols where you are in the world, 359 00:18:13,190 --> 00:18:15,050 or at least, what your IP address is. 360 00:18:15,050 --> 00:18:18,320 Now, that IP address might not alone uniquely identify you 361 00:18:18,320 --> 00:18:22,700 because it turns out on campuses, in homes, in corporate networks, 362 00:18:22,700 --> 00:18:26,390 you might actually share one IP address with many other people, 363 00:18:26,390 --> 00:18:30,090 but at least narrows the scope of whose IP it might be, 364 00:18:30,090 --> 00:18:31,970 even if it's shared among a few people. 365 00:18:31,970 --> 00:18:34,940 But your browser inside of that virtual envelope 366 00:18:34,940 --> 00:18:37,200 is sharing other information as well. 367 00:18:37,200 --> 00:18:42,350 Another HTTP header that is typically sent by browsers to servers 368 00:18:42,350 --> 00:18:43,940 is called user agent. 369 00:18:43,940 --> 00:18:48,380 And this is just a unique string of text that uniquely identifies typically 370 00:18:48,380 --> 00:18:52,310 the browser that you're using and the version thereof and the operating 371 00:18:52,310 --> 00:18:54,530 system that you're using and the version thereof. 372 00:18:54,530 --> 00:18:58,052 So for instance, a standard format might look a little something like this. 373 00:18:58,052 --> 00:18:59,510 And it's deliberately overwhelming. 374 00:18:59,510 --> 00:19:01,880 And it's just meant to capture how much detail 375 00:19:01,880 --> 00:19:04,040 might be leaked in this header's value. 376 00:19:04,040 --> 00:19:06,740 But within this big string of text that doesn't even 377 00:19:06,740 --> 00:19:09,740 fit onto one line-- it's wrapping here under three lines-- is 378 00:19:09,740 --> 00:19:13,400 some indication of what browser you're using, be it, Chrome or something else 379 00:19:13,400 --> 00:19:15,620 and what operating system you're using, be it, 380 00:19:15,620 --> 00:19:19,520 Android or something else on a phone, a laptop, or a desktop. 381 00:19:19,520 --> 00:19:21,710 Now, of course, a lot of people in the world 382 00:19:21,710 --> 00:19:23,870 presumably have the same browser installed. 383 00:19:23,870 --> 00:19:26,600 So that, too, even with IP address, might not 384 00:19:26,600 --> 00:19:30,140 be enough information to uniquely identify you, 385 00:19:30,140 --> 00:19:31,530 at least, with high probability. 386 00:19:31,530 --> 00:19:33,720 So what else can servers do? 387 00:19:33,720 --> 00:19:37,970 Well, if the server has the ability to send some code to your computer, 388 00:19:37,970 --> 00:19:41,330 for instance, some HTML, some CSS, and some JavaScript, 389 00:19:41,330 --> 00:19:45,710 servers can effectively interrogate the browser and ask it certain questions. 390 00:19:45,710 --> 00:19:50,768 For instance, a server could figure out what the resolution is of your screen. 391 00:19:50,768 --> 00:19:53,060 Now, this might be practically useful, so they know how 392 00:19:53,060 --> 00:19:54,900 to render information on the screen. 393 00:19:54,900 --> 00:19:56,610 But that alone might be enough. 394 00:19:56,610 --> 00:20:00,410 Especially if you're in the habit of full screening your browser 395 00:20:00,410 --> 00:20:03,260 and you always use the same resolution on your monitor, 396 00:20:03,260 --> 00:20:07,070 that might be another ingredient with which to identify or fingerprint you. 397 00:20:07,070 --> 00:20:09,890 The server might be able to figure out what fonts 398 00:20:09,890 --> 00:20:12,170 you have installed on your system. 399 00:20:12,170 --> 00:20:14,750 The server might be able to figure out what time 400 00:20:14,750 --> 00:20:18,950 zone you are in because that's also a value available within the context 401 00:20:18,950 --> 00:20:19,760 of a browser. 402 00:20:19,760 --> 00:20:24,470 And there's yet other values still that collectively with high probability 403 00:20:24,470 --> 00:20:26,760 can be used to fingerprint you and me. 404 00:20:26,760 --> 00:20:29,390 So even if you're not even logged in, even if you're 405 00:20:29,390 --> 00:20:32,990 using various privacy enhancing software products to try 406 00:20:32,990 --> 00:20:35,330 to remove some of these HTTP headers and the like, 407 00:20:35,330 --> 00:20:39,680 you're still leaking other information, including the extensions or plug-ins, 408 00:20:39,680 --> 00:20:42,690 sometimes, that your browser might have installed. 409 00:20:42,690 --> 00:20:45,983 So if you're in the habit of using the same computer again and again 410 00:20:45,983 --> 00:20:48,650 and you're in the habit of not changing a lot of these settings, 411 00:20:48,650 --> 00:20:52,070 that alone might be enough for a website to effectively track you. 412 00:20:52,070 --> 00:20:53,420 Now, it might be innocuous. 413 00:20:53,420 --> 00:20:55,520 They might just use this for statistical purposes 414 00:20:55,520 --> 00:20:58,010 to get a sense of how many users or customers they have. 415 00:20:58,010 --> 00:21:00,050 But it could be for more invasive purposes, 416 00:21:00,050 --> 00:21:02,330 like serving you targeted advertising, based 417 00:21:02,330 --> 00:21:05,450 on your behavior of these websites, or really, just tracking you, 418 00:21:05,450 --> 00:21:06,260 specifically. 419 00:21:06,260 --> 00:21:09,740 And the catch is that if you ever log in to this server 420 00:21:09,740 --> 00:21:13,820 just once, if the server has been logging all of your traffic based 421 00:21:13,820 --> 00:21:17,510 on that fingerprint for days, for months, for years, at that point, 422 00:21:17,510 --> 00:21:20,870 retroactively, with high probability, they can infer, oh, wait a minute. 423 00:21:20,870 --> 00:21:23,570 If the user on this day was David and we think 424 00:21:23,570 --> 00:21:27,080 it was the same user on all of these previous days, now by transitivity, 425 00:21:27,080 --> 00:21:30,810 they know a lot more about your browser history as well. 426 00:21:30,810 --> 00:21:34,460 So even unbeknownst to you, and even without explicit header values 427 00:21:34,460 --> 00:21:37,940 being sent that identify you, the collection 428 00:21:37,940 --> 00:21:41,360 of attributes or characteristics that our browsers have 429 00:21:41,360 --> 00:21:45,530 and our browsing behavior has can still be enough to uniquely identify 430 00:21:45,530 --> 00:21:47,570 most of us quite a bit of the time. 431 00:21:47,570 --> 00:21:51,080 Let me pause here and see if there's any questions on fingerprinting 432 00:21:51,080 --> 00:21:54,560 or these implications for privacy. 433 00:21:54,560 --> 00:21:58,897 AUDIENCE: Will using a VPN prevent browser fingerprinting? 434 00:21:58,897 --> 00:22:00,230 DAVID J. MALAN: A good question. 435 00:22:00,230 --> 00:22:02,570 And we'll talk about VPNs a bit more soon. 436 00:22:02,570 --> 00:22:03,650 Short answer, no. 437 00:22:03,650 --> 00:22:07,490 So VPNs will typically mask your IP address, but that's about it. 438 00:22:07,490 --> 00:22:12,115 If you still use your browser as usual with your user account as usual, 439 00:22:12,115 --> 00:22:14,240 all of that same information is going to be leaked. 440 00:22:14,240 --> 00:22:16,850 It's just going to change one piece of it. 441 00:22:16,850 --> 00:22:18,090 A good question. 442 00:22:18,090 --> 00:22:20,645 Other questions on fingerprinting and privacy? 443 00:22:20,645 --> 00:22:24,980 AUDIENCE: Is it possible that a hacker can steal a fingerprint 444 00:22:24,980 --> 00:22:28,670 and use it for their own purposes and everything 445 00:22:28,670 --> 00:22:33,100 will look like it was my computer that performed certain actions? 446 00:22:33,100 --> 00:22:36,695 so it's like stealing an identity. 447 00:22:36,695 --> 00:22:38,810 DAVID J. MALAN: A short answer, yes, if the hacker 448 00:22:38,810 --> 00:22:40,760 has access to the same information. 449 00:22:40,760 --> 00:22:43,580 If though, if we rewind to our focus on encryption 450 00:22:43,580 --> 00:22:46,910 a couple of classes ago, if you are accessing websites only 451 00:22:46,910 --> 00:22:50,870 via HTTPS and nothing is unencrypted, then it's 452 00:22:50,870 --> 00:22:54,320 going to be a lot harder for a hacker in between you and that server 453 00:22:54,320 --> 00:22:58,130 to glean any of the same information because almost all of it is encrypted. 454 00:22:58,130 --> 00:22:59,360 IP address is not. 455 00:22:59,360 --> 00:23:04,140 But anything inside of the envelope is, including these headers, the HTML, 456 00:23:04,140 --> 00:23:05,900 the JavaScript, and the CSS. 457 00:23:05,900 --> 00:23:08,240 If, though, the hacker has somehow infiltrated 458 00:23:08,240 --> 00:23:13,070 your own laptop, or desktop, or phone, or the server, then all bets are off. 459 00:23:13,070 --> 00:23:15,590 And they could absolutely identify you, according 460 00:23:15,590 --> 00:23:18,730 to these same pieces of information. 461 00:23:18,730 --> 00:23:19,796 Other questions? 462 00:23:19,796 --> 00:23:22,780 AUDIENCE: I was just curious to understand the difference, perhaps, 463 00:23:22,780 --> 00:23:24,310 when you are on mobile. 464 00:23:24,310 --> 00:23:28,150 My understanding is that they can even get much more information 465 00:23:28,150 --> 00:23:29,915 when you are on mobile. 466 00:23:29,915 --> 00:23:31,540 DAVID J. MALAN: That's a fair question. 467 00:23:31,540 --> 00:23:34,030 I don't think I would answer yes to that. 468 00:23:34,030 --> 00:23:38,680 I'm hard pressed to imagine what more your phone is doing than the browser is 469 00:23:38,680 --> 00:23:41,800 doing, except that there are-- 470 00:23:41,800 --> 00:23:44,500 I suppose I could argue that your phone tends 471 00:23:44,500 --> 00:23:49,990 to have additional features nowadays, like GPS, like accelerometers, 472 00:23:49,990 --> 00:23:53,740 gyroscope, perhaps, so other hardware features that theoretically 473 00:23:53,740 --> 00:23:57,530 can be interrogated by JavaScript code, typically, on an opt-in basis. 474 00:23:57,530 --> 00:24:00,790 So you, the user, could deny access to these pieces of information. 475 00:24:00,790 --> 00:24:03,820 But those characteristics, I suspect could 476 00:24:03,820 --> 00:24:08,260 be used to identify you a bit more uniquely because laptops, at least, 477 00:24:08,260 --> 00:24:11,538 today, have less of that functionality. 478 00:24:11,538 --> 00:24:12,205 Other questions? 479 00:24:12,205 --> 00:24:17,350 AUDIENCE: When storing and retrieving data on the front end, 480 00:24:17,350 --> 00:24:24,125 is it more secure to use cookies, local storage, or another alternative? 481 00:24:24,125 --> 00:24:25,750 DAVID J. MALAN: A really good question. 482 00:24:25,750 --> 00:24:29,500 And we will come to this subject literally in one slide, cookies. 483 00:24:29,500 --> 00:24:32,200 In general, local storage because cookies, 484 00:24:32,200 --> 00:24:34,960 by design, are meant to be sent back and forth, back and forth 485 00:24:34,960 --> 00:24:36,700 between browser and server. 486 00:24:36,700 --> 00:24:39,940 Theoretically, that should not be a concern if everything is encrypted. 487 00:24:39,940 --> 00:24:43,060 But we've talked in the past already how mistakes can be made. 488 00:24:43,060 --> 00:24:46,360 You might start on HTTP, be redirected to HTTPS. 489 00:24:46,360 --> 00:24:49,360 So in general, storing things in local storage, at least, 490 00:24:49,360 --> 00:24:52,840 prevent things from accidentally leaking out over the browser connection. 491 00:24:52,840 --> 00:24:55,900 That said, if you're storing things in local storage, 492 00:24:55,900 --> 00:24:57,770 they are literally available locally. 493 00:24:57,770 --> 00:25:00,160 So if you have a colleague, a friend, a sibling 494 00:25:00,160 --> 00:25:04,000 who gains physical access to that device, let alone, an adversary, 495 00:25:04,000 --> 00:25:07,330 then they could see all of the information and not only your cookies, 496 00:25:07,330 --> 00:25:09,100 but also local storage. 497 00:25:09,100 --> 00:25:11,890 So at that point, physical access, generally, 498 00:25:11,890 --> 00:25:15,170 all bets are off when it comes to your privacy. 499 00:25:15,170 --> 00:25:15,670 All right. 500 00:25:15,670 --> 00:25:17,670 How about one other question? 501 00:25:17,670 --> 00:25:22,300 AUDIENCE: There were calls being made from people's local phone numbers 502 00:25:22,300 --> 00:25:25,540 on cell phones to other local numbers. 503 00:25:25,540 --> 00:25:27,710 Obviously, the people weren't making the calls. 504 00:25:27,710 --> 00:25:28,960 And it had happened to me too. 505 00:25:28,960 --> 00:25:31,540 And I was wondering how that kind of works 506 00:25:31,540 --> 00:25:34,000 or if it's related to this at all. 507 00:25:34,000 --> 00:25:35,298 DAVID J. MALAN: It is. 508 00:25:35,298 --> 00:25:37,090 We weren't planning to talk about it today. 509 00:25:37,090 --> 00:25:41,380 But in a nutshell, it is very easy to spoof telephone numbers. 510 00:25:41,380 --> 00:25:44,440 And this is how a lot of spam calls are sent, particularly, 511 00:25:44,440 --> 00:25:46,540 internationally or abroad, where they might not 512 00:25:46,540 --> 00:25:49,510 be regulated in the same way as someone's home country. 513 00:25:49,510 --> 00:25:53,590 It's very common, too, for if your number starts-- 514 00:25:53,590 --> 00:25:57,820 your own phone number starts with 555, for instance, very often, you'll 515 00:25:57,820 --> 00:26:01,120 get fake calls from other numbers that also 516 00:26:01,120 --> 00:26:03,850 start with 555 because the presumption by the adversary 517 00:26:03,850 --> 00:26:06,790 is that, oh, Sabrina's probably more likely to pick this up 518 00:26:06,790 --> 00:26:09,910 if she thinks it's a neighbor with a similar looking phone number. 519 00:26:09,910 --> 00:26:11,860 But unfortunately, with the phone system, 520 00:26:11,860 --> 00:26:13,960 it's all too easy to fake phone numbers. 521 00:26:13,960 --> 00:26:17,480 And this is yet another reason why using phones, using SMS, 522 00:26:17,480 --> 00:26:20,030 is not a recommended approach for our earlier topic 523 00:26:20,030 --> 00:26:21,890 about multi-factor authentication. 524 00:26:21,890 --> 00:26:23,390 It's just not a secure network. 525 00:26:23,390 --> 00:26:27,230 That's not how Edison and others designed it 100-plus years ago. 526 00:26:27,230 --> 00:26:30,890 This is why systems that use cryptography in some form 527 00:26:30,890 --> 00:26:35,020 are much safer when it comes to that information. 528 00:26:35,020 --> 00:26:35,650 All right. 529 00:26:35,650 --> 00:26:38,020 So beyond this user agent header, there's 530 00:26:38,020 --> 00:26:42,032 other headers that your browser is often sending back and forth with the server. 531 00:26:42,032 --> 00:26:44,240 And one of these we've talked about, and one of these 532 00:26:44,240 --> 00:26:47,020 you probably came into the course knowing about, namely, cookies. 533 00:26:47,020 --> 00:26:48,728 But there are different types of cookies. 534 00:26:48,728 --> 00:26:51,700 But recall that in general, a cookie is a piece of information 535 00:26:51,700 --> 00:26:56,360 that a server puts on your computer to help remember who you are. 536 00:26:56,360 --> 00:26:59,020 So in the absence of these fingerprints and the absence 537 00:26:59,020 --> 00:27:01,060 of specific headers like these, it can just 538 00:27:01,060 --> 00:27:04,685 put a small random value with numbers and letters 539 00:27:04,685 --> 00:27:07,060 or the like on your computer or maybe even a bigger value 540 00:27:07,060 --> 00:27:08,140 if it has lots of users. 541 00:27:08,140 --> 00:27:11,290 And it uses that value to uniquely identify you 542 00:27:11,290 --> 00:27:14,350 if you return again and again to the website. 543 00:27:14,350 --> 00:27:16,720 It doesn't necessarily know that I am David, 544 00:27:16,720 --> 00:27:19,265 unless I log in at some point, at which point, 545 00:27:19,265 --> 00:27:20,890 then it can realize, oh, wait a minute. 546 00:27:20,890 --> 00:27:22,670 David's cookie is this value. 547 00:27:22,670 --> 00:27:24,942 Now I know who this user is. 548 00:27:24,942 --> 00:27:26,650 But in general, there are different types 549 00:27:26,650 --> 00:27:28,660 of cookies and different settings for cookies that are 550 00:27:28,660 --> 00:27:30,285 worth knowing a little something about. 551 00:27:30,285 --> 00:27:34,000 So we talked previously about what we'd more properly call session cookies. 552 00:27:34,000 --> 00:27:39,000 So session cookies are used by servers to maintain state, 553 00:27:39,000 --> 00:27:41,600 so to speak, between the server and the browser. 554 00:27:41,600 --> 00:27:46,280 That is to say, without getting too technical, HTTP is typically stateless, 555 00:27:46,280 --> 00:27:49,550 whereby, when you visit a page, the browser icon might spin for a bit. 556 00:27:49,550 --> 00:27:52,310 And then it stops because the transaction between the browser 557 00:27:52,310 --> 00:27:54,420 and the server is complete. 558 00:27:54,420 --> 00:27:58,100 But if you want to remember who the user is, therefore, 559 00:27:58,100 --> 00:28:01,970 the second, the third, the fourth time, the browser contacts the server. 560 00:28:01,970 --> 00:28:05,000 The browser had better remind the server who it is. 561 00:28:05,000 --> 00:28:08,710 And this is why we use the metaphor of the virtual handstamp, whereby, 562 00:28:08,710 --> 00:28:11,210 that handstamp is the browser's way of reminding the server, 563 00:28:11,210 --> 00:28:12,200 you've seen me before. 564 00:28:12,200 --> 00:28:13,710 Don't make me log in again. 565 00:28:13,710 --> 00:28:14,810 I am David. 566 00:28:14,810 --> 00:28:15,770 I am David. 567 00:28:15,770 --> 00:28:18,950 --even though it's just relying on this virtual handstamp or really 568 00:28:18,950 --> 00:28:23,720 some unique identifier that's going in the cookie header from browser 569 00:28:23,720 --> 00:28:24,510 to server. 570 00:28:24,510 --> 00:28:26,870 So a session cookie allows browsers and servers 571 00:28:26,870 --> 00:28:29,390 to maintain sessions, this kind of state. 572 00:28:29,390 --> 00:28:33,230 A little more concretely, it allows them to maintain things like shopping carts. 573 00:28:33,230 --> 00:28:35,907 So if you're shopping on an amazon.com or the like, 574 00:28:35,907 --> 00:28:39,230 the session cookie is what remembers who you are, 575 00:28:39,230 --> 00:28:41,900 or at least, that you're the same person, so that every time you 576 00:28:41,900 --> 00:28:45,680 poke around on the website, Amazon shows you the same contents of your shopping 577 00:28:45,680 --> 00:28:47,870 cart again and again, so that they don't lose 578 00:28:47,870 --> 00:28:51,630 your business by accidentally deleting it when you simply change the page. 579 00:28:51,630 --> 00:28:53,510 So how do session cookies work? 580 00:28:53,510 --> 00:28:55,670 Well, when you first visit a website that 581 00:28:55,670 --> 00:28:58,640 wants to plant a cookie on your computer, 582 00:28:58,640 --> 00:29:01,100 the response might look a little something like this. 583 00:29:01,100 --> 00:29:01,970 HTTP. 584 00:29:01,970 --> 00:29:04,820 200 is the status code, which, recall, means OK. 585 00:29:04,820 --> 00:29:05,630 All is well. 586 00:29:05,630 --> 00:29:08,820 It's not something like 404, which would mean file not found. 587 00:29:08,820 --> 00:29:10,190 So 200 is OK. 588 00:29:10,190 --> 00:29:14,660 But the server might also respond with this key value pair, this HTTP header, 589 00:29:14,660 --> 00:29:16,890 Set-Cookie:. 590 00:29:16,890 --> 00:29:17,970 So that's the key. 591 00:29:17,970 --> 00:29:22,220 The value of which is session=1234abcd. 592 00:29:22,220 --> 00:29:24,260 And that's the same value we used previously 593 00:29:24,260 --> 00:29:26,390 when we talked about cookies in this context. 594 00:29:26,390 --> 00:29:30,140 And the point here is that the name of this cookie is Session. 595 00:29:30,140 --> 00:29:34,790 And its value equals, in this case, 1234abcd. 596 00:29:34,790 --> 00:29:37,970 Now, if you visit the same website and you, and you, and you, 597 00:29:37,970 --> 00:29:42,110 we would all have different seemingly random values for those cookies. 598 00:29:42,110 --> 00:29:45,283 And so this number, this sequence of letters and numbers, 599 00:29:45,283 --> 00:29:46,700 would be different for each of us. 600 00:29:46,700 --> 00:29:49,070 That is to say we have different handstamps 601 00:29:49,070 --> 00:29:50,940 that we're presenting each time. 602 00:29:50,940 --> 00:29:52,830 Now, this is a session cookie. 603 00:29:52,830 --> 00:29:55,550 And it's a session cookie in the sense that it 604 00:29:55,550 --> 00:29:59,300 is supposed to expire when you close the browser, when 605 00:29:59,300 --> 00:30:02,912 you quit for the night, when you reboot or anything else. 606 00:30:02,912 --> 00:30:05,120 Now, with that said, that's a bit of an overstatement 607 00:30:05,120 --> 00:30:09,652 because browsers nowadays will frequently preserve your tabs for you. 608 00:30:09,652 --> 00:30:10,610 They might go to sleep. 609 00:30:10,610 --> 00:30:12,110 You might have to wake them back up. 610 00:30:12,110 --> 00:30:15,110 But increasingly, sessions are living longer than they once did. 611 00:30:15,110 --> 00:30:20,000 But the idea is that this is not meant to last for a year or forever. 612 00:30:20,000 --> 00:30:24,230 It has a much shorter lifetime by design. 613 00:30:24,230 --> 00:30:28,490 When your browser has received that cookie and you click on some other 614 00:30:28,490 --> 00:30:31,280 page, you visit some other product on amazon.com, 615 00:30:31,280 --> 00:30:35,630 your browser might say something like this, GET/ and then cookie:, 616 00:30:35,630 --> 00:30:37,080 that exact same value. 617 00:30:37,080 --> 00:30:38,930 So recall from our previous class, this is 618 00:30:38,930 --> 00:30:43,340 how the browser just reminds the server what its handstamp is 619 00:30:43,340 --> 00:30:44,900 or what its cookie value is. 620 00:30:44,900 --> 00:30:48,350 But again, the idea is that when the browser is closed, 621 00:30:48,350 --> 00:30:51,920 you reboot for the night, then you should not 622 00:30:51,920 --> 00:30:56,490 have the same session cookie tomorrow, at least, in this model. 623 00:30:56,490 --> 00:30:59,000 That's not true for all websites, but according to cookies 624 00:30:59,000 --> 00:31:00,680 as we are currently using them. 625 00:31:00,680 --> 00:31:04,160 Now, that's pretty good for your privacy because if the cookie is 626 00:31:04,160 --> 00:31:08,390 by design meant to be a session cookie and it expires pretty soon when you're 627 00:31:08,390 --> 00:31:11,510 done with that browser tab or done using the browser for the day, 628 00:31:11,510 --> 00:31:14,635 then that's pretty good because it means if you go back to the same website 629 00:31:14,635 --> 00:31:16,830 tomorrow, that cookie might not exist anymore, 630 00:31:16,830 --> 00:31:19,890 so you might as well look like or be a brand new user. 631 00:31:19,890 --> 00:31:23,880 So they can't correlate, perhaps, by default as much information about you. 632 00:31:23,880 --> 00:31:28,070 But these are the cookies that you read about being bad for you 633 00:31:28,070 --> 00:31:30,740 and bad for your privacy, tracking cookies, which 634 00:31:30,740 --> 00:31:33,830 are the exact same idea, key value pairs that 635 00:31:33,830 --> 00:31:37,100 are sent from server to browser to remember who you are, or at least, 636 00:31:37,100 --> 00:31:39,830 that you're the same person, even if we don't know that you're 637 00:31:39,830 --> 00:31:42,060 David Malan specifically just yet. 638 00:31:42,060 --> 00:31:44,690 But as per the name, tracking cookies are really 639 00:31:44,690 --> 00:31:46,830 designed to track you and me. 640 00:31:46,830 --> 00:31:47,330 Why? 641 00:31:47,330 --> 00:31:50,500 Well maybe analytical purposes, maybe debugging purposes, 642 00:31:50,500 --> 00:31:53,000 so that they know where users were in case something breaks, 643 00:31:53,000 --> 00:31:55,580 maybe advertising purposes, so that you get 644 00:31:55,580 --> 00:31:58,760 served different ads from me, so that they can maximize their revenue 645 00:31:58,760 --> 00:32:01,250 by clickserving up ads that you and I are each 646 00:32:01,250 --> 00:32:03,120 more individually likely to click on. 647 00:32:03,120 --> 00:32:06,740 So tracking cookies are the ones that get a bad rep and rightfully so. 648 00:32:06,740 --> 00:32:08,720 So let's consider an example of a cookie that's 649 00:32:08,720 --> 00:32:11,900 designed to track your behavior on a particular website. 650 00:32:11,900 --> 00:32:14,540 Here, for instance, is a set-cookie header 651 00:32:14,540 --> 00:32:18,210 that Google, specifically, might send to your browser. 652 00:32:18,210 --> 00:32:23,270 In fact, they use a cookie that by convention is called _ga for Google 653 00:32:23,270 --> 00:32:25,640 Analytics, which they use for analytical purposes. 654 00:32:25,640 --> 00:32:28,470 And its value looks a little something like this. 655 00:32:28,470 --> 00:32:31,520 And the point of this value is that it's generated 656 00:32:31,520 --> 00:32:35,570 on a per website basis if that website is using Google Analytics. 657 00:32:35,570 --> 00:32:40,220 And Google Analytics is a tool that allows website designers to track 658 00:32:40,220 --> 00:32:44,240 who is clicking on what, what browsers they're using, what operating systems 659 00:32:44,240 --> 00:32:47,570 they're using, and generally giving them a sense of the demographics 660 00:32:47,570 --> 00:32:48,770 of their user base. 661 00:32:48,770 --> 00:32:53,210 But unlike session cookies, which are meant to expire after a day, 662 00:32:53,210 --> 00:32:57,800 after the browser closes or the like, Google's analytical cookie here 663 00:32:57,800 --> 00:33:01,310 has a maximum age of this many seconds, which if you do out 664 00:33:01,310 --> 00:33:05,070 the math is by default two years, which is to say, 665 00:33:05,070 --> 00:33:09,440 if you visit some website that is using Google Analytics by embedding 666 00:33:09,440 --> 00:33:13,670 a bit of Google's JavaScript code in their website, whenever that Google 667 00:33:13,670 --> 00:33:16,070 code is pulled from Google's website, Google 668 00:33:16,070 --> 00:33:20,090 has an opportunity to plant this cookie on your computer. 669 00:33:20,090 --> 00:33:24,650 And you'll get a unique ID based on you visiting for the first time, 670 00:33:24,650 --> 00:33:28,580 based on the specific website that is embedding Google Analytics. 671 00:33:28,580 --> 00:33:31,340 And that cookie is going to live in your computer, 672 00:33:31,340 --> 00:33:35,150 according to this HTTP header, for as long as two years. 673 00:33:35,150 --> 00:33:36,518 Now, that's useful for Google. 674 00:33:36,518 --> 00:33:38,060 It's perhaps, useful for the website. 675 00:33:38,060 --> 00:33:41,240 It's perhaps, a little more invasive for me and you. 676 00:33:41,240 --> 00:33:43,550 Now, Google has many other cookies that they use too. 677 00:33:43,550 --> 00:33:46,312 But this is, perhaps, one that you should keep an eye out for. 678 00:33:46,312 --> 00:33:48,020 And indeed, in the coming weeks or months 679 00:33:48,020 --> 00:33:50,228 if you poke around some of your own browser settings, 680 00:33:50,228 --> 00:33:53,930 you might very well see values like this. 681 00:33:53,930 --> 00:33:57,780 But what else might servers use to keep track of us, 682 00:33:57,780 --> 00:34:01,040 especially if you and I are in the habit of deleting our cookies 683 00:34:01,040 --> 00:34:05,552 or clearing your history, which would be counterproductive for Google 684 00:34:05,552 --> 00:34:07,760 or websites that are trying to track you in this way, 685 00:34:07,760 --> 00:34:12,358 but a plus for you and for my privacy if you're behaving in this way? 686 00:34:12,358 --> 00:34:14,150 But it turns out there's other ways servers 687 00:34:14,150 --> 00:34:18,560 can track us, including through HTTP parameters, tracking parameters. 688 00:34:18,560 --> 00:34:20,659 So parameters are the key value pairs that 689 00:34:20,659 --> 00:34:24,560 often appear in URLs that are sent via GET requests typically. 690 00:34:24,560 --> 00:34:25,699 So we've seen one of these. 691 00:34:25,699 --> 00:34:28,760 If you recall when we searched for cats on Google before, 692 00:34:28,760 --> 00:34:30,770 you might recall that the URL was something 693 00:34:30,770 --> 00:34:39,230 like https://www.google.com/search?q=cats. 694 00:34:39,230 --> 00:34:44,780 Anything after a question mark in a URL is, indeed, an HTTP parameter. 695 00:34:44,780 --> 00:34:48,380 But it could be used not for innocuous helpful purposes, like searching 696 00:34:48,380 --> 00:34:50,690 for cats, but also, to track you. 697 00:34:50,690 --> 00:34:52,880 And in fact, if you see ampersands in URLs, 698 00:34:52,880 --> 00:34:57,190 that might mean that you have a second, or a third, or more parameter up there. 699 00:34:57,190 --> 00:35:00,120 And sometimes the purpose of these parameters 700 00:35:00,120 --> 00:35:03,340 is simply to track you as some person. 701 00:35:03,340 --> 00:35:06,330 So for instance, here is a representative URL. 702 00:35:06,330 --> 00:35:07,320 It's a long one. 703 00:35:07,320 --> 00:35:14,500 And this is taken from example.com having a path of as_engagement?. 704 00:35:14,500 --> 00:35:19,800 And then I'll highlight here click_id= and then this long seemingly random 705 00:35:19,800 --> 00:35:20,310 string. 706 00:35:20,310 --> 00:35:24,740 But there's a second HTTP parameter in this particular URL. 707 00:35:24,740 --> 00:35:27,670 &campaign_id=23. 708 00:35:27,670 --> 00:35:30,660 So the campaign ID, certainly with such a small number, 709 00:35:30,660 --> 00:35:32,070 is not meant to track you. 710 00:35:32,070 --> 00:35:36,470 That's meant to be sufficient input to the website to know what types of ads 711 00:35:36,470 --> 00:35:37,470 should be served to you. 712 00:35:37,470 --> 00:35:40,110 What campaign should be served up? 713 00:35:40,110 --> 00:35:44,760 But this click_id, which is sort of a euphemism for tracking cookie 714 00:35:44,760 --> 00:35:50,040 or tracking parameter in this case, is what's actually keeping track of you, 715 00:35:50,040 --> 00:35:51,990 specifically, because different users are 716 00:35:51,990 --> 00:35:54,390 going to find that whatever link they click on 717 00:35:54,390 --> 00:35:57,570 has a slightly different value for click_id. 718 00:35:57,570 --> 00:35:59,820 So recall that a tracking cookie is something 719 00:35:59,820 --> 00:36:01,410 that's sent via an HTTP header. 720 00:36:01,410 --> 00:36:03,243 And so it's harder for you and me to see it, 721 00:36:03,243 --> 00:36:05,202 unless we're more comfortable with our browsers 722 00:36:05,202 --> 00:36:07,110 and can poke around some underlying settings. 723 00:36:07,110 --> 00:36:11,310 But these tracking parameters are right there in front of you, at least, 724 00:36:11,310 --> 00:36:15,780 if you click on the URL in your browser and take a look at its entirety. 725 00:36:15,780 --> 00:36:20,310 Now, wonderfully, at least for us end users who are concerned about privacy, 726 00:36:20,310 --> 00:36:23,700 browsers and even third-party software are increasingly 727 00:36:23,700 --> 00:36:26,190 removing values like this for us. 728 00:36:26,190 --> 00:36:28,500 As soon as the browser manufacturer or as soon 729 00:36:28,500 --> 00:36:31,560 as the third-party software developer knows that, wait a minute, 730 00:36:31,560 --> 00:36:36,930 click ID has no good purpose other than tracking our users, 731 00:36:36,930 --> 00:36:39,897 they can simply automatically remove it for you. 732 00:36:39,897 --> 00:36:41,730 After all, when you visit a web page and you 733 00:36:41,730 --> 00:36:45,030 get the HTML that represents that web page, 734 00:36:45,030 --> 00:36:47,190 the browser could certainly poke around there 735 00:36:47,190 --> 00:36:49,380 before you even have a chance to click on anything. 736 00:36:49,380 --> 00:36:53,530 And it could scrub or sanitize these kinds of tracking parameters. 737 00:36:53,530 --> 00:36:57,240 Now, to be fair, if the browser manufacturer doesn't necessarily 738 00:36:57,240 --> 00:37:00,090 know what the tracking parameter is called 739 00:37:00,090 --> 00:37:03,030 or if maybe the website is constantly changing the name 740 00:37:03,030 --> 00:37:07,390 or trying to mix things up, this might not work so well. 741 00:37:07,390 --> 00:37:09,360 But it's at least an attempt to try to put 742 00:37:09,360 --> 00:37:13,410 downward pressure on this very commonplace technique of keeping 743 00:37:13,410 --> 00:37:14,820 track of you and me. 744 00:37:14,820 --> 00:37:17,610 Now, why is this parameter able to track us? 745 00:37:17,610 --> 00:37:21,550 Well, this, too, can end up in those server logs because this would be, 746 00:37:21,550 --> 00:37:24,045 for instance, the web page that I am requesting, 747 00:37:24,045 --> 00:37:29,700 /ad_engagement?click_id= dot, dot, dot, that could very well be logged 748 00:37:29,700 --> 00:37:31,830 by the server, stored in a database, even. 749 00:37:31,830 --> 00:37:33,750 And they could use that information to know 750 00:37:33,750 --> 00:37:37,290 exactly which pages I have clicked on, because I visited those links, 751 00:37:37,290 --> 00:37:38,990 and even what ads I have seen. 752 00:37:38,990 --> 00:37:40,740 And maybe that's a good thing commercially 753 00:37:40,740 --> 00:37:43,590 because now they know what types of ads I'm clicking on. 754 00:37:43,590 --> 00:37:45,810 Now they can serve even more of them to me. 755 00:37:45,810 --> 00:37:48,490 And that might be great for them, but probably not so great, 756 00:37:48,490 --> 00:37:51,990 if not, annoying or invasive for me and you. 757 00:37:51,990 --> 00:37:54,810 So something else to keep an eye out for and something else that 758 00:37:54,810 --> 00:37:57,540 might guide your decision making in the days and the years 759 00:37:57,540 --> 00:38:00,060 to come when it comes to picking your browser. 760 00:38:00,060 --> 00:38:03,930 You don't necessarily have to nowadays use the one that comes with your phone, 761 00:38:03,930 --> 00:38:05,670 comes with your laptop or desktop. 762 00:38:05,670 --> 00:38:08,190 You can, if more comfortable, install something else. 763 00:38:08,190 --> 00:38:12,930 And increasingly, you and I are having more and more options. 764 00:38:12,930 --> 00:38:17,550 Questions now on these tracking parameters or anything 765 00:38:17,550 --> 00:38:20,010 prior with respect to our privacy? 766 00:38:20,010 --> 00:38:23,190 AUDIENCE: Are the cookies the ones that track or are 767 00:38:23,190 --> 00:38:25,020 the ones that are being tracked? 768 00:38:25,020 --> 00:38:29,160 DAVID J. MALAN: The cookies are values that are being used to track you. 769 00:38:29,160 --> 00:38:33,930 So recall that-- a metaphor for the cookies is like that virtual handstamp. 770 00:38:33,930 --> 00:38:38,820 And so if all of these web servers are putting ink on your hand and on my hand 771 00:38:38,820 --> 00:38:41,640 and because of HTTP, you and I, our browsers 772 00:38:41,640 --> 00:38:44,010 are in the habit of presenting these cookies, 773 00:38:44,010 --> 00:38:46,410 these handstamps to every website we visit, 774 00:38:46,410 --> 00:38:49,680 that value is being used to track us. 775 00:38:49,680 --> 00:38:52,990 So cookies in and of themselves are just a technology. 776 00:38:52,990 --> 00:38:57,130 It's a very simple idea storing a big random value on your computer and mine 777 00:38:57,130 --> 00:38:58,750 just to uniquely identify us. 778 00:38:58,750 --> 00:39:02,590 They are necessary to give us features like logging into websites, 779 00:39:02,590 --> 00:39:04,060 maintaining shopping carts. 780 00:39:04,060 --> 00:39:06,910 But very quickly, especially since the internet from the get go 781 00:39:06,910 --> 00:39:09,430 has been largely free to use-- 782 00:39:09,430 --> 00:39:11,710 or rather, a lot of the internet has been 783 00:39:11,710 --> 00:39:14,620 free to use once you have a connection, at least-- 784 00:39:14,620 --> 00:39:17,950 they've been used, or in some views, abused 785 00:39:17,950 --> 00:39:22,070 by the advertisers, the Facebooks, and the others of the world. 786 00:39:22,070 --> 00:39:24,490 So another way to think about tracking cookies 787 00:39:24,490 --> 00:39:27,430 is to consider them to be third-party cookies because, indeed, 788 00:39:27,430 --> 00:39:29,920 even in the Google example, that's how they're being used. 789 00:39:29,920 --> 00:39:34,750 If a website like example.com is embedding Google Analytics, 790 00:39:34,750 --> 00:39:39,340 and therefore, some kind of HTML tag that mentions google.com, well then, 791 00:39:39,340 --> 00:39:42,940 example.com is the first-party in that story, so to speak. 792 00:39:42,940 --> 00:39:46,270 And google.com is the third party in that story. 793 00:39:46,270 --> 00:39:48,700 What that means is that your browser might get cookies 794 00:39:48,700 --> 00:39:51,400 from both example.com and google.com. 795 00:39:51,400 --> 00:39:54,890 But the most important ones, presumably, are the first-party ones 796 00:39:54,890 --> 00:39:58,430 from example.com because that is the website you chose to go to 797 00:39:58,430 --> 00:40:00,650 and whose functionality you want to use. 798 00:40:00,650 --> 00:40:03,950 The third-party functionality, like tracking your clicks 799 00:40:03,950 --> 00:40:07,940 and your internet behavior on that site via Google, that's third party. 800 00:40:07,940 --> 00:40:10,340 And so very commonly do browsers nowadays 801 00:40:10,340 --> 00:40:15,230 certainly offer options via which you can disable third-party cookies. 802 00:40:15,230 --> 00:40:17,240 And that tends to be good for privacy sake 803 00:40:17,240 --> 00:40:20,480 because it means you're blocking third parties like Google 804 00:40:20,480 --> 00:40:22,700 from keeping track of you via cookies. 805 00:40:22,700 --> 00:40:26,660 But, but, but that doesn't necessarily mean the website isn't still 806 00:40:26,660 --> 00:40:29,210 using tracking parameters in some way. 807 00:40:29,210 --> 00:40:32,600 And you would only know that by actually looking more closely at the URLs 808 00:40:32,600 --> 00:40:35,520 you're clicking on or that are embedded in the web page itself. 809 00:40:35,520 --> 00:40:38,360 And that's where now browsers and third-party software 810 00:40:38,360 --> 00:40:42,860 are additionally helping by helping us remove not only those cookies, but even 811 00:40:42,860 --> 00:40:43,850 those parameters. 812 00:40:43,850 --> 00:40:45,920 But let's consider a more concrete scenario 813 00:40:45,920 --> 00:40:50,540 of what third-party cookies are and why they allow companies 814 00:40:50,540 --> 00:40:53,510 not only like Google to track your behavior on one website, 815 00:40:53,510 --> 00:40:56,930 but even companies like Google or other advertisers 816 00:40:56,930 --> 00:40:59,840 to track your behavior on multiple websites. 817 00:40:59,840 --> 00:41:03,440 And in this sense, third parties have increasingly 818 00:41:03,440 --> 00:41:08,840 been more powerful, more omniscient, for instance, than the first-party websites 819 00:41:08,840 --> 00:41:10,700 that you and I are actually visiting. 820 00:41:10,700 --> 00:41:11,270 Why? 821 00:41:11,270 --> 00:41:13,520 Well, if there's a lot of popular third parties 822 00:41:13,520 --> 00:41:17,390 out there, Google being one of them for advertisements and for analytics, 823 00:41:17,390 --> 00:41:20,390 well, if lots of different websites are using them-- 824 00:41:20,390 --> 00:41:21,590 maybe Harvard's using them. 825 00:41:21,590 --> 00:41:22,400 Yale's using them. 826 00:41:22,400 --> 00:41:25,880 Stanford's using them-- then that third party very quickly 827 00:41:25,880 --> 00:41:30,530 becomes more powerful than even any of those individual parties alone. 828 00:41:30,530 --> 00:41:31,100 Why? 829 00:41:31,100 --> 00:41:36,830 Because that third party, if it is being embedded at Harvard, Yale, 830 00:41:36,830 --> 00:41:39,560 and Stanford, that third party Google, for instance, 831 00:41:39,560 --> 00:41:42,600 kind of has eyes into all three websites. 832 00:41:42,600 --> 00:41:46,340 And if it sends the same cookie to you on all three websites, 833 00:41:46,340 --> 00:41:49,670 Google might actually know that you're poking around Harvard's, and Yale's 834 00:41:49,670 --> 00:41:52,940 and Stanford's website when Harvard might have no idea you're 835 00:41:52,940 --> 00:41:54,380 checking out Yale and Stanford. 836 00:41:54,380 --> 00:41:57,890 And Stanford might have no idea you're checking out Yale and Harvard. 837 00:41:57,890 --> 00:41:59,660 So what does this mean concretely? 838 00:41:59,660 --> 00:42:03,837 Well, consider some HTML here, such as we've seen before. 839 00:42:03,837 --> 00:42:06,170 And I've highlighted a couple of salient characteristics 840 00:42:06,170 --> 00:42:07,610 in this particular example. 841 00:42:07,610 --> 00:42:11,030 Notice that I've given in this web page not only a body, which contains 842 00:42:11,030 --> 00:42:13,140 the body, the bulk of the web page. 843 00:42:13,140 --> 00:42:16,190 I've also included a head for the web page, inside of which 844 00:42:16,190 --> 00:42:17,810 is another tag called Title. 845 00:42:17,810 --> 00:42:20,300 And I'm doing this just to, one, demonstrate 846 00:42:20,300 --> 00:42:23,000 there are more tags than we have seen in this language thus far. 847 00:42:23,000 --> 00:42:29,300 And specifically, this I claim is meant to represent harvard.edu's own website, 848 00:42:29,300 --> 00:42:31,790 the title of which would be Harvard, like in the tab 849 00:42:31,790 --> 00:42:33,050 along the top of the screen. 850 00:42:33,050 --> 00:42:37,140 And inside of the body of this page for simplicity, 851 00:42:37,140 --> 00:42:40,010 let's assume that for now, there's just one big advertisement. 852 00:42:40,010 --> 00:42:42,350 There's no content for the sake of discussion. 853 00:42:42,350 --> 00:42:44,180 There's just one advertisement. 854 00:42:44,180 --> 00:42:46,620 Well, where is that advertisement coming from? 855 00:42:46,620 --> 00:42:50,570 It's coming from, in this case, example.com, or our friends at Google, 856 00:42:50,570 --> 00:42:53,600 specifically, a file called ad.gif. 857 00:42:53,600 --> 00:42:58,700 And this particular URL is being used as the value of the source 858 00:42:58,700 --> 00:43:00,800 attribute of an image tag. 859 00:43:00,800 --> 00:43:02,100 So what do I mean by this? 860 00:43:02,100 --> 00:43:05,420 Well, if you visit harvard.edu in the story, what you are seeing 861 00:43:05,420 --> 00:43:07,940 is a big advertisement, a big GIF, a graphic 862 00:43:07,940 --> 00:43:11,840 that is coming from example.com. 863 00:43:11,840 --> 00:43:14,430 Now, what is the implication of that? 864 00:43:14,430 --> 00:43:17,130 Well, suppose that Yale is doing the same thing. 865 00:43:17,130 --> 00:43:20,270 So here now, for the sake of discussion, is the exact same HTML, 866 00:43:20,270 --> 00:43:22,100 except it lives at yale.edu. 867 00:43:22,100 --> 00:43:24,842 So the title of the page has now changed to Yale. 868 00:43:24,842 --> 00:43:27,050 And moreover, just to make things really interesting, 869 00:43:27,050 --> 00:43:28,610 let's add Stanford to the mix. 870 00:43:28,610 --> 00:43:30,240 Same exact page. 871 00:43:30,240 --> 00:43:35,630 So the point of this story is that Harvard, and Yale, and Stanford are all 872 00:43:35,630 --> 00:43:39,590 using the same third party, example.com in this case 873 00:43:39,590 --> 00:43:42,500 or maybe someone like Google in the real world. 874 00:43:42,500 --> 00:43:45,050 And they're requesting moreover the same GIF. 875 00:43:45,050 --> 00:43:47,030 And so the same file is being accessed. 876 00:43:47,030 --> 00:43:49,370 But that even alone isn't a strict requirement. 877 00:43:49,370 --> 00:43:53,620 The same website is being accessed by all three of these first parties. 878 00:43:53,620 --> 00:43:54,810 So what does that mean? 879 00:43:54,810 --> 00:43:59,010 Suppose that you open up your browser and you first visit harvard.edu, 880 00:43:59,010 --> 00:44:02,940 your browser is going to download the HTML for Harvard's website. 881 00:44:02,940 --> 00:44:05,580 It's going to see that, oh, there's an image tag in there. 882 00:44:05,580 --> 00:44:09,900 And that image tag wants to show this ad.gif from example.com. 883 00:44:09,900 --> 00:44:12,210 So your browser is automatically, by nature 884 00:44:12,210 --> 00:44:14,520 of how browsers work, going to send a second HTTP 885 00:44:14,520 --> 00:44:20,280 request, this time requesting ad.gif from the host, example.com. 886 00:44:20,280 --> 00:44:22,710 And just to tie today's stories together, 887 00:44:22,710 --> 00:44:29,010 it's going to include, probably, a referrer, HTTP header that 888 00:44:29,010 --> 00:44:30,855 specifies where I'm coming from. 889 00:44:30,855 --> 00:44:32,730 And that's useful for our purposes because it 890 00:44:32,730 --> 00:44:34,770 puts these requests into context. 891 00:44:34,770 --> 00:44:40,110 Now that server, example.com, or Google, in the case of the real world, 892 00:44:40,110 --> 00:44:45,090 is going to probably respond with 200 OK like, OK, here is the advertisement. 893 00:44:45,090 --> 00:44:48,450 And it's going to include not only the image, but also an HTTP 894 00:44:48,450 --> 00:44:49,450 header of its own. 895 00:44:49,450 --> 00:44:51,810 And this is our old friend set-cookie, where 896 00:44:51,810 --> 00:44:53,740 in this case, for the sake of discussion, 897 00:44:53,740 --> 00:44:57,460 I'm going to propose that it's setting a cookie on my computer called ID 898 00:44:57,460 --> 00:45:00,700 because this is going to be my unique identifier for example.com. 899 00:45:00,700 --> 00:45:04,930 Its value is going to be the one I keep using for discussion's sake, 1234abcd. 900 00:45:04,930 --> 00:45:07,450 But that would be some big random value for each of us. 901 00:45:07,450 --> 00:45:08,260 And my gosh. 902 00:45:08,260 --> 00:45:09,790 This thing is going to last a year. 903 00:45:09,790 --> 00:45:13,340 That's the number of seconds in 365 days. 904 00:45:13,340 --> 00:45:16,180 So this cookie is being planted on my computer 905 00:45:16,180 --> 00:45:21,130 by example.com because I visited harvard.edu. 906 00:45:21,130 --> 00:45:22,930 So Harvard is the first party. 907 00:45:22,930 --> 00:45:26,480 Example.com is the third party in this case. 908 00:45:26,480 --> 00:45:28,570 But here now is the concern. 909 00:45:28,570 --> 00:45:32,590 When I visit yale.edu with that same browser, 910 00:45:32,590 --> 00:45:36,100 my hand has been stamped by example.com already. 911 00:45:36,100 --> 00:45:40,090 And so what happens is that my browser now presents 912 00:45:40,090 --> 00:45:44,920 that handstamp to example.com, sending the same ID and the same value, 913 00:45:44,920 --> 00:45:46,360 that is, the same handstamp. 914 00:45:46,360 --> 00:45:48,850 The host is as before example.com. 915 00:45:48,850 --> 00:45:51,740 But this time, the referrer happens to be Yale. 916 00:45:51,740 --> 00:45:55,120 So in other words, after I visited Harvard and my hand 917 00:45:55,120 --> 00:45:58,960 has been stamped with this tracking cookie, this third-party cookie 918 00:45:58,960 --> 00:46:02,290 from example.com, my browser, when I visit yale.edu, 919 00:46:02,290 --> 00:46:05,830 is going to present that same handstamp again, this time, 920 00:46:05,830 --> 00:46:08,320 to example.com with this referrer. 921 00:46:08,320 --> 00:46:11,170 The next time I use my browser to visit stanford.edu, 922 00:46:11,170 --> 00:46:13,630 the same message is going to be sent from my browser 923 00:46:13,630 --> 00:46:18,190 to example.com to request that same ad, this time now 924 00:46:18,190 --> 00:46:20,740 from stanford.edu's website. 925 00:46:20,740 --> 00:46:22,360 Now, what's the implication? 926 00:46:22,360 --> 00:46:26,710 Via these three HTTP requests, example.com 927 00:46:26,710 --> 00:46:30,160 knows that I'm visiting Stanford, and before that, Yale 928 00:46:30,160 --> 00:46:32,110 and before that, Harvard. 929 00:46:32,110 --> 00:46:34,990 And none of Harvard, or Yale, or Stanford 930 00:46:34,990 --> 00:46:37,630 necessarily know that I'm visiting any of those other websites. 931 00:46:37,630 --> 00:46:39,950 The third party is the more powerful. 932 00:46:39,950 --> 00:46:44,140 It's the more all seeing, simply because example.com, or in the real world, 933 00:46:44,140 --> 00:46:46,570 Google, is just so darn popular, that it's 934 00:46:46,570 --> 00:46:51,520 embedded in so many darn websites, Google and others almost everything, 935 00:46:51,520 --> 00:46:54,040 dare say, about what you and I are doing on the web 936 00:46:54,040 --> 00:46:57,710 because these ads are all over the place in this way. 937 00:46:57,710 --> 00:46:59,770 So we've seen a very simple example. 938 00:46:59,770 --> 00:47:04,720 But it's simple because cookies and HTTP really are relatively. 939 00:47:04,720 --> 00:47:07,030 It's once you realize how they work, that you 940 00:47:07,030 --> 00:47:09,460 can use them not only to solve compelling problems for all 941 00:47:09,460 --> 00:47:13,250 of us, sessions, and shopping carts, and the like, 942 00:47:13,250 --> 00:47:16,090 but also can be used to monetize the internet 943 00:47:16,090 --> 00:47:19,750 and has been used historically to monetize the internet, or even worse, 944 00:47:19,750 --> 00:47:24,710 perhaps, for us, to track our individual clicks and behavior. 945 00:47:24,710 --> 00:47:27,610 So let me pause here and see if there's any questions now 946 00:47:27,610 --> 00:47:31,060 on third-party cookies and why, therefore, it's 947 00:47:31,060 --> 00:47:35,140 perhaps so compelling for you or me to opt in to disabling them, 948 00:47:35,140 --> 00:47:40,030 or better yet, to use browsers that are starting to block them for us. 949 00:47:40,030 --> 00:47:43,120 AUDIENCE: What browsers are more secure among others 950 00:47:43,120 --> 00:47:45,035 considering tracking parameters? 951 00:47:45,035 --> 00:47:45,910 DAVID J. MALAN: Sure. 952 00:47:45,910 --> 00:47:46,510 A quick tweak. 953 00:47:46,510 --> 00:47:48,260 I wouldn't say that some browsers are more 954 00:47:48,260 --> 00:47:50,350 secure than others in this context. 955 00:47:50,350 --> 00:47:53,620 I would say that want browsers that are more privacy conscious 956 00:47:53,620 --> 00:47:56,830 or privacy preserving because that's what we're talking about today. 957 00:47:56,830 --> 00:47:59,080 Hopefully, all of them are just as secure when 958 00:47:59,080 --> 00:48:03,820 it comes to HTTPS and the encryption that's just keeping our data protected 959 00:48:03,820 --> 00:48:05,830 between points A and B. 960 00:48:05,830 --> 00:48:10,930 So generally, Safari has been pretty good when it comes to privacy. 961 00:48:10,930 --> 00:48:15,035 And they are the ones that very recently that you're using now 962 00:48:15,035 --> 00:48:18,160 announced that they're going to start giving people the feature of removing 963 00:48:18,160 --> 00:48:20,380 tracking parameters from URLs. 964 00:48:20,380 --> 00:48:22,510 In fact, the sample URL I gave was actually 965 00:48:22,510 --> 00:48:25,990 from Apple's recent announcement about exactly that. 966 00:48:25,990 --> 00:48:29,920 DuckDuckGo is probably the most popular third-party browser 967 00:48:29,920 --> 00:48:32,440 that is very privacy conscious and tries to disable 968 00:48:32,440 --> 00:48:34,490 a lot of these tracking behaviors. 969 00:48:34,490 --> 00:48:36,580 Another one is Brave. 970 00:48:36,580 --> 00:48:39,820 Perhaps, the worst offender is probably Chrome, 971 00:48:39,820 --> 00:48:43,120 even though I, myself, am guilty of using it myself because it's 972 00:48:43,120 --> 00:48:45,190 so integrated into Google's ecosystem. 973 00:48:45,190 --> 00:48:47,140 But Google, of course, has made their business 974 00:48:47,140 --> 00:48:50,030 on monetizing your behavior and mine. 975 00:48:50,030 --> 00:48:54,000 So that is, perhaps, one to put toward the bottom of the list 976 00:48:54,000 --> 00:48:55,630 if you're concerned about this. 977 00:48:55,630 --> 00:48:57,380 So that's kind of how I would rank things. 978 00:48:57,380 --> 00:48:58,280 And there's yet others. 979 00:48:58,280 --> 00:48:59,870 But I think those are some of the most popular. 980 00:48:59,870 --> 00:49:01,970 And then, of course, in the Microsoft ecosystem, 981 00:49:01,970 --> 00:49:04,780 there is Edge and Firefox too. 982 00:49:04,780 --> 00:49:06,530 I should have put them higher on the list. 983 00:49:06,530 --> 00:49:09,650 They are more privacy conscious, I do believe, than Google. 984 00:49:09,650 --> 00:49:12,690 So with all of these mechanisms for tracking in mind, 985 00:49:12,690 --> 00:49:15,350 what can we do to protect all the more of our privacy? 986 00:49:15,350 --> 00:49:18,350 Well, you might already know of this feature, private browsing. 987 00:49:18,350 --> 00:49:22,220 So you don't necessarily have to delete all of your browser history 988 00:49:22,220 --> 00:49:23,850 and delete all of your cookies. 989 00:49:23,850 --> 00:49:26,122 You can instead, on occasion, open up a special type 990 00:49:26,122 --> 00:49:27,830 of window, which most of today's browsers 991 00:49:27,830 --> 00:49:31,040 support that puts you into private mode or incognito mode. 992 00:49:31,040 --> 00:49:34,460 And you can think of this as giving you just a different chunk of memory 993 00:49:34,460 --> 00:49:38,360 in the computer that doesn't know any of your past browser history, that doesn't 994 00:49:38,360 --> 00:49:41,210 have any of your past cookies, that doesn't remember any 995 00:49:41,210 --> 00:49:42,920 of your past usernames and passwords. 996 00:49:42,920 --> 00:49:47,150 You're sort of starting fresh, so that everything you do in that window 997 00:49:47,150 --> 00:49:48,110 is brand new. 998 00:49:48,110 --> 00:49:50,180 The catch, though, is that everything you do in 999 00:49:50,180 --> 00:49:56,270 that window still works exactly as the web works as we have been describing. 1000 00:49:56,270 --> 00:49:58,670 So you're still might have tracking parameters. 1001 00:49:58,670 --> 00:50:00,800 You still might have tracking cookies. 1002 00:50:00,800 --> 00:50:02,840 You still might have server logs. 1003 00:50:02,840 --> 00:50:07,740 But when you close that private window or you close that incognito mode, 1004 00:50:07,740 --> 00:50:11,580 at least, the information is discarded from your computer, 1005 00:50:11,580 --> 00:50:13,790 so that if tomorrow, you do the exact same thing 1006 00:50:13,790 --> 00:50:16,790 and open up an incognito window again, then 1007 00:50:16,790 --> 00:50:19,670 it's as though you're starting fresh with that server, 1008 00:50:19,670 --> 00:50:23,030 except for the reality, as per our past discussion, 1009 00:50:23,030 --> 00:50:26,150 that fingerprinting is still a possibility. 1010 00:50:26,150 --> 00:50:29,840 Your IP address can still be factored in as can be other information 1011 00:50:29,840 --> 00:50:31,610 that your browser might still be leaking. 1012 00:50:31,610 --> 00:50:34,610 But what you're not doing is contaminating, so to speak, 1013 00:50:34,610 --> 00:50:37,310 your general browsing history with specifically 1014 00:50:37,310 --> 00:50:39,650 what you're using that window for. 1015 00:50:39,650 --> 00:50:43,040 What you should realize, too, that private browsing or incognito mode 1016 00:50:43,040 --> 00:50:44,850 is entirely client side. 1017 00:50:44,850 --> 00:50:47,300 So particularly, those logs that we have mentioned 1018 00:50:47,300 --> 00:50:49,460 are still being stored by the server. 1019 00:50:49,460 --> 00:50:52,910 They might be storing, perhaps, a different tracking cookie or parameter 1020 00:50:52,910 --> 00:50:56,390 for you because it doesn't necessarily recognize you when 1021 00:50:56,390 --> 00:50:58,340 you're in private or incognito mode. 1022 00:50:58,340 --> 00:51:02,240 But it doesn't mean that your tracks are completely absent from the internet. 1023 00:51:02,240 --> 00:51:05,420 Rather, it's really just scrubbing them from your local computer 1024 00:51:05,420 --> 00:51:09,440 and decreasing the probability, but not eliminating the probability 1025 00:51:09,440 --> 00:51:12,125 that a server still knows that it's you. 1026 00:51:12,125 --> 00:51:14,160 So I would use with care. 1027 00:51:14,160 --> 00:51:17,240 But with that said, if you take a course in web development 1028 00:51:17,240 --> 00:51:21,320 or you already design your own websites, using private browsing or incognito 1029 00:51:21,320 --> 00:51:23,732 mode can also be useful for development purposes 1030 00:51:23,732 --> 00:51:25,940 because it's a way of opening a brand new window that 1031 00:51:25,940 --> 00:51:28,790 has no recollection of maybe past bugs that you had 1032 00:51:28,790 --> 00:51:30,740 or past web pages that you clicked on. 1033 00:51:30,740 --> 00:51:33,860 And it's very commonly used as part of development tools 1034 00:51:33,860 --> 00:51:40,140 to actually facilitate and mimic the idea of starting fresh with some site. 1035 00:51:40,140 --> 00:51:43,050 Super cookies, though, these sound delicious, 1036 00:51:43,050 --> 00:51:46,870 but these two are kind of the worst of cookies that we've discussed already. 1037 00:51:46,870 --> 00:51:49,110 We saw session cookies for maintaining state. 1038 00:51:49,110 --> 00:51:51,900 We saw tracking cookies for tracking you. 1039 00:51:51,900 --> 00:51:54,780 Super cookies are not so super, really. 1040 00:51:54,780 --> 00:51:57,660 These are cookies that are typically injected 1041 00:51:57,660 --> 00:52:01,440 by a third party, like your company, your university, 1042 00:52:01,440 --> 00:52:06,570 or your internet service provider into your HTTP request, which 1043 00:52:06,570 --> 00:52:10,110 is to say, if you, from your browser, visit some website, 1044 00:52:10,110 --> 00:52:13,620 that traffic, of course, goes from your laptop or phone 1045 00:52:13,620 --> 00:52:17,670 through some internet service provider, whether it's on campus, or home, 1046 00:52:17,670 --> 00:52:19,660 or wirelessly in the real world. 1047 00:52:19,660 --> 00:52:22,830 And if whoever is providing you with that internet service 1048 00:52:22,830 --> 00:52:26,430 can see the contents of that virtual envelope, 1049 00:52:26,430 --> 00:52:28,290 there's technically nothing stopping them 1050 00:52:28,290 --> 00:52:30,540 from opening up the envelope, so to speak, 1051 00:52:30,540 --> 00:52:34,060 and adding one or more HTTP headers of their own. 1052 00:52:34,060 --> 00:52:36,780 And so mobile phone carriers, for instance, in the past 1053 00:52:36,780 --> 00:52:39,300 have been known to do this, whereby, if you are just 1054 00:52:39,300 --> 00:52:43,390 requesting a website, like example.com from your phone, 1055 00:52:43,390 --> 00:52:47,800 they might-- halfway between you and that server, 1056 00:52:47,800 --> 00:52:50,410 they might inject a cookie of their own. 1057 00:52:50,410 --> 00:52:53,800 For the sake of discussion, I'm going to use the same name and value as before. 1058 00:52:53,800 --> 00:52:56,710 id=1234abcd. 1059 00:52:56,710 --> 00:53:00,760 But what's noteworthy here is that that value is not coming from your phone. 1060 00:53:00,760 --> 00:53:02,740 It is not coming from your browser. 1061 00:53:02,740 --> 00:53:04,473 You can clear all of your cookies. 1062 00:53:04,473 --> 00:53:05,890 You can clear all of your history. 1063 00:53:05,890 --> 00:53:08,600 You can use incognito or private mode on your phone. 1064 00:53:08,600 --> 00:53:11,290 You're not going to see any trace of that client side 1065 00:53:11,290 --> 00:53:16,750 because the darn thing is being injected into your traffic between you, 1066 00:53:16,750 --> 00:53:19,180 point A, and the server, point B. 1067 00:53:19,180 --> 00:53:22,900 So this is sort of a canonical example of a machine in the middle attack. 1068 00:53:22,900 --> 00:53:26,080 But your internet service provider in this telling of the story 1069 00:53:26,080 --> 00:53:28,900 is doing it because they want to track you. 1070 00:53:28,900 --> 00:53:31,810 Or they want-- because of advertising relationships 1071 00:53:31,810 --> 00:53:33,790 they might have with some websites, they want 1072 00:53:33,790 --> 00:53:37,070 to make sure that you can be tracked by that website, 1073 00:53:37,070 --> 00:53:40,390 even if you have opted out or have been clearing proactively 1074 00:53:40,390 --> 00:53:41,770 your very own cookies. 1075 00:53:41,770 --> 00:53:45,040 So suffice it to say, these have been particularly controversial. 1076 00:53:45,040 --> 00:53:48,310 And thankfully, you and I do have a pretty good defense here. 1077 00:53:48,310 --> 00:53:51,790 Just never use HTTP without encryption. 1078 00:53:51,790 --> 00:53:58,450 If URLs are always https:// and then something, theoretically, 1079 00:53:58,450 --> 00:54:02,620 this attack or this "feature" of your mobile phone carrier should not be 1080 00:54:02,620 --> 00:54:03,140 possible. 1081 00:54:03,140 --> 00:54:03,640 Why? 1082 00:54:03,640 --> 00:54:06,130 Because if the contents of the envelope are encrypted, 1083 00:54:06,130 --> 00:54:08,560 not only can't they see what's actually inside, 1084 00:54:08,560 --> 00:54:12,040 they can't add anything to the mix because they don't have the key that's 1085 00:54:12,040 --> 00:54:14,530 being used to encrypt that information. 1086 00:54:14,530 --> 00:54:19,243 So simply using always HTTPS is one solution to this problem. 1087 00:54:19,243 --> 00:54:21,910 And also, at least, in the US, some of the mobile phone carriers 1088 00:54:21,910 --> 00:54:23,500 got a lot of backlash for this. 1089 00:54:23,500 --> 00:54:28,870 But so, you can occasionally log into your cell phone provider's website, 1090 00:54:28,870 --> 00:54:33,410 go through a bunch of menus, find an option to opt out of this feature. 1091 00:54:33,410 --> 00:54:37,240 But I will say from experience, that they typically bury these options too. 1092 00:54:37,240 --> 00:54:40,120 And so it's not necessarily even the iciest thing to find. 1093 00:54:40,120 --> 00:54:45,070 But again, this is just a natural result of the underlying technology 1094 00:54:45,070 --> 00:54:51,000 that we're being used, or if you prefer, abused, for alternative purposes. 1095 00:54:51,000 --> 00:54:51,500 All right. 1096 00:54:51,500 --> 00:54:54,310 Let me pause here and see if there's any questions now 1097 00:54:54,310 --> 00:54:59,380 on these super cookies, which indeed, are not so super or anything prior. 1098 00:54:59,380 --> 00:55:02,560 AUDIENCE: Given that cookies store passwords and emails, 1099 00:55:02,560 --> 00:55:07,540 can the adversary impersonate another person by copying that cookie 1100 00:55:07,540 --> 00:55:13,237 and pasting it into his own computer and visiting that website? 1101 00:55:13,237 --> 00:55:14,570 DAVID J. MALAN: A good question. 1102 00:55:14,570 --> 00:55:19,990 So cookies can be used to store user names, email addresses, even passwords, 1103 00:55:19,990 --> 00:55:22,270 though, I would generally not recommend doing this. 1104 00:55:22,270 --> 00:55:24,882 But they theoretically should be secure, even 1105 00:55:24,882 --> 00:55:26,590 if you're storing those values in cookies 1106 00:55:26,590 --> 00:55:30,370 because they're going back and forth between the browser and the server 1107 00:55:30,370 --> 00:55:34,120 using encryption if HTTPS is, indeed, in use. 1108 00:55:34,120 --> 00:55:37,480 A danger, though, is that if someone has physical access to your computer, 1109 00:55:37,480 --> 00:55:40,660 it's very easy to poke around your own browser's cookies, at which point, 1110 00:55:40,660 --> 00:55:44,180 they're going to see your password, which is probably not a good thing. 1111 00:55:44,180 --> 00:55:46,750 So on an alternative would be, for instance, 1112 00:55:46,750 --> 00:55:50,350 for a browser to encrypt the cookie or minimally digitally sign it, 1113 00:55:50,350 --> 00:55:54,310 so that it can be identified as belonging to that same server. 1114 00:55:54,310 --> 00:55:58,360 But even better, I dare say, would be for servers 1115 00:55:58,360 --> 00:56:03,160 to only plant big random values as cookies on your computer, 1116 00:56:03,160 --> 00:56:06,760 like this virtual handstamp, and then store 1117 00:56:06,760 --> 00:56:11,020 recollection of your username, email, and/or password on the server. 1118 00:56:11,020 --> 00:56:15,190 So stamp my hand to remember who I am and that I'm logged in, 1119 00:56:15,190 --> 00:56:19,910 but don't bother expecting my browser to send my username, my email address, 1120 00:56:19,910 --> 00:56:21,610 my password again and again. 1121 00:56:21,610 --> 00:56:24,280 It should suffice to send that just once. 1122 00:56:24,280 --> 00:56:25,555 Other questions here? 1123 00:56:25,555 --> 00:56:28,040 AUDIENCE: I've heard that it's possible-- 1124 00:56:28,040 --> 00:56:30,400 for example, if I'm writing a text to someone, 1125 00:56:30,400 --> 00:56:35,270 it's possible to intercept, to alter my text and send it on my behalf. 1126 00:56:35,270 --> 00:56:40,020 So it's going to be a different message, so it's possible to ask, maybe, 1127 00:56:40,020 --> 00:56:41,710 for sensitive information. 1128 00:56:41,710 --> 00:56:45,840 So I was wondering, don't those messengers use something like cookies? 1129 00:56:45,840 --> 00:56:46,917 How can this be possible? 1130 00:56:46,917 --> 00:56:48,250 DAVID J. MALAN: A good question. 1131 00:56:48,250 --> 00:56:52,410 So SMS, or traditional texting, is generally insecure. 1132 00:56:52,410 --> 00:56:56,080 It is very easy for someone to forge your phone number. 1133 00:56:56,080 --> 00:56:58,710 And in fact, if you've gotten a lot of spam via text, 1134 00:56:58,710 --> 00:57:01,020 that might be exactly what is happening. 1135 00:57:01,020 --> 00:57:07,050 Or worse, it's also possible, recall, to steal your SIM card essentially or port 1136 00:57:07,050 --> 00:57:11,110 it to another carrier, so that someone can intercept all of your actual texts. 1137 00:57:11,110 --> 00:57:14,340 So in general, nowadays, you should be reducing, if not, 1138 00:57:14,340 --> 00:57:18,570 eliminating your usage of SMS, at least, for anything important or anything 1139 00:57:18,570 --> 00:57:19,740 you want to keep private. 1140 00:57:19,740 --> 00:57:26,730 When it comes to other messaging tools, like iMessage, like WhatsApp, Signal, 1141 00:57:26,730 --> 00:57:31,020 Telegram, there's a lot of products nowadays, third-party or otherwise, 1142 00:57:31,020 --> 00:57:33,580 that use end-to-end encryption, which recall, 1143 00:57:33,580 --> 00:57:35,200 we discussed a couple of classes ago. 1144 00:57:35,200 --> 00:57:39,060 And in that case, even though the data is going through a company like 1145 00:57:39,060 --> 00:57:42,940 Facebook, theoretically, assuming they're behaving honorably and have 1146 00:57:42,940 --> 00:57:45,520 implemented end-to-end encryption properly, 1147 00:57:45,520 --> 00:57:49,570 then even they cannot see the message going between their servers. 1148 00:57:49,570 --> 00:57:51,190 And that is independent of cookies. 1149 00:57:51,190 --> 00:57:54,160 Cookies have no part of that solution. 1150 00:57:54,160 --> 00:57:58,210 That solution is entirely thanks to cryptography and encryption 1151 00:57:58,210 --> 00:58:01,230 with digital signatures. 1152 00:58:01,230 --> 00:58:01,840 All right. 1153 00:58:01,840 --> 00:58:04,958 So let's consider one other threat to your privacy 1154 00:58:04,958 --> 00:58:06,750 that you might not necessarily have thought 1155 00:58:06,750 --> 00:58:10,740 about that isn't relate just to the web, but really, your use of the internet 1156 00:58:10,740 --> 00:58:14,490 more generally, namely, DNS, the Domain Name System. 1157 00:58:14,490 --> 00:58:18,510 Thankfully, even though computers on the internet all have IP addresses, 1158 00:58:18,510 --> 00:58:21,510 these unique numeric addresses that we've discussed, 1159 00:58:21,510 --> 00:58:25,350 you and I don't have to remember what server's IP addresses are 1160 00:58:25,350 --> 00:58:28,830 because servers typically have domain names, something 1161 00:58:28,830 --> 00:58:34,020 like harvard.edu, yale.edu, stanford.edu, google.com, amazon.com, 1162 00:58:34,020 --> 00:58:35,160 and others. 1163 00:58:35,160 --> 00:58:39,600 But how then-- when you type in any of those domain names into your browser 1164 00:58:39,600 --> 00:58:41,880 or into any piece of software on the internet, 1165 00:58:41,880 --> 00:58:47,580 how does your browser or your computer know what IP address to contact? 1166 00:58:47,580 --> 00:58:50,850 Well, it turns out that there's a domain name system in the world. 1167 00:58:50,850 --> 00:58:53,220 And this is a system deployed throughout the world 1168 00:58:53,220 --> 00:58:57,870 on the internet whose purpose in life is to translate domain names to IP 1169 00:58:57,870 --> 00:59:00,640 addresses, so that on the outside of those envelopes 1170 00:59:00,640 --> 00:59:04,760 can, indeed, go the IP addresses of source and destination. 1171 00:59:04,760 --> 00:59:07,840 But you and I, as humans, don't need to know or remember 1172 00:59:07,840 --> 00:59:09,890 exactly what those IP addresses are. 1173 00:59:09,890 --> 00:59:11,890 You can think about this back in the day of when 1174 00:59:11,890 --> 00:59:13,848 we were in the habit of typing in phone numbers 1175 00:59:13,848 --> 00:59:16,520 to actual analog landline telephones. 1176 00:59:16,520 --> 00:59:19,368 It was actually pretty hard to remember lots of people's numbers. 1177 00:59:19,368 --> 00:59:21,160 And you might even have had an address book 1178 00:59:21,160 --> 00:59:23,110 that you looked up people's numbers in. 1179 00:59:23,110 --> 00:59:24,640 Or there were certain mnemonics. 1180 00:59:24,640 --> 00:59:27,880 For instance, in the United States, there was a number, 1-800-COLLECT, 1181 00:59:27,880 --> 00:59:33,370 C-O-L-L-E-C-T, which was just much easier to remember than the actual 1182 00:59:33,370 --> 00:59:35,770 numbers for making a collect call. 1183 00:59:35,770 --> 00:59:38,560 The equivalent on the internet is DNS, which 1184 00:59:38,560 --> 00:59:42,610 just automates this process for us, so that every website, every service 1185 00:59:42,610 --> 00:59:46,480 can have its own unique name, but it's translated automatically 1186 00:59:46,480 --> 00:59:52,030 for us via DNS servers throughout the world to the corresponding IP address. 1187 00:59:52,030 --> 00:59:54,410 But why is this problematic? 1188 00:59:54,410 --> 00:59:58,970 Well, it turns out that DNS servers are typically in a few different places. 1189 00:59:58,970 --> 01:00:03,520 One, you probably have one in your home, or your company, or your university. 1190 01:00:03,520 --> 01:00:07,690 And it probably is built into, if in your home, the router, the device 1191 01:00:07,690 --> 01:00:10,120 that you're using just to connect to the internet. 1192 01:00:10,120 --> 01:00:14,290 But your internet service provider also tends to have a DNS server. 1193 01:00:14,290 --> 01:00:16,750 And that DNS server probably knows about way 1194 01:00:16,750 --> 01:00:20,950 more IP addresses than your own home does because why would 1195 01:00:20,950 --> 01:00:24,030 your own home network know about all of the IP addresses in the world? 1196 01:00:24,030 --> 01:00:26,530 But with that said, why would your internet service provider 1197 01:00:26,530 --> 01:00:30,400 know about all of the possible IP addresses and domain names 1198 01:00:30,400 --> 01:00:31,030 in the world? 1199 01:00:31,030 --> 01:00:32,830 Well, suffice it to say for our purposes, 1200 01:00:32,830 --> 01:00:34,580 there's a hierarchical system. 1201 01:00:34,580 --> 01:00:36,910 So even if your home router doesn't know, 1202 01:00:36,910 --> 01:00:39,370 even if your internet service provider doesn't know, 1203 01:00:39,370 --> 01:00:42,580 there's some other server on the internet that can eventually 1204 01:00:42,580 --> 01:00:46,870 give you the answer to a question like, what is harvard.edu's IP address? 1205 01:00:46,870 --> 01:00:51,170 What is yale.edu's IP address and so forth? 1206 01:00:51,170 --> 01:00:55,120 And for efficiency, once that answer has been figured out somewhere, 1207 01:00:55,120 --> 01:00:59,620 then your internet service provider might remember, or cache, the answer. 1208 01:00:59,620 --> 01:01:04,060 And even your home router, and heck, even your device or your browser 1209 01:01:04,060 --> 01:01:06,580 might remember the same answer for efficiency, 1210 01:01:06,580 --> 01:01:09,650 so we don't have to keep asking the same question. 1211 01:01:09,650 --> 01:01:14,950 And it turns out by convention, DNS uses port 53, if you recall our discussion, 1212 01:01:14,950 --> 01:01:21,610 of also using unique numbers to identify things like HTTP, or 80, HTTPS, or 443, 1213 01:01:21,610 --> 01:01:23,830 or 22 for SSH. 1214 01:01:23,830 --> 01:01:26,170 DNS tends to use 53. 1215 01:01:26,170 --> 01:01:29,830 But the catch is that the traffic used for DNS 1216 01:01:29,830 --> 01:01:35,080 is typically unencrypted, which means that when your phone, or your laptop, 1217 01:01:35,080 --> 01:01:38,830 or your desktop is asking your home device, or maybe your internet service 1218 01:01:38,830 --> 01:01:42,940 provider, or someone else, what is the IP address for harvard.edu, 1219 01:01:42,940 --> 01:01:47,260 or yale.edu, or the like, you're actually announcing to the world what 1220 01:01:47,260 --> 01:01:49,210 website you are about to visit. 1221 01:01:49,210 --> 01:01:49,750 Why? 1222 01:01:49,750 --> 01:01:52,150 Because you're waiting for a response from the DNS 1223 01:01:52,150 --> 01:01:55,850 server to actually tell you the corresponding IP address. 1224 01:01:55,850 --> 01:01:56,950 So this isn't great. 1225 01:01:56,950 --> 01:02:00,730 And moreover, your internet service provider, therefore, 1226 01:02:00,730 --> 01:02:04,360 knows all of this information about you because every time 1227 01:02:04,360 --> 01:02:07,570 you ask for a new website that you've never been to before, your home 1228 01:02:07,570 --> 01:02:09,740 network probably doesn't know the IP address, 1229 01:02:09,740 --> 01:02:12,275 so you have to ask your internet service provider. 1230 01:02:12,275 --> 01:02:13,900 And again, they might ask someone else. 1231 01:02:13,900 --> 01:02:17,240 But the internet service provider is going to know now that you asked. 1232 01:02:17,240 --> 01:02:20,470 So your internet service provider, be it for your home network 1233 01:02:20,470 --> 01:02:23,320 or for your cellular phone, pretty much knows 1234 01:02:23,320 --> 01:02:26,110 every website you've ever been to, assuming 1235 01:02:26,110 --> 01:02:28,600 they're logging this information, which they probably are, 1236 01:02:28,600 --> 01:02:32,320 unless there are regulatory or legal requirements that say they can't or 1237 01:02:32,320 --> 01:02:34,400 they can't for very long. 1238 01:02:34,400 --> 01:02:37,340 Now, why is this the case? 1239 01:02:37,340 --> 01:02:40,690 Well, the domain name system essentially requires 1240 01:02:40,690 --> 01:02:42,190 that we ask these very questions. 1241 01:02:42,190 --> 01:02:45,070 And if the internet service providers remember these answers, 1242 01:02:45,070 --> 01:02:48,640 well, they can keep track of everywhere we've been, at least, at a high level. 1243 01:02:48,640 --> 01:02:53,020 DNS only gives them back a translation from the domain name to the IP address. 1244 01:02:53,020 --> 01:02:57,340 What it does not include is the specific page that you're looking at, 1245 01:02:57,340 --> 01:03:01,257 the specific URL, the folder, the file that you're looking at. 1246 01:03:01,257 --> 01:03:03,090 So your internet service provider might know 1247 01:03:03,090 --> 01:03:05,880 you're visiting somewhere on harvard.edu because you 1248 01:03:05,880 --> 01:03:07,440 asked, of course, for its IP address. 1249 01:03:07,440 --> 01:03:10,500 But they don't know what department you were looking for 1250 01:03:10,500 --> 01:03:13,260 or what course you were looking at or the like. 1251 01:03:13,260 --> 01:03:16,410 But there's still a decent amount of invasion, therefore, of your privacy 1252 01:03:16,410 --> 01:03:21,120 if you'd rather that ISP or someone else just not know that information. 1253 01:03:21,120 --> 01:03:25,170 So increasingly, there are alternatives to the standard DNS 1254 01:03:25,170 --> 01:03:31,210 functionality, one of which is called DNS over HTTPS, or DoH for short. 1255 01:03:31,210 --> 01:03:32,850 This means exactly that. 1256 01:03:32,850 --> 01:03:37,230 Instead of just sending out DNS requests unencrypted on port 53 1257 01:03:37,230 --> 01:03:41,520 to the local DNS server, now they're sent, potentially if you enable this, 1258 01:03:41,520 --> 01:03:43,470 over HTTPS. 1259 01:03:43,470 --> 01:03:48,780 And what this means is that they will be sent using the HTTP protocol, which 1260 01:03:48,780 --> 01:03:51,270 we've talked about endlessly in these virtual envelopes, 1261 01:03:51,270 --> 01:03:56,190 but securely using TLS, which is the encryption protocol that ensures 1262 01:03:56,190 --> 01:03:58,920 that no one else can see what's going on inside of that envelope, 1263 01:03:58,920 --> 01:04:01,700 including your internet service provider. 1264 01:04:01,700 --> 01:04:05,590 Now, someone is going to still know what domain name you're 1265 01:04:05,590 --> 01:04:09,040 looking up because after all, to whom are you sending this request? 1266 01:04:09,040 --> 01:04:10,990 Maybe you're sending it to Google. 1267 01:04:10,990 --> 01:04:12,910 Maybe you're sending it to some third party. 1268 01:04:12,910 --> 01:04:14,830 But you are sending it to someone. 1269 01:04:14,830 --> 01:04:18,580 But at least, goes the thinking, it's not your internet service provider, 1270 01:04:18,580 --> 01:04:21,095 who really doesn't need to know this information. 1271 01:04:21,095 --> 01:04:22,720 So that's one way of thinking about it. 1272 01:04:22,720 --> 01:04:24,095 And there's alternatives to this. 1273 01:04:24,095 --> 01:04:27,760 There's actually something called DNS over TLS, DoT, which 1274 01:04:27,760 --> 01:04:30,940 is very similar in spirit, but it doesn't even bother using HTTP. 1275 01:04:30,940 --> 01:04:33,590 But it is still using encryption. 1276 01:04:33,590 --> 01:04:35,740 So this is something that's increasingly common. 1277 01:04:35,740 --> 01:04:38,380 It's not necessarily the default on a lot of systems. 1278 01:04:38,380 --> 01:04:41,200 But it's yet another feature of today's technology 1279 01:04:41,200 --> 01:04:45,040 that you can increasingly look for, seek out, enable proactively 1280 01:04:45,040 --> 01:04:49,000 if this, too, is a concern that you don't necessarily want a third party, 1281 01:04:49,000 --> 01:04:52,277 like your ISP to know what it is you're accessing. 1282 01:04:52,277 --> 01:04:53,860 And it might not even be your own ISP. 1283 01:04:53,860 --> 01:04:58,090 If you're on the road, in a coffee shop that gives Wi-Fi, 1284 01:04:58,090 --> 01:05:00,760 or an airport that gives Wi-Fi, at that point, 1285 01:05:00,760 --> 01:05:02,860 your internet service provider is effectively 1286 01:05:02,860 --> 01:05:04,810 that coffee shop or that airport. 1287 01:05:04,810 --> 01:05:07,570 And do you really want them knowing everywhere you're going? 1288 01:05:07,570 --> 01:05:10,870 You might be, depending on your comfort level, prefer-- 1289 01:05:10,870 --> 01:05:15,550 you might be preferring that, at least, all of your DNS requests 1290 01:05:15,550 --> 01:05:19,940 go to some other central party that you do trust for whatever reason, 1291 01:05:19,940 --> 01:05:23,140 so you're not just informing every different Wi-Fi hotspot that you 1292 01:05:23,140 --> 01:05:25,630 might be using around the world. 1293 01:05:25,630 --> 01:05:27,880 Let me pause here and see if there's any questions now 1294 01:05:27,880 --> 01:05:33,910 about DNS and this concern with respect to your privacy or these solutions 1295 01:05:33,910 --> 01:05:34,960 there to. 1296 01:05:34,960 --> 01:05:41,410 AUDIENCE: Can DND [INAUDIBLE] used to deceive users and steal 1297 01:05:41,410 --> 01:05:42,865 information, which is sensitive? 1298 01:05:42,865 --> 01:05:43,990 DAVID J. MALAN: Absolutely. 1299 01:05:43,990 --> 01:05:47,200 So DNS, itself, can also be used for evil purposes. 1300 01:05:47,200 --> 01:05:51,265 If you control the DNS server, you don't have to give an honest answer. 1301 01:05:51,265 --> 01:05:54,460 If someone asks you for the IP address of harvard.edu, 1302 01:05:54,460 --> 01:05:58,600 you could give them the IP address of some completely malicious server 1303 01:05:58,600 --> 01:05:59,800 that you control. 1304 01:05:59,800 --> 01:06:05,350 However, if the user, like Ryan, in this case, is using HTTPS, 1305 01:06:05,350 --> 01:06:10,150 the whole point of HTTPS is to encrypt the data between browser and server. 1306 01:06:10,150 --> 01:06:11,980 And presumably, the browser is going to try 1307 01:06:11,980 --> 01:06:18,320 to request the TLS certificate of harvard.edu in this case. 1308 01:06:18,320 --> 01:06:22,360 But if the IP address returns the wrong certificate 1309 01:06:22,360 --> 01:06:26,470 that wasn't signed by the right website, then the connection might fail. 1310 01:06:26,470 --> 01:06:29,800 And you'll be given a warning that you can typically ignore in your browser. 1311 01:06:29,800 --> 01:06:33,010 But this should be preventable because you should at least be warned 1312 01:06:33,010 --> 01:06:34,870 that that is not working correctly. 1313 01:06:34,870 --> 01:06:37,180 And ISPs actually do this quite often. 1314 01:06:37,180 --> 01:06:41,260 If you make a typographical error sometimes on home networks, or coffee 1315 01:06:41,260 --> 01:06:45,490 shops, or airports, you might actually still see a website of search results, 1316 01:06:45,490 --> 01:06:46,990 or worse, advertisements. 1317 01:06:46,990 --> 01:06:50,590 And that's because even if you made a typo in the domain name, the coffee 1318 01:06:50,590 --> 01:06:52,840 shop's or the airport's DNS server is still 1319 01:06:52,840 --> 01:06:55,360 going to return to you an IP address of their server, 1320 01:06:55,360 --> 01:06:59,210 so they can at least push some content at you. 1321 01:06:59,210 --> 01:07:02,710 So let's consider some of the mechanisms via which 1322 01:07:02,710 --> 01:07:06,478 we can push back on some of these more invasive privacy practices. 1323 01:07:06,478 --> 01:07:08,770 And one is something we've talked about before, namely, 1324 01:07:08,770 --> 01:07:13,930 a virtual private network, or VPN, which is a increasingly familiar technology. 1325 01:07:13,930 --> 01:07:16,450 But it's worth knowing exactly what problems it is 1326 01:07:16,450 --> 01:07:18,730 solving for you and exactly which problems 1327 01:07:18,730 --> 01:07:21,820 it is not, particularly, if you're using such a service 1328 01:07:21,820 --> 01:07:23,890 to protect your own privacy. 1329 01:07:23,890 --> 01:07:25,220 Well, what is a VPN? 1330 01:07:25,220 --> 01:07:28,030 It allows us, recall, to connect from point A 1331 01:07:28,030 --> 01:07:32,830 to another point B using a completely encrypted tunnel. 1332 01:07:32,830 --> 01:07:35,740 So it doesn't matter if there are machines in the middle, 1333 01:07:35,740 --> 01:07:37,450 as indeed, there will be on the internet. 1334 01:07:37,450 --> 01:07:42,520 All of the traffic between A and B on a VPN is encrypted or scrambled. 1335 01:07:42,520 --> 01:07:43,850 So what does this do? 1336 01:07:43,850 --> 01:07:47,320 This allows you to access sometimes a corporate network or a university 1337 01:07:47,320 --> 01:07:49,750 network that might have servers or services that 1338 01:07:49,750 --> 01:07:53,080 are only accessible if you are on physically 1339 01:07:53,080 --> 01:07:56,560 or if you are on virtually that particular network. 1340 01:07:56,560 --> 01:08:01,010 This ensures that even if you're at home, or in a cafe, or an airport, 1341 01:08:01,010 --> 01:08:03,770 at least, you have an encrypted, more secure connection 1342 01:08:03,770 --> 01:08:07,640 to the campus or the corporate network, at which point, the campus or company 1343 01:08:07,640 --> 01:08:11,262 might be more comfortable with you accessing those services. 1344 01:08:11,262 --> 01:08:13,220 Now, this does not prevent you still from being 1345 01:08:13,220 --> 01:08:17,526 hacked because if you're running malware on your own computer accidentally, 1346 01:08:17,526 --> 01:08:20,359 it doesn't matter if you have an encrypted connection to the company 1347 01:08:20,359 --> 01:08:21,170 or campus. 1348 01:08:21,170 --> 01:08:25,250 You might very well have an infected connection now to the company or campus 1349 01:08:25,250 --> 01:08:27,200 if you, yourselves, are infected. 1350 01:08:27,200 --> 01:08:30,680 VPNs can also be used to create the illusion that you're actually 1351 01:08:30,680 --> 01:08:32,779 in one country and not another. 1352 01:08:32,779 --> 01:08:33,319 Why? 1353 01:08:33,319 --> 01:08:38,000 Well, if point A is where you are and point B is somewhere abroad, well, 1354 01:08:38,000 --> 01:08:40,340 to the rest of the world, if you start using 1355 01:08:40,340 --> 01:08:44,240 this VPN, this virtual private network, you 1356 01:08:44,240 --> 01:08:48,380 will appear to have an IP address that is in that foreign country 1357 01:08:48,380 --> 01:08:52,460 because all of your internet traffic for chatting, video conferencing, 1358 01:08:52,460 --> 01:08:55,880 the web will be sent through that VPN by design. 1359 01:08:55,880 --> 01:08:57,290 That's what a VPN is for. 1360 01:08:57,290 --> 01:09:00,149 And it will come out the other end in that foreign country 1361 01:09:00,149 --> 01:09:06,210 and then continue on its way to the chat service, the email service, the web 1362 01:09:06,210 --> 01:09:07,990 service, or the like. 1363 01:09:07,990 --> 01:09:10,529 So each of those services will think that you 1364 01:09:10,529 --> 01:09:15,149 live or are physically in that foreign country, even if you are not actually. 1365 01:09:15,149 --> 01:09:17,040 So what's the implication of this? 1366 01:09:17,040 --> 01:09:19,950 A virtual private network only guarantees 1367 01:09:19,950 --> 01:09:23,430 that the connection between you and that point B is encrypted. 1368 01:09:23,430 --> 01:09:26,130 It doesn't necessarily mean that once you're out of that VPN, 1369 01:09:26,130 --> 01:09:30,149 it's going to stay encrypted, especially if you're using still HTTP 1370 01:09:30,149 --> 01:09:31,529 and not HTTPS. 1371 01:09:31,529 --> 01:09:35,040 But it does, at least, encrypt everything between points A and B. 1372 01:09:35,040 --> 01:09:39,158 It also does change what your IP address appears to be, 1373 01:09:39,158 --> 01:09:41,700 so that you will, indeed, appear to have an IP address that's 1374 01:09:41,700 --> 01:09:45,189 from that foreign country and not your domestic IP address, 1375 01:09:45,189 --> 01:09:47,670 which might have some value in covering your tracks 1376 01:09:47,670 --> 01:09:50,340 or decreasing the probability that you'll be identified. 1377 01:09:50,340 --> 01:09:53,279 But again, we've seen so many other mechanisms today, 1378 01:09:53,279 --> 01:09:55,620 whereby, your browser can be fingerprinted 1379 01:09:55,620 --> 01:09:58,020 in the context of the web, that someone might still 1380 01:09:58,020 --> 01:10:01,440 be able to realize that, OK, your IP is different today, 1381 01:10:01,440 --> 01:10:04,560 but this still looks like you, even if they don't necessarily 1382 01:10:04,560 --> 01:10:07,990 know that you are David Malan or you, yourself. 1383 01:10:07,990 --> 01:10:11,940 But it, at least, does solve at least one problem, which is encrypting end 1384 01:10:11,940 --> 01:10:14,430 to end all of your traffic. 1385 01:10:14,430 --> 01:10:16,380 Well, there's another piece of software that's 1386 01:10:16,380 --> 01:10:19,900 been popular for some time called Tor, The Onion Router. 1387 01:10:19,900 --> 01:10:24,030 So this is a piece of software that you can install on your own Mac, or PC, 1388 01:10:24,030 --> 01:10:25,230 or other device. 1389 01:10:25,230 --> 01:10:30,210 And this uses encryption to solve the problem a different way using 1390 01:10:30,210 --> 01:10:35,100 additional encryption to try to give you a higher probability of privacy. 1391 01:10:35,100 --> 01:10:38,070 And here's a picture that Tor, themselves, puts on their website. 1392 01:10:38,070 --> 01:10:41,310 And it has depicting here you on a very old school 1393 01:10:41,310 --> 01:10:44,910 PC connecting to a whole bunch of nodes inside 1394 01:10:44,910 --> 01:10:47,520 of this Tor network connected to ultimately maybe 1395 01:10:47,520 --> 01:10:49,230 the websites that you're visiting. 1396 01:10:49,230 --> 01:10:54,900 And what happens here is that when your computer is running the Tor software, 1397 01:10:54,900 --> 01:10:57,990 the Tor software first figures out, OK, who else in the world 1398 01:10:57,990 --> 01:10:59,730 is using the Tor software? 1399 01:10:59,730 --> 01:11:03,510 Because it's going to use those other computers to route your traffic, 1400 01:11:03,510 --> 01:11:07,050 up, down, left, and right and kind of like the movies or TV, 1401 01:11:07,050 --> 01:11:11,508 where you see a map of the world and the traffic is bouncing back and forth 1402 01:11:11,508 --> 01:11:12,300 and back and forth. 1403 01:11:12,300 --> 01:11:14,400 That's kind of the spirit of Tor. 1404 01:11:14,400 --> 01:11:17,250 And what happens is if your computer here 1405 01:11:17,250 --> 01:11:23,010 wants to send a request to a website that's maybe over here 1406 01:11:23,010 --> 01:11:25,530 and it decides, for instance, to route it through one, 1407 01:11:25,530 --> 01:11:29,610 two, three computers, what the Tor software will do 1408 01:11:29,610 --> 01:11:33,240 is encrypt the request at least three different times. 1409 01:11:33,240 --> 01:11:36,600 Whatever web request you are sending, whatever email you 1410 01:11:36,600 --> 01:11:39,630 are sending, whatever chat message you're sending, whatever service you 1411 01:11:39,630 --> 01:11:43,110 are using between point A on the left and point B on the right 1412 01:11:43,110 --> 01:11:47,310 is going to be encrypted with this node's public key, 1413 01:11:47,310 --> 01:11:51,600 with this node's public key, with this node's public key. 1414 01:11:51,600 --> 01:11:54,930 And so here's the onion in Tor, The Onion Router. 1415 01:11:54,930 --> 01:11:59,140 You are encrypting layer, upon layer, upon layer of data, 1416 01:11:59,140 --> 01:12:03,750 so that mathematically, recall per our discussion of public key cryptography, 1417 01:12:03,750 --> 01:12:07,230 only this node can peel off one layer, only 1418 01:12:07,230 --> 01:12:11,220 this node can peel off one layer, only this node can peel off one layer using 1419 01:12:11,220 --> 01:12:15,810 their own respective private keys, which undoes the effect of your 1420 01:12:15,810 --> 01:12:17,710 having encrypted your traffic. 1421 01:12:17,710 --> 01:12:19,920 So what you're really doing here by choosing, 1422 01:12:19,920 --> 01:12:23,730 perhaps, a different path, every request, a different path every day 1423 01:12:23,730 --> 01:12:27,570 is you are with Tor effectively covering your tracks in some sense. 1424 01:12:27,570 --> 01:12:32,290 And by design, the Tor software doesn't remember much information at all, 1425 01:12:32,290 --> 01:12:35,580 so it doesn't have the sorts of logs that I propose can be worrisome, 1426 01:12:35,580 --> 01:12:37,260 at least, in the context of web servers. 1427 01:12:37,260 --> 01:12:41,710 By design, Tor is meant to preserve your privacy with higher probability. 1428 01:12:41,710 --> 01:12:45,060 And so by design, it just doesn't keep nearly as much information around. 1429 01:12:45,060 --> 01:12:48,850 Now, this isn't to say that if you're doing this for malicious purposes, 1430 01:12:48,850 --> 01:12:50,910 trying to evade the authorities, this isn't 1431 01:12:50,910 --> 01:12:53,700 to say that this computer, this computer, this computer 1432 01:12:53,700 --> 01:12:57,000 couldn't be subpoenaed, so to speak, by some government entity 1433 01:12:57,000 --> 01:13:00,910 and they could reconstruct the path that your data took. 1434 01:13:00,910 --> 01:13:03,790 But the point is that it's generally quite laborious. 1435 01:13:03,790 --> 01:13:07,720 By this time, all of that data has disappeared from those interior nodes. 1436 01:13:07,720 --> 01:13:09,820 And so they don't have much information to share. 1437 01:13:09,820 --> 01:13:14,740 And so increasingly, it does provide you with some higher probability of privacy 1438 01:13:14,740 --> 01:13:18,520 by layering your requests with encryption, encryption, encryption 1439 01:13:18,520 --> 01:13:21,220 and sort of trusting that these interior nodes are 1440 01:13:21,220 --> 01:13:24,895 going to relay it to the final endpoint, so something 1441 01:13:24,895 --> 01:13:26,020 to consider if of interest. 1442 01:13:26,020 --> 01:13:30,400 But realize, too, that because of how the internet works with IP addresses, 1443 01:13:30,400 --> 01:13:32,800 because of how the internet works with port numbers, 1444 01:13:32,800 --> 01:13:38,360 it's still possible on a network to know who is using Tor, for instance. 1445 01:13:38,360 --> 01:13:43,360 So if you happen to be the only person at home, the only person on a company 1446 01:13:43,360 --> 01:13:46,990 or on a university network who's using Tor at the moment, 1447 01:13:46,990 --> 01:13:50,230 it's being used for malicious purposes, odds are, 1448 01:13:50,230 --> 01:13:53,840 you could be targeted as the source of that attack. 1449 01:13:53,840 --> 01:13:58,150 And so realize, in particular, that this just raises the bar to detection. 1450 01:13:58,150 --> 01:14:01,320 It raises the bar to your privacy being invaded. 1451 01:14:01,320 --> 01:14:05,870 But it does not, as do none of the technologies we've discussed, 1452 01:14:05,870 --> 01:14:10,120 give you an absolute protection of these same properties. 1453 01:14:10,120 --> 01:14:13,740 So there's one final mechanism when it comes to preserving one's privacy 1454 01:14:13,740 --> 01:14:16,230 that's thankfully increasingly available to us 1455 01:14:16,230 --> 01:14:19,800 on devices, on desktops and laptops, and especially, on phones. 1456 01:14:19,800 --> 01:14:23,190 And that's this notion of permissions, which isn't anything new. 1457 01:14:23,190 --> 01:14:26,910 But as iOS, and Android, and other operating systems 1458 01:14:26,910 --> 01:14:31,440 have evolved, increasingly, you and I are being asked by our operating 1459 01:14:31,440 --> 01:14:34,110 systems, do you want to allow this? 1460 01:14:34,110 --> 01:14:36,840 Not only do you want to allow this program to run, 1461 01:14:36,840 --> 01:14:40,980 but do you want to allow this program to access your camera, for instance? 1462 01:14:40,980 --> 01:14:44,040 Do you want this program to access your microphone, for instance? 1463 01:14:44,040 --> 01:14:48,940 Do you want this application to access your contacts, for instance? 1464 01:14:48,940 --> 01:14:53,310 So on the one hand, we're being given much more fine-grained control, 1465 01:14:53,310 --> 01:14:55,830 which is a good thing, presumably. 1466 01:14:55,830 --> 01:15:00,240 At the same, time, it's also just pushing the decision onto you and me. 1467 01:15:00,240 --> 01:15:03,450 And very often, with these applications, as you've probably found, well, 1468 01:15:03,450 --> 01:15:07,140 if you don't enable the camera and give access to the app, 1469 01:15:07,140 --> 01:15:10,840 it just might not work because they have some code in their application 1470 01:15:10,840 --> 01:15:14,870 that says if camera's not on, then do not do anything useful. 1471 01:15:14,870 --> 01:15:19,030 So there's this tension between usability and privacy in this case. 1472 01:15:19,030 --> 01:15:21,460 But thankfully, there's finer-grained controls too. 1473 01:15:21,460 --> 01:15:23,570 On iOS, for instance, you might be prompted, 1474 01:15:23,570 --> 01:15:26,290 do you want to give this app access to this feature 1475 01:15:26,290 --> 01:15:30,722 always, or only while using the application, or never? 1476 01:15:30,722 --> 01:15:32,680 And that's certainly a good thing for something 1477 01:15:32,680 --> 01:15:34,840 like the camera or the microphone, where it 1478 01:15:34,840 --> 01:15:37,090 would be nice to trust that when you close the app 1479 01:15:37,090 --> 01:15:39,730 and put your phone in your pocket, that it's not still 1480 01:15:39,730 --> 01:15:43,440 listening to or trying to watch you from this built-in hardware. 1481 01:15:43,440 --> 01:15:45,190 Now, there is some feature that might need 1482 01:15:45,190 --> 01:15:48,940 to run all of the time, which includes location-based services, which 1483 01:15:48,940 --> 01:15:51,340 is to say that our phones, especially nowadays, 1484 01:15:51,340 --> 01:15:54,880 can pretty effectively track our location using GPS, 1485 01:15:54,880 --> 01:15:57,100 or Wi-Fi, or some other technology. 1486 01:15:57,100 --> 01:16:00,910 Now, that's of course, useful, if not, necessary for using mapping 1487 01:16:00,910 --> 01:16:03,850 applications, like Maps, or Google Maps, or the like that 1488 01:16:03,850 --> 01:16:08,110 help us get physically from point A to point B. But very commonly, 1489 01:16:08,110 --> 01:16:10,360 these applications, at least, by default, 1490 01:16:10,360 --> 01:16:13,960 ask for access to your geographic location 1491 01:16:13,960 --> 01:16:17,260 always, which means just by walking down the street, 1492 01:16:17,260 --> 01:16:19,790 even if you're not following a map on your phone, 1493 01:16:19,790 --> 01:16:22,870 means that the app can still be tracking where you're going. 1494 01:16:22,870 --> 01:16:25,360 And certainly, among the Googles and the Apples 1495 01:16:25,360 --> 01:16:27,700 of the world nowadays or other manufacturers, 1496 01:16:27,700 --> 01:16:30,640 they certainly know pretty much everywhere you 1497 01:16:30,640 --> 01:16:35,600 and I are going if we leave these location-based services on by default. 1498 01:16:35,600 --> 01:16:40,900 So this is an example of something of which you should be mindful if only 1499 01:16:40,900 --> 01:16:43,900 because here is yet another example of information 1500 01:16:43,900 --> 01:16:48,130 that logically, when you think about it, OK, obviously, that makes sense. 1501 01:16:48,130 --> 01:16:51,280 They must be keeping track of my location, otherwise, 1502 01:16:51,280 --> 01:16:54,250 how could they provide me with mapping services? 1503 01:16:54,250 --> 01:16:58,030 But pause and think now, perhaps, exactly what the implications 1504 01:16:58,030 --> 01:17:02,260 are for you, for your privacy, and just walking around 24/7 1505 01:17:02,260 --> 01:17:05,830 with these radios now in our pockets. 1506 01:17:05,830 --> 01:17:08,950 So even though there are quite a few threats to our privacy, 1507 01:17:08,950 --> 01:17:11,860 online especially, at least, there are these mechanisms 1508 01:17:11,860 --> 01:17:16,360 that you and I can enable to at least preserve some of the same. 1509 01:17:16,360 --> 01:17:18,670 Well, what have we done over the past few weeks? 1510 01:17:18,670 --> 01:17:21,580 We began with a look at how we can secure our accounts, then 1511 01:17:21,580 --> 01:17:24,520 our data, then our systems, then our software, and today, of course, 1512 01:17:24,520 --> 01:17:26,930 focusing on preserving our privacy. 1513 01:17:26,930 --> 01:17:29,770 And by way of the various technologies we've looked at, 1514 01:17:29,770 --> 01:17:33,700 the stories we've told, the principles that we've introduced, 1515 01:17:33,700 --> 01:17:36,610 we hope that in the days, the weeks, and the years 1516 01:17:36,610 --> 01:17:39,700 to come, you can use all of these first principles, 1517 01:17:39,700 --> 01:17:41,470 and these ideas, and these building blocks 1518 01:17:41,470 --> 01:17:44,140 to extrapolate to how new technologies work, 1519 01:17:44,140 --> 01:17:48,580 to how new threats might affect you, and to what questions you should be asking 1520 01:17:48,580 --> 01:17:53,110 of either the software you use or the software you develop to ensure 1521 01:17:53,110 --> 01:17:56,290 that not only your communications are secure, but also, 1522 01:17:56,290 --> 01:18:00,850 that it has these privacy-preserving properties that you, and your users, 1523 01:18:00,850 --> 01:18:02,560 and your customers might want. 1524 01:18:02,560 --> 01:18:05,860 This then was CS50's Introduction to Cyber Security. 1525 01:18:05,860 --> 01:18:08,640 And this was CS50. 1526 01:18:08,640 --> 01:18:12,000