WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:02.928 [MUSIC PLAYING] 00:00:16.987 --> 00:00:18.070 DAVID J. MALAN: All right. 00:00:18.070 --> 00:00:21.130 This is CS50's Introduction to Cybersecurity. 00:00:21.130 --> 00:00:22.360 My name is David Malan. 00:00:22.360 --> 00:00:25.480 And this week, let's focus on preserving privacy. 00:00:25.480 --> 00:00:27.340 Indeed, over the past several weeks, we've 00:00:27.340 --> 00:00:32.350 focused on securing your accounts, your data, your systems, your software. 00:00:32.350 --> 00:00:36.640 And all of that is really about keeping communications between points A and B, 00:00:36.640 --> 00:00:39.310 for instance, secure, so that no one in between 00:00:39.310 --> 00:00:42.070 can actually access the information you're trying to share. 00:00:42.070 --> 00:00:47.090 But what if you, A, don't even want B to have some of that information? 00:00:47.090 --> 00:00:49.990 So indeed, today, let's focus on some of the technologies 00:00:49.990 --> 00:00:52.420 that you and I use every day and some of the technologies 00:00:52.420 --> 00:00:55.420 that underlie the software, and applications, 00:00:55.420 --> 00:00:58.150 and more that you and I are going to use tomorrow and beyond 00:00:58.150 --> 00:01:02.440 and consider exactly what information we're sharing now, 00:01:02.440 --> 00:01:04.599 perhaps, even without our knowledge and also 00:01:04.599 --> 00:01:07.660 empower you with certain mechanisms via which you can perhaps 00:01:07.660 --> 00:01:09.940 restrict all the more of this information 00:01:09.940 --> 00:01:13.220 if you, indeed, do not want to share it beyond yourself. 00:01:13.220 --> 00:01:16.360 So let's consider first some of the obvious features 00:01:16.360 --> 00:01:19.820 that you and I probably use every day, like your web browsing history. 00:01:19.820 --> 00:01:23.150 Whether you're on a laptop, or desktop, or mobile device, 00:01:23.150 --> 00:01:25.520 odds are you know by now that your browser tends 00:01:25.520 --> 00:01:29.180 to keep track of pretty much everywhere you go on the World Wide Web. 00:01:29.180 --> 00:01:31.640 That is to say, if you click on your URL bar, 00:01:31.640 --> 00:01:35.450 you can sometimes browse through the past few URLs that you visited. 00:01:35.450 --> 00:01:38.338 If you go up to your browser's history via some menu, 00:01:38.338 --> 00:01:40.130 you can probably see everything you've done 00:01:40.130 --> 00:01:43.130 earlier today, yesterday, last week, last year, 00:01:43.130 --> 00:01:45.420 or perhaps, even the entirety of your history, 00:01:45.420 --> 00:01:48.810 particularly, if you're logging into your Google account, 00:01:48.810 --> 00:01:51.060 Microsoft account, or something else. 00:01:51.060 --> 00:01:56.720 So the web browsing history is sort of both a concern when it comes 00:01:56.720 --> 00:01:58.755 to your privacy, but also a feature. 00:01:58.755 --> 00:02:00.380 Well, let's first consider the feature. 00:02:00.380 --> 00:02:01.650 Well, why is that useful? 00:02:01.650 --> 00:02:04.640 Well, one, I mean, even I occasionally go back 00:02:04.640 --> 00:02:06.440 through my history trying to find some web 00:02:06.440 --> 00:02:09.050 page that I know I was looking at earlier in the day, 00:02:09.050 --> 00:02:11.750 or yesterday, or some previous time in the past 00:02:11.750 --> 00:02:14.650 because it just helps me find information more quickly. 00:02:14.650 --> 00:02:16.910 And so in that sense, it might solve a problem for me. 00:02:16.910 --> 00:02:20.930 Moreover, you probably have noticed that your web browsing history is often 00:02:20.930 --> 00:02:22.700 used for features like autocomplete. 00:02:22.700 --> 00:02:25.580 So when you start typing a URL or maybe even 00:02:25.580 --> 00:02:28.310 a keyword that was in the name of a page, 00:02:28.310 --> 00:02:31.860 your browser might remember much more quickly what it is you're looking for. 00:02:31.860 --> 00:02:33.410 So you can just hit Enter or click. 00:02:33.410 --> 00:02:35.660 And voila, you're at that same page. 00:02:35.660 --> 00:02:40.070 But of course, this is a concern, potentially, for your privacy, 00:02:40.070 --> 00:02:44.360 whereby, you might not want someone else who has physical access to your device 00:02:44.360 --> 00:02:46.700 to start poking through where it is you've gone. 00:02:46.700 --> 00:02:49.670 You might not want someone else to have access if you just so happen 00:02:49.670 --> 00:02:54.200 to visit that website or those websites on maybe a computer in a lab 00:02:54.200 --> 00:02:56.790 environment, or an internet cafe, or the like. 00:02:56.790 --> 00:02:59.300 So you can imagine quite a few scenarios in which 00:02:59.300 --> 00:03:03.350 this is, yes, a feature, but quite a few other scenarios in which this is not 00:03:03.350 --> 00:03:06.050 really a desirable feature because it invades 00:03:06.050 --> 00:03:08.780 your privacy in some sense, or at least, puts it at risk 00:03:08.780 --> 00:03:10.800 for being invaded by someone else. 00:03:10.800 --> 00:03:14.630 So we'll consider how we might at least sanitize this history 00:03:14.630 --> 00:03:17.420 or remove it altogether in ways that you might already know about. 00:03:17.420 --> 00:03:21.350 For instance, you're probably already familiar with some option 00:03:21.350 --> 00:03:24.110 in your browser, whereby, you can clear your browser history. 00:03:24.110 --> 00:03:26.750 And that forgets, therefore, all of the places 00:03:26.750 --> 00:03:30.560 that you've been, all of the cookies that you might have accumulated, 00:03:30.560 --> 00:03:32.450 all of the usernames and passwords that might 00:03:32.450 --> 00:03:34.040 have been remembered by your browser. 00:03:34.040 --> 00:03:36.650 Although, that tends to be a fairly heavy-handed solution 00:03:36.650 --> 00:03:39.230 because when you clear your browser history, assuming you 00:03:39.230 --> 00:03:42.110 check all of those boxes, all of it is gone. 00:03:42.110 --> 00:03:46.340 And that might mean negatively that you're now logged out of Google, 00:03:46.340 --> 00:03:49.012 you're now logged out of Outlook, or some other account 00:03:49.012 --> 00:03:50.720 that you actually still want to use, even 00:03:50.720 --> 00:03:54.270 if you just wanted to clear your history from something else altogether. 00:03:54.270 --> 00:03:57.170 So we'll consider then what else might be 00:03:57.170 --> 00:04:00.440 a concern when it comes to your privacy beyond your own browser. 00:04:00.440 --> 00:04:04.880 And in fact, it doesn't even matter if you sanitize your own web browsing 00:04:04.880 --> 00:04:08.720 history and delete the entirety of it because it turns out 00:04:08.720 --> 00:04:12.290 that typically, any website you visit is, itself, 00:04:12.290 --> 00:04:16.490 on the server side also keeping track of a lot of that same information. 00:04:16.490 --> 00:04:18.959 That is to say that servers typically have logs. 00:04:18.959 --> 00:04:21.529 And these are not only for diagnostic purposes. 00:04:21.529 --> 00:04:23.660 In case anything goes wrong, the IT staff 00:04:23.660 --> 00:04:26.450 can use those logs to reconstruct history 00:04:26.450 --> 00:04:28.640 and figure it out, figure out who was doing what 00:04:28.640 --> 00:04:30.920 and when and how that might explain some problem. 00:04:30.920 --> 00:04:32.930 It might be used for auditing purposes if they 00:04:32.930 --> 00:04:35.690 want to keep track of exactly what was accessed on a system. 00:04:35.690 --> 00:04:37.850 It might be used for advertising purposes 00:04:37.850 --> 00:04:40.820 or analytical purposes more generally to mine 00:04:40.820 --> 00:04:45.020 or analyze that data to figure out how we might monetize it or do something 00:04:45.020 --> 00:04:46.730 else with that same information. 00:04:46.730 --> 00:04:48.770 But what do we mean concretely when we say 00:04:48.770 --> 00:04:51.780 that information is logged on a server? 00:04:51.780 --> 00:04:54.350 Well, it's very similar to your own web browsing history, 00:04:54.350 --> 00:04:56.060 but it's even more detailed. 00:04:56.060 --> 00:05:00.350 So here, for instance, is a representative piece of configuration 00:05:00.350 --> 00:05:04.190 that captures what is a very common convention for information 00:05:04.190 --> 00:05:06.740 that servers log, web servers, specifically, 00:05:06.740 --> 00:05:08.210 when you visit them with a browser. 00:05:08.210 --> 00:05:10.290 And I'll highlight just a subset thereof. 00:05:10.290 --> 00:05:13.310 This log format, the so-called combined format, 00:05:13.310 --> 00:05:16.370 indicates to me that it's very common for a server 00:05:16.370 --> 00:05:21.510 when you visit some web page on it that the server will log, that is, remember 00:05:21.510 --> 00:05:24.790 your remote address, otherwise known as your IP address. 00:05:24.790 --> 00:05:28.230 It will remember the day and time at which you accessed that page. 00:05:28.230 --> 00:05:30.540 It will remember exactly what you requested, 00:05:30.540 --> 00:05:33.330 so the name of the file or folder on the server 00:05:33.330 --> 00:05:35.920 specifically that you sought to download or look at. 00:05:35.920 --> 00:05:39.210 It'll remember the referrer, that is, the URL from which you came. 00:05:39.210 --> 00:05:41.640 And it will even remember the user agent that you use, 00:05:41.640 --> 00:05:43.710 that is to say your browser. 00:05:43.710 --> 00:05:46.560 So perhaps, unbeknownst to you, every time 00:05:46.560 --> 00:05:49.740 you use your browser to visit some website, inside 00:05:49.740 --> 00:05:54.510 of that virtual envelope is quite a bit more than just the request 00:05:54.510 --> 00:05:56.250 that you're making of the browser-- 00:05:56.250 --> 00:05:57.510 of the server, rather. 00:05:57.510 --> 00:06:00.510 It includes, yes, your IP address and on the outside 00:06:00.510 --> 00:06:02.640 of the envelope, as we've described it in the past. 00:06:02.640 --> 00:06:06.330 It includes some number of HTTP headers, as we've discussed in the past. 00:06:06.330 --> 00:06:09.450 But in particular, it includes information 00:06:09.450 --> 00:06:13.230 that you might not want being stored on servers in perpetuity 00:06:13.230 --> 00:06:16.860 and you have no control over deleting necessarily. 00:06:16.860 --> 00:06:20.430 Unless there is some regulatory requirement or law that 00:06:20.430 --> 00:06:23.910 requires that the server delete it for you on some schedule, 00:06:23.910 --> 00:06:27.220 you have much, much less control over this information. 00:06:27.220 --> 00:06:29.280 So let's consider in a bit more technical detail 00:06:29.280 --> 00:06:31.860 what some of this information is and how you 00:06:31.860 --> 00:06:35.670 might at least exert some control over just how much of that information 00:06:35.670 --> 00:06:36.880 is being shared. 00:06:36.880 --> 00:06:40.710 So let's revisit first this building block of HTTP headers 00:06:40.710 --> 00:06:44.250 that we keep coming back to if only because in the world of systems 00:06:44.250 --> 00:06:47.970 and software nowadays, things on the web are just so common. 00:06:47.970 --> 00:06:52.140 Using HTML, CSS, JavaScript, using web browsers and web servers, 00:06:52.140 --> 00:06:54.450 that's driving a lot of today's interactions 00:06:54.450 --> 00:06:57.090 with technology, whether it's in native applications 00:06:57.090 --> 00:07:01.600 or whether it's with mobile websites, or desktop websites, or the like. 00:07:01.600 --> 00:07:05.460 So HTTP headers, recall, are just like key value pairs 00:07:05.460 --> 00:07:07.950 that are inside of those virtual envelopes that 00:07:07.950 --> 00:07:11.850 indicate some kind of setting or some kind of piece of information 00:07:11.850 --> 00:07:13.830 that the browser is sending to the server 00:07:13.830 --> 00:07:16.210 or that the server is sending to the browser. 00:07:16.210 --> 00:07:19.920 So for instance, if I go on google.com and I search, 00:07:19.920 --> 00:07:24.300 as I often do, for cats, well, what might be going on underneath the hood? 00:07:24.300 --> 00:07:29.530 Well, in that web page that Google gives me with 10, or 20, or 30, 00:07:29.530 --> 00:07:34.960 or six billion, 240 million cats, there might be HTML that looks like this. 00:07:34.960 --> 00:07:37.290 And recall that this HTML, which I'm proposing 00:07:37.290 --> 00:07:41.460 exists somewhere in Google search results, is an anchor tag for a link. 00:07:41.460 --> 00:07:43.320 There's the n tag over there. 00:07:43.320 --> 00:07:46.170 The hyper-reference for this link, or href attribute 00:07:46.170 --> 00:07:51.720 has a value of https://example.com, for instance. 00:07:51.720 --> 00:07:56.190 And the word that the human will see is cats, literally in this case. 00:07:56.190 --> 00:07:58.710 Now, I'm assuming for the sake of discussion that today, 00:07:58.710 --> 00:08:01.020 example.com is a website full of cats. 00:08:01.020 --> 00:08:04.020 And that's why it might be appearing among Google search results 00:08:04.020 --> 00:08:06.870 when I search for cats as my keyword. 00:08:06.870 --> 00:08:12.930 But when a user like you or me clicks on that link on google.com, 00:08:12.930 --> 00:08:15.630 because that is literally where you're looking at the search 00:08:15.630 --> 00:08:20.790 results in this story, it turns out that your browser not only goes and requests 00:08:20.790 --> 00:08:26.040 that web page, your browser includes an HTTP header 00:08:26.040 --> 00:08:29.040 like this in that virtual envelope. 00:08:29.040 --> 00:08:33.039 That's specifically called referrer-- that's the key in this case-- 00:08:33.039 --> 00:08:37.169 the value of which is the URL from which you came. 00:08:37.169 --> 00:08:41.250 So for instance, if I have just gone to google.com, and I've searched for cats, 00:08:41.250 --> 00:08:43.830 and I've hit Enter, recall, as in past classes, 00:08:43.830 --> 00:08:47.160 I proposed that the shortest version of the URL that you might see 00:08:47.160 --> 00:08:50.040 in your browser upon searching for cats is this, 00:08:50.040 --> 00:08:57.150 https://www.google.com/search?q=cats. 00:08:57.150 --> 00:08:59.640 Now, that is what you'd see in your URL bar. 00:08:59.640 --> 00:09:02.610 Below that, you'd see the 10, or the 20, or the 30, or the six billion, 00:09:02.610 --> 00:09:06.510 240 million cats, each of which has a link that when clicked, 00:09:06.510 --> 00:09:08.280 leads you to a search result. 00:09:08.280 --> 00:09:12.510 But the implication of this HTTP header is that by default, 00:09:12.510 --> 00:09:15.300 perhaps, unbeknownst to you, indeed, your browser 00:09:15.300 --> 00:09:18.270 is telling the whole world from which web page 00:09:18.270 --> 00:09:23.490 you came when you visited some other web page via a link. 00:09:23.490 --> 00:09:26.250 Now, why in the world is this compelling? 00:09:26.250 --> 00:09:30.510 Well, it's actually useful for the website at which you end because it 00:09:30.510 --> 00:09:32.100 might be useful for their analytics. 00:09:32.100 --> 00:09:34.980 They might want to know, well, how are people finding my website? 00:09:34.980 --> 00:09:37.120 How are people finding my business on the internet? 00:09:37.120 --> 00:09:37.620 Oh. 00:09:37.620 --> 00:09:39.570 It looks like I'm getting a lot of users, 00:09:39.570 --> 00:09:42.300 a lot of customers, perhaps, from google.com, 00:09:42.300 --> 00:09:47.140 specifically, when someone searches for cats, not dogs, not something else, 00:09:47.140 --> 00:09:47.850 but cats. 00:09:47.850 --> 00:09:50.610 So you can imagine, especially in the world of commerce, 00:09:50.610 --> 00:09:54.240 that just being useful information to know how people are finding you, 00:09:54.240 --> 00:09:57.810 or conversely, how people are not apparently finding you. 00:09:57.810 --> 00:10:01.228 But this is very invasive because now this website, 00:10:01.228 --> 00:10:03.270 even though it's arguably none of their business, 00:10:03.270 --> 00:10:07.440 they know I use Google instead of Bing or some other search engine, perhaps. 00:10:07.440 --> 00:10:11.310 And you can imagine that there could be links on CS50's own website, 00:10:11.310 --> 00:10:13.350 on any number of other websites in the world. 00:10:13.350 --> 00:10:15.810 And just because you happened to visit them and you clicked a link, 00:10:15.810 --> 00:10:18.380 now they're broadcasting your business to whatever website 00:10:18.380 --> 00:10:22.820 you're ending up on by revealing where you came from, from where 00:10:22.820 --> 00:10:24.810 you were referred, so to speak. 00:10:24.810 --> 00:10:28.110 But this is long been a feature of HTTP. 00:10:28.110 --> 00:10:32.480 And this has long been a feature that's enabled by default, unless the website, 00:10:32.480 --> 00:10:37.670 or perhaps, you, as the user, turn this off or somehow moderate its response. 00:10:37.670 --> 00:10:41.120 Now, some of you might be noticing that there's a bit of a typo on the screen. 00:10:41.120 --> 00:10:43.128 And I promise, this isn't actually mine. 00:10:43.128 --> 00:10:45.920 In English, at least, this is not typically how you spell the word, 00:10:45.920 --> 00:10:46.610 referrer. 00:10:46.610 --> 00:10:48.320 And this is actually a fun fact. 00:10:48.320 --> 00:10:51.620 In referrer, there should be four R's in total. 00:10:51.620 --> 00:10:57.200 It should be R-E-F-E-R-R-E-R. However, fun fact, years ago, 00:10:57.200 --> 00:11:01.140 when the specification for this standard was being written, 00:11:01.140 --> 00:11:05.360 the poor individual who wrote the specification made a typo that has been 00:11:05.360 --> 00:11:08.520 immortalized in history for years to come. 00:11:08.520 --> 00:11:11.540 And so this is what browsers and servers have been using and expecting 00:11:11.540 --> 00:11:12.320 for years. 00:11:12.320 --> 00:11:15.560 There are other variants of this header that these typographical error has 00:11:15.560 --> 00:11:16.740 been fixed in. 00:11:16.740 --> 00:11:19.680 But it's sort of a fun fact from our internet history. 00:11:19.680 --> 00:11:23.880 But this is, indeed, what you might see going from your browser to your server. 00:11:23.880 --> 00:11:27.900 So ideally, we'd send less information, at least. 00:11:27.900 --> 00:11:31.020 I'd be a little more comfortable if example.com, 00:11:31.020 --> 00:11:34.780 which is this website for cats, told them, OK, fine. 00:11:34.780 --> 00:11:35.580 I came from Google. 00:11:35.580 --> 00:11:36.540 That's not a big deal. 00:11:36.540 --> 00:11:39.150 But I'd rather you not know what I was looking for 00:11:39.150 --> 00:11:41.400 if only because that seems unnecessary. 00:11:41.400 --> 00:11:42.300 It seems invasive. 00:11:42.300 --> 00:11:44.820 And who knows what kinds of cats I was looking for? 00:11:44.820 --> 00:11:48.510 Maybe I don't want you to know exactly what my preferences are in cats, 00:11:48.510 --> 00:11:51.180 or dogs, or whatever types of breeds there might be 00:11:51.180 --> 00:11:53.020 in this case of searching for animals. 00:11:53.020 --> 00:11:55.770 So it just feels like it's unnecessary information to share. 00:11:55.770 --> 00:11:58.020 But better still, I dare say, would not be 00:11:58.020 --> 00:12:01.350 to even tell example.com where I'm coming from 00:12:01.350 --> 00:12:04.510 and essentially just get rid of this altogether. 00:12:04.510 --> 00:12:08.250 So how might a website go about moderating 00:12:08.250 --> 00:12:11.460 just how much output comes from the browsers at the server's request? 00:12:11.460 --> 00:12:13.830 Or maybe, how might you with special software 00:12:13.830 --> 00:12:17.160 suppress some of this information to preserve all the more of your privacy 00:12:17.160 --> 00:12:18.810 and what it is you're doing online? 00:12:18.810 --> 00:12:23.160 Well, for instance, this is a common tag that web pages 00:12:23.160 --> 00:12:27.690 can put in their own HTML code that indicates to the browser 00:12:27.690 --> 00:12:32.310 that, yes, you may send the referring address, but only send the origin, 00:12:32.310 --> 00:12:36.540 that is, https://www.google.com/. 00:12:36.540 --> 00:12:43.230 And that's it, no search, path, no ?q=cats. 00:12:43.230 --> 00:12:46.740 Tell them the website you came from, but not the specific page or not 00:12:46.740 --> 00:12:49.600 the specific search query or search results. 00:12:49.600 --> 00:12:53.040 Notice here in the world of HTML, the typographical error has been fixed. 00:12:53.040 --> 00:12:54.670 There's two R's in the middle there. 00:12:54.670 --> 00:12:58.710 But otherwise, this is an HTML solution to the problem. 00:12:58.710 --> 00:13:01.860 The browser, assuming it respects this HTML, 00:13:01.860 --> 00:13:05.280 will therefore, send and refer HTTP header, 00:13:05.280 --> 00:13:10.020 but with less information, not the whole URL, but just the origin, so really, 00:13:10.020 --> 00:13:13.980 the domain name, itself, and a bit more the protocol. 00:13:13.980 --> 00:13:17.670 If you don't want any of that to be sent for your users, for your customers 00:13:17.670 --> 00:13:18.802 you could do this instead. 00:13:18.802 --> 00:13:20.010 Now, Google does not do this. 00:13:20.010 --> 00:13:24.030 Google currently actually sends origin, so part of the URL. 00:13:24.030 --> 00:13:26.190 But if you want to be an even better citizen 00:13:26.190 --> 00:13:29.820 and not make it easy for browsers to send more information than they need 00:13:29.820 --> 00:13:32.760 to, you can include this HTML in your page 00:13:32.760 --> 00:13:35.850 instead, informing the browser that you can send-- 00:13:35.850 --> 00:13:40.470 don't send a referrer at all because the value of this meta tag, so to speak, 00:13:40.470 --> 00:13:43.410 is actually none instead of origin. 00:13:43.410 --> 00:13:46.140 And there are other values as well that allow you 00:13:46.140 --> 00:13:48.870 a bit of range of opportunities when it comes to these settings. 00:13:48.870 --> 00:13:52.980 But these are, perhaps, the most common or ones to consider. 00:13:52.980 --> 00:13:54.240 There's an alternative too. 00:13:54.240 --> 00:13:56.032 If you happen to be a little more technical 00:13:56.032 --> 00:14:00.360 and you have control over the web server and not just the HTML on the server, 00:14:00.360 --> 00:14:05.430 you can actually configure a referrer policy, HTTP header, 00:14:05.430 --> 00:14:09.640 that goes from the browserver to the browser. 00:14:09.640 --> 00:14:12.510 So in this case, the referrer policy can indicate 00:14:12.510 --> 00:14:16.260 that you only want the origin to be sent, for instance, the shorter 00:14:16.260 --> 00:14:17.400 form of the URL. 00:14:17.400 --> 00:14:20.910 Or you can actually indicate that no referrer should actually 00:14:20.910 --> 00:14:24.270 be sent in this particular case, so a second mechanism 00:14:24.270 --> 00:14:26.400 for actually controlling the same. 00:14:26.400 --> 00:14:31.190 Let me pause here and see if there's not only some concerns, perhaps, 00:14:31.190 --> 00:14:32.940 now that you understand better, hopefully, 00:14:32.940 --> 00:14:37.800 how the web works, at least, by default or how we might mitigate 00:14:37.800 --> 00:14:40.080 this concern with your privacy. 00:14:40.080 --> 00:14:43.350 AUDIENCE: Is there a way that is easy enough 00:14:43.350 --> 00:14:49.110 for us to delete those traces as a client in case 00:14:49.110 --> 00:14:51.595 that we don't want to be tracked or something like that? 00:14:51.595 --> 00:14:53.220 DAVID J. MALAN: A really good question. 00:14:53.220 --> 00:14:57.870 We'll refer you to some URLs outside of the context of class, itself. 00:14:57.870 --> 00:15:00.240 But yes, there is actually client-side software 00:15:00.240 --> 00:15:02.520 that you can install on your own Mac or PC, 00:15:02.520 --> 00:15:05.340 typically, that will scrub some of this information, 00:15:05.340 --> 00:15:09.210 so that when your HTTP requests you go from your browser to servers, 00:15:09.210 --> 00:15:12.330 you can ensure that this third-party software removes 00:15:12.330 --> 00:15:15.750 a lot of that information automatically for you because in that way, 00:15:15.750 --> 00:15:19.130 you don't have to trust that the website, like the Googles of the world 00:15:19.130 --> 00:15:22.500 will actually reduce the amount of information for you. 00:15:22.500 --> 00:15:25.250 You can instead do that for yourself through client-side software. 00:15:25.250 --> 00:15:28.100 And we'll provide a few links online. 00:15:28.100 --> 00:15:29.990 Other questions on the same? 00:15:29.990 --> 00:15:33.160 AUDIENCE: By using a private browser such as Tor, 00:15:33.160 --> 00:15:36.980 for example, or using a temporary operating system like Tails, 00:15:36.980 --> 00:15:40.040 does this remove all of our traces on the internet? 00:15:40.040 --> 00:15:43.157 Or does it leave some on the client side or the server side? 00:15:43.157 --> 00:15:44.490 DAVID J. MALAN: A good question. 00:15:44.490 --> 00:15:47.330 Short answer is that it does leave some evidence on both the server 00:15:47.330 --> 00:15:48.680 side and the client side. 00:15:48.680 --> 00:15:52.590 But we'll come back to Tor in just a little bit as well. 00:15:52.590 --> 00:15:53.090 All right. 00:15:53.090 --> 00:15:54.320 How about one final question? 00:15:54.320 --> 00:15:58.430 AUDIENCE: You said previously about the third-party software that's 00:15:58.430 --> 00:16:02.210 supposed to be used in order to scrub the information from being submitted 00:16:02.210 --> 00:16:04.190 to the server side. 00:16:04.190 --> 00:16:07.970 What if that program, itself, is used to eavesdrop 00:16:07.970 --> 00:16:10.895 on what we do on the computer? 00:16:10.895 --> 00:16:12.770 DAVID J. MALAN: That is a very valid concern. 00:16:12.770 --> 00:16:14.690 It is absolutely possible. 00:16:14.690 --> 00:16:17.540 In general, what is working in your favor 00:16:17.540 --> 00:16:20.840 is either open-source software, where if you're using software 00:16:20.840 --> 00:16:23.840 that other people can see the source code of, presumably, 00:16:23.840 --> 00:16:26.210 it's less likely that it's doing something malicious. 00:16:26.210 --> 00:16:29.090 Capitalism often helps you here too, whereby, 00:16:29.090 --> 00:16:32.000 it is often not in a company's own interest 00:16:32.000 --> 00:16:35.320 to be violating the privacy of their users because presumably, 00:16:35.320 --> 00:16:38.570 that would create some form of backlash, which would not be good for business. 00:16:38.570 --> 00:16:41.240 But beyond that, there is a lot of trust on your part 00:16:41.240 --> 00:16:44.240 and my part whenever it comes to installing software. 00:16:44.240 --> 00:16:46.850 So that is, indeed, very much a risk. 00:16:46.850 --> 00:16:48.980 Now, it turns out there's other information 00:16:48.980 --> 00:16:52.100 that your browser might be sharing without your realizing 00:16:52.100 --> 00:16:53.390 that it's making it available. 00:16:53.390 --> 00:16:57.260 And that information is enough via which servers can even 00:16:57.260 --> 00:16:58.820 fingerprint you, so to speak. 00:16:58.820 --> 00:17:01.070 That is to say there's this technique generally called 00:17:01.070 --> 00:17:03.230 fingerprinting that in the context of the web 00:17:03.230 --> 00:17:07.310 means to take as input a whole bunch of characteristics 00:17:07.310 --> 00:17:09.859 of the request from the internet that's coming in 00:17:09.859 --> 00:17:14.060 and see if you can use those characteristics to create 00:17:14.060 --> 00:17:17.869 a profile of sorts for the user via which you can uniquely 00:17:17.869 --> 00:17:19.440 identify that user. 00:17:19.440 --> 00:17:22.579 Now, that doesn't mean you'll know specifically that user is David Malan. 00:17:22.579 --> 00:17:25.550 But you will know, according to this system, 00:17:25.550 --> 00:17:28.790 if it's the same user today, as you see tomorrow, 00:17:28.790 --> 00:17:32.690 as you see the next day because you can use this information 00:17:32.690 --> 00:17:36.770 to infer with high probability that, OK, we saw that exact same browser 00:17:36.770 --> 00:17:39.050 configuration again, and again, and again. 00:17:39.050 --> 00:17:42.170 Odds are it's the same person and not some twin 00:17:42.170 --> 00:17:45.150 on the internet who just happens to have precisely those settings. 00:17:45.150 --> 00:17:48.260 Now, how might this be implemented or achieved technologically? 00:17:48.260 --> 00:17:50.360 Well, the simplest mechanism, perhaps, is just 00:17:50.360 --> 00:17:52.190 to rely on something like your IP address. 00:17:52.190 --> 00:17:54.830 Recall that any time you're doing something on the internet, 00:17:54.830 --> 00:17:57.110 those virtual envelopes we keep talking about 00:17:57.110 --> 00:18:00.480 have your IP address on the outside, so to speak, 00:18:00.480 --> 00:18:03.470 as well as the IP address of the destination to which you're 00:18:03.470 --> 00:18:04.820 trying to send information. 00:18:04.820 --> 00:18:07.080 Your IP, in that case, is the return address, 00:18:07.080 --> 00:18:10.010 which means you're literally telling the remote server when 00:18:10.010 --> 00:18:13.190 using certain protocols where you are in the world, 00:18:13.190 --> 00:18:15.050 or at least, what your IP address is. 00:18:15.050 --> 00:18:18.320 Now, that IP address might not alone uniquely identify you 00:18:18.320 --> 00:18:22.700 because it turns out on campuses, in homes, in corporate networks, 00:18:22.700 --> 00:18:26.390 you might actually share one IP address with many other people, 00:18:26.390 --> 00:18:30.090 but at least narrows the scope of whose IP it might be, 00:18:30.090 --> 00:18:31.970 even if it's shared among a few people. 00:18:31.970 --> 00:18:34.940 But your browser inside of that virtual envelope 00:18:34.940 --> 00:18:37.200 is sharing other information as well. 00:18:37.200 --> 00:18:42.350 Another HTTP header that is typically sent by browsers to servers 00:18:42.350 --> 00:18:43.940 is called user agent. 00:18:43.940 --> 00:18:48.380 And this is just a unique string of text that uniquely identifies typically 00:18:48.380 --> 00:18:52.310 the browser that you're using and the version thereof and the operating 00:18:52.310 --> 00:18:54.530 system that you're using and the version thereof. 00:18:54.530 --> 00:18:58.052 So for instance, a standard format might look a little something like this. 00:18:58.052 --> 00:18:59.510 And it's deliberately overwhelming. 00:18:59.510 --> 00:19:01.880 And it's just meant to capture how much detail 00:19:01.880 --> 00:19:04.040 might be leaked in this header's value. 00:19:04.040 --> 00:19:06.740 But within this big string of text that doesn't even 00:19:06.740 --> 00:19:09.740 fit onto one line-- it's wrapping here under three lines-- is 00:19:09.740 --> 00:19:13.400 some indication of what browser you're using, be it, Chrome or something else 00:19:13.400 --> 00:19:15.620 and what operating system you're using, be it, 00:19:15.620 --> 00:19:19.520 Android or something else on a phone, a laptop, or a desktop. 00:19:19.520 --> 00:19:21.710 Now, of course, a lot of people in the world 00:19:21.710 --> 00:19:23.870 presumably have the same browser installed. 00:19:23.870 --> 00:19:26.600 So that, too, even with IP address, might not 00:19:26.600 --> 00:19:30.140 be enough information to uniquely identify you, 00:19:30.140 --> 00:19:31.530 at least, with high probability. 00:19:31.530 --> 00:19:33.720 So what else can servers do? 00:19:33.720 --> 00:19:37.970 Well, if the server has the ability to send some code to your computer, 00:19:37.970 --> 00:19:41.330 for instance, some HTML, some CSS, and some JavaScript, 00:19:41.330 --> 00:19:45.710 servers can effectively interrogate the browser and ask it certain questions. 00:19:45.710 --> 00:19:50.768 For instance, a server could figure out what the resolution is of your screen. 00:19:50.768 --> 00:19:53.060 Now, this might be practically useful, so they know how 00:19:53.060 --> 00:19:54.900 to render information on the screen. 00:19:54.900 --> 00:19:56.610 But that alone might be enough. 00:19:56.610 --> 00:20:00.410 Especially if you're in the habit of full screening your browser 00:20:00.410 --> 00:20:03.260 and you always use the same resolution on your monitor, 00:20:03.260 --> 00:20:07.070 that might be another ingredient with which to identify or fingerprint you. 00:20:07.070 --> 00:20:09.890 The server might be able to figure out what fonts 00:20:09.890 --> 00:20:12.170 you have installed on your system. 00:20:12.170 --> 00:20:14.750 The server might be able to figure out what time 00:20:14.750 --> 00:20:18.950 zone you are in because that's also a value available within the context 00:20:18.950 --> 00:20:19.760 of a browser. 00:20:19.760 --> 00:20:24.470 And there's yet other values still that collectively with high probability 00:20:24.470 --> 00:20:26.760 can be used to fingerprint you and me. 00:20:26.760 --> 00:20:29.390 So even if you're not even logged in, even if you're 00:20:29.390 --> 00:20:32.990 using various privacy enhancing software products to try 00:20:32.990 --> 00:20:35.330 to remove some of these HTTP headers and the like, 00:20:35.330 --> 00:20:39.680 you're still leaking other information, including the extensions or plug-ins, 00:20:39.680 --> 00:20:42.690 sometimes, that your browser might have installed. 00:20:42.690 --> 00:20:45.983 So if you're in the habit of using the same computer again and again 00:20:45.983 --> 00:20:48.650 and you're in the habit of not changing a lot of these settings, 00:20:48.650 --> 00:20:52.070 that alone might be enough for a website to effectively track you. 00:20:52.070 --> 00:20:53.420 Now, it might be innocuous. 00:20:53.420 --> 00:20:55.520 They might just use this for statistical purposes 00:20:55.520 --> 00:20:58.010 to get a sense of how many users or customers they have. 00:20:58.010 --> 00:21:00.050 But it could be for more invasive purposes, 00:21:00.050 --> 00:21:02.330 like serving you targeted advertising, based 00:21:02.330 --> 00:21:05.450 on your behavior of these websites, or really, just tracking you, 00:21:05.450 --> 00:21:06.260 specifically. 00:21:06.260 --> 00:21:09.740 And the catch is that if you ever log in to this server 00:21:09.740 --> 00:21:13.820 just once, if the server has been logging all of your traffic based 00:21:13.820 --> 00:21:17.510 on that fingerprint for days, for months, for years, at that point, 00:21:17.510 --> 00:21:20.870 retroactively, with high probability, they can infer, oh, wait a minute. 00:21:20.870 --> 00:21:23.570 If the user on this day was David and we think 00:21:23.570 --> 00:21:27.080 it was the same user on all of these previous days, now by transitivity, 00:21:27.080 --> 00:21:30.810 they know a lot more about your browser history as well. 00:21:30.810 --> 00:21:34.460 So even unbeknownst to you, and even without explicit header values 00:21:34.460 --> 00:21:37.940 being sent that identify you, the collection 00:21:37.940 --> 00:21:41.360 of attributes or characteristics that our browsers have 00:21:41.360 --> 00:21:45.530 and our browsing behavior has can still be enough to uniquely identify 00:21:45.530 --> 00:21:47.570 most of us quite a bit of the time. 00:21:47.570 --> 00:21:51.080 Let me pause here and see if there's any questions on fingerprinting 00:21:51.080 --> 00:21:54.560 or these implications for privacy. 00:21:54.560 --> 00:21:58.897 AUDIENCE: Will using a VPN prevent browser fingerprinting? 00:21:58.897 --> 00:22:00.230 DAVID J. MALAN: A good question. 00:22:00.230 --> 00:22:02.570 And we'll talk about VPNs a bit more soon. 00:22:02.570 --> 00:22:03.650 Short answer, no. 00:22:03.650 --> 00:22:07.490 So VPNs will typically mask your IP address, but that's about it. 00:22:07.490 --> 00:22:12.115 If you still use your browser as usual with your user account as usual, 00:22:12.115 --> 00:22:14.240 all of that same information is going to be leaked. 00:22:14.240 --> 00:22:16.850 It's just going to change one piece of it. 00:22:16.850 --> 00:22:18.090 A good question. 00:22:18.090 --> 00:22:20.645 Other questions on fingerprinting and privacy? 00:22:20.645 --> 00:22:24.980 AUDIENCE: Is it possible that a hacker can steal a fingerprint 00:22:24.980 --> 00:22:28.670 and use it for their own purposes and everything 00:22:28.670 --> 00:22:33.100 will look like it was my computer that performed certain actions? 00:22:33.100 --> 00:22:36.695 so it's like stealing an identity. 00:22:36.695 --> 00:22:38.810 DAVID J. MALAN: A short answer, yes, if the hacker 00:22:38.810 --> 00:22:40.760 has access to the same information. 00:22:40.760 --> 00:22:43.580 If though, if we rewind to our focus on encryption 00:22:43.580 --> 00:22:46.910 a couple of classes ago, if you are accessing websites only 00:22:46.910 --> 00:22:50.870 via HTTPS and nothing is unencrypted, then it's 00:22:50.870 --> 00:22:54.320 going to be a lot harder for a hacker in between you and that server 00:22:54.320 --> 00:22:58.130 to glean any of the same information because almost all of it is encrypted. 00:22:58.130 --> 00:22:59.360 IP address is not. 00:22:59.360 --> 00:23:04.140 But anything inside of the envelope is, including these headers, the HTML, 00:23:04.140 --> 00:23:05.900 the JavaScript, and the CSS. 00:23:05.900 --> 00:23:08.240 If, though, the hacker has somehow infiltrated 00:23:08.240 --> 00:23:13.070 your own laptop, or desktop, or phone, or the server, then all bets are off. 00:23:13.070 --> 00:23:15.590 And they could absolutely identify you, according 00:23:15.590 --> 00:23:18.730 to these same pieces of information. 00:23:18.730 --> 00:23:19.796 Other questions? 00:23:19.796 --> 00:23:22.780 AUDIENCE: I was just curious to understand the difference, perhaps, 00:23:22.780 --> 00:23:24.310 when you are on mobile. 00:23:24.310 --> 00:23:28.150 My understanding is that they can even get much more information 00:23:28.150 --> 00:23:29.915 when you are on mobile. 00:23:29.915 --> 00:23:31.540 DAVID J. MALAN: That's a fair question. 00:23:31.540 --> 00:23:34.030 I don't think I would answer yes to that. 00:23:34.030 --> 00:23:38.680 I'm hard pressed to imagine what more your phone is doing than the browser is 00:23:38.680 --> 00:23:41.800 doing, except that there are-- 00:23:41.800 --> 00:23:44.500 I suppose I could argue that your phone tends 00:23:44.500 --> 00:23:49.990 to have additional features nowadays, like GPS, like accelerometers, 00:23:49.990 --> 00:23:53.740 gyroscope, perhaps, so other hardware features that theoretically 00:23:53.740 --> 00:23:57.530 can be interrogated by JavaScript code, typically, on an opt-in basis. 00:23:57.530 --> 00:24:00.790 So you, the user, could deny access to these pieces of information. 00:24:00.790 --> 00:24:03.820 But those characteristics, I suspect could 00:24:03.820 --> 00:24:08.260 be used to identify you a bit more uniquely because laptops, at least, 00:24:08.260 --> 00:24:11.538 today, have less of that functionality. 00:24:11.538 --> 00:24:12.205 Other questions? 00:24:12.205 --> 00:24:17.350 AUDIENCE: When storing and retrieving data on the front end, 00:24:17.350 --> 00:24:24.125 is it more secure to use cookies, local storage, or another alternative? 00:24:24.125 --> 00:24:25.750 DAVID J. MALAN: A really good question. 00:24:25.750 --> 00:24:29.500 And we will come to this subject literally in one slide, cookies. 00:24:29.500 --> 00:24:32.200 In general, local storage because cookies, 00:24:32.200 --> 00:24:34.960 by design, are meant to be sent back and forth, back and forth 00:24:34.960 --> 00:24:36.700 between browser and server. 00:24:36.700 --> 00:24:39.940 Theoretically, that should not be a concern if everything is encrypted. 00:24:39.940 --> 00:24:43.060 But we've talked in the past already how mistakes can be made. 00:24:43.060 --> 00:24:46.360 You might start on HTTP, be redirected to HTTPS. 00:24:46.360 --> 00:24:49.360 So in general, storing things in local storage, at least, 00:24:49.360 --> 00:24:52.840 prevent things from accidentally leaking out over the browser connection. 00:24:52.840 --> 00:24:55.900 That said, if you're storing things in local storage, 00:24:55.900 --> 00:24:57.770 they are literally available locally. 00:24:57.770 --> 00:25:00.160 So if you have a colleague, a friend, a sibling 00:25:00.160 --> 00:25:04.000 who gains physical access to that device, let alone, an adversary, 00:25:04.000 --> 00:25:07.330 then they could see all of the information and not only your cookies, 00:25:07.330 --> 00:25:09.100 but also local storage. 00:25:09.100 --> 00:25:11.890 So at that point, physical access, generally, 00:25:11.890 --> 00:25:15.170 all bets are off when it comes to your privacy. 00:25:15.170 --> 00:25:15.670 All right. 00:25:15.670 --> 00:25:17.670 How about one other question? 00:25:17.670 --> 00:25:22.300 AUDIENCE: There were calls being made from people's local phone numbers 00:25:22.300 --> 00:25:25.540 on cell phones to other local numbers. 00:25:25.540 --> 00:25:27.710 Obviously, the people weren't making the calls. 00:25:27.710 --> 00:25:28.960 And it had happened to me too. 00:25:28.960 --> 00:25:31.540 And I was wondering how that kind of works 00:25:31.540 --> 00:25:34.000 or if it's related to this at all. 00:25:34.000 --> 00:25:35.298 DAVID J. MALAN: It is. 00:25:35.298 --> 00:25:37.090 We weren't planning to talk about it today. 00:25:37.090 --> 00:25:41.380 But in a nutshell, it is very easy to spoof telephone numbers. 00:25:41.380 --> 00:25:44.440 And this is how a lot of spam calls are sent, particularly, 00:25:44.440 --> 00:25:46.540 internationally or abroad, where they might not 00:25:46.540 --> 00:25:49.510 be regulated in the same way as someone's home country. 00:25:49.510 --> 00:25:53.590 It's very common, too, for if your number starts-- 00:25:53.590 --> 00:25:57.820 your own phone number starts with 555, for instance, very often, you'll 00:25:57.820 --> 00:26:01.120 get fake calls from other numbers that also 00:26:01.120 --> 00:26:03.850 start with 555 because the presumption by the adversary 00:26:03.850 --> 00:26:06.790 is that, oh, Sabrina's probably more likely to pick this up 00:26:06.790 --> 00:26:09.910 if she thinks it's a neighbor with a similar looking phone number. 00:26:09.910 --> 00:26:11.860 But unfortunately, with the phone system, 00:26:11.860 --> 00:26:13.960 it's all too easy to fake phone numbers. 00:26:13.960 --> 00:26:17.480 And this is yet another reason why using phones, using SMS, 00:26:17.480 --> 00:26:20.030 is not a recommended approach for our earlier topic 00:26:20.030 --> 00:26:21.890 about multi-factor authentication. 00:26:21.890 --> 00:26:23.390 It's just not a secure network. 00:26:23.390 --> 00:26:27.230 That's not how Edison and others designed it 100-plus years ago. 00:26:27.230 --> 00:26:30.890 This is why systems that use cryptography in some form 00:26:30.890 --> 00:26:35.020 are much safer when it comes to that information. 00:26:35.020 --> 00:26:35.650 All right. 00:26:35.650 --> 00:26:38.020 So beyond this user agent header, there's 00:26:38.020 --> 00:26:42.032 other headers that your browser is often sending back and forth with the server. 00:26:42.032 --> 00:26:44.240 And one of these we've talked about, and one of these 00:26:44.240 --> 00:26:47.020 you probably came into the course knowing about, namely, cookies. 00:26:47.020 --> 00:26:48.728 But there are different types of cookies. 00:26:48.728 --> 00:26:51.700 But recall that in general, a cookie is a piece of information 00:26:51.700 --> 00:26:56.360 that a server puts on your computer to help remember who you are. 00:26:56.360 --> 00:26:59.020 So in the absence of these fingerprints and the absence 00:26:59.020 --> 00:27:01.060 of specific headers like these, it can just 00:27:01.060 --> 00:27:04.685 put a small random value with numbers and letters 00:27:04.685 --> 00:27:07.060 or the like on your computer or maybe even a bigger value 00:27:07.060 --> 00:27:08.140 if it has lots of users. 00:27:08.140 --> 00:27:11.290 And it uses that value to uniquely identify you 00:27:11.290 --> 00:27:14.350 if you return again and again to the website. 00:27:14.350 --> 00:27:16.720 It doesn't necessarily know that I am David, 00:27:16.720 --> 00:27:19.265 unless I log in at some point, at which point, 00:27:19.265 --> 00:27:20.890 then it can realize, oh, wait a minute. 00:27:20.890 --> 00:27:22.670 David's cookie is this value. 00:27:22.670 --> 00:27:24.942 Now I know who this user is. 00:27:24.942 --> 00:27:26.650 But in general, there are different types 00:27:26.650 --> 00:27:28.660 of cookies and different settings for cookies that are 00:27:28.660 --> 00:27:30.285 worth knowing a little something about. 00:27:30.285 --> 00:27:34.000 So we talked previously about what we'd more properly call session cookies. 00:27:34.000 --> 00:27:39.000 So session cookies are used by servers to maintain state, 00:27:39.000 --> 00:27:41.600 so to speak, between the server and the browser. 00:27:41.600 --> 00:27:46.280 That is to say, without getting too technical, HTTP is typically stateless, 00:27:46.280 --> 00:27:49.550 whereby, when you visit a page, the browser icon might spin for a bit. 00:27:49.550 --> 00:27:52.310 And then it stops because the transaction between the browser 00:27:52.310 --> 00:27:54.420 and the server is complete. 00:27:54.420 --> 00:27:58.100 But if you want to remember who the user is, therefore, 00:27:58.100 --> 00:28:01.970 the second, the third, the fourth time, the browser contacts the server. 00:28:01.970 --> 00:28:05.000 The browser had better remind the server who it is. 00:28:05.000 --> 00:28:08.710 And this is why we use the metaphor of the virtual handstamp, whereby, 00:28:08.710 --> 00:28:11.210 that handstamp is the browser's way of reminding the server, 00:28:11.210 --> 00:28:12.200 you've seen me before. 00:28:12.200 --> 00:28:13.710 Don't make me log in again. 00:28:13.710 --> 00:28:14.810 I am David. 00:28:14.810 --> 00:28:15.770 I am David. 00:28:15.770 --> 00:28:18.950 --even though it's just relying on this virtual handstamp or really 00:28:18.950 --> 00:28:23.720 some unique identifier that's going in the cookie header from browser 00:28:23.720 --> 00:28:24.510 to server. 00:28:24.510 --> 00:28:26.870 So a session cookie allows browsers and servers 00:28:26.870 --> 00:28:29.390 to maintain sessions, this kind of state. 00:28:29.390 --> 00:28:33.230 A little more concretely, it allows them to maintain things like shopping carts. 00:28:33.230 --> 00:28:35.907 So if you're shopping on an amazon.com or the like, 00:28:35.907 --> 00:28:39.230 the session cookie is what remembers who you are, 00:28:39.230 --> 00:28:41.900 or at least, that you're the same person, so that every time you 00:28:41.900 --> 00:28:45.680 poke around on the website, Amazon shows you the same contents of your shopping 00:28:45.680 --> 00:28:47.870 cart again and again, so that they don't lose 00:28:47.870 --> 00:28:51.630 your business by accidentally deleting it when you simply change the page. 00:28:51.630 --> 00:28:53.510 So how do session cookies work? 00:28:53.510 --> 00:28:55.670 Well, when you first visit a website that 00:28:55.670 --> 00:28:58.640 wants to plant a cookie on your computer, 00:28:58.640 --> 00:29:01.100 the response might look a little something like this. 00:29:01.100 --> 00:29:01.970 HTTP. 00:29:01.970 --> 00:29:04.820 200 is the status code, which, recall, means OK. 00:29:04.820 --> 00:29:05.630 All is well. 00:29:05.630 --> 00:29:08.820 It's not something like 404, which would mean file not found. 00:29:08.820 --> 00:29:10.190 So 200 is OK. 00:29:10.190 --> 00:29:14.660 But the server might also respond with this key value pair, this HTTP header, 00:29:14.660 --> 00:29:16.890 Set-Cookie:. 00:29:16.890 --> 00:29:17.970 So that's the key. 00:29:17.970 --> 00:29:22.220 The value of which is session=1234abcd. 00:29:22.220 --> 00:29:24.260 And that's the same value we used previously 00:29:24.260 --> 00:29:26.390 when we talked about cookies in this context. 00:29:26.390 --> 00:29:30.140 And the point here is that the name of this cookie is Session. 00:29:30.140 --> 00:29:34.790 And its value equals, in this case, 1234abcd. 00:29:34.790 --> 00:29:37.970 Now, if you visit the same website and you, and you, and you, 00:29:37.970 --> 00:29:42.110 we would all have different seemingly random values for those cookies. 00:29:42.110 --> 00:29:45.283 And so this number, this sequence of letters and numbers, 00:29:45.283 --> 00:29:46.700 would be different for each of us. 00:29:46.700 --> 00:29:49.070 That is to say we have different handstamps 00:29:49.070 --> 00:29:50.940 that we're presenting each time. 00:29:50.940 --> 00:29:52.830 Now, this is a session cookie. 00:29:52.830 --> 00:29:55.550 And it's a session cookie in the sense that it 00:29:55.550 --> 00:29:59.300 is supposed to expire when you close the browser, when 00:29:59.300 --> 00:30:02.912 you quit for the night, when you reboot or anything else. 00:30:02.912 --> 00:30:05.120 Now, with that said, that's a bit of an overstatement 00:30:05.120 --> 00:30:09.652 because browsers nowadays will frequently preserve your tabs for you. 00:30:09.652 --> 00:30:10.610 They might go to sleep. 00:30:10.610 --> 00:30:12.110 You might have to wake them back up. 00:30:12.110 --> 00:30:15.110 But increasingly, sessions are living longer than they once did. 00:30:15.110 --> 00:30:20.000 But the idea is that this is not meant to last for a year or forever. 00:30:20.000 --> 00:30:24.230 It has a much shorter lifetime by design. 00:30:24.230 --> 00:30:28.490 When your browser has received that cookie and you click on some other 00:30:28.490 --> 00:30:31.280 page, you visit some other product on amazon.com, 00:30:31.280 --> 00:30:35.630 your browser might say something like this, GET/ and then cookie:, 00:30:35.630 --> 00:30:37.080 that exact same value. 00:30:37.080 --> 00:30:38.930 So recall from our previous class, this is 00:30:38.930 --> 00:30:43.340 how the browser just reminds the server what its handstamp is 00:30:43.340 --> 00:30:44.900 or what its cookie value is. 00:30:44.900 --> 00:30:48.350 But again, the idea is that when the browser is closed, 00:30:48.350 --> 00:30:51.920 you reboot for the night, then you should not 00:30:51.920 --> 00:30:56.490 have the same session cookie tomorrow, at least, in this model. 00:30:56.490 --> 00:30:59.000 That's not true for all websites, but according to cookies 00:30:59.000 --> 00:31:00.680 as we are currently using them. 00:31:00.680 --> 00:31:04.160 Now, that's pretty good for your privacy because if the cookie is 00:31:04.160 --> 00:31:08.390 by design meant to be a session cookie and it expires pretty soon when you're 00:31:08.390 --> 00:31:11.510 done with that browser tab or done using the browser for the day, 00:31:11.510 --> 00:31:14.635 then that's pretty good because it means if you go back to the same website 00:31:14.635 --> 00:31:16.830 tomorrow, that cookie might not exist anymore, 00:31:16.830 --> 00:31:19.890 so you might as well look like or be a brand new user. 00:31:19.890 --> 00:31:23.880 So they can't correlate, perhaps, by default as much information about you. 00:31:23.880 --> 00:31:28.070 But these are the cookies that you read about being bad for you 00:31:28.070 --> 00:31:30.740 and bad for your privacy, tracking cookies, which 00:31:30.740 --> 00:31:33.830 are the exact same idea, key value pairs that 00:31:33.830 --> 00:31:37.100 are sent from server to browser to remember who you are, or at least, 00:31:37.100 --> 00:31:39.830 that you're the same person, even if we don't know that you're 00:31:39.830 --> 00:31:42.060 David Malan specifically just yet. 00:31:42.060 --> 00:31:44.690 But as per the name, tracking cookies are really 00:31:44.690 --> 00:31:46.830 designed to track you and me. 00:31:46.830 --> 00:31:47.330 Why? 00:31:47.330 --> 00:31:50.500 Well maybe analytical purposes, maybe debugging purposes, 00:31:50.500 --> 00:31:53.000 so that they know where users were in case something breaks, 00:31:53.000 --> 00:31:55.580 maybe advertising purposes, so that you get 00:31:55.580 --> 00:31:58.760 served different ads from me, so that they can maximize their revenue 00:31:58.760 --> 00:32:01.250 by clickserving up ads that you and I are each 00:32:01.250 --> 00:32:03.120 more individually likely to click on. 00:32:03.120 --> 00:32:06.740 So tracking cookies are the ones that get a bad rep and rightfully so. 00:32:06.740 --> 00:32:08.720 So let's consider an example of a cookie that's 00:32:08.720 --> 00:32:11.900 designed to track your behavior on a particular website. 00:32:11.900 --> 00:32:14.540 Here, for instance, is a set-cookie header 00:32:14.540 --> 00:32:18.210 that Google, specifically, might send to your browser. 00:32:18.210 --> 00:32:23.270 In fact, they use a cookie that by convention is called _ga for Google 00:32:23.270 --> 00:32:25.640 Analytics, which they use for analytical purposes. 00:32:25.640 --> 00:32:28.470 And its value looks a little something like this. 00:32:28.470 --> 00:32:31.520 And the point of this value is that it's generated 00:32:31.520 --> 00:32:35.570 on a per website basis if that website is using Google Analytics. 00:32:35.570 --> 00:32:40.220 And Google Analytics is a tool that allows website designers to track 00:32:40.220 --> 00:32:44.240 who is clicking on what, what browsers they're using, what operating systems 00:32:44.240 --> 00:32:47.570 they're using, and generally giving them a sense of the demographics 00:32:47.570 --> 00:32:48.770 of their user base. 00:32:48.770 --> 00:32:53.210 But unlike session cookies, which are meant to expire after a day, 00:32:53.210 --> 00:32:57.800 after the browser closes or the like, Google's analytical cookie here 00:32:57.800 --> 00:33:01.310 has a maximum age of this many seconds, which if you do out 00:33:01.310 --> 00:33:05.070 the math is by default two years, which is to say, 00:33:05.070 --> 00:33:09.440 if you visit some website that is using Google Analytics by embedding 00:33:09.440 --> 00:33:13.670 a bit of Google's JavaScript code in their website, whenever that Google 00:33:13.670 --> 00:33:16.070 code is pulled from Google's website, Google 00:33:16.070 --> 00:33:20.090 has an opportunity to plant this cookie on your computer. 00:33:20.090 --> 00:33:24.650 And you'll get a unique ID based on you visiting for the first time, 00:33:24.650 --> 00:33:28.580 based on the specific website that is embedding Google Analytics. 00:33:28.580 --> 00:33:31.340 And that cookie is going to live in your computer, 00:33:31.340 --> 00:33:35.150 according to this HTTP header, for as long as two years. 00:33:35.150 --> 00:33:36.518 Now, that's useful for Google. 00:33:36.518 --> 00:33:38.060 It's perhaps, useful for the website. 00:33:38.060 --> 00:33:41.240 It's perhaps, a little more invasive for me and you. 00:33:41.240 --> 00:33:43.550 Now, Google has many other cookies that they use too. 00:33:43.550 --> 00:33:46.312 But this is, perhaps, one that you should keep an eye out for. 00:33:46.312 --> 00:33:48.020 And indeed, in the coming weeks or months 00:33:48.020 --> 00:33:50.228 if you poke around some of your own browser settings, 00:33:50.228 --> 00:33:53.930 you might very well see values like this. 00:33:53.930 --> 00:33:57.780 But what else might servers use to keep track of us, 00:33:57.780 --> 00:34:01.040 especially if you and I are in the habit of deleting our cookies 00:34:01.040 --> 00:34:05.552 or clearing your history, which would be counterproductive for Google 00:34:05.552 --> 00:34:07.760 or websites that are trying to track you in this way, 00:34:07.760 --> 00:34:12.358 but a plus for you and for my privacy if you're behaving in this way? 00:34:12.358 --> 00:34:14.150 But it turns out there's other ways servers 00:34:14.150 --> 00:34:18.560 can track us, including through HTTP parameters, tracking parameters. 00:34:18.560 --> 00:34:20.659 So parameters are the key value pairs that 00:34:20.659 --> 00:34:24.560 often appear in URLs that are sent via GET requests typically. 00:34:24.560 --> 00:34:25.699 So we've seen one of these. 00:34:25.699 --> 00:34:28.760 If you recall when we searched for cats on Google before, 00:34:28.760 --> 00:34:30.770 you might recall that the URL was something 00:34:30.770 --> 00:34:39.230 like https://www.google.com/search?q=cats. 00:34:39.230 --> 00:34:44.780 Anything after a question mark in a URL is, indeed, an HTTP parameter. 00:34:44.780 --> 00:34:48.380 But it could be used not for innocuous helpful purposes, like searching 00:34:48.380 --> 00:34:50.690 for cats, but also, to track you. 00:34:50.690 --> 00:34:52.880 And in fact, if you see ampersands in URLs, 00:34:52.880 --> 00:34:57.190 that might mean that you have a second, or a third, or more parameter up there. 00:34:57.190 --> 00:35:00.120 And sometimes the purpose of these parameters 00:35:00.120 --> 00:35:03.340 is simply to track you as some person. 00:35:03.340 --> 00:35:06.330 So for instance, here is a representative URL. 00:35:06.330 --> 00:35:07.320 It's a long one. 00:35:07.320 --> 00:35:14.500 And this is taken from example.com having a path of as_engagement?. 00:35:14.500 --> 00:35:19.800 And then I'll highlight here click_id= and then this long seemingly random 00:35:19.800 --> 00:35:20.310 string. 00:35:20.310 --> 00:35:24.740 But there's a second HTTP parameter in this particular URL. 00:35:24.740 --> 00:35:27.670 &campaign_id=23. 00:35:27.670 --> 00:35:30.660 So the campaign ID, certainly with such a small number, 00:35:30.660 --> 00:35:32.070 is not meant to track you. 00:35:32.070 --> 00:35:36.470 That's meant to be sufficient input to the website to know what types of ads 00:35:36.470 --> 00:35:37.470 should be served to you. 00:35:37.470 --> 00:35:40.110 What campaign should be served up? 00:35:40.110 --> 00:35:44.760 But this click_id, which is sort of a euphemism for tracking cookie 00:35:44.760 --> 00:35:50.040 or tracking parameter in this case, is what's actually keeping track of you, 00:35:50.040 --> 00:35:51.990 specifically, because different users are 00:35:51.990 --> 00:35:54.390 going to find that whatever link they click on 00:35:54.390 --> 00:35:57.570 has a slightly different value for click_id. 00:35:57.570 --> 00:35:59.820 So recall that a tracking cookie is something 00:35:59.820 --> 00:36:01.410 that's sent via an HTTP header. 00:36:01.410 --> 00:36:03.243 And so it's harder for you and me to see it, 00:36:03.243 --> 00:36:05.202 unless we're more comfortable with our browsers 00:36:05.202 --> 00:36:07.110 and can poke around some underlying settings. 00:36:07.110 --> 00:36:11.310 But these tracking parameters are right there in front of you, at least, 00:36:11.310 --> 00:36:15.780 if you click on the URL in your browser and take a look at its entirety. 00:36:15.780 --> 00:36:20.310 Now, wonderfully, at least for us end users who are concerned about privacy, 00:36:20.310 --> 00:36:23.700 browsers and even third-party software are increasingly 00:36:23.700 --> 00:36:26.190 removing values like this for us. 00:36:26.190 --> 00:36:28.500 As soon as the browser manufacturer or as soon 00:36:28.500 --> 00:36:31.560 as the third-party software developer knows that, wait a minute, 00:36:31.560 --> 00:36:36.930 click ID has no good purpose other than tracking our users, 00:36:36.930 --> 00:36:39.897 they can simply automatically remove it for you. 00:36:39.897 --> 00:36:41.730 After all, when you visit a web page and you 00:36:41.730 --> 00:36:45.030 get the HTML that represents that web page, 00:36:45.030 --> 00:36:47.190 the browser could certainly poke around there 00:36:47.190 --> 00:36:49.380 before you even have a chance to click on anything. 00:36:49.380 --> 00:36:53.530 And it could scrub or sanitize these kinds of tracking parameters. 00:36:53.530 --> 00:36:57.240 Now, to be fair, if the browser manufacturer doesn't necessarily 00:36:57.240 --> 00:37:00.090 know what the tracking parameter is called 00:37:00.090 --> 00:37:03.030 or if maybe the website is constantly changing the name 00:37:03.030 --> 00:37:07.390 or trying to mix things up, this might not work so well. 00:37:07.390 --> 00:37:09.360 But it's at least an attempt to try to put 00:37:09.360 --> 00:37:13.410 downward pressure on this very commonplace technique of keeping 00:37:13.410 --> 00:37:14.820 track of you and me. 00:37:14.820 --> 00:37:17.610 Now, why is this parameter able to track us? 00:37:17.610 --> 00:37:21.550 Well, this, too, can end up in those server logs because this would be, 00:37:21.550 --> 00:37:24.045 for instance, the web page that I am requesting, 00:37:24.045 --> 00:37:29.700 /ad_engagement?click_id= dot, dot, dot, that could very well be logged 00:37:29.700 --> 00:37:31.830 by the server, stored in a database, even. 00:37:31.830 --> 00:37:33.750 And they could use that information to know 00:37:33.750 --> 00:37:37.290 exactly which pages I have clicked on, because I visited those links, 00:37:37.290 --> 00:37:38.990 and even what ads I have seen. 00:37:38.990 --> 00:37:40.740 And maybe that's a good thing commercially 00:37:40.740 --> 00:37:43.590 because now they know what types of ads I'm clicking on. 00:37:43.590 --> 00:37:45.810 Now they can serve even more of them to me. 00:37:45.810 --> 00:37:48.490 And that might be great for them, but probably not so great, 00:37:48.490 --> 00:37:51.990 if not, annoying or invasive for me and you. 00:37:51.990 --> 00:37:54.810 So something else to keep an eye out for and something else that 00:37:54.810 --> 00:37:57.540 might guide your decision making in the days and the years 00:37:57.540 --> 00:38:00.060 to come when it comes to picking your browser. 00:38:00.060 --> 00:38:03.930 You don't necessarily have to nowadays use the one that comes with your phone, 00:38:03.930 --> 00:38:05.670 comes with your laptop or desktop. 00:38:05.670 --> 00:38:08.190 You can, if more comfortable, install something else. 00:38:08.190 --> 00:38:12.930 And increasingly, you and I are having more and more options. 00:38:12.930 --> 00:38:17.550 Questions now on these tracking parameters or anything 00:38:17.550 --> 00:38:20.010 prior with respect to our privacy? 00:38:20.010 --> 00:38:23.190 AUDIENCE: Are the cookies the ones that track or are 00:38:23.190 --> 00:38:25.020 the ones that are being tracked? 00:38:25.020 --> 00:38:29.160 DAVID J. MALAN: The cookies are values that are being used to track you. 00:38:29.160 --> 00:38:33.930 So recall that-- a metaphor for the cookies is like that virtual handstamp. 00:38:33.930 --> 00:38:38.820 And so if all of these web servers are putting ink on your hand and on my hand 00:38:38.820 --> 00:38:41.640 and because of HTTP, you and I, our browsers 00:38:41.640 --> 00:38:44.010 are in the habit of presenting these cookies, 00:38:44.010 --> 00:38:46.410 these handstamps to every website we visit, 00:38:46.410 --> 00:38:49.680 that value is being used to track us. 00:38:49.680 --> 00:38:52.990 So cookies in and of themselves are just a technology. 00:38:52.990 --> 00:38:57.130 It's a very simple idea storing a big random value on your computer and mine 00:38:57.130 --> 00:38:58.750 just to uniquely identify us. 00:38:58.750 --> 00:39:02.590 They are necessary to give us features like logging into websites, 00:39:02.590 --> 00:39:04.060 maintaining shopping carts. 00:39:04.060 --> 00:39:06.910 But very quickly, especially since the internet from the get go 00:39:06.910 --> 00:39:09.430 has been largely free to use-- 00:39:09.430 --> 00:39:11.710 or rather, a lot of the internet has been 00:39:11.710 --> 00:39:14.620 free to use once you have a connection, at least-- 00:39:14.620 --> 00:39:17.950 they've been used, or in some views, abused 00:39:17.950 --> 00:39:22.070 by the advertisers, the Facebooks, and the others of the world. 00:39:22.070 --> 00:39:24.490 So another way to think about tracking cookies 00:39:24.490 --> 00:39:27.430 is to consider them to be third-party cookies because, indeed, 00:39:27.430 --> 00:39:29.920 even in the Google example, that's how they're being used. 00:39:29.920 --> 00:39:34.750 If a website like example.com is embedding Google Analytics, 00:39:34.750 --> 00:39:39.340 and therefore, some kind of HTML tag that mentions google.com, well then, 00:39:39.340 --> 00:39:42.940 example.com is the first-party in that story, so to speak. 00:39:42.940 --> 00:39:46.270 And google.com is the third party in that story. 00:39:46.270 --> 00:39:48.700 What that means is that your browser might get cookies 00:39:48.700 --> 00:39:51.400 from both example.com and google.com. 00:39:51.400 --> 00:39:54.890 But the most important ones, presumably, are the first-party ones 00:39:54.890 --> 00:39:58.430 from example.com because that is the website you chose to go to 00:39:58.430 --> 00:40:00.650 and whose functionality you want to use. 00:40:00.650 --> 00:40:03.950 The third-party functionality, like tracking your clicks 00:40:03.950 --> 00:40:07.940 and your internet behavior on that site via Google, that's third party. 00:40:07.940 --> 00:40:10.340 And so very commonly do browsers nowadays 00:40:10.340 --> 00:40:15.230 certainly offer options via which you can disable third-party cookies. 00:40:15.230 --> 00:40:17.240 And that tends to be good for privacy sake 00:40:17.240 --> 00:40:20.480 because it means you're blocking third parties like Google 00:40:20.480 --> 00:40:22.700 from keeping track of you via cookies. 00:40:22.700 --> 00:40:26.660 But, but, but that doesn't necessarily mean the website isn't still 00:40:26.660 --> 00:40:29.210 using tracking parameters in some way. 00:40:29.210 --> 00:40:32.600 And you would only know that by actually looking more closely at the URLs 00:40:32.600 --> 00:40:35.520 you're clicking on or that are embedded in the web page itself. 00:40:35.520 --> 00:40:38.360 And that's where now browsers and third-party software 00:40:38.360 --> 00:40:42.860 are additionally helping by helping us remove not only those cookies, but even 00:40:42.860 --> 00:40:43.850 those parameters. 00:40:43.850 --> 00:40:45.920 But let's consider a more concrete scenario 00:40:45.920 --> 00:40:50.540 of what third-party cookies are and why they allow companies 00:40:50.540 --> 00:40:53.510 not only like Google to track your behavior on one website, 00:40:53.510 --> 00:40:56.930 but even companies like Google or other advertisers 00:40:56.930 --> 00:40:59.840 to track your behavior on multiple websites. 00:40:59.840 --> 00:41:03.440 And in this sense, third parties have increasingly 00:41:03.440 --> 00:41:08.840 been more powerful, more omniscient, for instance, than the first-party websites 00:41:08.840 --> 00:41:10.700 that you and I are actually visiting. 00:41:10.700 --> 00:41:11.270 Why? 00:41:11.270 --> 00:41:13.520 Well, if there's a lot of popular third parties 00:41:13.520 --> 00:41:17.390 out there, Google being one of them for advertisements and for analytics, 00:41:17.390 --> 00:41:20.390 well, if lots of different websites are using them-- 00:41:20.390 --> 00:41:21.590 maybe Harvard's using them. 00:41:21.590 --> 00:41:22.400 Yale's using them. 00:41:22.400 --> 00:41:25.880 Stanford's using them-- then that third party very quickly 00:41:25.880 --> 00:41:30.530 becomes more powerful than even any of those individual parties alone. 00:41:30.530 --> 00:41:31.100 Why? 00:41:31.100 --> 00:41:36.830 Because that third party, if it is being embedded at Harvard, Yale, 00:41:36.830 --> 00:41:39.560 and Stanford, that third party Google, for instance, 00:41:39.560 --> 00:41:42.600 kind of has eyes into all three websites. 00:41:42.600 --> 00:41:46.340 And if it sends the same cookie to you on all three websites, 00:41:46.340 --> 00:41:49.670 Google might actually know that you're poking around Harvard's, and Yale's 00:41:49.670 --> 00:41:52.940 and Stanford's website when Harvard might have no idea you're 00:41:52.940 --> 00:41:54.380 checking out Yale and Stanford. 00:41:54.380 --> 00:41:57.890 And Stanford might have no idea you're checking out Yale and Harvard. 00:41:57.890 --> 00:41:59.660 So what does this mean concretely? 00:41:59.660 --> 00:42:03.837 Well, consider some HTML here, such as we've seen before. 00:42:03.837 --> 00:42:06.170 And I've highlighted a couple of salient characteristics 00:42:06.170 --> 00:42:07.610 in this particular example. 00:42:07.610 --> 00:42:11.030 Notice that I've given in this web page not only a body, which contains 00:42:11.030 --> 00:42:13.140 the body, the bulk of the web page. 00:42:13.140 --> 00:42:16.190 I've also included a head for the web page, inside of which 00:42:16.190 --> 00:42:17.810 is another tag called Title. 00:42:17.810 --> 00:42:20.300 And I'm doing this just to, one, demonstrate 00:42:20.300 --> 00:42:23.000 there are more tags than we have seen in this language thus far. 00:42:23.000 --> 00:42:29.300 And specifically, this I claim is meant to represent harvard.edu's own website, 00:42:29.300 --> 00:42:31.790 the title of which would be Harvard, like in the tab 00:42:31.790 --> 00:42:33.050 along the top of the screen. 00:42:33.050 --> 00:42:37.140 And inside of the body of this page for simplicity, 00:42:37.140 --> 00:42:40.010 let's assume that for now, there's just one big advertisement. 00:42:40.010 --> 00:42:42.350 There's no content for the sake of discussion. 00:42:42.350 --> 00:42:44.180 There's just one advertisement. 00:42:44.180 --> 00:42:46.620 Well, where is that advertisement coming from? 00:42:46.620 --> 00:42:50.570 It's coming from, in this case, example.com, or our friends at Google, 00:42:50.570 --> 00:42:53.600 specifically, a file called ad.gif. 00:42:53.600 --> 00:42:58.700 And this particular URL is being used as the value of the source 00:42:58.700 --> 00:43:00.800 attribute of an image tag. 00:43:00.800 --> 00:43:02.100 So what do I mean by this? 00:43:02.100 --> 00:43:05.420 Well, if you visit harvard.edu in the story, what you are seeing 00:43:05.420 --> 00:43:07.940 is a big advertisement, a big GIF, a graphic 00:43:07.940 --> 00:43:11.840 that is coming from example.com. 00:43:11.840 --> 00:43:14.430 Now, what is the implication of that? 00:43:14.430 --> 00:43:17.130 Well, suppose that Yale is doing the same thing. 00:43:17.130 --> 00:43:20.270 So here now, for the sake of discussion, is the exact same HTML, 00:43:20.270 --> 00:43:22.100 except it lives at yale.edu. 00:43:22.100 --> 00:43:24.842 So the title of the page has now changed to Yale. 00:43:24.842 --> 00:43:27.050 And moreover, just to make things really interesting, 00:43:27.050 --> 00:43:28.610 let's add Stanford to the mix. 00:43:28.610 --> 00:43:30.240 Same exact page. 00:43:30.240 --> 00:43:35.630 So the point of this story is that Harvard, and Yale, and Stanford are all 00:43:35.630 --> 00:43:39.590 using the same third party, example.com in this case 00:43:39.590 --> 00:43:42.500 or maybe someone like Google in the real world. 00:43:42.500 --> 00:43:45.050 And they're requesting moreover the same GIF. 00:43:45.050 --> 00:43:47.030 And so the same file is being accessed. 00:43:47.030 --> 00:43:49.370 But that even alone isn't a strict requirement. 00:43:49.370 --> 00:43:53.620 The same website is being accessed by all three of these first parties. 00:43:53.620 --> 00:43:54.810 So what does that mean? 00:43:54.810 --> 00:43:59.010 Suppose that you open up your browser and you first visit harvard.edu, 00:43:59.010 --> 00:44:02.940 your browser is going to download the HTML for Harvard's website. 00:44:02.940 --> 00:44:05.580 It's going to see that, oh, there's an image tag in there. 00:44:05.580 --> 00:44:09.900 And that image tag wants to show this ad.gif from example.com. 00:44:09.900 --> 00:44:12.210 So your browser is automatically, by nature 00:44:12.210 --> 00:44:14.520 of how browsers work, going to send a second HTTP 00:44:14.520 --> 00:44:20.280 request, this time requesting ad.gif from the host, example.com. 00:44:20.280 --> 00:44:22.710 And just to tie today's stories together, 00:44:22.710 --> 00:44:29.010 it's going to include, probably, a referrer, HTTP header that 00:44:29.010 --> 00:44:30.855 specifies where I'm coming from. 00:44:30.855 --> 00:44:32.730 And that's useful for our purposes because it 00:44:32.730 --> 00:44:34.770 puts these requests into context. 00:44:34.770 --> 00:44:40.110 Now that server, example.com, or Google, in the case of the real world, 00:44:40.110 --> 00:44:45.090 is going to probably respond with 200 OK like, OK, here is the advertisement. 00:44:45.090 --> 00:44:48.450 And it's going to include not only the image, but also an HTTP 00:44:48.450 --> 00:44:49.450 header of its own. 00:44:49.450 --> 00:44:51.810 And this is our old friend set-cookie, where 00:44:51.810 --> 00:44:53.740 in this case, for the sake of discussion, 00:44:53.740 --> 00:44:57.460 I'm going to propose that it's setting a cookie on my computer called ID 00:44:57.460 --> 00:45:00.700 because this is going to be my unique identifier for example.com. 00:45:00.700 --> 00:45:04.930 Its value is going to be the one I keep using for discussion's sake, 1234abcd. 00:45:04.930 --> 00:45:07.450 But that would be some big random value for each of us. 00:45:07.450 --> 00:45:08.260 And my gosh. 00:45:08.260 --> 00:45:09.790 This thing is going to last a year. 00:45:09.790 --> 00:45:13.340 That's the number of seconds in 365 days. 00:45:13.340 --> 00:45:16.180 So this cookie is being planted on my computer 00:45:16.180 --> 00:45:21.130 by example.com because I visited harvard.edu. 00:45:21.130 --> 00:45:22.930 So Harvard is the first party. 00:45:22.930 --> 00:45:26.480 Example.com is the third party in this case. 00:45:26.480 --> 00:45:28.570 But here now is the concern. 00:45:28.570 --> 00:45:32.590 When I visit yale.edu with that same browser, 00:45:32.590 --> 00:45:36.100 my hand has been stamped by example.com already. 00:45:36.100 --> 00:45:40.090 And so what happens is that my browser now presents 00:45:40.090 --> 00:45:44.920 that handstamp to example.com, sending the same ID and the same value, 00:45:44.920 --> 00:45:46.360 that is, the same handstamp. 00:45:46.360 --> 00:45:48.850 The host is as before example.com. 00:45:48.850 --> 00:45:51.740 But this time, the referrer happens to be Yale. 00:45:51.740 --> 00:45:55.120 So in other words, after I visited Harvard and my hand 00:45:55.120 --> 00:45:58.960 has been stamped with this tracking cookie, this third-party cookie 00:45:58.960 --> 00:46:02.290 from example.com, my browser, when I visit yale.edu, 00:46:02.290 --> 00:46:05.830 is going to present that same handstamp again, this time, 00:46:05.830 --> 00:46:08.320 to example.com with this referrer. 00:46:08.320 --> 00:46:11.170 The next time I use my browser to visit stanford.edu, 00:46:11.170 --> 00:46:13.630 the same message is going to be sent from my browser 00:46:13.630 --> 00:46:18.190 to example.com to request that same ad, this time now 00:46:18.190 --> 00:46:20.740 from stanford.edu's website. 00:46:20.740 --> 00:46:22.360 Now, what's the implication? 00:46:22.360 --> 00:46:26.710 Via these three HTTP requests, example.com 00:46:26.710 --> 00:46:30.160 knows that I'm visiting Stanford, and before that, Yale 00:46:30.160 --> 00:46:32.110 and before that, Harvard. 00:46:32.110 --> 00:46:34.990 And none of Harvard, or Yale, or Stanford 00:46:34.990 --> 00:46:37.630 necessarily know that I'm visiting any of those other websites. 00:46:37.630 --> 00:46:39.950 The third party is the more powerful. 00:46:39.950 --> 00:46:44.140 It's the more all seeing, simply because example.com, or in the real world, 00:46:44.140 --> 00:46:46.570 Google, is just so darn popular, that it's 00:46:46.570 --> 00:46:51.520 embedded in so many darn websites, Google and others almost everything, 00:46:51.520 --> 00:46:54.040 dare say, about what you and I are doing on the web 00:46:54.040 --> 00:46:57.710 because these ads are all over the place in this way. 00:46:57.710 --> 00:46:59.770 So we've seen a very simple example. 00:46:59.770 --> 00:47:04.720 But it's simple because cookies and HTTP really are relatively. 00:47:04.720 --> 00:47:07.030 It's once you realize how they work, that you 00:47:07.030 --> 00:47:09.460 can use them not only to solve compelling problems for all 00:47:09.460 --> 00:47:13.250 of us, sessions, and shopping carts, and the like, 00:47:13.250 --> 00:47:16.090 but also can be used to monetize the internet 00:47:16.090 --> 00:47:19.750 and has been used historically to monetize the internet, or even worse, 00:47:19.750 --> 00:47:24.710 perhaps, for us, to track our individual clicks and behavior. 00:47:24.710 --> 00:47:27.610 So let me pause here and see if there's any questions now 00:47:27.610 --> 00:47:31.060 on third-party cookies and why, therefore, it's 00:47:31.060 --> 00:47:35.140 perhaps so compelling for you or me to opt in to disabling them, 00:47:35.140 --> 00:47:40.030 or better yet, to use browsers that are starting to block them for us. 00:47:40.030 --> 00:47:43.120 AUDIENCE: What browsers are more secure among others 00:47:43.120 --> 00:47:45.035 considering tracking parameters? 00:47:45.035 --> 00:47:45.910 DAVID J. MALAN: Sure. 00:47:45.910 --> 00:47:46.510 A quick tweak. 00:47:46.510 --> 00:47:48.260 I wouldn't say that some browsers are more 00:47:48.260 --> 00:47:50.350 secure than others in this context. 00:47:50.350 --> 00:47:53.620 I would say that want browsers that are more privacy conscious 00:47:53.620 --> 00:47:56.830 or privacy preserving because that's what we're talking about today. 00:47:56.830 --> 00:47:59.080 Hopefully, all of them are just as secure when 00:47:59.080 --> 00:48:03.820 it comes to HTTPS and the encryption that's just keeping our data protected 00:48:03.820 --> 00:48:05.830 between points A and B. 00:48:05.830 --> 00:48:10.930 So generally, Safari has been pretty good when it comes to privacy. 00:48:10.930 --> 00:48:15.035 And they are the ones that very recently that you're using now 00:48:15.035 --> 00:48:18.160 announced that they're going to start giving people the feature of removing 00:48:18.160 --> 00:48:20.380 tracking parameters from URLs. 00:48:20.380 --> 00:48:22.510 In fact, the sample URL I gave was actually 00:48:22.510 --> 00:48:25.990 from Apple's recent announcement about exactly that. 00:48:25.990 --> 00:48:29.920 DuckDuckGo is probably the most popular third-party browser 00:48:29.920 --> 00:48:32.440 that is very privacy conscious and tries to disable 00:48:32.440 --> 00:48:34.490 a lot of these tracking behaviors. 00:48:34.490 --> 00:48:36.580 Another one is Brave. 00:48:36.580 --> 00:48:39.820 Perhaps, the worst offender is probably Chrome, 00:48:39.820 --> 00:48:43.120 even though I, myself, am guilty of using it myself because it's 00:48:43.120 --> 00:48:45.190 so integrated into Google's ecosystem. 00:48:45.190 --> 00:48:47.140 But Google, of course, has made their business 00:48:47.140 --> 00:48:50.030 on monetizing your behavior and mine. 00:48:50.030 --> 00:48:54.000 So that is, perhaps, one to put toward the bottom of the list 00:48:54.000 --> 00:48:55.630 if you're concerned about this. 00:48:55.630 --> 00:48:57.380 So that's kind of how I would rank things. 00:48:57.380 --> 00:48:58.280 And there's yet others. 00:48:58.280 --> 00:48:59.870 But I think those are some of the most popular. 00:48:59.870 --> 00:49:01.970 And then, of course, in the Microsoft ecosystem, 00:49:01.970 --> 00:49:04.780 there is Edge and Firefox too. 00:49:04.780 --> 00:49:06.530 I should have put them higher on the list. 00:49:06.530 --> 00:49:09.650 They are more privacy conscious, I do believe, than Google. 00:49:09.650 --> 00:49:12.690 So with all of these mechanisms for tracking in mind, 00:49:12.690 --> 00:49:15.350 what can we do to protect all the more of our privacy? 00:49:15.350 --> 00:49:18.350 Well, you might already know of this feature, private browsing. 00:49:18.350 --> 00:49:22.220 So you don't necessarily have to delete all of your browser history 00:49:22.220 --> 00:49:23.850 and delete all of your cookies. 00:49:23.850 --> 00:49:26.122 You can instead, on occasion, open up a special type 00:49:26.122 --> 00:49:27.830 of window, which most of today's browsers 00:49:27.830 --> 00:49:31.040 support that puts you into private mode or incognito mode. 00:49:31.040 --> 00:49:34.460 And you can think of this as giving you just a different chunk of memory 00:49:34.460 --> 00:49:38.360 in the computer that doesn't know any of your past browser history, that doesn't 00:49:38.360 --> 00:49:41.210 have any of your past cookies, that doesn't remember any 00:49:41.210 --> 00:49:42.920 of your past usernames and passwords. 00:49:42.920 --> 00:49:47.150 You're sort of starting fresh, so that everything you do in that window 00:49:47.150 --> 00:49:48.110 is brand new. 00:49:48.110 --> 00:49:50.180 The catch, though, is that everything you do in 00:49:50.180 --> 00:49:56.270 that window still works exactly as the web works as we have been describing. 00:49:56.270 --> 00:49:58.670 So you're still might have tracking parameters. 00:49:58.670 --> 00:50:00.800 You still might have tracking cookies. 00:50:00.800 --> 00:50:02.840 You still might have server logs. 00:50:02.840 --> 00:50:07.740 But when you close that private window or you close that incognito mode, 00:50:07.740 --> 00:50:11.580 at least, the information is discarded from your computer, 00:50:11.580 --> 00:50:13.790 so that if tomorrow, you do the exact same thing 00:50:13.790 --> 00:50:16.790 and open up an incognito window again, then 00:50:16.790 --> 00:50:19.670 it's as though you're starting fresh with that server, 00:50:19.670 --> 00:50:23.030 except for the reality, as per our past discussion, 00:50:23.030 --> 00:50:26.150 that fingerprinting is still a possibility. 00:50:26.150 --> 00:50:29.840 Your IP address can still be factored in as can be other information 00:50:29.840 --> 00:50:31.610 that your browser might still be leaking. 00:50:31.610 --> 00:50:34.610 But what you're not doing is contaminating, so to speak, 00:50:34.610 --> 00:50:37.310 your general browsing history with specifically 00:50:37.310 --> 00:50:39.650 what you're using that window for. 00:50:39.650 --> 00:50:43.040 What you should realize, too, that private browsing or incognito mode 00:50:43.040 --> 00:50:44.850 is entirely client side. 00:50:44.850 --> 00:50:47.300 So particularly, those logs that we have mentioned 00:50:47.300 --> 00:50:49.460 are still being stored by the server. 00:50:49.460 --> 00:50:52.910 They might be storing, perhaps, a different tracking cookie or parameter 00:50:52.910 --> 00:50:56.390 for you because it doesn't necessarily recognize you when 00:50:56.390 --> 00:50:58.340 you're in private or incognito mode. 00:50:58.340 --> 00:51:02.240 But it doesn't mean that your tracks are completely absent from the internet. 00:51:02.240 --> 00:51:05.420 Rather, it's really just scrubbing them from your local computer 00:51:05.420 --> 00:51:09.440 and decreasing the probability, but not eliminating the probability 00:51:09.440 --> 00:51:12.125 that a server still knows that it's you. 00:51:12.125 --> 00:51:14.160 So I would use with care. 00:51:14.160 --> 00:51:17.240 But with that said, if you take a course in web development 00:51:17.240 --> 00:51:21.320 or you already design your own websites, using private browsing or incognito 00:51:21.320 --> 00:51:23.732 mode can also be useful for development purposes 00:51:23.732 --> 00:51:25.940 because it's a way of opening a brand new window that 00:51:25.940 --> 00:51:28.790 has no recollection of maybe past bugs that you had 00:51:28.790 --> 00:51:30.740 or past web pages that you clicked on. 00:51:30.740 --> 00:51:33.860 And it's very commonly used as part of development tools 00:51:33.860 --> 00:51:40.140 to actually facilitate and mimic the idea of starting fresh with some site. 00:51:40.140 --> 00:51:43.050 Super cookies, though, these sound delicious, 00:51:43.050 --> 00:51:46.870 but these two are kind of the worst of cookies that we've discussed already. 00:51:46.870 --> 00:51:49.110 We saw session cookies for maintaining state. 00:51:49.110 --> 00:51:51.900 We saw tracking cookies for tracking you. 00:51:51.900 --> 00:51:54.780 Super cookies are not so super, really. 00:51:54.780 --> 00:51:57.660 These are cookies that are typically injected 00:51:57.660 --> 00:52:01.440 by a third party, like your company, your university, 00:52:01.440 --> 00:52:06.570 or your internet service provider into your HTTP request, which 00:52:06.570 --> 00:52:10.110 is to say, if you, from your browser, visit some website, 00:52:10.110 --> 00:52:13.620 that traffic, of course, goes from your laptop or phone 00:52:13.620 --> 00:52:17.670 through some internet service provider, whether it's on campus, or home, 00:52:17.670 --> 00:52:19.660 or wirelessly in the real world. 00:52:19.660 --> 00:52:22.830 And if whoever is providing you with that internet service 00:52:22.830 --> 00:52:26.430 can see the contents of that virtual envelope, 00:52:26.430 --> 00:52:28.290 there's technically nothing stopping them 00:52:28.290 --> 00:52:30.540 from opening up the envelope, so to speak, 00:52:30.540 --> 00:52:34.060 and adding one or more HTTP headers of their own. 00:52:34.060 --> 00:52:36.780 And so mobile phone carriers, for instance, in the past 00:52:36.780 --> 00:52:39.300 have been known to do this, whereby, if you are just 00:52:39.300 --> 00:52:43.390 requesting a website, like example.com from your phone, 00:52:43.390 --> 00:52:47.800 they might-- halfway between you and that server, 00:52:47.800 --> 00:52:50.410 they might inject a cookie of their own. 00:52:50.410 --> 00:52:53.800 For the sake of discussion, I'm going to use the same name and value as before. 00:52:53.800 --> 00:52:56.710 id=1234abcd. 00:52:56.710 --> 00:53:00.760 But what's noteworthy here is that that value is not coming from your phone. 00:53:00.760 --> 00:53:02.740 It is not coming from your browser. 00:53:02.740 --> 00:53:04.473 You can clear all of your cookies. 00:53:04.473 --> 00:53:05.890 You can clear all of your history. 00:53:05.890 --> 00:53:08.600 You can use incognito or private mode on your phone. 00:53:08.600 --> 00:53:11.290 You're not going to see any trace of that client side 00:53:11.290 --> 00:53:16.750 because the darn thing is being injected into your traffic between you, 00:53:16.750 --> 00:53:19.180 point A, and the server, point B. 00:53:19.180 --> 00:53:22.900 So this is sort of a canonical example of a machine in the middle attack. 00:53:22.900 --> 00:53:26.080 But your internet service provider in this telling of the story 00:53:26.080 --> 00:53:28.900 is doing it because they want to track you. 00:53:28.900 --> 00:53:31.810 Or they want-- because of advertising relationships 00:53:31.810 --> 00:53:33.790 they might have with some websites, they want 00:53:33.790 --> 00:53:37.070 to make sure that you can be tracked by that website, 00:53:37.070 --> 00:53:40.390 even if you have opted out or have been clearing proactively 00:53:40.390 --> 00:53:41.770 your very own cookies. 00:53:41.770 --> 00:53:45.040 So suffice it to say, these have been particularly controversial. 00:53:45.040 --> 00:53:48.310 And thankfully, you and I do have a pretty good defense here. 00:53:48.310 --> 00:53:51.790 Just never use HTTP without encryption. 00:53:51.790 --> 00:53:58.450 If URLs are always https:// and then something, theoretically, 00:53:58.450 --> 00:54:02.620 this attack or this "feature" of your mobile phone carrier should not be 00:54:02.620 --> 00:54:03.140 possible. 00:54:03.140 --> 00:54:03.640 Why? 00:54:03.640 --> 00:54:06.130 Because if the contents of the envelope are encrypted, 00:54:06.130 --> 00:54:08.560 not only can't they see what's actually inside, 00:54:08.560 --> 00:54:12.040 they can't add anything to the mix because they don't have the key that's 00:54:12.040 --> 00:54:14.530 being used to encrypt that information. 00:54:14.530 --> 00:54:19.243 So simply using always HTTPS is one solution to this problem. 00:54:19.243 --> 00:54:21.910 And also, at least, in the US, some of the mobile phone carriers 00:54:21.910 --> 00:54:23.500 got a lot of backlash for this. 00:54:23.500 --> 00:54:28.870 But so, you can occasionally log into your cell phone provider's website, 00:54:28.870 --> 00:54:33.410 go through a bunch of menus, find an option to opt out of this feature. 00:54:33.410 --> 00:54:37.240 But I will say from experience, that they typically bury these options too. 00:54:37.240 --> 00:54:40.120 And so it's not necessarily even the iciest thing to find. 00:54:40.120 --> 00:54:45.070 But again, this is just a natural result of the underlying technology 00:54:45.070 --> 00:54:51.000 that we're being used, or if you prefer, abused, for alternative purposes. 00:54:51.000 --> 00:54:51.500 All right. 00:54:51.500 --> 00:54:54.310 Let me pause here and see if there's any questions now 00:54:54.310 --> 00:54:59.380 on these super cookies, which indeed, are not so super or anything prior. 00:54:59.380 --> 00:55:02.560 AUDIENCE: Given that cookies store passwords and emails, 00:55:02.560 --> 00:55:07.540 can the adversary impersonate another person by copying that cookie 00:55:07.540 --> 00:55:13.237 and pasting it into his own computer and visiting that website? 00:55:13.237 --> 00:55:14.570 DAVID J. MALAN: A good question. 00:55:14.570 --> 00:55:19.990 So cookies can be used to store user names, email addresses, even passwords, 00:55:19.990 --> 00:55:22.270 though, I would generally not recommend doing this. 00:55:22.270 --> 00:55:24.882 But they theoretically should be secure, even 00:55:24.882 --> 00:55:26.590 if you're storing those values in cookies 00:55:26.590 --> 00:55:30.370 because they're going back and forth between the browser and the server 00:55:30.370 --> 00:55:34.120 using encryption if HTTPS is, indeed, in use. 00:55:34.120 --> 00:55:37.480 A danger, though, is that if someone has physical access to your computer, 00:55:37.480 --> 00:55:40.660 it's very easy to poke around your own browser's cookies, at which point, 00:55:40.660 --> 00:55:44.180 they're going to see your password, which is probably not a good thing. 00:55:44.180 --> 00:55:46.750 So on an alternative would be, for instance, 00:55:46.750 --> 00:55:50.350 for a browser to encrypt the cookie or minimally digitally sign it, 00:55:50.350 --> 00:55:54.310 so that it can be identified as belonging to that same server. 00:55:54.310 --> 00:55:58.360 But even better, I dare say, would be for servers 00:55:58.360 --> 00:56:03.160 to only plant big random values as cookies on your computer, 00:56:03.160 --> 00:56:06.760 like this virtual handstamp, and then store 00:56:06.760 --> 00:56:11.020 recollection of your username, email, and/or password on the server. 00:56:11.020 --> 00:56:15.190 So stamp my hand to remember who I am and that I'm logged in, 00:56:15.190 --> 00:56:19.910 but don't bother expecting my browser to send my username, my email address, 00:56:19.910 --> 00:56:21.610 my password again and again. 00:56:21.610 --> 00:56:24.280 It should suffice to send that just once. 00:56:24.280 --> 00:56:25.555 Other questions here? 00:56:25.555 --> 00:56:28.040 AUDIENCE: I've heard that it's possible-- 00:56:28.040 --> 00:56:30.400 for example, if I'm writing a text to someone, 00:56:30.400 --> 00:56:35.270 it's possible to intercept, to alter my text and send it on my behalf. 00:56:35.270 --> 00:56:40.020 So it's going to be a different message, so it's possible to ask, maybe, 00:56:40.020 --> 00:56:41.710 for sensitive information. 00:56:41.710 --> 00:56:45.840 So I was wondering, don't those messengers use something like cookies? 00:56:45.840 --> 00:56:46.917 How can this be possible? 00:56:46.917 --> 00:56:48.250 DAVID J. MALAN: A good question. 00:56:48.250 --> 00:56:52.410 So SMS, or traditional texting, is generally insecure. 00:56:52.410 --> 00:56:56.080 It is very easy for someone to forge your phone number. 00:56:56.080 --> 00:56:58.710 And in fact, if you've gotten a lot of spam via text, 00:56:58.710 --> 00:57:01.020 that might be exactly what is happening. 00:57:01.020 --> 00:57:07.050 Or worse, it's also possible, recall, to steal your SIM card essentially or port 00:57:07.050 --> 00:57:11.110 it to another carrier, so that someone can intercept all of your actual texts. 00:57:11.110 --> 00:57:14.340 So in general, nowadays, you should be reducing, if not, 00:57:14.340 --> 00:57:18.570 eliminating your usage of SMS, at least, for anything important or anything 00:57:18.570 --> 00:57:19.740 you want to keep private. 00:57:19.740 --> 00:57:26.730 When it comes to other messaging tools, like iMessage, like WhatsApp, Signal, 00:57:26.730 --> 00:57:31.020 Telegram, there's a lot of products nowadays, third-party or otherwise, 00:57:31.020 --> 00:57:33.580 that use end-to-end encryption, which recall, 00:57:33.580 --> 00:57:35.200 we discussed a couple of classes ago. 00:57:35.200 --> 00:57:39.060 And in that case, even though the data is going through a company like 00:57:39.060 --> 00:57:42.940 Facebook, theoretically, assuming they're behaving honorably and have 00:57:42.940 --> 00:57:45.520 implemented end-to-end encryption properly, 00:57:45.520 --> 00:57:49.570 then even they cannot see the message going between their servers. 00:57:49.570 --> 00:57:51.190 And that is independent of cookies. 00:57:51.190 --> 00:57:54.160 Cookies have no part of that solution. 00:57:54.160 --> 00:57:58.210 That solution is entirely thanks to cryptography and encryption 00:57:58.210 --> 00:58:01.230 with digital signatures. 00:58:01.230 --> 00:58:01.840 All right. 00:58:01.840 --> 00:58:04.958 So let's consider one other threat to your privacy 00:58:04.958 --> 00:58:06.750 that you might not necessarily have thought 00:58:06.750 --> 00:58:10.740 about that isn't relate just to the web, but really, your use of the internet 00:58:10.740 --> 00:58:14.490 more generally, namely, DNS, the Domain Name System. 00:58:14.490 --> 00:58:18.510 Thankfully, even though computers on the internet all have IP addresses, 00:58:18.510 --> 00:58:21.510 these unique numeric addresses that we've discussed, 00:58:21.510 --> 00:58:25.350 you and I don't have to remember what server's IP addresses are 00:58:25.350 --> 00:58:28.830 because servers typically have domain names, something 00:58:28.830 --> 00:58:34.020 like harvard.edu, yale.edu, stanford.edu, google.com, amazon.com, 00:58:34.020 --> 00:58:35.160 and others. 00:58:35.160 --> 00:58:39.600 But how then-- when you type in any of those domain names into your browser 00:58:39.600 --> 00:58:41.880 or into any piece of software on the internet, 00:58:41.880 --> 00:58:47.580 how does your browser or your computer know what IP address to contact? 00:58:47.580 --> 00:58:50.850 Well, it turns out that there's a domain name system in the world. 00:58:50.850 --> 00:58:53.220 And this is a system deployed throughout the world 00:58:53.220 --> 00:58:57.870 on the internet whose purpose in life is to translate domain names to IP 00:58:57.870 --> 00:59:00.640 addresses, so that on the outside of those envelopes 00:59:00.640 --> 00:59:04.760 can, indeed, go the IP addresses of source and destination. 00:59:04.760 --> 00:59:07.840 But you and I, as humans, don't need to know or remember 00:59:07.840 --> 00:59:09.890 exactly what those IP addresses are. 00:59:09.890 --> 00:59:11.890 You can think about this back in the day of when 00:59:11.890 --> 00:59:13.848 we were in the habit of typing in phone numbers 00:59:13.848 --> 00:59:16.520 to actual analog landline telephones. 00:59:16.520 --> 00:59:19.368 It was actually pretty hard to remember lots of people's numbers. 00:59:19.368 --> 00:59:21.160 And you might even have had an address book 00:59:21.160 --> 00:59:23.110 that you looked up people's numbers in. 00:59:23.110 --> 00:59:24.640 Or there were certain mnemonics. 00:59:24.640 --> 00:59:27.880 For instance, in the United States, there was a number, 1-800-COLLECT, 00:59:27.880 --> 00:59:33.370 C-O-L-L-E-C-T, which was just much easier to remember than the actual 00:59:33.370 --> 00:59:35.770 numbers for making a collect call. 00:59:35.770 --> 00:59:38.560 The equivalent on the internet is DNS, which 00:59:38.560 --> 00:59:42.610 just automates this process for us, so that every website, every service 00:59:42.610 --> 00:59:46.480 can have its own unique name, but it's translated automatically 00:59:46.480 --> 00:59:52.030 for us via DNS servers throughout the world to the corresponding IP address. 00:59:52.030 --> 00:59:54.410 But why is this problematic? 00:59:54.410 --> 00:59:58.970 Well, it turns out that DNS servers are typically in a few different places. 00:59:58.970 --> 01:00:03.520 One, you probably have one in your home, or your company, or your university. 01:00:03.520 --> 01:00:07.690 And it probably is built into, if in your home, the router, the device 01:00:07.690 --> 01:00:10.120 that you're using just to connect to the internet. 01:00:10.120 --> 01:00:14.290 But your internet service provider also tends to have a DNS server. 01:00:14.290 --> 01:00:16.750 And that DNS server probably knows about way 01:00:16.750 --> 01:00:20.950 more IP addresses than your own home does because why would 01:00:20.950 --> 01:00:24.030 your own home network know about all of the IP addresses in the world? 01:00:24.030 --> 01:00:26.530 But with that said, why would your internet service provider 01:00:26.530 --> 01:00:30.400 know about all of the possible IP addresses and domain names 01:00:30.400 --> 01:00:31.030 in the world? 01:00:31.030 --> 01:00:32.830 Well, suffice it to say for our purposes, 01:00:32.830 --> 01:00:34.580 there's a hierarchical system. 01:00:34.580 --> 01:00:36.910 So even if your home router doesn't know, 01:00:36.910 --> 01:00:39.370 even if your internet service provider doesn't know, 01:00:39.370 --> 01:00:42.580 there's some other server on the internet that can eventually 01:00:42.580 --> 01:00:46.870 give you the answer to a question like, what is harvard.edu's IP address? 01:00:46.870 --> 01:00:51.170 What is yale.edu's IP address and so forth? 01:00:51.170 --> 01:00:55.120 And for efficiency, once that answer has been figured out somewhere, 01:00:55.120 --> 01:00:59.620 then your internet service provider might remember, or cache, the answer. 01:00:59.620 --> 01:01:04.060 And even your home router, and heck, even your device or your browser 01:01:04.060 --> 01:01:06.580 might remember the same answer for efficiency, 01:01:06.580 --> 01:01:09.650 so we don't have to keep asking the same question. 01:01:09.650 --> 01:01:14.950 And it turns out by convention, DNS uses port 53, if you recall our discussion, 01:01:14.950 --> 01:01:21.610 of also using unique numbers to identify things like HTTP, or 80, HTTPS, or 443, 01:01:21.610 --> 01:01:23.830 or 22 for SSH. 01:01:23.830 --> 01:01:26.170 DNS tends to use 53. 01:01:26.170 --> 01:01:29.830 But the catch is that the traffic used for DNS 01:01:29.830 --> 01:01:35.080 is typically unencrypted, which means that when your phone, or your laptop, 01:01:35.080 --> 01:01:38.830 or your desktop is asking your home device, or maybe your internet service 01:01:38.830 --> 01:01:42.940 provider, or someone else, what is the IP address for harvard.edu, 01:01:42.940 --> 01:01:47.260 or yale.edu, or the like, you're actually announcing to the world what 01:01:47.260 --> 01:01:49.210 website you are about to visit. 01:01:49.210 --> 01:01:49.750 Why? 01:01:49.750 --> 01:01:52.150 Because you're waiting for a response from the DNS 01:01:52.150 --> 01:01:55.850 server to actually tell you the corresponding IP address. 01:01:55.850 --> 01:01:56.950 So this isn't great. 01:01:56.950 --> 01:02:00.730 And moreover, your internet service provider, therefore, 01:02:00.730 --> 01:02:04.360 knows all of this information about you because every time 01:02:04.360 --> 01:02:07.570 you ask for a new website that you've never been to before, your home 01:02:07.570 --> 01:02:09.740 network probably doesn't know the IP address, 01:02:09.740 --> 01:02:12.275 so you have to ask your internet service provider. 01:02:12.275 --> 01:02:13.900 And again, they might ask someone else. 01:02:13.900 --> 01:02:17.240 But the internet service provider is going to know now that you asked. 01:02:17.240 --> 01:02:20.470 So your internet service provider, be it for your home network 01:02:20.470 --> 01:02:23.320 or for your cellular phone, pretty much knows 01:02:23.320 --> 01:02:26.110 every website you've ever been to, assuming 01:02:26.110 --> 01:02:28.600 they're logging this information, which they probably are, 01:02:28.600 --> 01:02:32.320 unless there are regulatory or legal requirements that say they can't or 01:02:32.320 --> 01:02:34.400 they can't for very long. 01:02:34.400 --> 01:02:37.340 Now, why is this the case? 01:02:37.340 --> 01:02:40.690 Well, the domain name system essentially requires 01:02:40.690 --> 01:02:42.190 that we ask these very questions. 01:02:42.190 --> 01:02:45.070 And if the internet service providers remember these answers, 01:02:45.070 --> 01:02:48.640 well, they can keep track of everywhere we've been, at least, at a high level. 01:02:48.640 --> 01:02:53.020 DNS only gives them back a translation from the domain name to the IP address. 01:02:53.020 --> 01:02:57.340 What it does not include is the specific page that you're looking at, 01:02:57.340 --> 01:03:01.257 the specific URL, the folder, the file that you're looking at. 01:03:01.257 --> 01:03:03.090 So your internet service provider might know 01:03:03.090 --> 01:03:05.880 you're visiting somewhere on harvard.edu because you 01:03:05.880 --> 01:03:07.440 asked, of course, for its IP address. 01:03:07.440 --> 01:03:10.500 But they don't know what department you were looking for 01:03:10.500 --> 01:03:13.260 or what course you were looking at or the like. 01:03:13.260 --> 01:03:16.410 But there's still a decent amount of invasion, therefore, of your privacy 01:03:16.410 --> 01:03:21.120 if you'd rather that ISP or someone else just not know that information. 01:03:21.120 --> 01:03:25.170 So increasingly, there are alternatives to the standard DNS 01:03:25.170 --> 01:03:31.210 functionality, one of which is called DNS over HTTPS, or DoH for short. 01:03:31.210 --> 01:03:32.850 This means exactly that. 01:03:32.850 --> 01:03:37.230 Instead of just sending out DNS requests unencrypted on port 53 01:03:37.230 --> 01:03:41.520 to the local DNS server, now they're sent, potentially if you enable this, 01:03:41.520 --> 01:03:43.470 over HTTPS. 01:03:43.470 --> 01:03:48.780 And what this means is that they will be sent using the HTTP protocol, which 01:03:48.780 --> 01:03:51.270 we've talked about endlessly in these virtual envelopes, 01:03:51.270 --> 01:03:56.190 but securely using TLS, which is the encryption protocol that ensures 01:03:56.190 --> 01:03:58.920 that no one else can see what's going on inside of that envelope, 01:03:58.920 --> 01:04:01.700 including your internet service provider. 01:04:01.700 --> 01:04:05.590 Now, someone is going to still know what domain name you're 01:04:05.590 --> 01:04:09.040 looking up because after all, to whom are you sending this request? 01:04:09.040 --> 01:04:10.990 Maybe you're sending it to Google. 01:04:10.990 --> 01:04:12.910 Maybe you're sending it to some third party. 01:04:12.910 --> 01:04:14.830 But you are sending it to someone. 01:04:14.830 --> 01:04:18.580 But at least, goes the thinking, it's not your internet service provider, 01:04:18.580 --> 01:04:21.095 who really doesn't need to know this information. 01:04:21.095 --> 01:04:22.720 So that's one way of thinking about it. 01:04:22.720 --> 01:04:24.095 And there's alternatives to this. 01:04:24.095 --> 01:04:27.760 There's actually something called DNS over TLS, DoT, which 01:04:27.760 --> 01:04:30.940 is very similar in spirit, but it doesn't even bother using HTTP. 01:04:30.940 --> 01:04:33.590 But it is still using encryption. 01:04:33.590 --> 01:04:35.740 So this is something that's increasingly common. 01:04:35.740 --> 01:04:38.380 It's not necessarily the default on a lot of systems. 01:04:38.380 --> 01:04:41.200 But it's yet another feature of today's technology 01:04:41.200 --> 01:04:45.040 that you can increasingly look for, seek out, enable proactively 01:04:45.040 --> 01:04:49.000 if this, too, is a concern that you don't necessarily want a third party, 01:04:49.000 --> 01:04:52.277 like your ISP to know what it is you're accessing. 01:04:52.277 --> 01:04:53.860 And it might not even be your own ISP. 01:04:53.860 --> 01:04:58.090 If you're on the road, in a coffee shop that gives Wi-Fi, 01:04:58.090 --> 01:05:00.760 or an airport that gives Wi-Fi, at that point, 01:05:00.760 --> 01:05:02.860 your internet service provider is effectively 01:05:02.860 --> 01:05:04.810 that coffee shop or that airport. 01:05:04.810 --> 01:05:07.570 And do you really want them knowing everywhere you're going? 01:05:07.570 --> 01:05:10.870 You might be, depending on your comfort level, prefer-- 01:05:10.870 --> 01:05:15.550 you might be preferring that, at least, all of your DNS requests 01:05:15.550 --> 01:05:19.940 go to some other central party that you do trust for whatever reason, 01:05:19.940 --> 01:05:23.140 so you're not just informing every different Wi-Fi hotspot that you 01:05:23.140 --> 01:05:25.630 might be using around the world. 01:05:25.630 --> 01:05:27.880 Let me pause here and see if there's any questions now 01:05:27.880 --> 01:05:33.910 about DNS and this concern with respect to your privacy or these solutions 01:05:33.910 --> 01:05:34.960 there to. 01:05:34.960 --> 01:05:41.410 AUDIENCE: Can DND [INAUDIBLE] used to deceive users and steal 01:05:41.410 --> 01:05:42.865 information, which is sensitive? 01:05:42.865 --> 01:05:43.990 DAVID J. MALAN: Absolutely. 01:05:43.990 --> 01:05:47.200 So DNS, itself, can also be used for evil purposes. 01:05:47.200 --> 01:05:51.265 If you control the DNS server, you don't have to give an honest answer. 01:05:51.265 --> 01:05:54.460 If someone asks you for the IP address of harvard.edu, 01:05:54.460 --> 01:05:58.600 you could give them the IP address of some completely malicious server 01:05:58.600 --> 01:05:59.800 that you control. 01:05:59.800 --> 01:06:05.350 However, if the user, like Ryan, in this case, is using HTTPS, 01:06:05.350 --> 01:06:10.150 the whole point of HTTPS is to encrypt the data between browser and server. 01:06:10.150 --> 01:06:11.980 And presumably, the browser is going to try 01:06:11.980 --> 01:06:18.320 to request the TLS certificate of harvard.edu in this case. 01:06:18.320 --> 01:06:22.360 But if the IP address returns the wrong certificate 01:06:22.360 --> 01:06:26.470 that wasn't signed by the right website, then the connection might fail. 01:06:26.470 --> 01:06:29.800 And you'll be given a warning that you can typically ignore in your browser. 01:06:29.800 --> 01:06:33.010 But this should be preventable because you should at least be warned 01:06:33.010 --> 01:06:34.870 that that is not working correctly. 01:06:34.870 --> 01:06:37.180 And ISPs actually do this quite often. 01:06:37.180 --> 01:06:41.260 If you make a typographical error sometimes on home networks, or coffee 01:06:41.260 --> 01:06:45.490 shops, or airports, you might actually still see a website of search results, 01:06:45.490 --> 01:06:46.990 or worse, advertisements. 01:06:46.990 --> 01:06:50.590 And that's because even if you made a typo in the domain name, the coffee 01:06:50.590 --> 01:06:52.840 shop's or the airport's DNS server is still 01:06:52.840 --> 01:06:55.360 going to return to you an IP address of their server, 01:06:55.360 --> 01:06:59.210 so they can at least push some content at you. 01:06:59.210 --> 01:07:02.710 So let's consider some of the mechanisms via which 01:07:02.710 --> 01:07:06.478 we can push back on some of these more invasive privacy practices. 01:07:06.478 --> 01:07:08.770 And one is something we've talked about before, namely, 01:07:08.770 --> 01:07:13.930 a virtual private network, or VPN, which is a increasingly familiar technology. 01:07:13.930 --> 01:07:16.450 But it's worth knowing exactly what problems it is 01:07:16.450 --> 01:07:18.730 solving for you and exactly which problems 01:07:18.730 --> 01:07:21.820 it is not, particularly, if you're using such a service 01:07:21.820 --> 01:07:23.890 to protect your own privacy. 01:07:23.890 --> 01:07:25.220 Well, what is a VPN? 01:07:25.220 --> 01:07:28.030 It allows us, recall, to connect from point A 01:07:28.030 --> 01:07:32.830 to another point B using a completely encrypted tunnel. 01:07:32.830 --> 01:07:35.740 So it doesn't matter if there are machines in the middle, 01:07:35.740 --> 01:07:37.450 as indeed, there will be on the internet. 01:07:37.450 --> 01:07:42.520 All of the traffic between A and B on a VPN is encrypted or scrambled. 01:07:42.520 --> 01:07:43.850 So what does this do? 01:07:43.850 --> 01:07:47.320 This allows you to access sometimes a corporate network or a university 01:07:47.320 --> 01:07:49.750 network that might have servers or services that 01:07:49.750 --> 01:07:53.080 are only accessible if you are on physically 01:07:53.080 --> 01:07:56.560 or if you are on virtually that particular network. 01:07:56.560 --> 01:08:01.010 This ensures that even if you're at home, or in a cafe, or an airport, 01:08:01.010 --> 01:08:03.770 at least, you have an encrypted, more secure connection 01:08:03.770 --> 01:08:07.640 to the campus or the corporate network, at which point, the campus or company 01:08:07.640 --> 01:08:11.262 might be more comfortable with you accessing those services. 01:08:11.262 --> 01:08:13.220 Now, this does not prevent you still from being 01:08:13.220 --> 01:08:17.526 hacked because if you're running malware on your own computer accidentally, 01:08:17.526 --> 01:08:20.359 it doesn't matter if you have an encrypted connection to the company 01:08:20.359 --> 01:08:21.170 or campus. 01:08:21.170 --> 01:08:25.250 You might very well have an infected connection now to the company or campus 01:08:25.250 --> 01:08:27.200 if you, yourselves, are infected. 01:08:27.200 --> 01:08:30.680 VPNs can also be used to create the illusion that you're actually 01:08:30.680 --> 01:08:32.779 in one country and not another. 01:08:32.779 --> 01:08:33.319 Why? 01:08:33.319 --> 01:08:38.000 Well, if point A is where you are and point B is somewhere abroad, well, 01:08:38.000 --> 01:08:40.340 to the rest of the world, if you start using 01:08:40.340 --> 01:08:44.240 this VPN, this virtual private network, you 01:08:44.240 --> 01:08:48.380 will appear to have an IP address that is in that foreign country 01:08:48.380 --> 01:08:52.460 because all of your internet traffic for chatting, video conferencing, 01:08:52.460 --> 01:08:55.880 the web will be sent through that VPN by design. 01:08:55.880 --> 01:08:57.290 That's what a VPN is for. 01:08:57.290 --> 01:09:00.149 And it will come out the other end in that foreign country 01:09:00.149 --> 01:09:06.210 and then continue on its way to the chat service, the email service, the web 01:09:06.210 --> 01:09:07.990 service, or the like. 01:09:07.990 --> 01:09:10.529 So each of those services will think that you 01:09:10.529 --> 01:09:15.149 live or are physically in that foreign country, even if you are not actually. 01:09:15.149 --> 01:09:17.040 So what's the implication of this? 01:09:17.040 --> 01:09:19.950 A virtual private network only guarantees 01:09:19.950 --> 01:09:23.430 that the connection between you and that point B is encrypted. 01:09:23.430 --> 01:09:26.130 It doesn't necessarily mean that once you're out of that VPN, 01:09:26.130 --> 01:09:30.149 it's going to stay encrypted, especially if you're using still HTTP 01:09:30.149 --> 01:09:31.529 and not HTTPS. 01:09:31.529 --> 01:09:35.040 But it does, at least, encrypt everything between points A and B. 01:09:35.040 --> 01:09:39.158 It also does change what your IP address appears to be, 01:09:39.158 --> 01:09:41.700 so that you will, indeed, appear to have an IP address that's 01:09:41.700 --> 01:09:45.189 from that foreign country and not your domestic IP address, 01:09:45.189 --> 01:09:47.670 which might have some value in covering your tracks 01:09:47.670 --> 01:09:50.340 or decreasing the probability that you'll be identified. 01:09:50.340 --> 01:09:53.279 But again, we've seen so many other mechanisms today, 01:09:53.279 --> 01:09:55.620 whereby, your browser can be fingerprinted 01:09:55.620 --> 01:09:58.020 in the context of the web, that someone might still 01:09:58.020 --> 01:10:01.440 be able to realize that, OK, your IP is different today, 01:10:01.440 --> 01:10:04.560 but this still looks like you, even if they don't necessarily 01:10:04.560 --> 01:10:07.990 know that you are David Malan or you, yourself. 01:10:07.990 --> 01:10:11.940 But it, at least, does solve at least one problem, which is encrypting end 01:10:11.940 --> 01:10:14.430 to end all of your traffic. 01:10:14.430 --> 01:10:16.380 Well, there's another piece of software that's 01:10:16.380 --> 01:10:19.900 been popular for some time called Tor, The Onion Router. 01:10:19.900 --> 01:10:24.030 So this is a piece of software that you can install on your own Mac, or PC, 01:10:24.030 --> 01:10:25.230 or other device. 01:10:25.230 --> 01:10:30.210 And this uses encryption to solve the problem a different way using 01:10:30.210 --> 01:10:35.100 additional encryption to try to give you a higher probability of privacy. 01:10:35.100 --> 01:10:38.070 And here's a picture that Tor, themselves, puts on their website. 01:10:38.070 --> 01:10:41.310 And it has depicting here you on a very old school 01:10:41.310 --> 01:10:44.910 PC connecting to a whole bunch of nodes inside 01:10:44.910 --> 01:10:47.520 of this Tor network connected to ultimately maybe 01:10:47.520 --> 01:10:49.230 the websites that you're visiting. 01:10:49.230 --> 01:10:54.900 And what happens here is that when your computer is running the Tor software, 01:10:54.900 --> 01:10:57.990 the Tor software first figures out, OK, who else in the world 01:10:57.990 --> 01:10:59.730 is using the Tor software? 01:10:59.730 --> 01:11:03.510 Because it's going to use those other computers to route your traffic, 01:11:03.510 --> 01:11:07.050 up, down, left, and right and kind of like the movies or TV, 01:11:07.050 --> 01:11:11.508 where you see a map of the world and the traffic is bouncing back and forth 01:11:11.508 --> 01:11:12.300 and back and forth. 01:11:12.300 --> 01:11:14.400 That's kind of the spirit of Tor. 01:11:14.400 --> 01:11:17.250 And what happens is if your computer here 01:11:17.250 --> 01:11:23.010 wants to send a request to a website that's maybe over here 01:11:23.010 --> 01:11:25.530 and it decides, for instance, to route it through one, 01:11:25.530 --> 01:11:29.610 two, three computers, what the Tor software will do 01:11:29.610 --> 01:11:33.240 is encrypt the request at least three different times. 01:11:33.240 --> 01:11:36.600 Whatever web request you are sending, whatever email you 01:11:36.600 --> 01:11:39.630 are sending, whatever chat message you're sending, whatever service you 01:11:39.630 --> 01:11:43.110 are using between point A on the left and point B on the right 01:11:43.110 --> 01:11:47.310 is going to be encrypted with this node's public key, 01:11:47.310 --> 01:11:51.600 with this node's public key, with this node's public key. 01:11:51.600 --> 01:11:54.930 And so here's the onion in Tor, The Onion Router. 01:11:54.930 --> 01:11:59.140 You are encrypting layer, upon layer, upon layer of data, 01:11:59.140 --> 01:12:03.750 so that mathematically, recall per our discussion of public key cryptography, 01:12:03.750 --> 01:12:07.230 only this node can peel off one layer, only 01:12:07.230 --> 01:12:11.220 this node can peel off one layer, only this node can peel off one layer using 01:12:11.220 --> 01:12:15.810 their own respective private keys, which undoes the effect of your 01:12:15.810 --> 01:12:17.710 having encrypted your traffic. 01:12:17.710 --> 01:12:19.920 So what you're really doing here by choosing, 01:12:19.920 --> 01:12:23.730 perhaps, a different path, every request, a different path every day 01:12:23.730 --> 01:12:27.570 is you are with Tor effectively covering your tracks in some sense. 01:12:27.570 --> 01:12:32.290 And by design, the Tor software doesn't remember much information at all, 01:12:32.290 --> 01:12:35.580 so it doesn't have the sorts of logs that I propose can be worrisome, 01:12:35.580 --> 01:12:37.260 at least, in the context of web servers. 01:12:37.260 --> 01:12:41.710 By design, Tor is meant to preserve your privacy with higher probability. 01:12:41.710 --> 01:12:45.060 And so by design, it just doesn't keep nearly as much information around. 01:12:45.060 --> 01:12:48.850 Now, this isn't to say that if you're doing this for malicious purposes, 01:12:48.850 --> 01:12:50.910 trying to evade the authorities, this isn't 01:12:50.910 --> 01:12:53.700 to say that this computer, this computer, this computer 01:12:53.700 --> 01:12:57.000 couldn't be subpoenaed, so to speak, by some government entity 01:12:57.000 --> 01:13:00.910 and they could reconstruct the path that your data took. 01:13:00.910 --> 01:13:03.790 But the point is that it's generally quite laborious. 01:13:03.790 --> 01:13:07.720 By this time, all of that data has disappeared from those interior nodes. 01:13:07.720 --> 01:13:09.820 And so they don't have much information to share. 01:13:09.820 --> 01:13:14.740 And so increasingly, it does provide you with some higher probability of privacy 01:13:14.740 --> 01:13:18.520 by layering your requests with encryption, encryption, encryption 01:13:18.520 --> 01:13:21.220 and sort of trusting that these interior nodes are 01:13:21.220 --> 01:13:24.895 going to relay it to the final endpoint, so something 01:13:24.895 --> 01:13:26.020 to consider if of interest. 01:13:26.020 --> 01:13:30.400 But realize, too, that because of how the internet works with IP addresses, 01:13:30.400 --> 01:13:32.800 because of how the internet works with port numbers, 01:13:32.800 --> 01:13:38.360 it's still possible on a network to know who is using Tor, for instance. 01:13:38.360 --> 01:13:43.360 So if you happen to be the only person at home, the only person on a company 01:13:43.360 --> 01:13:46.990 or on a university network who's using Tor at the moment, 01:13:46.990 --> 01:13:50.230 it's being used for malicious purposes, odds are, 01:13:50.230 --> 01:13:53.840 you could be targeted as the source of that attack. 01:13:53.840 --> 01:13:58.150 And so realize, in particular, that this just raises the bar to detection. 01:13:58.150 --> 01:14:01.320 It raises the bar to your privacy being invaded. 01:14:01.320 --> 01:14:05.870 But it does not, as do none of the technologies we've discussed, 01:14:05.870 --> 01:14:10.120 give you an absolute protection of these same properties. 01:14:10.120 --> 01:14:13.740 So there's one final mechanism when it comes to preserving one's privacy 01:14:13.740 --> 01:14:16.230 that's thankfully increasingly available to us 01:14:16.230 --> 01:14:19.800 on devices, on desktops and laptops, and especially, on phones. 01:14:19.800 --> 01:14:23.190 And that's this notion of permissions, which isn't anything new. 01:14:23.190 --> 01:14:26.910 But as iOS, and Android, and other operating systems 01:14:26.910 --> 01:14:31.440 have evolved, increasingly, you and I are being asked by our operating 01:14:31.440 --> 01:14:34.110 systems, do you want to allow this? 01:14:34.110 --> 01:14:36.840 Not only do you want to allow this program to run, 01:14:36.840 --> 01:14:40.980 but do you want to allow this program to access your camera, for instance? 01:14:40.980 --> 01:14:44.040 Do you want this program to access your microphone, for instance? 01:14:44.040 --> 01:14:48.940 Do you want this application to access your contacts, for instance? 01:14:48.940 --> 01:14:53.310 So on the one hand, we're being given much more fine-grained control, 01:14:53.310 --> 01:14:55.830 which is a good thing, presumably. 01:14:55.830 --> 01:15:00.240 At the same, time, it's also just pushing the decision onto you and me. 01:15:00.240 --> 01:15:03.450 And very often, with these applications, as you've probably found, well, 01:15:03.450 --> 01:15:07.140 if you don't enable the camera and give access to the app, 01:15:07.140 --> 01:15:10.840 it just might not work because they have some code in their application 01:15:10.840 --> 01:15:14.870 that says if camera's not on, then do not do anything useful. 01:15:14.870 --> 01:15:19.030 So there's this tension between usability and privacy in this case. 01:15:19.030 --> 01:15:21.460 But thankfully, there's finer-grained controls too. 01:15:21.460 --> 01:15:23.570 On iOS, for instance, you might be prompted, 01:15:23.570 --> 01:15:26.290 do you want to give this app access to this feature 01:15:26.290 --> 01:15:30.722 always, or only while using the application, or never? 01:15:30.722 --> 01:15:32.680 And that's certainly a good thing for something 01:15:32.680 --> 01:15:34.840 like the camera or the microphone, where it 01:15:34.840 --> 01:15:37.090 would be nice to trust that when you close the app 01:15:37.090 --> 01:15:39.730 and put your phone in your pocket, that it's not still 01:15:39.730 --> 01:15:43.440 listening to or trying to watch you from this built-in hardware. 01:15:43.440 --> 01:15:45.190 Now, there is some feature that might need 01:15:45.190 --> 01:15:48.940 to run all of the time, which includes location-based services, which 01:15:48.940 --> 01:15:51.340 is to say that our phones, especially nowadays, 01:15:51.340 --> 01:15:54.880 can pretty effectively track our location using GPS, 01:15:54.880 --> 01:15:57.100 or Wi-Fi, or some other technology. 01:15:57.100 --> 01:16:00.910 Now, that's of course, useful, if not, necessary for using mapping 01:16:00.910 --> 01:16:03.850 applications, like Maps, or Google Maps, or the like that 01:16:03.850 --> 01:16:08.110 help us get physically from point A to point B. But very commonly, 01:16:08.110 --> 01:16:10.360 these applications, at least, by default, 01:16:10.360 --> 01:16:13.960 ask for access to your geographic location 01:16:13.960 --> 01:16:17.260 always, which means just by walking down the street, 01:16:17.260 --> 01:16:19.790 even if you're not following a map on your phone, 01:16:19.790 --> 01:16:22.870 means that the app can still be tracking where you're going. 01:16:22.870 --> 01:16:25.360 And certainly, among the Googles and the Apples 01:16:25.360 --> 01:16:27.700 of the world nowadays or other manufacturers, 01:16:27.700 --> 01:16:30.640 they certainly know pretty much everywhere you 01:16:30.640 --> 01:16:35.600 and I are going if we leave these location-based services on by default. 01:16:35.600 --> 01:16:40.900 So this is an example of something of which you should be mindful if only 01:16:40.900 --> 01:16:43.900 because here is yet another example of information 01:16:43.900 --> 01:16:48.130 that logically, when you think about it, OK, obviously, that makes sense. 01:16:48.130 --> 01:16:51.280 They must be keeping track of my location, otherwise, 01:16:51.280 --> 01:16:54.250 how could they provide me with mapping services? 01:16:54.250 --> 01:16:58.030 But pause and think now, perhaps, exactly what the implications 01:16:58.030 --> 01:17:02.260 are for you, for your privacy, and just walking around 24/7 01:17:02.260 --> 01:17:05.830 with these radios now in our pockets. 01:17:05.830 --> 01:17:08.950 So even though there are quite a few threats to our privacy, 01:17:08.950 --> 01:17:11.860 online especially, at least, there are these mechanisms 01:17:11.860 --> 01:17:16.360 that you and I can enable to at least preserve some of the same. 01:17:16.360 --> 01:17:18.670 Well, what have we done over the past few weeks? 01:17:18.670 --> 01:17:21.580 We began with a look at how we can secure our accounts, then 01:17:21.580 --> 01:17:24.520 our data, then our systems, then our software, and today, of course, 01:17:24.520 --> 01:17:26.930 focusing on preserving our privacy. 01:17:26.930 --> 01:17:29.770 And by way of the various technologies we've looked at, 01:17:29.770 --> 01:17:33.700 the stories we've told, the principles that we've introduced, 01:17:33.700 --> 01:17:36.610 we hope that in the days, the weeks, and the years 01:17:36.610 --> 01:17:39.700 to come, you can use all of these first principles, 01:17:39.700 --> 01:17:41.470 and these ideas, and these building blocks 01:17:41.470 --> 01:17:44.140 to extrapolate to how new technologies work, 01:17:44.140 --> 01:17:48.580 to how new threats might affect you, and to what questions you should be asking 01:17:48.580 --> 01:17:53.110 of either the software you use or the software you develop to ensure 01:17:53.110 --> 01:17:56.290 that not only your communications are secure, but also, 01:17:56.290 --> 01:18:00.850 that it has these privacy-preserving properties that you, and your users, 01:18:00.850 --> 01:18:02.560 and your customers might want. 01:18:02.560 --> 01:18:05.860 This then was CS50's Introduction to Cyber Security. 01:18:05.860 --> 01:18:08.640 And this was CS50.