00:00:00,030 --> 00:00:01,696 DAVID J. MALAN: All right, welcome back. This is day two of two, Computer Science for Business Leaders. So a little pop quiz perhaps. What did we do yesterday? So the answer is on the board. The goal is to explain. So what was computational thinking all about? Whoever makes eye contact first. AUDIENCE: Input, output. DAVID J. MALAN: OK, good. So inputs into algorithms gives us outputs, and it's a way of framing your thought processes and problem solving techniques more methodically and generally bringing to bear to problems ideas that have been inspired by and embraced by computer science. And we looked at some mechanical examples like drawing pictures to reinforce the precision with which instructions are necessary, as well as to actually solve problems like a phone book more generally. Internet technology. So this was a loaded discussion with a whole alphabet soup of topics. But any questions or confusions that remain from those various acronyms and technologies? 00:00:59,832 --> 00:01:04,845 If no, it sounds like I could ask a question like, what does GHCP do and everyone should have an answer. AUDIENCE: It configures Macs and PCs. DAVID J. MALAN: It configures Macs and PCs to do what? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. So to get the DNS server IP addresses as well as their own IP address, their so-called subnet mask-- which we didn't talk too much about-- as well as their own default router. And then that was intermingled with any number of other technologies, but there too there was this theme of layering. So HTTP and HDPS we claimed were built, like, on top of the Internet. These are applications or services, so to speak, that we know more casually as the web or the formerly the world wide web, though no one really says that anymore. And so what do HTTP and HDPS actually do? You were to define them, what is it? What is HTTP, the protocol? Katy? 00:01:59,820 --> 00:02:03,935 AUDIENCE: It's protocol that knows to go to that IP address. DAVID J. MALAN: OK, the protocol that knows to go to that IP address. Technically not quite insofar as HTTP actually doesn't know anything about the destination IP address. That's the lower level. So the operating system and the browser know, but not necessarily HTTP itself. What was the extent of the message that we had to send in accordance with HTTP? Yeah? AUDIENCE: [INAUDIBLE] 00:02:29,350 --> 00:02:30,350 DAVID J. MALAN: Exactly. We sent a message And just really a one, maybe a two-line message that just said get slash where slash means give me the default page from some website. And then HTTP slash 1.1, which is just the version number. And then I also typed in the host to remind the server what host name we wanted, not IP per se, and that was enough to get the server to send me a cat or to send me Google search results or the like. All right, we then transitioned to cloud computing. So now you all had probably heard about cloud computing before yesterday, but now going back home or to work you can perhaps explain it a little better to someone. So if someone asks you tomorrow back at home or work, oh, what is cloud computing by the way, what is your well-formulated answer to that now? 00:03:14,439 --> 00:03:15,605 Everyone's very preoccupied. 00:03:18,150 --> 00:03:19,600 What is cloud computing? What is the cloud? 00:03:23,975 --> 00:03:24,850 AUDIENCE: [INAUDIBLE] 00:03:36,959 --> 00:03:37,750 DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] 00:03:41,710 --> 00:03:42,511 DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] 00:03:49,150 --> 00:03:50,290 DAVID J. MALAN: OK. Cons might be complexity, reliance on third parties to keep your own website or workflows alive. And you described it as a framework, and that's-- I think that works. I would be careful not to define it too precisely as a framework insofar as it's really just a collection of technologies, a collection of Internet-based technologies that people are leveraging to store their data, to do their computation, and any number of other processes. And if you had a moment last night to read the New York Times article, for instance, that was a wonderful use-- years ago now-- of how you might spin up, so to speak, turn on a whole bunch of servers or virtual machines all at once, do some interesting computation like generating PDFs, and just turn them off. And so for just a few dollars or a few hundred dollars later, you've converted millions of articles, in that case, to PDF without having to invest a single dollar in your own physical hardware, let alone the configuration and installation of all of that. We talk more generally about some silly acronyms like IAAS and PAAS, which generally refer to infrastructure as a service. Something like Amazon, which we described as pretty low level. You have to know a little something about your load balancers and where you're storing the data and where your servers are living, and that's all fine and good but it's fairly low level. And we then transitioned more to platform as a service as a topic, which was something like Heroku. Amazon has something called Elastic Beanstalk, Google's long had Google App Engine and similar, which are higher-level services built on top of these lower-level details. And then there are still things like software as a service, which is an even fancier term for really just describing a web application that does useful stuff. And you really don't care about the language it's written in and in turn you really don't care about the infrastructure it's running on. So, again, this theme of layering or abstraction has kind of hit us in any number of ways. Then lastly we looked at web development. CSS and HTML, little bit of time on the mechanics of that. It's not a programming language. It's a markup language, and more on the distinction there later today. And then we looked at Cloud 9 as just an example of a software as a service, web-based programming environment that's somewhere in the cloud but we don't really know or need to care where that is at the end of the day. And talked ultimately about what you can actually do and how a web server and browser inter-communicate. So just so that we cross off everyone's to do lists while we're still here in Cambridge today, is there anything you'd like to add to today's list, which includes the following? So we'll start today with a focus on privacy and security motivated by a couple of specific examples, followed by any number of directions in which folks might want to go related to those two topics. Two, looking at programming and some basic constructs in certain algorithms and data structures. We'll tease apart what those are and these are sort of ingredients that you might bring when designing a piece of software at a white board or in a computer ultimately. Then technology stacks, which is a general term describing any number of these kinds of things here. Sort of additional ingredients, higher-level ingredients like actual tools and software and technologies that you might use to solve problems in software. And then finally a look at web programming in particular and some of the technologies related there too, some of which we scratched the surface of yesterday like databases, which we'll look at in a bit more detail. But is there anything else? Yeah? JP? AUDIENCE: Block chain. DAVID J. MALAN: Block chain. OK. AUDIENCE: [INAUDIBLE] 00:07:06,269 --> 00:07:07,060 DAVID J. MALAN: OK. Block chain in the context of Bitcoin and such or a different context? AUDIENCE: The logic of it. DAVID J. MALAN: The logic of it. OK. All right. AUDIENCE: What we're trying to do is to [INAUDIBLE]. DAVID J. MALAN: Yep. OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. Other topics to add to today's list? Yeah? [INAUDIBLE] AUDIENCE: Microservices. DAVID J. MALAN: What's that? AUDIENCE: Microservices. DAVID J. MALAN: Microservices. OK. So microservices. AUDIENCE: Just, like, the notion of splitting up monolithic apps into independent microservices. DAVID J. MALAN: Sure. And this relates nicely to our brief discussion yesterday of containerization, which is the sort of infrastructure that tends to support stuff like this. Other topics? Victor? AUDIENCE: You touched upon content delivery networks but I'd like to know how it works in the context of the different [INAUDIBLE]. DAVID J. MALAN: OK, sounds good. Content delivery networks or CDNs, like Akamai was one such example. Sean? AUDIENCE: It was an article last night too in Hadoop. DAVID J. MALAN: Oh, Hadoop? Oh, yes. The tech-- AUDIENCE: [INAUDIBLE] 00:08:06,950 --> 00:08:08,560 DAVID J. MALAN: OK, sure. Yes, [INAUDIBLE]? AUDIENCE: Maybe [INAUDIBLE] 00:08:12,750 --> 00:08:13,807 DAVID J. MALAN: OK. Well, maybe we'll end with that cliffhanger. OK. AUDIENCE: Take some inspiration there. DAVID J. MALAN: Stay till the end today and you'll find out about the future. Alicia? AUDIENCE: You had talked about programming-- I see these articles about who we need to program, [INAUDIBLE] to know how to program, but what's behind that [INAUDIBLE]? DAVID J. MALAN: Oh, OK. So CS for All, for instance, which is one of the hashtags going around. OK, so-- OK. Other topics? Yeah? AUDIENCE: [INAUDIBLE] 00:08:47,927 --> 00:08:49,260 DAVID J. MALAN: Oh, interesting. I'm sure we're not going to do it well is the short answer. But-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Sure. We can weave that into this morning's discussion in particular. Other topics? Yeah? Alicia? AUDIENCE: More a business topic but [INAUDIBLE] Google Docs or Microsoft [INAUDIBLE] talk about the risk of [INAUDIBLE] 00:09:16,790 --> 00:09:18,700 DAVID J. MALAN: OK, so-- yeah, so-- AUDIENCE: [INAUDIBLE] in the cloud tool. DAVID J. MALAN: Cloud tool. Sure, let's weave that in, perhaps into the privacy and security discussion as well. Other topics? Anything at all? Yeah? AUDIENCE: [INAUDIBLE] about machine learning maybe. DAVID J. MALAN: Machine learning. OK. I'll put an abbreviated ML. Sure. Other topics? Sort of creating a problem for myself with this very long list today. Let me propose this. I'll do my best to weave these into today's framework, but let me also encourage you to remind me or come up during any of the breaks or lunch time or after today too if we don't quite hit on everything lest touch on something of interest. And worst case I can follow up via email with some references if we don't get through everything today. All right, so you might have enjoyed if you hadn't in previous months the John Oliver segment on Last Week Tonight, which is a fun excuse for homework to watch an 18-minute segment about encryption, but he actually does a wonderful job in general at peeling back the layers of some interesting societal trends, some of them technological. Among them, for instance, has been encryption. So some months ago the FBI really wanted to get into someone's iPhone and Apple said no. They weren't willing to help with this request. But does anyone want to explain a bit more of the context and perhaps some of the technicalities? Like, what was it the FBI wanted Apple to do for them? Either high-level answer or low level is fine. Yeah? Griff? AUDIENCE: Wanted Apple to write a small program to bypass the security feature of the phone. DAVID J. MALAN: OK, they-- AUDIENCE: [INAUDIBLE] the password reset. DAVID J. MALAN: OK. So they wanted Apple to write some software to bypass a security mechanism of the phone related to password reset. And can someone dive in deeper there? What exactly is the security feature in question that some of us are perhaps using? Sean? AUDIENCE: An iPhone [INAUDIBLE] so many times, you get one more time or it'll erase the phone. DAVID J. MALAN: Exactly. Yeah. So this is optional. So by default your phone should not do this, since I'm sure there are people who will enter their own password wrong 10 or 11 times maybe late at night sometimes too or after a night on the town and that's bad if it just deletes itself automatically. But this is also a good feature. Why? Why might you want your phone to sort of figuratively self-destruct? AUDIENCE: If someone's tampering with it and it automatically wipes the information [INAUDIBLE]. DAVID J. MALAN: Exactly. If it's someone malicious or your phone's been lost or stolen or it's a nosy roommate or such you might want the data to self-destruct, provided it's hopefully backed up somewhere either on iTunes or in iCloud, in the cloud where it's completely safe, I'm sure. So you at least assume that it's backed up or that you don't care about the data at that point. And the worry in this particular case was that the phone in question might have had this feature enabled because there was apparently some evidence in like the iCloud server logs that in the past he had enabled this particular feature. But it was unclear and the FBI, as I understand it, didn't really want to take the risk of trying to guess the person's password and have the phone self-destruct, so to speak, whereby it deletes all of the data on the phone. Now let's pause for just a moment. What would it mean to delete data on the phone, especially in the light of yesterday's chat? Grace? AUDIENCE: It's just deleting the directory? [INAUDIBLE] DAVID J. MALAN: It's a good question. So hopefully Apple's not so unsophisticated as to just forget where the data is and market this as a self-destruct feature, though it wasn't all that long ago that even industry miscommunicated these kinds of things. If I can find a screenshot-- DOS delete all data erased. I'm trying to remember an error mess-- or a warning message from yesteryear. Delete secure erase-- don't run DOS anymore so I can't just run the command for you. Let's see. DOS format command. So DOS, Disk Operating System, is what we all had before Windows and newer versions of Mac OS came about. And this, for instance, was the message you would get years ago if you tried to format your hard drive, where formatting is in most people's minds equivalent to erasing it. And it surely seems to be the case that all data on non-removable disk drive C, which is your default hard drive, will be lost. So that sounds really, really bad, and it is. You're certainly creating a problem for yourself if you regret this decision. But what do they really mean by lost? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. It's just-- and it's not that you can't find it. It's just harder to find it. Your data gets lost in the sense that it's still there. The zeros and ones, the little magnetic particles back in the day were still oriented north-south, south-north representing ones and zeros as you had left them, but the directory entries were simply forgotten. So the data's still there but the recollection thereof is no longer. And as an aside if you've ever wondered why in the Windows world or Mac world some people say folders and directories interchangeably-- like, a directory really is just that table. It's like a directory in a building looking up someone's name and office number. That they're equivalent, we just represent them iconographically with folder icons. This just forgets where the data is. All right, so hopefully Apple's not just doing that and forgetting the data. And they're not. What are they doing instead? Or what should you do instead if you really want to delete someone's data? 00:14:48,330 --> 00:14:49,740 AUDIENCE: Write it over. DAVID J. MALAN: Write it over? In what sense? AUDIENCE: Jibberish. DAVID J. MALAN: Jibberish. So you only have zeros and ones at your disposal, so what does gibberish mean in that context? AUDIENCE: Random. DAVID J. MALAN: So random. So if you have the ability to generate random numbers, which to some extent computers indeed do, you can just change all of the zeros and ones to just random zeros and ones so that it looks like random noise. It is noise and with very low probability can you actually resurrect that data. Now, as a technical aside, back in the day, especially in the days of magnetic storage, it's long been-- well, I think it's been the case in super old hardware that when you reorient like a 1 to a 0 it doesn't necessarily go, like, 180 degrees. There might be cases where it goes, like, 179 degrees, for instance, thereby leaving a little bit of a hint that that bit wasn't necessarily what it appears to be right now. But modern hard drives, there's just so much data, everything is so packed, this really isn't a concern these days. And so simply randomizing the data in this way would work well. Overwriting everything with zeros, which is very common, would work well. Or ones, equivalently, would work well. But that's not actually what Apple does because that would actually take a decent number of seconds or minutes, especially now that you have gigabytes and gigabytes of space in your phone. They instead do something that's actually pretty fast. If you've ever securely erased your iPhone, it's mostly the boot up process that's the slow part not so much the wiping. AUDIENCE: [INAUDIBLE] they just reinstall the iOS [INAUDIBLE] DAVID J. MALAN: Yeah, OK. So maybe they're reinstalling the software. And to be honest, they're probably not reinstalling all the apps because those are probably stored in a different portion of memory. AUDIENCE: It's the same when you reset your iPhone. DAVID J. MALAN: OK. AUDIENCE: You can do that wherever you are [INAUDIBLE] DAVID J. MALAN: Oh. So, yes, it does redownload applications that you had previously installed. Absolutely. But that's separate from the data. So what is it actually doing with respective user data-- your emails, your text messages, and other things you had on there? When Apple shows that progress bar or Android equivalent, what's it doing? It's not overwriting the zeros and ones. That-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Not even. It actually doesn't go and touch all of the bits. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. It's more sophisticated than that. So we'll talk about this in a little more detail in just a bit. But encryption, again, is the art of scrambling information and most encryption assumes that you have some secret key. In particular, suppose I were to write, let's say, this. That's an encrypted message for you all this morning. What does it say? AUDIENCE: Hi? DAVID J. MALAN: Hi. OK, who said hi? OK, why do you say hi? AUDIENCE: It's the alphabet plus 1. DAVID J. MALAN: It's the alphabet plus 1. And fortunately I chose a short word so we could sort of brute force this by just figuring out what's a two-letter word we can think of which is not unreasonable strategy either. This indeed means hi. But what was the cipher, the algorithm with which this is encrypted? Well, as you glean, Dan, it looks like I took the original word and just added one to it. So H plus one letter of the alphabet gives me I and I plus 1 gives me J and thus is born my ciphertext. So this might be called plaintext by convention and this would be called ciphertext, which is the resulting encrypted text. Now, of course, this is not a very secure cipher because my choice of key was not very good. So what if I choose a different key? How about if I write this message to you, which not meant to be an acronym or text speak. What's this say? 00:18:21,709 --> 00:18:22,640 AUDIENCE: Same thing. DAVID J. MALAN: Same thing, but what's different? AUDIENCE: Algorithm. DAVID J. MALAN: Yeah, well the algorithm is the same I'd argue. The input is slightly different. There's two inputs, indeed, in this case, not only the plaintext but also the number. So if this previously was 1, now I've gone and done hi plus 2, which is what's ultimately given me J and K. So how many possible keys are there for this algorithm, for this cipher? 00:18:56,500 --> 00:18:58,280 The key, again, is the number I added. What would you argue is the total number of possible keys? AUDIENCE: Infinite. DAVID J. MALAN: Infinite's true, but practically speaking not really because many of them are equivalent to each other. AUDIENCE: 25, 26. DAVID J. MALAN: Yeah, 25 or 26. Why do you say that, Katy? AUDIENCE: You start adding-- one is the same if I have 26 or 27, right? DAVID J. MALAN: There's 27. Yep, exactly. Yeah, so if you're assuming we're confining this cipher to only operate on letters, to only take letters as input and only produce letters as output, there's going to be a corner case so to speak, a sort of extreme scenario that you better anticipate lest your algorithm be buggy. And the scenario I'm thinking of is what if the letter is Z, like you're encrypting the word "zoo" for instance? Well, suppose the key is 1. What the o is easy. It just becomes "PP." But what's Z become? It's like A feels like a nice, clean solution where you just wrap back around otherwise it's going to be like some funky punctuation symbol or something like that, which is fine, but you have to decide in advance what you're going to allow as your inputs and outputs. So in this case, if we're just adding numbers and we're comfortable with the idea of wrapping around from Z to A and so forth, well then we only have 25, maybe 26 possible keys. But 26 is kind of silly because by the time you add 26 letters it just wraps back around to A. So there's this old joke on the Internet which geeky people find funny. So there's this algorithm called rot13 which back in the day on message boards, bulletin board systems, used to be this sort of low impact way of encrypting information like movie spoilers and the like. So it would look like nonsense but if you really wanted to see the movie spoiler you could rotate everything by 13 places. So the joke on the Internet is, well, you can-- if you want your data to be twice as secure, you should use rot26. OK, one person finds this funny. rot26 of course would be stupid because you're just rotating from A to A, B to B, and so forth. And to your point earlier, Avi, about infinite, true but infinitely many of those numbers are equivalent to others so it all boils down to still just 25 useful values. But even then, these values aren't all that useful because Dan cracked this cipher in just a couple of seconds. So what might be a better way of encrypting information? And, again, the goal here is to come back to Siobhan's conjecture that maybe Apple is just forgetting an encryption key, whatever that means. But we'll come back to that. What might be more secure than this rotational cipher otherwise known as a Caesar cipher where you just add some fixed number to each letter? 00:21:38,290 --> 00:21:39,616 What else might you do? AUDIENCE: [INAUDIBLE] have each letter could correspond to a different letter but it's not the same [INAUDIBLE]. DAVID J. MALAN: Yeah. AUDIENCE: You create a key where each letter and other letter can randomize [INAUDIBLE]. DAVID J. MALAN: OK, good. Let's take those both in turn. So instead of using just one key, let's say-- let me try to simplify the definition as just we'll use one key per letter. So for instance let's just so we have a longer word, if we had the word "hello" and we were just using a key of 1, this would become I, this would become F, this would be M, M, and P. But of course this could be very quickly cracked in a couple of ways. One, you could just kind of noodle on it for a moment and see, oh, this has just been rotated one place. There's another piece of information that's leaked by nature of this cipher. There's a hint, a pattern. What kind of pattern, Grace? AUDIENCE: It's a double letter. DAVID J. MALAN: Yeah. And double letters are kind of interesting because it's probably not going to be, like, two Q's in a row. It's probably not going to be, you know, "cc" isn't that common. "ll." I feel like I see "ll" a lot. And that kind of intuition or those kinds of statistics ultimately can help you crack whatever is going on. So that is a bad property of this rotational cipher. But what if we refine it somehow so that instead of adding 1 to each of the letters, why don't we choose maybe a number-- and this is overly simplistic, but maybe we should add a different value to each thereby obfuscating the fact that there is indeed a repetition of letters. Because if you add 3 to one L and 4 to the other, it's going to be two different letters as a result. And so this is more generally known as Vigenere's cipher, a French gentleman years ago, which was an improvement upon the Caesar cipher, or the rotational cipher we just discussed. Instead of using one key, you use one key per letter, although technically you would still use a finite number of keys. And if you need to reuse them-- for instance, if the message you're trying to encrypt is "hello world," you would just reuse those same keys again. So it just helps you remember so you don't need a super, super long key. But in reality this is not what most modern computers do. What most modern computers use are different algorithms like DES or RSA or AES. These are a number of the acronyms that you might see if you poke around security software even on your own computer. But at the end of the day, all of these algorithms have some notion of secrecy involved. And we'll tease this apart in more detail for this one and come back to yesterday's topic of browser-based encryption. But for now, all of them have some secret, like the number 1 or the number 2 or the number 12345 or such, and the whole security of your system is predicated on that secret remaining a secret. As soon as one of you knows that my key is 1 you can now crack any messages I send throughout the room here, for instance, on a piece of paper. So that would be bad. So Siobhan proposed earlier that maybe Apple is deleting your data by forgetting an encryption key. And that's actually true, but can we infer now what he might mean from that given this definition? Anyone other than Siobhan? 00:25:01,140 --> 00:25:06,980 How might forgetting an encryption key allow you to delete data? How could you leverage that primitive, so to speak? 00:25:12,968 --> 00:25:16,134 AUDIENCE: [INAUDIBLE] treat the ciphertext like plaintext? DAVID J. MALAN: Treat the ciphertext like plaintext. In what sense? AUDIENCE: Like plaintext [INAUDIBLE] become encrypted and if you forget it [INAUDIBLE] 00:25:29,679 --> 00:25:30,470 DAVID J. MALAN: Oh. Or maybe not what you input but what you intended to be random I think is where you're going with this? So, yeah. So, like, if we-- let's choose a little example. So suppose that I have encrypted some sentence or word and the result is GZYEAZYFQ-- I'm just making these up in succession. I suppose that this is what some secret key on my phone has encrypted. So this is what is currently on my phone. Now, that might be a text message I sent to someone, but it looks like nonsense if you're just looking at the letters or looking at the pattern of the bits because its Apple iOS, their software, has used an encryption key to turn it into something that looks like that. Now, when I, the human, am using my phone I of course don't see that. I see the actual English text message because I have logged into my phone, enabled the encryption key. This is all sort of built in automatically for us and it is automatically for me decrypting anything before I see it on the screen. So if it looks like this when it's actually stored in my phone's memory but it can be turned back into plaintext by using that supposedly secret key and iOS knows how to do this, it would seem that simply forgetting the key, whatever it is-- it's probably something fancier than the number 1 or the number 13. But whatever that key is, if you forget it, the side effect of forgetting your key means that what is left in your phone's memory? 00:27:06,900 --> 00:27:11,380 Just nonsense like this, which, yes, represents encrypted data, but the definition of a good cipher is that the resulting ciphertext appears to be random and is indiscernible from truly random data. And so, as such, even though, yeah, all of your data is still on there, it's encrypted. And because you threw away the encryption key, the FBI and no one else is going to be able to decrypt it, at least not without a huge amount of luck. And so for all intents and purposes it is in fact deleted. It's equivalent to Avi's suggestion earlier of just overwriting your data with random bits. These are random if you just have no way of recovering what they once were. Now, what if Apple or a nosy roommate just tries to guess your password or, in turn, your key? And to be clear, we humans just have to remember a four-digit passcode or a six-digit passcode or a longer pass phrase and that is used to unlock, essentially, the encryption key, which is actually bigger inside of the phone. So how secure is this whole process? Yeah? AUDIENCE: So does each phone have its own encrypted key? DAVID J. MALAN: Yes. And every time you reset your phone it generates a new one. Exactly. So how secure is your own phone? So most of you probably-- well, some of you probably don't have any passcodes whatsoever, which means this whole conversation is moot. But if you have a four-digit passcode-- something, something, something, something-- how secure is your phone? And on iOS and I think Android by default those are typically numeric. So if you have a four decimal digit passcode how secure is your phone? How do you answer a question like that? AUDIENCE: Not. DAVID J. MALAN: Not. OK, let's probe a little deeper here. How not? AUDIENCE: [INAUDIBLE] 00:28:49,449 --> 00:28:50,240 DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] 00:28:53,610 --> 00:28:54,610 DAVID J. MALAN: Exactly. And that's the way-- AUDIENCE: [INAUDIBLE] 00:28:57,120 --> 00:28:58,120 DAVID J. MALAN: Perfect. And that's a good way of quantifying the security of a system is, well, how many possible keys are there? Because if it takes you some number of seconds or milliseconds or nanoseconds or whatever to try inputting one of those numbers, well then the security of your system is defined by the total amount of time it would take to input one of those times the total number of them. And then on average you'll get lucky and it'll be, like, you'll be halfway through the list when you guess someone's code and so it's going to be roughly equivalent to how many codes there were in the first place. So if 1, 2, 3, 4 is the length of your code and you have 10 possibilities-- 10, 10-- because you have 0's through 9, that of course is going to give you how many possibilities? 10,000 of them, from 0 on up to 9,999. All right, so not very secure, but how insecure? Well, let's just do some quick math. And this would be what a computer scientist or engineer might just generally call a back-of-the-envelope calculation. Let's suppose for simplicity that just typing in a passcode that's four digits takes one second. So that means we have 10,000 seconds if I want to try all possible codes. And there are 60 seconds in a minute and 60 minutes in an hour, which means in 2.7 hours I can figure out your passcode. It's a little tedious and it's a boring three hours, but your phone is not at all secure unless what feature could we enable, just to be clear, as Sean proposed? David? Oh, I was thinking even simpler. The feature with which we started today's chat, the one the FBI was worried was enabled. AUDIENCE: The resets. DAVID J. MALAN: The resets, right. So the self-destruct after 10 attempts. Now, this doesn't make your phone fundamentally secure because the FBI or your nosy roommate could just get lucky and out of the 10,000 possibilities your password was 0, 0, 0, 0. Or if he or she is just typing in random numbers, they get lucky one out of 10,000 times, or really 10 out of 10,000. So one out of 1,000 chances will they just guess it right and crack into your phone. So that helps us, but it really just narrows the scope of the threat by having 10 attempts. All right, so we could push back on that. Why don't we give the user just one attempt? That makes it even less likely, 10 times less likely, that you're going to be compromised by someone trying to guess your password. But-- AUDIENCE: Not user friendly. DAVID J. MALAN: What's that? AUDIENCE: Not user friendly. DAVID J. MALAN: Not user friendly. And, Alicia? AUDIENCE: You could make a mistake. DAVID J. MALAN: If you make the mistake, which most of us surely do, now you're screwed because you just deleted your data with even higher probability because you occasionally make those mistakes. So, again, this theme of tradeoffs here. All right, now realistically it's not going to take a second if you can automate this process, but about a second feels right if it's going to be a physical process like this. But suppose that you have instead a six-digit passcode. How many possibilities are there then? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. So now you're up to a million because you add two more zeros to this. So if we have a million and now that's one per second, for instance, and there's 60 seconds in a minute, 60 minutes in an hour, and 24 hours in a day, now it's going to take you 11.57 days of nonstop inputting in order to crack this password. So doable still, especially if you automate it or share the load with a friend, but also doesn't feel particularly secure. Feels like within 12 days your data could be compromised. So what's better? Yeah, Danny? AUDIENCE: I learned this the hard way, but with the iPad if you enter it in incorrectly a certain number of times it makes you wait five minutes. DAVID J. MALAN: Nice. AUDIENCE: So you can reenter it. And then if you screw it up again it'll say you have to wait a half an hour or an hour. And it keeps adding how long-- DAVID J. MALAN: How many times did you screw up your password? 00:32:34,496 --> 00:32:35,620 AUDIENCE: I figured it out. DAVID J. MALAN: OK. So that's clever and that is true of a lot of systems, but why? That seems incredibly annoying of Apple to do that. AUDIENCE: Yeah. DAVID J. MALAN: But why? Why is that actually a brilliant idea, I would argue? Security-- 00:32:53,290 --> 00:32:56,670 AUDIENCE: It increases the amount of time it requires [INAUDIBLE] 00:33:01,901 --> 00:33:03,150 DAVID J. MALAN: Yeah, exactly. If this is like the best system you have for protecting a device, and a passcode isn't fundamentally horrible. It's just horrible if you can input them really fast or there's relatively few of them. So what if you do insert artificial delays such that the software after each attempt or each several attempts says, wait a minute. You seem like an adversary, not just someone who forgot his password. Let me let you keep doing this but only after a minute break or a five minute break. And so you generally just increase the cost to an adversary by delaying or slowing down his or her attempts to crack into your phone. Of course the price you pay, as you discovered, is that this too is a trade off in terms of user experience, in terms of the user having a scenario like you just forgot your passcode presumably. But this is a wonderful defense mechanism, this back off. What else could we do to increase the security of these phones? Yeah? Dan again. AUDIENCE: So you could allow both letters and numbers. DAVID J. MALAN: Yeah. So-- AUDIENCE: And that increases the number of possibilities for each. DAVID J. MALAN: All right, to how many do we have here now? AUDIENCE: Well, at 10 you'd have 37. DAVID J. MALAN: OK, so 30-- AUDIENCE: If you use just letters. DAVID J. MALAN: OK. So if we-- AUDIENCE: 36? DAVID J. MALAN: Yeah. So if we have letters and numbers, and we can actually ratchet this up even further-- AUDIENCE: Include, like, exclamation points. DAVID J. MALAN: OK, can definitely do that, but even simpler. You're forgetting sort of half-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. If you take into account case sensitivity now we go from 26 plus 26 plus 10 so that's 62 possibilities for each. So that's more. So 62, 62, 62, 62, 62 possibilities for each of these characters. So now this is 62 to the sixth power-- 62 times 62, duh duh duh. And that gives us this many possible codes, which is definitely a bigger number. So let's try this. So 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day. And now let's go further. There are 365 days in a year. So 1,800 years now we're up to, assuming it takes just one attempt per second. So now we're probably safe because we're going to be long dead by the time someone gets into our phone. So but we've raised the bar. And what's nice is that we haven't really changed the algorithm per se or the process for getting into the phone, we've just increased the search space, so to speak, by increasing the total number of possibilities here. All right, so then let's come back to I think Griff proposed. What was it then, to recap, the FBI wanted Apple to do? AUDIENCE: So the FBI wanted Apple to write them a software that's going to bypass [INAUDIBLE] DAVID J. MALAN: OK, good. So they wanted to bypass that self-destruct feature and they could only do this by changing the underlying operating system. Unfortunately-- OK, so they were hoping to do that. What else were they hoping for? 00:35:59,970 --> 00:36:01,940 AUDIENCE: Unlimited assistance. DAVID J. MALAN: What's that? AUDIENCE: Unlimited assistance. DAVID J. MALAN: Unlimited assistance. So at the end of the day so they wanted that access. They also wanted the ability to electronically input the code so that they don't have to have an FBI agent, like, manually typing in all of these possible codes well past the previous limit potentially of 10. And they also wanted the ability for-- and this is more technical-- for Apple to side load the software into the phone so that it's only in RAM recall from yesterday, not on the solid state drive inside the phone. Because they didn't want to forensically alter any of the zeros and ones that were on the phone so that presumably in court no claim could be made that, well, the FBI injected this software or this evidence into the phone by putting it into RAM thereby not touching the persistent data that was apparently forensically of interest to them. So I forget-- let me see if I have this here. So just to be clear, this is more tongue in cheek than anything. But this is a little video on YouTube such that technically the FBI could also do something like this as opposed to inputting codes electronically as via a Thunderbolt cable. Whoops. 00:37:07,960 --> 00:37:09,510 This apparently is a thing. 00:37:12,182 --> 00:37:14,390 And you can actually see in the top right-hand corner looks like an Android phone. So that's how you might brute force an Android phone if you have at least a short code and 12 days to spare with a device like that, but they wanted to avoid that kind of silliness, for instance. So what did they-- OK, so all of this, at the end of the day Apple ended up not obliging and supposedly the FBI was able to have someone else, a security firm supposedly, get into the phone without going through this whole process. So at the end it was sort of an anticlimactic ending and it was about to be an incredibly frightening legal case potentially as to the degree to which companies could be coerced to actually helping the FBI in cases like this. But this whole fiasco was also in part the result of a lack of feature in the iPhone, which is that in theory they could have installed the software into the first place. This was arguably a bit of an Achilles heel in the design of the iPhone version 9.2 or whatever they were up to at that point whereby it seems to be completely beside the point if your adversary-- in this case, the FBI in some sense-- is able to just install new software onto your phone thereby circumventing the very protections built into the phone. So this was a flaw, arguably, from a security perspective, and I believe it's already been fixed-- "fixed"-- such that now to upgrade the software on your phone, on an iPhone, you have to-- guess what-- input your passcode now. And that was the catch before. Supposedly the upgrade process for updating the iOS software on a phone up until recently did not require you to input your password necessarily to update the software. And I do believe as of the last time I updated my own phone and I noticed, oh, I don't remember being asked for my passcode before, and that seems to be a side effect of this. So it's a fascinating sort of cat and mouse game where in this case, you know, the FBI were trying to do a good thing, but from a security perspective they were in this scenario the adversary in some sense insofar as they were trying to get into a phone and Apple either hadn't anticipated or hadn't worried about or just hadn't bothered to implement this particular additional defence mechanism. So questions about the scenario or what happened or how it ultimately played out? Yeah. AUDIENCE: Was there any way to see where people typically touch on their phone to see what their passcode is? [INAUDIBLE] DAVID J. MALAN: Oh, you know, I'm sure there is. I feel like I've seen movies certainly where this is-- you sort of go-- AUDIENCE: Yeah. DAVID J. MALAN: And then you see their passcode. I do think there's something to that. Certainly on physical devices you see wear on, like, old school alarm panels where you-- AUDIENCE: Even on laptops, like, they're touching the same [INAUDIBLE] some wear [INAUDIBLE]. DAVID J. MALAN: Yeah. I'd have to Google around to see the sort of tales that I've read. But I'm sure this is possible. I mean, even you can glean which keys are used the most, which is information because it narrows your search space. And I do think I've even seen some article where-- it was something more clever than just filming someone's hands. I'd have to remember. But, I mean, yes I'm sure this is possible. And even on glass you're going to leave some kind of oily residue presumably if you're touching the same place repeatedly. So these two are fairly sophisticated attacks that most typical adversaries aren't going to wage, but absolutely. There's such things as that I would imagine possible. All right, so let's transition for a moment to a slightly deeper dive into encryption itself and how it might otherwise be implemented. Because this whole Caesar cipher thing, or the rotational cipher, and this whole idea of a secret key seems fundamentally flawed. For instance, Felicia, if I wanted to send you a secret message in this room by passing you a note that, sort of like grade school style, that only you and I can read and write and understand, how could we do it? How can I write on a piece of paper a note and somehow encrypt it and send it to Felicia so that if any of you intercept it-- like I did for David's packet yesterday-- it's not at risk for being figured out? Avi. AUDIENCE: [INAUDIBLE] 00:41:18,825 --> 00:41:19,700 DAVID J. MALAN: Yeah. Which is kind of a catch-22, right? Because in this context I could say, you know, walk over here, password is 13, right? And assuming only she could hear that, that would work. But this is also kind of stupid because if I have the ability to secretly communicate to her the password I might as well just hand her the note physically at that point. So there's this chicken and the egg problem whereby we can't communicate securely unless we can first communicate securely in order to exchange that one piece of secret information, whether it's the number 1, 13, 12345, or something even more sophisticated still. So how do we address this? Well, it turns out that this is an example in the middle here, RSA, of what's called a public key cryptosystem. The first two and the Caesar and Vigenere cipher we were just discussing are generally called secret key cryptosystems whereby there truly is one and only one secret and both parties, A and B, have to know it in advance in order for the whole system to work. But this of course is problematic, especially in the world of the web because, like, I don't know anyone personally at Amazon.com with whom I can exchange a secret so that I can buy something on their website and check out. I don't know anyone at Facebook necessarily that I can establish a secret with so that when I log into Facebook my password is encrypted between my laptop and their servers, right? Just the world wouldn't work if this were a prerequisite. So it turns out that in public key cryptosystems you have two keys, not a secret key per se, but a public and a private key, as it's called. And they're sort of semantically the same idea, but they're mathematically a little different. Turns out that with algorithms like RSA and Diffie Hellman and bunches of others you have two numbers that have some kind of mathematical relationship between them such that you can give out your public key. And Felicia should give me her public key. I could then use her public key to encrypt a message using some kind of system similar in spirit to what we've discussed then send her that message. Even if David or someone else in the room intercepts it, presumably he's not going to be able to understand it because what he doesn't have is Felicia's private key, presumably, with which there's this reversible relationship to the public key. So only Felicia's private key can undo the effects of her public key, and so long as she keeps her private key private and tells no one, including me, the whole system rather works. And so technically it's not a public key, private key she and I would usually use. Those tend to be fairly computationally expensive relative to the secret key encryption techniques we discussed earlier. So instead what we would do is use our public keys to establish a shared secret key. So I would use her public key and I would encrypt the message 13, for instance, if that's the secret key we want to use. She uses her private key to decrypt that message. She says, oh, David wants to talk to me using the secret key 13, and then we can use something simpler like the Caesar cipher or something else but hopefully with a bigger key or fancier algorithm than that. So this allows us to address the chicken and the egg problem but you want to know whom you're talking to. So I want to be confident that I'm talking to, like, Felcia.com, so to speak, and she wants to be confident she's talking to David.com or whatnot. And so to do this, this is where those whole SSL certificates come into play that we discussed yesterday. She can pay someone else, you know, $50 a year or $300 a year, give that certificate authority, so to speak-- her name and mailing address and all of this. And they will then give her a stamp of approval, a digital signature, so to speak, on her public key essentially that allows me to trust that if I trust this third party called Verisign or Namecheap or GoDaddy or any number of third party companies, I, by transitivity, should be able to trust Felicia as well because they vouched for folks. And actually we were into an interesting corner case with David's laptop yesterday where as best we could tell he wasn't able to use Cloud 9 which uses HDPS and CS50 IDE because presumably this government laptop didn't necessarily-- as best I can tell-- trust the certificate authority who signed the keys that are being used by Cloud 9 and also by Harvard's site I think and probably several others. And that's probably because the sysadmins, the system administrators, remove certain keys to limits the access. Because, indeed, there have been cases out there where there's sort of fly by night operations that are verified certificate authorities but they're really not doing any due diligence whatsoever and so illicit or, rather, untrustworthy websites have also been able to enable this kind of feature. Felicia? AUDIENCE: [INAUDIBLE] RSA SecurID [INAUDIBLE] DAVID J. MALAN: No, that's a little different. So RSA is also the name of the company and it's also the initials of the founders of the company. So it has a lot of different meanings and different contexts. RSA SecurID, which is increasingly common, is an example of something called two-factor authentication, which is actually a good transition to a more general topic of security of your own data, from which we can now transition into Dropbox as well. So most of us for most services use one-factor authentication. When you log into a website with a username and password, the username is uninteresting. It might as well be public by definition. But the password is meant to be secret and that is your one factor that only you, hopefully, know. Unfortunately, passwords are not all that great of a defense mechanism. Why? They're not all that secure. AUDIENCE: Get stolen? DAVID J. MALAN: Get stolen how? AUDIENCE: Phishing DAVID J. MALAN: Phishing attacks like we discussed yesterday. Copying Bank of America's site and duping me into typing it into some bad guy's website. Sure. How else? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Looking at someone's keyboard, I would argue, or whatever sort of sticky residue they've left on their-- the oils from their fingers. We could figure out with some probability what their password is. Not secure for that reason. AUDIENCE: [INAUDIBLE] hard to remember a long password. DAVID J. MALAN: Yeah. Most of us probably have pretty bad passwords, right, because they're hard to remember so we choose easier passwords. And even if-- this is often kind of a scary revelation. Even if you think you're being clever by let's say you're-- the first "i" word that came to mind was "igloo." Suppose you're being clever by using a 1 for the i and maybe you're being triply clever by using zeros for the o's, well, guess who else knows those heuristics? Like all of the bad guys, right? So even if you're being clever like this, you are increasing the cost to that bad guy to figuring out your password because now instead of checking all the letters of the alphabet-- all 26 or 52 of them, uppercase and lowercase-- now he or she has to try numbers as well, increasing by a factor of 10 the total number of possibilities here for each of these characters. But even still, with enough savvy and computational cycles, computing cycles, he or she could certainly figure this out too. So not clever. Better would be to use a password that doesn't look like this but is gol17$_'a-- that's not an a-- gAB. And I'm just truly making this up randomly somehow, but I will never remember what this password is. Or if I do and then I go on, like, holiday for a week or two and come home perhaps as in Danny's scenario with the iPad, like, I'm not going to remember within some amount of time what that very clever password was. So there's this tension then between keeping your system secure but keeping them accessible because the more secure you make it, the higher the probability it is that you're going to lose access to the very data or services that you're trying to protect. So in this case, passwords aren't all that great and they can also be stolen, which is sort of like salt in the wound. So this is if you have one factor, a password that only you know. And, Felicia, can you then hypothesize what the formal definition of two-factor authentication must be and why it's better? AUDIENCE: Does it verify on the other side who that person is? DAVID J. MALAN: Does it verify on the other side who that person is? In some sense. AUDIENCE: [INAUDIBLE] that information. DAVID J. MALAN: In some sense. Can you elaborate for those unfamiliar, what is this RSA SecurID? AUDIENCE: So I log into my [INAUDIBLE] DAVID J. MALAN: Yep. AUDIENCE: [INAUDIBLE] gives me a key. So when I log in it not only verifies my username, password [INAUDIBLE] but it's another security. DAVID J. MALAN: Exactly. So you're asked for both a password and then some number. And the earlier incarnation of this used to be RSA key fob-- some people might still carry these-- that looked like this. And it's a clever idea and it kind of predates everyone having phones in their pockets. This is a number that changes every second, every minute or so, and the idea is that Felicia in addition to typing her username and password also has to type in whatever the current number is on this key fob that's on her key chain in her pocket. So this is a second factor, but it's fundamentally different from her password in what sense? It's not just the second password, which would be still one factor. AUDIENCE: It changes. DAVID J. MALAN: It changes, which is compelling, but that, too, isn't what fundamentally distinguishes it, I would argue. AUDIENCE: It's physical. DAVID J. MALAN: It's physical. It's physical. Well, it's actually not random otherwise it would be problematic to have it on her key chain and then somehow synchronize with the server. So there is a pattern that it follows in some sense, but it's physical. So whereas the first factor is generally something you know and have sort of up here but that can be extracted or somehow stolen if you type it in or have it logged somehow, the second factor needs to be fundamentally different which needs to be, in this case, something you have. So now given that Felicia is protecting her account with these two factors, not only-- I could probably pretty easily figure out her password looking over her shoulder or tricking her into a phishing attack or the like, but I have to have physical access to her now to thread her security further because now I need that key fob. And unless I have physical access to that, I'm not going to know what that six-digit code is at any given time. And what happens is when these devices first shipped from the factory, essentially they're synchronized with some piece of software or some kind of initialization routine on a server so that Felicia's company knows that this is 159759 at this moment in time. But then a minute later both the server's understanding of that number and her own key fob device change, and sometimes they can drift out of sync because clocks will be ever so slightly askew. So every time you log in, which is probably once a day or once a week or what not, the clocks will resync as a result. But if you've ever carried this thing and you haven't used it for weeks on end or maybe even a year, it might very well not work after some time because it will drift or it will be forcibly expired for lack of use. AUDIENCE: [INAUDIBLE] the Internet? DAVID J. MALAN: These are-- no. No. These are just-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: The server syncs with this device. So if it starts to drift out of date slightly, Felicia's server knows, oh, wait a minute, I just saw that number. I was expecting to see it a few seconds ago. Let me just update my own clock relative to that device. So nowadays most people don't bother carrying these because we all carry these. And so now you have software versions of this, and Google offers this, lots of banks offer this. And even if you don't have a special application you can certainly just receive a text message, which is super compelling. And so, personally speaking, after this week if you go back home and take nothing else away security-wise besides choosing better passwords, for instance, also start to look for companies, especially banks, that offer two-factor authentication. Most of us probably don't care all that much if our Peapod.com account is compromised or some silly website, for instance, that you occasionally log into, but bank accounts are more important, maybe your email is more important or any number of other such accounts. And unfortunately relatively few vendors offer these services, but Gmail does and certain banks do now, certain brokerage houses. And so that's when you want to really raise the bar to the adversary. Of course, there's a price paid, so to speak. I mean, sometimes they might charge you for this but generally not. What, Felicia, what other price do you pay, so to speak? AUDIENCE: Time. DAVID J. MALAN: Time, right? You have to, like, fish around for the damn thing. And there's corner cases like, you know, what can go wrong with this? Battery dies or you're in the basement somewhere on campus or at home and you just don't have reception so you can't get the text message, so now to log in you have to walk outside, walk upstairs, get the message, now go back downstairs, and you only have a few seconds to actually type it in. So there's some annoying corner cases but at least you don't have to figure out how to buy or carry around something like this. AUDIENCE: I thought, like, a few years ago this got hacked and it was a big scandal because it was supposed to be unhackable. DAVID J. MALAN: Yeah, they took a hit. I forget the particulars of this. But somehow I think the keys on their server, so to speak, that they were using to keep things synchronized were compromised, and via that information you could figure out what someone's code was. I'm guessing they had to reissue or change some aspect of it. But, yeah, that was scandal because this was the product. Like, that is not supposed to happen. But you've got to trust someone. And so in fact behind most of these topics today is you have to trust someone at the end of the day. You just want to ideally minimize the number of people that you're actually having that trust with. All right, so speaking of trust-- oh, yes. Sal? AUDIENCE: What do you think about cloud-based password managers? DAVID J. MALAN: Ah. Good question. What do I think about cloud-based password managers? They are probably better than not using any password manager because if the result is that, understandably, you have very easy passwords like 12345 or welcome or 1gloo with a 1, then you're probably gaining on net from using a cloud-based service. The problem with cloud-based password managers-- these are pieces of software for which you generally have one hard to guess password. So you have to sort of bite your tongue and just choose a really hard password for just one of your accounts. And then use that one super hard password to protect all of your other accounts by encrypting them essentially. The problem with cloud-based services is to get those passwords or access to them you're probably going to, like, LastPass.com or whatever the service is and you are typing in your super secret password, hitting Enter, and then copying and pasting whatever your other passwords are. There are some threats here though. What's one such threat? 00:55:40,487 --> 00:55:42,340 AUDIENCE: Somebody could hack your computer. DAVID J. MALAN: Someone could hack your computer. Like I personally would never do that on, like, a friend's computer or a lab computer or a computer I don't physically own and keep in my possession because it's just too possible for there to be what's called a keystroke logger on the computer where it's just logging all of the keystrokes. Now, if Last Pass or the equivalent has two-factor I'd be more comfortable because that way, sure, my friend or some adversary in a computer lab might steal my password, but he or she also needs to physically take this from me or some equivalent device. So that would be my biggest concern there. So I use a password manager, but it's not cloud based. It all runs on my phone, runs on my Mac and my desktop and my laptop computers. With that said, it also supports Dropbox, which is a beautiful segue, but it encrypts the file before putting it on Dropbox. AUDIENCE: What was the name? DAVID J. MALAN: One Password is an alternative that does the same thing as Last Pass. So One Password and Last Pass If you're unfamiliar. So a perfect segue to services like this. So you don't have to offer up which one you use, but how many people in this room use either Dropbox or One Drive or Google Drive or Box. Like, pretty much almost everyone uses one of these things. So let's consider how these services might work. And they do have different security policies if not practices, but let's see if we can't figure out just to what extent we are secure with something like Dropbox, for instance. So for those unfamiliar, Dropbox is a file synchronization service whereby on your Mac or your PC or your phone you install their software and then just automatically behind the scenes it constantly ensures that a folder called Dropbox on your computer has the exact same files on this Mac, on this phone, on this PC, on this desktop-- any computer that I install the software on and log into. And this is beautiful because if you have a work computer and a home computer, maybe a phone, you can create the illusion for yourself that all of your files are constantly accessible to you. Now, in reality it's doing a lot of copying and updating the files moving around. It's not the same file. It's not just one cloud service. It's maintaining local copies on all of your computers. So how does Dropbox achieve this? So if this is the cloud and this is Dropbox here and this is one of my laptops and here's another laptop, maybe work versus home, each of them have a Dropbox folder. Suppose I change a file on this laptop here. How does it get over here according to this kind of model? AUDIENCE: Sychronize [INAUDIBLE]. DAVID J. MALAN: Yeah. So the data, if I update it here, first goes through the cloud and gets stored on Dropbox's servers. They formerly used to use Amazon S3 but I believe they use their own infrastructure now. But that's sort of an uninteresting implementation detail for now. Then it syncs the file to here. And then if you change it here, it goes back up here and saves it down here. So where might you be vulnerable in this scenario In terms of your privacy or security? Sorry? AUDIENCE: In the Dropbox server? DAVID J. MALAN: Yeah, the Dropbox server, right? At the end of the day, we have to trust someone. And it would seem that if I'm uploading, like, my important documents, they've got to go to the cloud, so to speak, which is an actual, physical place somewhere. So this is Dropbox's servers and they've got to get synchronized back down to my laptop. So Dropbox would seem to have access. OK, but wait a minute. If you read the fine print and if you read the marketing speak on Dropbox's website, they encrypt your data. So what might that mean actually, because you shouldn't just take these marketing statements at face value? 00:59:09,470 --> 00:59:12,895 They encrypt your data, so what are they doing to it? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Good. So they're scrambling it to create the illusion of just random zeros and ones on their servers. That seems like a good thing. They're using some key presumably for that. OK, so what key? AUDIENCE: [INAUDIBLE] 00:59:26,370 --> 00:59:28,490 DAVID J. MALAN: OK, well good, right? Although that seems to be a thing. So you don't know, but is it yours? So it turns out it could be maybe your own password. When you sign up for Dropbox, you have your own username and password. So maybe they're using your password as the secret with which to encrypt the data. But this would kind of break the file-sharing features of Dropbox and other tools insofar as I can share my files or folder with Katy as well and with Christina and with any number of other people and this would be problematic if my files now are encrypted with my key but Katy's installed the same thing on her computer, needs access to my files, but to her they look like random zeros and ones. So can't be my key it would seem logically in this scenario. It can't be Katy's key because, conversely, I couldn't then see the files. So it must be some central key, some main key that Dropbox generated years ago, for instance, has locked away in a vault, and is used to encrypt all of their data. Yes? Turns out that's true. So what are the threats? At least, that's true the last time I inquired. AUDIENCE: Someone could get that key. DAVID J. MALAN: Someone could get that key, right? And so really all that's protecting you at that point is the physical security of that key, which is probably on some piece of paper or a USB stick or something like that hopefully in a vault with very few people in the company having access to what is effectively a really big seemingly random looking number that's used on their servers. But it could certainly be compromised somehow. But this is actually a feature insofar as this is possible for Dropbox. So it turns out Dropbox supports a technique which is in their own interests, like other companies, called deduplication. Does anyone know what it means to deduplicate your data? Or for Dropbox to deduplicate our collective data? 01:01:24,070 --> 01:01:24,680 So-- AUDIENCE: No. DAVID J. MALAN: No. So deduplicate-- suppose that all of us in this room had, let's say, all downloaded the same video file, which tend to be big. So it's like some movie that we've all rented or bought or whatnot and it's two gigabytes or four gigabytes. So it's a pretty big file. And there's 20 plus of us in this room and it would be pretty wasteful for Dropbox to store two or four gigabytes times 20 people because at the end of the day how many copies does Dropbox technically need in order to provide us all with access to that same movie? I mean, in theory one copy. Now, for backup purposes they should have a few spares anyway on different servers, but that's besides the point for now. They certainly don't need to keep one copy of every file that we share among us. One, in theory, suffices. So to deduplicate your data means to do exactly that. Instead of remembering the file for every user and a copy thereof, what should suffice instead whereby Dropbox could use 1/20th the space? AUDIENCE: Directory. DAVID J. MALAN: The directory entry. So just remember that I have this file and Siobhan has this file and Dario has this file and everyone in the room has this file. And just like yesterday's discussion of file system, just remember where that file is and then any other backups as well. So this too is only possible for Dropbox if what is the case? If they're not encrypting our data individually. Because if all 20 of us downloaded this movie and dragged it into our Dropbox folder, suppose that it were encrypted with our own encryption keys. So far as Dropbox is concerned, it would look like they have 20 different patterns of bits that are thus not the same by definition so they can't throw away 19 of those copies. They have to keep all 20. So in order to save on cost, 1/20th the cost in our particular sample, like Dropbox logically has to be using the same key to encrypt all of our data in order to leverage that feature. So here there's this trade too between space and security and financial cost and security. If they want that feature, they can't go about logically throwing away-- they can't go about, rather, encrypting all of our data individually. 01:03:34,420 --> 01:03:35,710 Question? Yeah? Inessa? AUDIENCE: So usually the data in Dropbox is able-- you are able to manipulate the particular piece of data so long as an individual user manipulates it and it creates a separate copy that becomes a different source of code, right, that's separate from that. So if 20 of us downloaded a movie-- DAVID J. MALAN: Uh huh. AUDIENCE: Let's say if that one source [INAUDIBLE] I manipulate the movie, say, for the [INAUDIBLE] just mine? DAVID J. MALAN: That is correct. If you make your own edit of it, absolutely. Then you become decoupled from everyone else. So now there's two copies-- the original and the altered version of it. Absolutely. So that is the case, for sure. Yeah? AUDIENCE: Also I noticed if you delete something on Dropbox it goes to the Trash and then it stays there for a certain amount of time. DAVID J. MALAN: Yeah. So that's a feature from their perspective, right, because-- or feature for us, the end users, because if you accidentally delete something or if, like, Katy deletes some file that I did not want Katy to delete, it's a nice feature that I can retrieve it from the trash. And it's probably ill-defined just because they don't feel like committing to it. It's sort of like a nice feature. Nice to have but not a guaranteed feature. And that's very similar in spirit to what we discussed yesterday whereby eventually they'll just forget where the file is, but it's surely somewhere on their servers because it's not worth overriding necessarily. Yeah, Felicia? AUDIENCE: Dropbox, Box, iCloud [INAUDIBLE] DAVID J. MALAN: Unclear. I know less about the other services just because there's less information kind of floating around. But you can start to infer these kinds of details. And technically I don't know-- Dropbox might maybe have different keys for different subsets of its users. Now they have business features, for instance, where you can actually put users in an organization, and within those narrowly-defined organizations they might be using separate keys. So there are ways to re-scope the threat in question but at the-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: I'm sorry. AUDIENCE: The synchronization piece? DAVID J. MALAN: The synchronization piece is pretty much equivalent, yeah. They all offer roughly the same feature set. Exactly. Sean? AUDIENCE: [INAUDIBLE] if you share your whole folder and someone makes a change to those files in the folder, you get the same effect [INAUDIBLE]. They change it for you. But if you just give them a link would [INAUDIBLE] the file themselves, just kind of read and not write? DAVID J. MALAN: That is correct. And they will get a copy of whatever the file looks like when they initiate the download process if you haven't changed it since. AUDIENCE: [INAUDIBLE] 01:06:04,987 --> 01:06:06,320 DAVID J. MALAN: That's fine too. And there's actually another feature built into the client now where you can make the folder read-only where I presume if someone does change it, Dropbox essentially undoes the change by deleting it and re-downloading the original version or some such trick as well. All right, other questions as it relates to security? Yeah, David? AUDIENCE: It's kind of related to the Dropbox and the synchronization. A lot of the browsers also have a synchronization mechanism, like Firefox-- DAVID J. MALAN: Yep. AUDIENCE: --where you could just change your browser history and your bookmarks [INAUDIBLE] and does that store all the cookies or all your certificates as well? Does that get synchronized and unencrypted? DAVID J. MALAN: That's a good question. It definitely gets synchronized to some server. What they do is unclear. Certainly in this case in the abstract I'm not sure. And frankly almost always if you go to some website, they will use some fluffy marketing speak like, encrypted with industry-grade encryption. That could mean so many things. That could just mean they're using HTTPS so that when your data goes from your browser to their servers, yes, it's encrypted and it's pretty safe mathematically, but once it gets to their servers it's not very well-defined. In fact, this is one of the hard things, especially since most consumers aren't savvy enough yet in society to demand better clarity around these issues. The companies are under no obligation otherwise really other than ethically to disclose this information, and even then they might not want you to know what they really mean when they say, this is secure. So in cases like that, you could try reading the fine print, you could try googling around for a white paper as it would typically be called on the security of that particular system. And, like, Dropbox has such documents available, for instance. But even then, the secret sauce is often not disclosed, either because it's particularly secret or they just don't want you to know exactly what the attack scenarios are. Not good for business. But even here for the password manager I mentioned earlier, One Password, like I would-- so personally using Dropbox, like, I don't put personal stuff in it for this reason. I'll put encrypted stuff in it where I will encrypt it or I'll use a tool like One Password which will encrypt its data file, put a seemingly random sequence of zeros and ones in the Dropbox folder that gets synced so that even if something does get compromised, all they have is seemingly random zeros and ones. Unfortunately, the world is not user friendly enough yet to do all of this for us automatically. And so this is kind of-- we're kind of here. And I forget who asked this. Was there something more precise we can address? But risks of cloud tools, certainly Dropbox and password management and any other number of variants all kind of relate to the security of cloud services. Per our discussion yesterday, I would say that you have physical threats certainly. Like, Amazon is a physical place. They have data centers in various parts of the world and so there are certainly engineers at the end of the day that could get physical access to the hard drives, get off that data, but that's hopefully a fairly narrow threat and most likely some random person working at Amazon doesn't care about your specific data. But there's only so much you can do at the end of the day if you don't physically control the space. In fact, there's a common kind of saying where the only secure computer in a room is in a locked closet with no power and no cords. I mean, that is truly secure. Everything after that starts to poke holes in the system. AUDIENCE: When I was asking the question [INAUDIBLE] Google Docs [INAUDIBLE] because of the security [INAUDIBLE]. So I'm just curious how secure was it because clearly there's a cost element. You don't have to pay for Microsoft Office [INAUDIBLE]. DAVID J. MALAN: Oh, those kinds of tools too. AUDIENCE: [INAUDIBLE] Google Docs. You could use [INAUDIBLE] you could use all those tools for free as opposed to paying for [INAUDIBLE] licenses. DAVID J. MALAN: Yeah. I mean, I would probably-- the free services are typically freemium services whereby there's paid versions. And so you're probably getting essentially the exact same tools and in turn the exact same security for something like Google Docs as you would if you paid for Google Apps. The difference being for certain governmental requirements sometimes cloud-based companies will run separate servers or that adhere to certain security protocols that are too expensive for consumers to care about or to commit to, but the US government might require that, hey, if we're going to use Amazon, you have to put us in our own locked concrete room with our own servers on no one else's servers, no other country's servers and so forth. But that would be more a matter of policy or practice. Box typically was much better at that in terms of focusing on the enterprise. Dropbox started more as a consumer company. And there's different levels of offerings along those lines. But the short answer is that cloud-based service is not secure fundamentally, right? Unless you run it yourself, unless you're doing the encryption yourself, it's not fundamentally secure. And even then the great tragedy in computing is suppose I do download some program that encrypts information. PGP-- it's called Pretty Good Privacy. It's sort of a tongue in cheek kind of-- it's actually good privacy, but they call it Pretty Good Privacy. But if I use this kind of tool or any number of these algorithms I enumerated earlier, who's to say that the software you're using to encrypt your data-- which you might not even understand mathematically and so you're just trusting that someone else understands this stuff better than you and thus wrote the software for you-- how do you know that that person didn't slip in their own backdoor? This is what the government, for instance, is so clamoring for-- give us backdoors into this. Years ago it was like the V-chip in TVs and so forth giving them this backdoor. Well, how do you know there aren't backdoors in our phones right now? Like, how do I know my phone is not listening to this entire class and transmitting it somewhere? I'd have to intercept it with some physical device. But, you know, unless the green light turns on on my laptop, I'm going to assume the camera's not on. And yet if you watch like Mr. Robot or other shows, I mean, this is possible. If you have malware, malicious software installed on your computer, who's to say it didn't turn on my camera and just didn't turn on the green light and is recording everything I do and say? So this is kind of-- and you can go to the lowest level. Like, we will talk a little bit about this later today, but when you write software you typically use a program called an interpreter or a compiler that turns the words that you write into zeros and ones, essentially, the computer understands. Who's to say that the compiler or the interpreter you're using isn't adding to your own software malicious zeros and ones that are injecting backdoors into every piece of software we write? So unless you wrote the sort of software that understands the zeros and ones at the lowest level, like, you can't trust anything we use. You really are just computing while crossing your fingers. 01:12:36,780 --> 01:12:37,334 Felicia? AUDIENCE: I have a question on security. So I have deleted some text messages I didn't mean to delete and I wanted to retrieve them. So technically I guess they were lost somewhere. DAVID J. MALAN: OK. AUDIENCE: So I paid for some software that could retrieve them. DAVID J. MALAN: OK. So this tells me, OK, they're not really deleted off my phone, right? DAVID J. MALAN: OK. AUDIENCE: So again that's another backdoor [INAUDIBLE]. DAVID J. MALAN: I wouldn't call that a backdoor. Backdoor generally means there is a way of logging into or accessing a system outside of the normal means. That would be just how deletion is implement. That's a bug or a feature. The phone was just forgetting where the files were a la yesterday's chat about directories. Yeah. I mean-- AUDIENCE: So it's just forgetting? DAVID J. MALAN: Yeah. And, I mean, these days too we have so much performance capability in these phones, it's lazy and it's bad design I would argue if phones and computers these days are not actually deleting your files when you request as much. But we consumers, especially given messages like the DOS window from earlier, have never been really educated fundamentally to expect as much and so people don't do it. And in fact you can't-- I mean, it's Apple's and Microsoft's own faults. Like Apple at least to their credit for some time-- it's no longer in the most recent version-- you used to have a Secure Empty Trash option that's since been removed that would actually overwrite with zeroes and ones the data on your hard drive. Nowadays I think this ends up impacting the drive too much because SSDs don't last as long as mechanical devices necessarily in terms of how many times you can write them. But it's Microsoft's and Windows' own fault that they've not been doing this for us. So hopefully this tendency will change over time. And in the case of Apple, if you instead turn on something called File Vault, which is the equivalent of what iOS automatically does, you can make sure your hard drive is constantly encrypted so that if it is stolen or someone physically tries to access it, they only see random zeros and ones unless they know your password. Of course, if your password is super easy to guess, it's still insecure. So bad design I'm guessing. Like, there's no way Apple would let you run software like that, so I'm guessing it's an Android phone? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: They were able to? What version of-- AUDIENCE: One or two versions of [INAUDIBLE]. DAVID J. MALAN: Really? Oh, OK. So it might be something that's since been-- AUDIENCE: [INAUDIBLE] text messages purely. It was kind of gobbledygook, but it did retrieve some of the text messages. DAVID J. MALAN: Really? Interesting. Poorly implemented software that you were-- good for you but bad that you were able to get them back nonetheless. AUDIENCE: They told me, yes, that [INAUDIBLE]. DAVID J. MALAN: Gotcha. And did they have to connect a cable to do this? AUDIENCE: No. You just download the software online [INAUDIBLE] and link that to an iPhone DAVID J. MALAN: Oh, really. AUDIENCE: Up to the cloud, I mean. DAVID J. MALAN: Interesting. OK. Can't-- AUDIENCE: [INAUDIBLE] backup copy of her [INAUDIBLE]. DAVID J. MALAN: Oh, that's what it is. So it was-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK, good, good. AUDIENCE: I do it before for my old business [INAUDIBLE], my old phone-- DAVID J. MALAN: OK. AUDIENCE: --or whatever and I pulled the backup [INAUDIBLE]. DAVID J. MALAN: That makes sense. Yeah, you can't just toss words like cloud around now. That has no meaning anymore after today and yesterday. OK, other questions? All right, well let's go ahead and pause here. Let's take a 15 or so minute break. And when we come back, we will focus on what it means to program.