[MUSIC PLAYING] DAVID MALAN: All right, one last time. This is CS50, and we realize this has been a bit of a fire hose over the past-- thank you. [APPLAUSE] Thank you. We realize this has been a bit of a fire hose. Indeed, recall that we began the class in week 0, months ago with this here MIT hack, wherein a fire hose was connected to a fire hydrant, in turn connected to a water fountain. And it really spoke to just how much information we predicted would be sort of flowing at you over the past few months. If you are feeling all these weeks later that it never actually got easy, and with pset 1 to pset 2, pset 3 on to pset 9, you never quite felt like you got your footing, realize that it's kind of by design because every time you did get your-- every time you did get your footing, our goal was to ratchet things up a little bit more so that you feel like you're still getting something out of that final week. And indeed, that final week is now behind us. All that remains ahead of us is the final project. And what we thought we'd do today is recap a little bit of where we began and where you hopefully now are. Take a look at the world of cybersecurity, because it's a scary place out there, but hopefully you're all the more equipped now with a mental model and vocabulary to evaluate threats in the real world, and as educated people, make decisions, be it in industry, be it in government, be it in your own personal or professional lives. And we hope ultimately, too, that you've walked away with a very practical skill, including how to program in C, how to program in Python, how to program in SQL, how to program in JavaScript in the context, for instance, of even more HTML, CSS, and the like. But most importantly, we hope that you've really walked away with an understanding of how to program. Like, you're not going to have CS50 by your side or even the duck by your side forever. You're going to have really, that foundation that hopefully you'll walk out of here today having accumulated over the past few months. And even though the world's languages are going to change, new technologies are going to exist tomorrow, hopefully, you'll find that a lot of the foundations over the past several months really do stay with you and allow you to bootstrap to a new understanding, even if you never take another CS course again. Ultimately, we claim that this was all about solving problems. And hopefully, we've kind of cleaned up your thinking a little bit, given you more tools in your toolkit to think and evaluate and solve problems more methodically, not only in code, but just algorithmically as well. And keep this mind too. If you're still feeling like, oh, I never really quite got your footing-- my footing, think back to how hard Mario might have felt some three months ago. But what ultimately matters in this course is indeed, not so much where you end up relative to your classmates, but where you end up relative to yourself when you began. So here we are, and consider that there delta. And if you don't believe me, like, literally go back this weekend or sometime soon, try implementing Mario in C. And I do dare say it's going to come a little more readily to you. Even if you need to Google something, ask the duck something, ask ChatGPT something just to remember some stupid syntactic detail, the ideas hopefully are with you now for some time. So that there hack is actually fully documented here in MIT. Our friends down the road have a tradition of doing such things every year. One year, one of my favorites was they turned the dome of MIT into a recreation of R2-D2. So there's a rich history of going to great lengths to prank each other, or even us here Harvard folks akin to the Harvard Yale video we took a look at last time. And this duck has really become a defining characteristic of late of CS50, so much so that last year, the CS50 Hackathon, we invited the duck along. It posed, as it is here, for photographs with your classmates past. And then around like, 4:00 AM, it disappeared, and the duck went missing. And we were about to head off to IHOP, our friends from Yale. Your former classmates had just kind of packed up and started driving back to New haven. And I'm ashamed to say our first thought was that Yale took it. And we texted our TA friends on the shuttle buses, 4:30 AM asking, hey, did you take our duck because we kind of need it next week for the CS50 fair? And I'm ashamed to say that we thought so, but it was not in fact, them. It was this guy instead, down the road. Because a few hours later after I think, no sleep on much of our part, we got the equivalent of a ransom email. "Hi, David, it's your friend, bbd. I hope you're well and not too worried after I left so abruptly yesterday night after such a successful Hackathon and semester so far. I just needed to unwind a bit and take a trip to new places and fresh air. Don't worry though, I will return safe, sound, healthy, home once I am more relaxed. As of right now, I'm just spending some few days with our tech friends up Massachusetts Avenue. They gave me a hand on moving tonight. For some reason, I could never find my feet, and they've been amazing hosts. I will see you soon and I will miss you and Harvard specially our students. Sincerely yours, CS50 bbd." So almost a perfect hack. They didn't quite get the DDB detail quite right. But after this, they proceeded to make a scavenger hunt of sorts of clues here. This here is Hundredville. And so in Hundredville, they handed out flyers to students at MIT, inviting folks to write a Python program to solve a mystery. "The CS50 duck has been stolen. The town of Hundredville has been called on you to solve the mystery of the-- authorities believe that the thief stole the duck and then shortly thereafter took a walk out of town. Your goal is to identify who the thief is, what school the thief escaped to, and who the thief's accomplice is who helped them escape. This took place on December 2, 2022, and took place at the CS50 Hackathon." In the days to come, we proceeded to receive a series of ransom postcards as the duck traveled, not only to MIT to Professor John Guttag 6.100B class, which is a rough equivalent of CS50 down the road. Pictured there our CS50 duck with some tape on its torso. But then the duck took, apparently, a ride, either in actuality or with Photoshop, not only there, took a tour of the Charles River in front of Harvard, the Charles in front of Boston. It went all the way over to Yale. We then received this postcard from Princeton all the way over from Stanford. Duck took a flight according to this photo here, and then saw a bit of the world as well. So eventually, we received a follow-up email saying, "Hi, David. I intend to arrive for the fair between 8:37 AM and 9:47 AM. It would be easier for my MIT hacker friends to bring me to the right location if there's someone waiting there with a sign that says 'Duck'." I'm not sure if we actually stood there with a sign holding duck, but it turns out they came actually earlier in the morning to escape detection altogether. The duck found its home and everyone lived happily ever after. And here the duck is again today. But our props to our friends down the road at MIT for returning the duck safely and for going to such crazy lengths to put us in the annals of MIT's Hacks Gallery. In fact, in exchange for this, we sent them a little package. And without telling you what it is, you can read more about this here hack that's now been immortalized on hacks.mit.edu at this URL here. So maybe round of applause for our friends down the road for having pulled that off a year ago. [APPLAUSE] So before we dive into some of today's material, I wanted to give you a sense of what lies ahead as well. So this year's CS50 Hackathon is an annual tradition, whereby students here at Harvard and our friends from Yale who will take buses in the other direction to join us in about a week's time for an epic all-nighter, starting roughly at 7:00 PM ending roughly at 7:00 AM will be punctuated by multiple meals, first meal-- first dinner around 9:00 PM, second dinner around 1:00 AM. And those of you who still have the energy and are still awake around 5:00 AM, we'll hop in a shuttle bus and head down to IHOP, the larger one down the road, not the one in the square, and have a little bit of breakfast together. The evening typically begins a little bit like this with a lot of energy, the focus of which is entirely on final projects. The staff will be present, but the intent is not to be 12 hours of office hours. Indeed, the staff will be working on their own projects or psets, final projects, and the like, but to guide you toward and point you in the direction of solutions to new problems you have. And we do think that the duck, and in turn, AI, CS50.ai and other tools you'll now be able to use, including the actual ChatGPT, the actual GitHub Copilot, or other AI tools which are now reasonable to use at this point in the semester as you off board from CS50 and enter the real world. Should be an opportunity for you to take your newfound knowledge of software out for a spin and build something of your very own, something that even maybe the TFs and myself have never dabbled in before, but with all of this now software support by your side. This here is our very own CS50 shuttles that will take us then to IHOP. And then a week after that is the epic CS50 fair, which will be an opportunity to showcase what it is you'll pull off over the next few weeks to students, faculty, and staff across campus. More details to come, but you'll bring over your laptop or phone to a large space on campus. We'll invite all of your friends, even family if they're around. And the goal will be simply to have chats like this and present your final project to passersby. There'll be a bit of an incentive model, whereby anyone who chats you up about their project, you can give a little sticker to. And that will enter them into a raffle for fabulous prizes to grease the wheels of conversations as well. And you'll see faculty from across campus join us as well. But ultimately, you walk out of that event with this here CS50 shirt, one like it, so you too, can proudly proclaim that you indeed took CS50. So all that and more to come, resting on finally, those final projects. But how to get there. So here are some general advice that's not necessarily going to be applicable to all final projects. But as we exit CS50 and enter the real world, here are some tips on what you might read, what you might download, sort of starting points so that in answer to the FAQ, what now? So for instance, if you would like to begin to experience on your own Mac or PC more of the programming environment that we provided to you, sort of turnkey style in the cloud using cs50.dev, you can actually install command line tools on your own laptop, desktop, or the like. For instance, Apple has their own. Windows has their own. So you can open a terminal window on your own computer and execute much of the same commands that you've been doing in Linux this whole term. Learning Git, so Git is version control software. And it's very, very popular in industry. And it's a mechanism for saving multiple versions of your files. Now, this is something you might be familiar with if still, even using file names in the real world, like on your Mac or PC-- maybe this is resume version 1, resume version 2, resume Monday night version, resume Tuesday, or whatever the case may be. If you're using Google documents, this happens automatically nowadays. But with code, it can happen automatically, but also more methodically using this here tool. And Git is a very popular tool for collaborating with others as well. And you've actually been secretly using it underneath the hood for a lot of CS50's tools. But we've abstracted away some of the details. But Brian, via this video and any number of other references, can peel back that abstraction and show you how to use it more manually. You don't need to use cs50.dev anymore but you are welcome to. You can instead install VS Code onto your own Mac or PC. If you go to this first URL here, it's a free download. It's actually open source. So you can even poke around and see how it, itself is built. And at CS50's own documentation, we have some tips for making it look like CS50's environment even if longer term, you want to cut the cord entirely. What can you now do? Well, many of you for your final projects will typically tackle websites, sort of building on the ideas of problem set 9, CS50 finance and the like, or just generally something dynamic. But if you instead want to host a portfolio, like just your resume, just projects you've worked on and the like, a static websites can be hosted for free via various services. A popular one is this URL here, called GitHub pages. There's another service that offers a free tier called Netlify that can allow you to host your own projects statically for free. But when it comes to more dynamic hosting, you have many more options. And these are just some of the most popular. The first three are some of the biggest cloud providers nowadays, whether it's Amazon or Microsoft Azure or Google services. If you go to this fourth URL here, this is GitHub's education pack, they essentially broker with lots of different companies to give students, specifically, discounts on or free access to a lot of tools. So you might want to sign up for that while you're eligible. And then lastly, here are two other popular third-party, but not free services, but that are very commonly used when you want to host actual web applications. So maybe it's Flask, maybe it's something else, but something that involves some input and output. Questions meanwhile-- so there's just lots of communities. If you want to keep an eye on what's happening in tech, these are just some of the popular options. And undoubtedly, if you have some techie friends, they'll have suggestions as well. But you might find some of these destinations of interest. Of course increasingly, will you just ask questions of software itself, AI, whether it's ChatGPT, GitHub Copilot, or the like. And then classes, we're clearly a little biased here with what's on the screen. So these aren't college classes per se, but freely available OpenCourseWare courses that CS50's team has put together over time. And in a nutshell as you can infer from the suffix of each of these URLs, if you want to learn more about Python, CS50 has got a free, open online class for that, or SQL, thanks to Carter, web and AI stuff, thanks to Brian, a games class, thanks to Colton, cybersecurity, which will extend where we leave off today. And then if you're more interested, not so much in coding and going more deeply into software, but want to take a step higher level and focus more on intersections of computer science with business or law or technology, those two are freely available, if you're looking for something to do over January the summer or just to dabble over time. And there's innumerable other free resources from other folks on the internet as well certainly too. All right, so a few invitations and thank yous. So one, after today, after we dive into and out of cybersecurity, please do stay in touch via any of CS50's online communities. As we start to recruit next year's team for teaching fellows, teaching assistants, course assistants, we'll be in touch via email for those opportunities as well. And now some thanks for the group before we then dive into here today's topic. So one, allow me to thank our hosts here for giving us access to such a wonderful, privileged space to just hold classes in, the whole team for Memorial Hall. Our thanks too, to ESS, which is the team that makes everything sound so good in spaces like this with music, mics, and the like, our friends, of course, Wesley down the road at Changsho, where we went most every other Friday this semester. If you've never actually been, or if you're hearing this online, please join our friends at Changsho show on Mass Ave down the road any time you might like. And then especially, CS50's team-- there's quite a few humans operating cameras in the room, both here and way in back, as well as online. My thanks. [APPLAUSE] Thank you to them for making this look and sound so good. And what you don't see is when I do actually screw up, even if we don't fix it in real time, they very kindly help us go back in time, fix things, so that your successors have hopefully, an even improved version as well. And then as well, CS50's own Sophie Anderson, who is the daughter of one of CS50's teaching fellows who lives all the way over in New Zealand, who has wonderfully brought the CS50 duck to life in this animated form. thanks to Sophie, this duck is now everywhere, including most recently, on some T-shirts too. But of course, we have this massive support structure in the form of the team. This is some of our past team members, but who wonderfully via Zoom you'll recall in week seven, showed us how TCP/IP works by passing those envelopes up, down, left, and right. I commented at the time, disclaim, that it actually took us quite a bit of effort to do that. And so I thought I would share as a representative thanks of our whole teaching team, whether it's Carter and Julia and Ozan and Cody and all of C50's team members in Cambridge in New Hey, thought I'd give you a look behind the scenes at how things go indeed, behind the scenes that you don't necessarily see. So let me switch over here and hit play. [VIDEO PLAYBACK] [INAUDIBLE] [INAUDIBLE] Buffering. OK. Josh? Nice. Helen? Oh. [CHUCKLING] [INAUDIBLE] Moni-- no, oh, wait. That was amazing, Josh. Sophie. Amazing. That was perfect. Moni. [LAUGHTER] I think I-- [INTERPOSING VOICES] - Over to you, [INAUDIBLE]. Guy. That was amazing. Thank you all. - So good. [END PLAYBACK] DAVID MALAN: All right, these outtakes aside, my thanks to the whole teaching team for making this whole class possible. [APPLAUSE] So cybersecurity, this refers to the process of keeping secure our systems, our data, our accounts, and. More and it's something that's going to be increasingly important, as it already is, just because of the sheer omnipresence of technology on our desks, on our laps, in our pockets, and beyond. So exactly what is it? And how can we, as students of computer science over the past many weeks, think about things a little more methodically, a little more carefully, and maybe even put some numbers to the intuition that I think a lot of you probably have when it comes to deciding, is something secure or is it not? So first of all, what does it mean for something to be secure? How might you as citizens of the world now answer that question? What does it mean to be secure? AUDIENCE: Resistant to attack. DAVID MALAN: OK, so resistant to attack, I like that formulation. Other thoughts on what it means to be secure? What does it mean? Yeah. AUDIENCE: You control who has access to it. DAVID MALAN: Yeah, so you control who has access to something. And there's these techniques known as authentication, like logging in, authorization, deciding whether or not that person, once authenticated, should have access to things. And, of course, you and I are very commonly in the habit of using fairly primitive mechanisms still. Although, we'll touch today on some technologies that we'll see all the more of in the weeks and months and years to come. But you and I are pretty much in the habit of relying on passwords for most everything still today. And so we thought we'd begin with exactly this topic to consider just how secure or insecure is this mechanism and why and see if we can't evaluate it a little more methodically so that we can make more than intuitive arguments, but quantitative compelling arguments as well. So unfortunately we humans are not so good at choosing passwords. And every year, accounts are hacked into. Maybe yours, maybe your friends, maybe your family members have experienced this already. And this unfortunately happens to so many people online. But, fortunately, there are security researchers in the world that take a look at attacks once they have happened, particularly when data from attacks, databases, are posted online or on the so-called dark web or the like and downloaded by others for malicious purposes, they can also conversely provide us with some insights as to the behavior of us humans that might give us some insights as to when and why things are getting attacked successfully. So as of last year, here, for instance, according to one measure are the top 10 most popular, a.k.a. worst passwords-- at least according to the data that security researchers have been able to glean-- by attacks that have already happened. So the number one password as of last year, according to systems compromised, was 123456. The second most, admin. The third most, 12345678. And thereafter, 123456789, 1234, 12345, password, 123, Aa123456, and then 1234567890. So you can actually infer-- sort of goofy as some of these are-- you can actually infer certain policies from these, right? The fact that we're taking such little effort to choose our password seems to correlate really with probably, what's the minimum length of a password required for systems? And you can see that at worst, some systems require only three digit passwords. And maybe they might require six or eight or nine or even 10. But you can kind of infer corporate or policies from these passwords alone. If you keep going through the list, there's some funnier ones even down the list that are nonetheless enlightening. So, for instance, lower on the list is Iloveyou, no spaces. Sort of adorable, maybe it's meaningful to you. But if you can think of it, so can an adversary, so can some hacker, so much so that it's this popular on these lists. Qwertyuiop, it's not quite English, but its derivative of English keyboards. Anyone? Yeah, so this is, if you look at a US English keyboard, it's just the top row of keys if you just hit them all together left or right to choose your, therefore, password. And then this one, "password," which has an at sign for the A and a zero for the O, which I guess I'm guessing some of you do similar tricks. But this is the thing too, if you think like you're being clever, well, there's a lot of other adversaries, there's a lot of adversaries out there who are just as good at being clever. So even heuristics like this that in the past, to be fair, you might have been taught to do because it confuses adversaries' or hackers' attempts, unfortunately, if you know to do it, so does the adversary. And so your accounts aren't necessarily any more secure as a result. So what are some of our takeaways from this? Well, one, if you have these lists of passwords, all too possible are, for instance, dictionary attacks. Like we literally have published on the internet-- and there's a citation in the slides if you're curious-- of these most popular passwords in the world. So what's a smart adversary going to do when trying to get into your account? They're not necessarily going to try all possible passwords or try your birthday or things like that. They're just going to start with this top 10 list, this top 100 list. And odds are, statistically, in a room this big, they're probably going to get into at least one person's account. But let's consider maybe a little more academically what we can do about this. And let's start with something simple like the simplest, the most omnipresent device we might all have now is some kind of mobile device like a phone. Generally speaking, Apple and Google and others are requiring of us that we at least have a passcode or at least you're prompted to set it up even if you therefore opt out of it. But most of us probably have a passcode, be it numeric or alphabetic or something else. So what might we take away from that? Well, suppose that you do the bare minimum. And the default for years has generally been having at least four digits in your passcode. Well, what does that mean? Well, how secure is that? How quickly might it be hacked? And, in fact, Carter, would you mind joining me up here? Perhaps we can actually decide together how best to proceed here. If you want to flip over to your other screen there, we're going to ask everyone to go to-- I'll pull it up here-- this URL here if you haven't already. And this is going to pull up a polling website that's going to allow you in a moment to answer some multiple choice questions. This is the same URL as earlier if you already logged in. And in just a moment, we're going to ask you a question. And I think, can we show the question before we do this? Here's the first question from Carter here. How long might it take to crack-- that is, figure out-- a four-digit passcode on someone's phone, for instance? How long might it take to crack a four-digit passcode? Why don't we go ahead and flip over to see who is typing in what. And we'll see what the scores are already. All right, and it looks like most of you think a few seconds. Some of you think a few minutes, a few hours, a few days. So I'd say most of you are about to be very unpleasantly surprised. In fact, the winner here is indeed going to be a few seconds, but perhaps even faster than that. So, in fact, let me go ahead and do this. Thank you to Carter. Let me flip over and let me introduce you to, unfortunately, what's a very real world problem known as a brute force attack. As the word kind of conjures, if you think to-- back to yesteryear when there was some kind of battering ram trying to brute force their way into a castle door, it just meant trying to hammer the heck out of a system. A castle, in that case, to get into the destination. Digitally though, this might mean being a little more clever. We all know how to write code in a bunch of different languages now. You could maybe open up a text editor, write a Python program to try all possible four-digit codes from 0000 to 9999 in order to figure out exactly, how long does it actually take? So let's first consider this. Let me ask the next question. How many four-digit passcodes are there? Carter, if you wouldn't mind joining me and maybe just staying up with me here to run our second question at this same URL. How many four-digit passcodes are there in the world? On your phone or laptop, you should now see the second question. And the answers include 4, 40, 9,999, 10,000, or it's OK to be unsure. Let's go ahead and flip over to the results. And it looks like most of you think 10,000. And, indeed, that is the case. Because if I kind of led you with 0000 to 9999, that's 10,000 possibilities. So that is, in fact, a lot. But most of you thought it'd take maybe a few seconds to actually brute force your way into that. Let's consider how we might measure how long that actually takes. So thank you. So in the world of a four-digit passcode-- and they are, indeed, digits, decimal digits from 0 to 9-- another way to think about it is there's 10 possibilities for the first digit, 10 for the next, 10 to the 10. So that really gives us 10 times itself four times or 10,000 in total. But how long does that actually take? Well, let me go ahead and do this. I'm going to go ahead and open up on my Mac here, not even-- not even Codespaces or cs50.dev today. I'm going to open up VS Code itself. So before class, I went ahead and installed VS Code on my own Mac here. It looks almost the same as Codespaces, though the windows might look a little different and the menus as well. And I've gone ahead here and begun a file called crack.py. To crack something means to break into it, to figure out in this case what the passcode actually is. Well, how might I write some code to try all 10,000 possible passcodes? And, heck, even though this isn't quite going to be like hacking into my actual phone, I bet I could find a USB or a lightning cable, connect the two devices, and maybe send all of these passcodes to my device trying to brute force my way in. And that's indeed how a hacker might go about doing this if the manufacturer doesn't protect against that. So here's some code. Let me go ahead and do this. From string, import digits. This isn't strictly necessary. But in Python, there is a string library from which you can get all of the decimal digits just so I don't have to manually type out 0 through 9. But that's just a minor optimization. But there's another library called itertools, tools related to iteration, doing things in like a looping fashion, where I can import a cross product function, a function that's going to allow me to combine like all numbers with all numbers again and again and again for the length of the passcode. Now I can do a simple Python for loop like this. For each passcode in the cross product of those 10 digits repeated four times. In other words, this is just a programmatic Pythonic way to implement the idea of combining all 10 digits with itself four times in a loop in this fashion. And just so we can visualize this, let's just go ahead and print out the passcode. But if I did have a lightning cable or a USB cable, I wouldn't print it. I would maybe send it through the cable to the device to try to get through the passcode screen. So we can revisit now the question of how long might it take to get into this device. Well, let's just try this. Python of crack.py. And assume, again, it's connected via cable. So we'll see how long this program takes to run and break into this here phone. Done. So that's all it took for 10,000 iterations. And this is on a Mac that's not even the fastest one out there. You could imagine doing this even faster. So that's actually not necessarily all the best for our security. So what could we do instead of 10 digits? Well, most of you have probably upgraded a lot of your passwords to maybe being alphabetical instead. So what if I instead were to ask the question-- and Carter, if you want to rejoin me here in a second-- what if I instead were to consider maybe four-letter passcodes? So now we have A through Z four times. And maybe we'll throw into the mix uppercase and-- well, let's just keep it four letters. Let's just go ahead and do maybe uppercase and lowercase, so 52 possibilities. This is going to give us 52 times 52 times 52 times 52. And anyone want to ballpark the math here, how many possible four-letter passcodes are there, roughly? 7 million, yeah, so roughly 7 million, which is way bigger than 10,000. So, oh, I spoiled this, didn't I? Can you flip over? So how many four-letter passcodes are there? It seems that most of you, 93% of you, in fact, got the answer right. Those of you who are changing your answer-- there we go, no, definitely not that. So, anyhow, I screwed up. Order of operations matters in computing and, indeed, including lectures. So 7 million, so the segue I wanted to make is, OK, how long does that actually take to implement in code? Well, let me just tweak our code here a little bit. Let me go ahead and go back into the VS Code on my Mac in which I had the same code as before. So let me shrink my terminal window, go back to the code from which I began. And let's just actually make a simple change. Let me go ahead and simply change digits to something called ASCII letters. And this too is just a time saving technique. So I don't have to type out A through Z and uppercase and lowercase like 52 total times. And so I'm going to change digits to ASCII letters. And we'll get a quantitative sense of how long this takes. So Python of crack.py, here's how long it takes to go through 7 million possibilities. All right, clearly slower because we haven't seen the end of the list yet. And you can see we're going through all of the lowercase letters here. We're about to hit Z. But now we're going through the uppercase letters. So it looks like the answer this time is going to be a few seconds, indeed. But definitely less than a minute would seem, at least on this particular computer. So odds are if I'm the adversary and I've plugged this phone into someone's device-- maybe I'm not here in a lecture, but in Starbucks or an airport or anywhere where I have physical opportunity to grab that device and plug a cable in-- it's not going to take long to hack into that device either. So what might be better than just digits and letters from the real world? So add in some punctuation, which like almost every website requires that we do. Well, if we want to add punctuation into the mix, if I can get this segue correct so that we can now ask Carter one last time, how many four-character passcodes are possible where a character is an uppercase or lowercase letter or a decimal digit or a punctuation symbol? If you go to your device now, you'll see-- if we want to flip over to the screen-- these possibilities. There's a million, maybe, a billion, a trillion, a quadrillion, or a quintillion when it comes to a-- oh, wrong question. Wow, we're new here, OK. OK, we're going to escalate things here. How many eight-character passcodes are possible? We're going to make things more secure, even though I said four. We're now making it more secure to eight. All right, you want to flip over to the chart? All right, so it looks like most of you are now erring on the side of quintillion or quadrillion. 1% of you still said million, even though there's definitely more than there were a moment ago. But that's OK. So quadrillion-- quintillion is still winning. And I think if we go and reveal this, with the math, you should be doing is 94 to the 4th power. Because there's 26 plus 26 plus 10 plus some more digits, some punctuation digits in there as well. So it's actually, oh, this is the other example, isn't it? This is embarrassing. All right, we had a good run in the past nine weeks instead. All right, so if you were curious as to how many four-character passwords are possible, it's 78 million. But that's not the question at hand. The question at hand was, how many eight character passcodes are there? And in this case, the math you would be doing is 94 to the 8th power, which is a really big number. And, in fact, it's this number here, which is roughly 6 quadrillion possibilities. Now, I could go about actually doing this in code here. So let me actually, for a final flourish, let me open up VS Code one last time here. And in VS Code, I'm going to go ahead and shrink my terminal window, go back into the code, and I'm going to import not just ASCII letters, not just digits, but punctuation as well, which is going to give me like 32 punctuation symbols from a typical US English keyboard. And I'm going to go ahead and just concatenate them all together in one big list by using the plus operator in Python to plus in both digits and punctuation. And I'm going to change the 4 to an 8. So this now, it's what four actual lines of code is, all it takes for an adversary to whip up some code, find a cable as step two, and hack into a phone that even has eight-character passcodes. Let me enlarge in my terminal window here, run for a final time Python of crack.py. And this I'll actually leave running for some time. Because you can get already sort of a palpable feel of how much slower it is-- because these characters clearly haven't moved-- how long it's going to take. We might actually do-- need to do a bit more math. Because doing just four-digit passcodes was super fast. Doing four-letter passcodes was slower, but still under a minute. We'll see maybe in time how long this actually runs for. But this clearly seems to be better, at least for some definition of better. But it should hopefully not be that easy to hack into a system. What does your own device probably do to defend against that brute force attack? Yeah. AUDIENCE: Gives you a limited number of tries. DAVID MALAN: Yeah, so it gives you a limited number of tries. So odds are, at least once in your life, you've somehow locked yourself out of a device, typically after typing your passcode more than 10 times or 10 attempts or maybe it's your siblings or your roommate's phone that you realize this is a feature of iPhones and Android devices as well. But here's a screenshot of what an iPhone might do if you do try to input the wrong passcode maybe 10 or so times. Notice that it's really telling you to try again in one minute. So this isn't fundamentally changing what the adversary can do. The adversary can absolutely use those same four lines of code with a cable and try to hack into your device. But what has this just done? It's significantly increased the cost to the adversary, where the cost might be measured in sheer number amount of time-- like minutes, seconds, hours, days, or beyond. Maybe it's increased the cost in the sense of risk. Why? Because if this were like a movie incarnation of this and the adversary has just plugged into the phone and is kind of creepily looking around until you come back, it's going to take way too long for them to safely get away with that, assuming your passcode is not 123456, it's somewhere in the middle of that massive search space. So this just kind of fundamentally raises the bar to the adversary. And that's one of the biggest takeaways of cybersecurity in general. It's completely naive to think in terms of absolute security or to even say a sentence like "my website is secure" or even "my home is physically secure." Why? Well, for a couple of reasons, like, one, an adversary with enough time, energy, motivation, or resources can surely get into most any system and can surely get into most any home. But the other thing to consider, unfortunately, that if we're the good people in this story and the adversaries are the bad people, you and I rather have to be perfect. In the physical world, we have to lock every door, every window. Because if we mess up just one spot, the adversary can get in. And so where there's sort of this imbalance. The adversary just has to find the window that's ajar to get into your physical home. The adversary just needs to find one user who's got a really bad password to somehow get into that system. And so cybersecurity is hard. And so what we'll see today really are techniques that can let you create a gauntlet of defenses-- so not just one, but maybe two, maybe three. And even if the adversary gets in, another tenant of cybersecurity is at least, let's have mechanisms in place that detect the adversary, some kind of monitoring, automatic emails. You can increasingly see this already in the real world. If you log into your Instagram account from a different city or state suddenly because maybe you're traveling, you will-- if you've opted into settings like these-- often get a notification or an email saying, hey, you seems to have logged in from Palo Alto rather than Cambridge. Is this, in fact, you? So even though we might not be able to keep the adversary out, let's at least minimize the window of opportunity or damage by letting humans like us know that something's been compromised. Of course, there is a downside here. And this is another theme of cybersecurity. Every time you improve something, you've got to pay a price. There's going to be a tradeoff. And we've seen this with time and space and money and other such resources when it comes to designing systems already. What's the downside of this mechanism? Why is this perhaps a bad thing or what's the downside to you, the good person in the story? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, if you've just forgotten your passcode, it's going to be more difficult for you to log in. Or maybe you just really need to get into your phone now and you don't really want to wait a minute. And if you, worse, if you keep trying, sometimes it'll change to two minutes, five minutes, one hour. It'll increase exponentially. Why? Because Apple and Google figure that, they don't necessarily know what the right cutoff is. Maybe it's 10, maybe it's fewer, maybe it's more. But at some point, it is much more likely that this is a hacker trying to get in than it is for getting your passcode. But in the corporate world, it can be even worse. There's a feature that lets phones essentially self-destruct whereby rather than just waiting you wait a minute, it will wipe the device, more dramatically. The presumption being that, no, no, no, no, no, if this is a corporate phone, let's lock it down further so that it is an adversary, the data is gone after 10 failed attempts. But there's other mechanisms as well. In addition to logging into phones via passcodes, there's also websites like Gmail, for instance. And it's very common, therefore, to log in to websites like these. And odds are, statistically, a lot of you are in the habit of reusing passwords. Like, no, don't nod if you are. We have cameras everywhere. But maybe you're in the habit of reusing it. Why? Because it's hard to remember really big long cryptic passwords. So mathematically, there's surely an advantage there. Why? Because it just makes it so much harder, more time-consuming, more risky for an adversary to get in. But the other tradeoff is like, my God, I just can't even remember most of my passwords as a result unless I reuse the one good password I thought of and memorized already or maybe I write it down on a post-it note on my monitor, as all too often happens in corporate workplaces. Or maybe you're being clever and in your top right drawer, you've got a printout of all of your accounts. Well, if you do, like ha-ha, so do a lot of other people. Or maybe it's a little more secure than that, but there are sociological side effects of these technological policies that really until recent years were maybe underappreciated. The academics, the IT administrators were mandating policies that you and I as human users were not necessarily behaving properly in the face of. So nowadays, there are things called password managers. And a password manager is just a piece of software on Macs, on PCs, on phones that manage your passwords for you. What this means specifically is when you go to a website for the very first time, you, the human, don't need to choose your password anymore. You instead click a button or use some keyboard shortcut. And the software generates a really long cryptic password for you that's not even eight characters. It might be 16 or 32 characters, can be even bigger than that, but with lots of randomness. Definitely not going to be on that top 10 or that top 100 list. The software thereafter remembers that password for you and even your username, whether it's your email address or something else. And it saves it onto your Mac or your phone or your PC's disk or hard drive. The next time you visit that same website, what you can do is via menu or, better yet, a keyboard shortcut, log into the website without even remembering or even knowing your password. I mean, to this day, I'll tell you, I don't even know anymore 99% of my own passwords. Rather, I rely on software like this to do the heavy lifting for me. But there's an obvious downside here, which might be what if you're doing this? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Right, so what if they find out the one password that's protecting this software? Because unstated by me up until now is that this password manager itself has a primary password that protects all of those other eggs in the one basket, so to speak. And my one primary password for my own password manager, it is really long and hard to guess. And the odds that anyone's going to guess are just so low that I'm comfortable with that being the one really difficult thing that I've committed to my memory. But the problem is if someone does figure it out nonetheless somehow or, worse, I forget what it is. Now, I've not lost access to one account, but all of my accounts. Now, that might be too high of a price to pay. But, again, if you're in the habit of choosing easy passwords like being on that top 10 list, reusing passwords, it's probably a net positive to incur this single risk versus the many risks you're incurring across the board with all of these other sites. As for what you can use, increasingly our operating systems come with support for this, be it in the Apple world, Google, Microsoft world, or the like. There's third party software you can pay for and download. But even then, I would beware. And I would ask friends whose opinion you trust or do some googling for reviews and the like. All too often in the software world have password managers been determined to be buggy themselves. I mean, you've seen in weeks of CS50 how easy it is to introduce bugs. And even the best of programmers still introduce bugs to software. So you're also trusting that the companies making this password management software is really good at it. And that's not always the case. So beware there too. But we'll also focus today on some of the fundamentals that these companies can be using to better protect your data as well. But there's another mechanism, which odds are you're in the habit of using. Two-factor authentication, like most of us probably have to use this for some of your accounts-- your Harvard account, your Yale account, maybe your bank accounts, or the like. So what is two-factor authentication in a nutshell? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, you get a second factor that you have to provide to the website or application to prove that it's you like a text to your phone or maybe it's an actual application that gets push notifications or the like. Maybe in the corporate world, it's actually a tiny little device with a screen on it that's on your keychain or the like. Maybe it's actually a USB dongle that you have to plug into your work laptop. In short, it's some second factor. And by factor, I mean something technical. It's not just a second password, which would be one factor. It's a second fundamentally different factor. So generally speaking in the world of two-factor authentication or 2FA or MFA is the generalization as multi-factor authentication, you have not just a password, which is something you know, the second factor is usually something you have-- whether it's your phone or that application or the keychain. It might also be biometrics like your fingerprints, your retinas, or something else physically about you. But it's something that significantly decreases the probability that some adversary is going to get into that account. Why? Because right now, if you've only got a username and password, your adversaries are literally every human in the world with an internet connection, arguably. But as soon as you introduce 2FA, now it's only people on campus or, more narrowly, only the people in Starbucks at that moment who might physically have access to your person and your second factor, in this case. More technically, what those technologies do is they send you a one-time passcode, which is further secure because once it's used, there's hopefully some database that remembers that it has been used and cannot be used again. So an adversary can't like sniff the airwaves and replay that passcode the next time they, indeed, expire, which adds some additional defense. And you might type it into a phone or maybe a web app that looks a little something like this. So passwords thus far, some defenses, therefore, any questions on this here mechanism? No? All right, well, let's consider this. Odds are, with some frequency, you forget these passwords, especially if you're not using a password manager. And so you go to Gmail and you actually have to click a link like this, Forgot Password. And then it typically emails you to initiate a process of resetting that password. But if you can recall, has anyone ever clicked a link like that and then got an email with your password in the email? Maybe if you ever see this in the wild, that is to say in the real world, that is horrible, horrible design. Why? Because well-designed websites, not unlike CS50 Finance, which had a users table, should not be storing username-- rather, should not be storing passwords in the clear, as it actually is. It should somehow be obfuscated so that even if your database from CS50 Finance or Google's database is hacked and compromised and sold on the web, it should not be as simple as doing like select star from Account semicolon to see what your actual passwords are. And the mechanism that well-designed websites use is actually a primitive back from like week 5 when we talked about hashing and hash tables. This time, we're using it for slightly different purposes. So in the world of passwords, on the server side, there's often a database or maybe, more simply, a text file somewhere on the server that just associates usernames with passwords. So to keep things simple, if there's at least two users like Alice and Bob, Alice's password is maybe apple. Bob's password is maybe banana, just to keep the mnemonics kind of simple. If though that were the case on the server and that server is compromised, whoever the hacker now has access to every username and every password, which in and of itself might not be a huge deal because maybe the server administrators can just disable all of the accounts, make everyone change their password, and move on. But there's also this attack known as password stuffing, which is a weirdly technical term, which means when you compromise one database, you know what? Take advantage of the naivety of a lot of us users. Try the compromised Apple password, the banana password not on the compromised website, but other websites that you and I might have access to, the presumption being that some of us in this room are using the same passwords in multiple places. So it's bad if your password is compromised on one server because, by transitivity, so can all of your other accounts be compromised. So in the world of hashing, this was the picture we drew some time ago, we can apply this same logic whereby, mathematically, a hash function is like some function F and the input is X and the output or the range is F of X. That was sort of the fancy way of describing mathematically hashing as a process weeks ago. But here, at a simpler level, the input to this process is going to be your actual password. The output is going to be a hash value, which in week 5 was something simple generally like a number-- 1 or 2 or 3 based on the first letter. That's not going to be quite as naive an approach as we take in the password world. It's going to look a little more cryptic. So Apple weeks ago might have just been 1, banana might have been 3. But now let me propose that in the world of real world system design, what the database people should actually store is not apple, but rather this cryptic value. And you can think of this as sort of random, but it's not random. Because it is the result of an algorithm, some mathematical function that someone implemented and smart people evaluated and said, yes, this seems to be secure, secure in the sense that this hash function is meant to be one way. So this is not encryption, a la Caesar Cipher from weeks ago whereby you could just add 1 to encrypt and subtract 1 to decrypt. This is one way in the sense that given this value, it should be pretty much impossible mathematically to reverse the process and figure out that the user's password was originally apple. Meanwhile banana, back in week 5 for simplicity, for hashing into a table, we might have had a simple output of 2, since B is the second letter of the English alphabet. But now the hash value of banana, thanks to a fancier mathematical function, is actually going to be something more cryptic like this. And so what the server really does is store not apple and banana, but rather those two seemingly cryptic values. And then when the human, be it Alice or Bob, logs in to a web form with their actual username and password, like Alice, apple, Bob, banana, the website no longer even knows that Alice's password is apple and that Bob's is banana. But that's OK. Because so long as the server uses the same code as it was using when these folks registered for accounts, Alice can type in apple, hit Enter, send it via HTTP to the server. The server can run that same hash function on A-P-P-L-E. And if the value matches, it can conclude with high probability, yes, this is in fact, the original Alice or this, in fact, is the original Bob. So the server never saves the password, but it does use the same hash function to compare those same hash values again and again whenever these folks log in again and again. So, in reality, here's a simple one-way hash for both Alice's and Bob's passwords in the real world. It's even longer, this is to say, than what I used as shorter examples a moment ago. But there is a corner case here. Suppose that an adversary is smart and has some free time and isn't necessarily interested in getting into someone's account right now, but wants to do a bit of prework to decrease the future cost of getting into someone's account. There is a technical term known as a rainbow table, which is essentially like a dictionary in the Python sense or the SQL sense, whereby in advance an adversary could just try hashing all of the fruits of the world or, really, all of the English words of the world or, rather, all possible four-digit, four-character, eight-character passcodes in advance and just store them in two columns-- the password, like 0000 or apple or banana, and then just store in advance the hash values. So the adversary could effectively reverse engineer the hash by just looking at a hash, comparing it against its massive database of hashes, and figuring out what password originally correspond to that. Why then is this still relatively safe? Rainbow tables are concerning. But they don't defeat passwords altogether. Why might that be? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, so the adversary might not know exactly what hash function the company is using. Generally speaking, you would not want to necessarily keep that private. That would be considered security through obscurity. And all it takes is like one bad actor to tell the adversary what hash function is being used. And then that would put your security more at risk. So generally in the security world, openness when it comes to the algorithms in process is generally considered best practice. And the reality is, there's a few popular hash functions out there that any company should be using. And so it's not really keeping a secret anyway. But other thoughts? Why is this rainbow table not such a concern? AUDIENCE: It takes a lot longer for the [INAUDIBLE]. DAVID MALAN: It takes a lot longer for the adversary to access that information because this table could get long. And even more along those lines-- anyone want to push a little harder? This doesn't necessarily put all of our passwords at risk. It easily puts our four-digit passcodes at risk. Why? Because this table, this dictionary would have, what, 10,000 rows? And we've seen that you can search that kind of like that or even regenerate all of the possible values. But once you get to eight-character passcodes, I said it was 4 quadrillion possibilities. That's a crazy big dictionary in Python or crazy big list of some sort in Python. That's just way more RAM or memory than a typical adversary is going to have. Now, maybe if it's a particularly resourced adversary like a government, a state more generally, maybe they do have supercomputers that can fit that much information. But, fine, then use a 16-character passcode and make it an unpronounceable long search space that's way bigger than 4 quadrillion. So it's a threat, but only if you're on that horrible top 10 list or top 100 or short passcode list that we've discussed thus far. So here's though a related threat that's just worth knowing about. What's problematic here? If we introduce two more users, Carol and Charlie, and just for the semantics of it, whose password happened to be cherry. What if they both happened to have the same password and this database is compromised? Some hacker gets in. And just to be clear, we wouldn't be storing apple, banana, cherry, cherry. We'd still be storing, according to this story, these hashes. But why is this still concerning? AUDIENCE: [INAUDIBLE] DAVID MALAN: Exactly. If you figure out just one of them, now you've got the other. And this is, in some sense, just leaking information, right? I don't maybe at a glance what I could do with this information. But if Carol and Charlie have the same password, you know what? I bet they have the same password on other systems as well. You're leaking information that just does no good for anyone. So how can we avoid that? Well, we probably don't want to force Carol or Charlie to change their password, especially when they're registering. You definitely don't want to say, sorry, someone's already using that password, you can't use it as well. Because that too would leak information. But there's this technique in computing known as salting whereby we can do this instead. If cherry we in this scheme hashes to a value like this, you know what? Let's go ahead and sprinkle a little bit of salt into the process. And it's sort of a metaphorical salt whereby this hash function now takes two inputs, not just the password, but some other value known as a salt. And the salt can be generally something super short like two characters even, or something longer. And the idea is that this salt, much like a recipe, should of perturb the output a little bit, make it taste a little bit differently, if you will. And so concretely, if we take the word cherry and then when Carol registers, for instance, we randomly choose a salt of 50, 5-0, so two characters, the hash value now-- because there's two inputs-- might now be this value. But if for Charlie, we still have cherry, but we change the 50, we might see this instead. Notice that for this first example, Carol, 50, the salt is preserved in the hash value, just so you know what it was and you can sprinkle the same amount of salt, so to speak, next time. But that's the whole hash value for Carol in this case. But if Charlie also has a password of cherry, but we change the salt to, say, 49 arbitrarily, that whole hash value changed. And so now in my hash database, I'm going to see different salts there, different values, which is going to effectively cover up the fact that Carol and Charlie have the same password. Now, if we have so many users that we run out of salts, that still might leak some information. But that's kind of a we can kick down the road and probabilistically not going to happen if you require passwords of sufficiently long length, most likely. So any questions on salting, which to be clear, is just a mechanism for decreasing the probability that an adversary is going to glean information that you might not want them to have? So what does this mean concretely? When you get an email from a website saying "click this link to reset your password," it's not the website, if well designed, is being difficult or shy and not telling you your password, the web administrators just do not know, ideally, your password. So what are they doing? They're probably sending you a link, similar in spirit to a one-time password, there's some random unique string in there that's unique to you. They've stored that in their database. So as soon as you click on that link, they check their database and be like, oh, wait a minute, I know I set this link a minute ago to David. Let me just trust now-- because probabilistically there's no way someone guessed this URL within 60 seconds-- let's trust that whatever he wants to type in as his new password should be associated with that Malan account in the database. But if, conversely, you ever get an email saying your password is 123456 or whatever it is, it is clearly not being hashed, let alone salted, on the server. And that is not a website to do anything particularly sensitive with. All right, so what more can we do? Well, let's pick up where we left off in week two on the art of cryptography, this art, the science of scrambling information, but in a reversible way. So whereas hashing, as we've described it here, is really tends to be one-way, whereby you should not be able to reverse the process unless you cheat and make a massive table of all of the inputs and all of the outputs, which isn't really so much reversing as it is just looking it up. Cryptography, like in week 2, can actually be a solution to a lot of problems, not just sending messages across a crowded room. We, weeks ago, really focused on this type of cryptography whereby you've got some plain text message. You've got a key, like a secret number 1 or 13 or something else. The cipher, which might be a rotational cipher or a substitution cipher, some algorithm, and then ciphertext was the term of art for describing the scrambled version. That should look like random zeros and ones or letters of the alphabet or the like. This though was reversible, whereby you could just input the ciphertext with the key and get back out the plain text. Maybe you have to change a positive number to a negative number. But the key is really the same. Be it plus 1 minus 1 or plus 13 minus 13, the process was symmetric. And, indeed, what we talked about in week two was an example of something called secret key cryptography, where there's, indeed, one secret between two parties, a.k.a. symmetric cryptography. Because encryption is pretty much the same as decryption, but maybe you change the sign on the key itself. But this is not necessarily all we want. Because here's that general process. Here's the letter A. Here's the key of 1. We outputed in week 2 a value of B. That's not necessarily the solution to all of our problems. Why? Well, if two people want to communicate securely, they need some shared secret. So, for instance, if I wanted to send a secret message to Rongxin in the back of the room here, he and I have better agreed upon a secret in advance. Otherwise, how can I possibly send a message, encrypt it in a way that he can reverse? I mean, I could be like, (WHISPERING) let's use a key of 1. (SPEAKING NORMALLY) But obviously, anyone in the middle has just now heard that. So we might as well not communicate securely at all. So there's this kind of chicken-and-the-egg problem, not just contrived here in lecture. But the first time I want to buy something on amazon.com with my credit card, I would like my credit card to be encrypted, scrambled somehow. But I don't know anyone personally at amazon.com, let alone someone that I've prearranged some secret for my Mac and their servers. So it seems that we fundamentally can't use symmetric cryptography all of the time, unless we have some other mechanism for securely generating that key, which we don't have as the common case in the world today. Thankfully, mathematicians years ago came up with something known as asymmetric cryptography, which does not require that you use the same secret in both directions. This is otherwise known as public key cryptography. And it works essentially as follows. When you want to take some plaintext message and encrypt it, you use the recipient's public key. So if Rongxin is my colleague in back and he has a public key, it is public by definition. He can literally shout for the whole room to hear what his public key is, which effectively is just some big, seemingly random number. But there's some mathematical significance of it. And I can write that down. Heck, you can all write it down if you too want to send him secure messages. And out of those two inputs, we get one output, the ciphertext, that I can then hand off to people in the room in those virtual envelopes. And it doesn't matter if all of you have heard his public key. Because you can perhaps guess where this is going. How would Rongxin reverse this process? He's not going to use one public key. He's going to use, not surprisingly, a corresponding private key. And so in asymmetric cryptography or public key cryptography, you really have a key pair, a public key and a private key. And for our mathematical purposes today, let me just stipulate that there's some fancy math involved, such that when you choose that key or, really, those keys, there's a mathematical relationship between them. And knowing one does not really give you any information about the other. Why? Because these numbers are so darn big it would take adversaries more time than we all have on Earth to figure out via brute force what the corresponding private key is. The math is that good. And even as computers get faster, we just keep using bigger and bigger keys, more and more bits to make the math even harder for adversaries. So when Rongxin receives that message, he uses his private key, takes the ciphertext I sent him through the room, and gets back out the plaintext. So this is exactly how HTTPS works effectively to securely establish a channel between me and Amazon.com, gmail.com. Any website starting with https:// uses public key cryptography to come up with, initially, a secret. And in practice, it turns out, mathematically, it's faster to use secret key crypto. So very often, people will use asymmetric crypto to generate a big shared key and then use the faster algorithms thereafter. But it does solve asymmetric cryptography, that chicken-and-the-egg problem, by giving us all public keys and private keys. If you've heard of RSA, Diffie-Hellman, elliptic curve cryptography, there's different algorithms for this that you can actually study in higher level, more theoretical classes. But there's a bunch of different ways mathematically to solve this problem. But those are the primitives involved. And how many of you have heard of now passkeys, which is kind of only just catching on in recent months, literally. If I had to make any prediction this semester, odds are, you're going to see these in more and more places. And in fact, the next time you register for a website or log into a website, look for a link, a button that maybe doesn't say passkeys, per se. It's often called passwordless login. But it's really referring to the same thing. Passkeys are essentially a newish feature of operating systems, be it Mac OS or Windows or Linux or the OS running on your phone, that doesn't require that you choose a username and password anymore. Rather, when you visit a website for the very first time, your device will generate a public and private key pair. Your device will then send to the website for what you're registering your public key so that it has one of the values, but you keep your private key, indeed, private. And using the same mathematical process that I alluded to earlier, you can therefore log into that website in the future by proving mathematically that you are, in fact, the owner of the corresponding private key. So, in essence, if we use a picture like this, when you proceed to log in to that website again-- and, again, that website has stored your public key-- it essentially uses something known as digital signatures-- you're familiar with this term, you've heard it in the wild-- whereby the website will send you a challenge message, like some random number or string of text. It's just some random value. If you then effectively encrypt it with your private key or run both of those through a particular algorithm, you'll get back a signature. And that signature can be verified by the website by using your public key. So digital signatures are kind of an application of cryptography but in the reverse direction. In the world of encryption, you use someone's public key to send a message encrypted. And they use their private key to decrypt it. In the world of signatures, or really passkeys, you reverse the process, whereby you use your private key to effectively encrypt some random challenge you've been sent. And the website, the third party, can use your public key to verify, OK, mathematically, that response came from David. Because I have his public key on file. So what's the upside of this? We just get out of the business of passwords and password managers more generally. You do have to trust and protect your devices, be it your phone or your laptop or desktop all the more. And that's going to open another possible threat. But this is a way to chip away at what is becoming the reality that you and I probably have dozens, hundreds of usernames and passwords that's probably not sustainable long-term. And, indeed, we read to often about hacks in the wild as a result. Questions then on cryptography or passkeys? All right, just a few more building blocks to equip you for the real world before we sort of maybe do a final check for understanding of sorts. So when it comes to encryption, we can solve other problems as well. And in this too is a feature you should increasingly be seeking out. So end-to-end encryption refers to a stronger use of encryption than most websites are actually in the habit of using. Case in point, if you're using HTTPS to send an email to Gmail, that's good because no one between you and Gmail servers presumably can see the message because it's encrypted. It just looks like random zeros and ones. So it's effectively secure from people on the internet. The emails are not secure from like nosy employees at Google who do have access to those servers. Now, maybe through corporate policy, they shouldn't or physically don't. But, theoretically, there's someone at Google who could look at all of your email if they were so inclined. Hopefully it's just not a long list of people. But end-to-end encryption ensures that if you're sending a message from A to B, even if it's going through C in the middle-- be it Google or Microsoft or someone else-- end-to-end encryption means that you're encrypting it between A and B. And so even C in the middle has no idea what's going on. This is not true of services like Gmail or Outlook. This is true of services like iMessage or WhatsApp or Signal or Telegram or other services where if you poke around, also you'll see literally mention of end-to-end encryption. It's a feature that's becoming a little more commonplace, but something you should seek out when you don't necessarily trust or want to trust the machine in the middle, the point C between A and B. So, indeed, when sending messages on phones and even video conferencing nowadays too. And here's something where sometimes you kind of have to dig. Most of us are familiar with Zoom certainly by now. And if we go into Zoom settings, which I did this morning to take this screenshot, this is what it looks like as of now. Here's the menu of options for creating a new meeting. And toward the bottom here-- it's a little small-- you'll notice that you have two options for encryption. And funny, enough the one that's typically selected by default, unless you opt in to the other one, is enhanced encryption. Brilliant marketing, right? Who doesn't want enhanced encryption. It is weaker than this encryption though, which is end-to-end encryption. End-to-end encryption means that when you're having a video conference with one or more people, not even Zoom can see or hear what you're talking about. Enhanced encryption means no one between you and Zoom can hear or see what you're talking about. So end-to-end ensures that it's A to B, and if Zoom is C In the story, even Zoom can't see what you're doing. Now, there are some downsides. And there's some little fine print here. When you enable end-to-end encryption on a cloud-based service like Zoom, you can't use cloud recordings anymore. Why? Well, if Zoom by definition mathematically can't see or hear your meeting, how are they going to record it for you? It's just random zeros and ones. You can still record it locally on your Mac or PC, but end-to-end encryption ensures that you don't have to worry about prying eyes-- be it a company, be it a government, a state more generally. And so societally, you'll start to see this discussed probably even more than it already is when it comes to personal liberties and freedom among citizens of countries and states because of the implications for actual privacy that these primitives that we've been discussing and that you even explored in week 2, albeit weakly, with these ciphers we used in the real world. But encryption has one other use that's worth knowing about too and yet another feature to turn on. So when it comes to deleting files, odds are, most everyone in the room knows on a Mac or PC that when you drag a file to the trashcan or the recycle bin, it doesn't actually go away unless you right click or Control click or go to the appropriate menu and empty the trash. But did anyone know that even when you empty the trash or recycle bin, the file also doesn't really go away. Your operating system typically just forgets where it is. But the zeros and ones that compose the file or files you tried to delete are still there for the pickings, especially if someone gets physical or virtual access to your system. So, for instance, here is a whole bunch of ones and zeros. Maybe it's representing something on my hard drive. And suppose that I want to go ahead and delete a file that comprises these zeros and ones, these bits here. Well, when your operating system deletes the file, even if you click on Empty Trash or Empty Recycle Bin, it essentially just forgets about those bits, but doesn't actually change them. Only once you create a new file or download something else do some of those zeros and ones end up getting overwritten. And per the yellow remnants here, the implication of this contrived example is that even at this point in time you can still recover like half of the file, it would seem. So maybe the juicy part with a credit card number or a message that you really wanted to delete or the like, there's still remnants on the computer's hard drive here. So what's the alternative? Well, if you really want to be thorough, you could delete files and then download the biggest possible movies you can to really fill up your hard drive. Because, probabilistically, you would end up overwriting all of those zeros and ones eventually. But that's not really a tenable solution. It would just take too much time and it's fraught with possible simple mistakes. So what should we do instead, well, maybe we should securely delete information. And securely delete would mean when you actually empty the recycle bin or the trash can, what happens to the original zeros and ones is that you take them and you change all of them to zeros or all of them to ones or all of them to random zeros and ones. Why? So that you can still reuse those bits now, but there's no remnants even on the computer's hard drive that they were once there. But even now, this is not fully robust. Why? It turns out that because of today's electronics and solid state devices, there might still be remnants of files on them because these hard drives, these storage devices nowadays are smart enough that if they realize that parts of them are failing, they might prevent you from changing data in certain corners. So if you think of your memory as like a big rectangle, some of the bits might get blocked off to you just over time. So there might still be remnants there. So if you really are worried about a sibling, an employer, or a government like finding data on that system, there might actually still be remnants. Now, you can go extreme and just physically destroy the device, which should be pretty effective. But that's going to get pretty expensive over time when you want to delete data. Or, again, we can use encryption as the solution to this problem. So, again, encryption is increasingly in the real world an amazing tool for your toolkit because it can be deployed in different ways. So, in this case, full disk encryption is something you can enable in Windows or Mac OS. Nowadays, it's typically enabled by default on iOS and you can opt in as well on other platforms. In the world of full disk encryption, instead of storing any of your files as a plain text, like in their original raw format, you essentially randomize everything on the disk instead. You rely on the user's password or some unique string that they know when you log into your Mac or PC to essentially scramble the entire contents of the hard drive. And it's not quite as simple as that. Typically, there's a much larger key that's used that in turn is protected by your actual password. But, in this case, this means that if someone steals your laptop while you're not paying attention in Starbucks or the airport or even your dorm room, even if they open the lid and don't have your password, they're not going to be able to access any of the data because it's just going to look like zeros and ones. Even if they remove the hard drive from your device, plug it into another device, they're only going to see zeros and ones. Now, if you walk away from your laptop at Starbucks with the lid open and you're logged in, there is a window of opportunity. Because the data has got to be decrypted when you care about it and when you're using it. So here too is another example of best practice. You should minimally be closing the lid of your laptop, making sure it's logging you out or at least locking the screen, so that someone can't just walk off with your device and have access to your logged in account. But full disk encryption essentially decreases the probability that an adversary is going to be successful. In the world of Macs, it's called FileVault. It's in your System Preferences. Windows, it's called BitLocker. There's third party solutions too. Here too, we have to trust that Microsoft and Apple don't screw up and write buggy code. But generally speaking, turning on features like these things are good for you. Except what's maybe an obvious downside of doing this? What's that? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, if you forget your password. There's no mathematician in the world who is probably going to be able to recover your data for you. So there too, it's maybe a hefty tradeoff. But hopefully you have enough defenses in place, be it your-- a good password, a password manager, maybe even printing out your primary password on a sheet of paper, but locking it in a box or bringing it home so that no one near you actually has physical access, you can at least mitigate some of these risks. You'll read about, though, in the real world even this, which is like an adversarial use of full disk encryption. Sometimes when hackers get into systems, this has happened literally with hospital systems, municipal government systems, and the like. If they hack into them, they don't just delete the data or just create havoc, they will proactively encrypt the server's hard drive with some random key that only the hacker knows. They will then demand that the hospital or the town pay them, often in Bitcoin or some cryptocurrency to decrease the probability of being caught, and they'll only turn over that key to decrypt the data if someone actually pays up. So here too, there's sort of a dark side of these mathematical principles. So there too, it's always a trade off between good people and perhaps bad. Well, maybe before we wrap and before we serve some cake in the transept, Carter, can you join me one last time? But, first, before I turn things over to me and Carter, here's your problem set 10, a sort of unofficial homework. One, among your takeaways for today, you should start using a password manager or even these fancier passkeys, at least for your most sensitive accounts. So anything medical, financial, particularly personal, like this is a very concrete takeaway and action item. I wouldn't sit down and try to change all of your accounts over. Because knowing humans, You're not going to get through the whole to-do list. So maybe do it the next time you log into that account, turn on some of these features or add it to a password manager or at least start with the most important. Two, turning on two-factor authentication beyond where you have to at places like Harvard and Yale, but certainly bank accounts, privates, anything medical, personal, or the like. And then lastly, where you can, turning on end-to-end encryption. Being careful with it, you don't want to go and during lecture, hopefully no one clicked the turn on FileVault button while we're in class. Because closing your laptop lid while things are being encrypted is generally bad practice. See us after though if you did do that a moment ago. So here's just then three actionable takeaways. But we thought we'd conclude by taking a few final minutes for a CS50 quiz show of sorts, a final check for understanding using some questions we come up with ourselves, but also some of the review questions that you all kindly contributed as part of the most recent problem set. So some of these questions come from you yourselves. And let me go ahead and turn things over to Carter here to help run the show. We will invite you at this point to take out that same device as you had earlier. This is the same URL as before. But if you closed the tab, you can reopen it here. To make things a little fun-- because we still have some cookies left-- could we get three final CS50 volunteers? OK, one hand is already up. How about two hands there? And how about three hands? Over here. All right, yes, sure, a round of applause for our final volunteers. Come on up. [APPLAUSE] On the line are some delicious Oreo cookies. If the three of you would like to come over and take any of these seats in the middle, you will be our human players, but we'll invite everyone in the group to play too. Do you want to take a mic and introduce yourself to the world? AUDIENCE: Sure. Hi, I'm Dani. I'm a first year in WIG C. And I'm planning on studying economics. DAVID MALAN: Nice, welcome. AUDIENCE: Hi, I'm Rochelle. I'm from the best state, Ohio. DAVID MALAN: [INAUDIBLE] AUDIENCE: And I'm a freshman in Greeno. I'm planning on concentrating in CS. DAVID MALAN: Nice, welcome. And? AUDIENCE: My name is Jackson. I'm from Indiana. I live in Thayer. I'm a first year. And I'm studying linguistics and Germanic languages and literatures. DAVID MALAN: Welcome as well. So, if our volunteers could have a seat, you're going to want to be able to see this screen or that one. So you can move your chairs if you would like. Carter is going to kindly cue up the software, which hopefully everyone has on their phones as well. And I should have mentioned, do you have your phone with you? AUDIENCE: [INAUDIBLE] DAVID MALAN: Do you have your phone with you? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, do you have your phone over there? OK, what's your name again? AUDIENCE: Rochelle. DAVID MALAN: OK, Rochelle will be right back, if you want to go grab your phones. And in the meantime, we're going to go ahead and-- thank you so much-- we're going to go ahead and cue up the screens here for the CS50 quiz show. It's about 20 questions in total, the first few of which are going to focus on cybersecurity to see how well we can check our current understanding. The rest will be questions written by you in the days leading up to today. All right, Carter, let's go ahead and reveal the first question. And note that you can win up to 1,000 points this time per question. It's not just about being right or wrong. And you get more points the faster you buzz in as well. So we'll see who's on the top based on all of the guest user names. All right, here we go, Carter, question one, what is the best way to create a password? Substitute letters with numbers or punctuation signs, ensure it's at least eight characters long, have a password manager generated for you, or include both lowercase and uppercase letters? All right, let's see what the results are. Almost everyone said have a password manager generate it for you. 90% of you said that's the case. And, indeed, that one is correct. Nicely done. Let's go ahead and see the random usernames you've chosen. So this looks like it's web_hexidecimalidentifier to keep things anonymous. So if you are OAF9E, nicely done, but there's a whole lot of ties up at the top. All right, and I see-- well, just to keep things interesting, you had 792 points. You had-- AUDIENCE: 917. DAVID MALAN: 917 points, 917 points. So it's a close race here. Number two, what is a downside of two-factor authentication? You might lose access to the second factor. Your account becomes too secure. You can be notified someone else is trying to access your account. You can pick any authentication you like. Hopefully, you can reload. You might have missed that one. And the number one answer was might lose access to the second factor. Indeed, 93% of you got that. And we're up to 1,375 points, 792 points, and-- AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, and forced reload. So, yes, you tried reloading the page and hopefully it'll click back in. All right, Carter, number 3. We have, what would you see if you tried to read an encrypted disk? You would see a random sequence of zeros and ones, scrambled words from the user's documents, all of the user's information, or all one's? About 10 seconds remain. Is it working for you now? OK. All right, three seconds. And the ranked answers are a random sequence of zeros and ones. 91% of you indeed got that right. Let's see who's winning on the guest screen. Web user a28c3, nicely done. But it's still a close tie among three of you anonymous participants. Number four, which type of encryption is most secure-- enhanced encryption, end-to-end encryption, full scale encryption, advanced encryption? About five seconds. And most popular response is the correct one, end-to-end encryption with 92% of you. Nice. We're up to 2,375, 3,792, and 2,917. And good job to these three folks in the front of our list. All right, Carter, number 5, the last on cybersecurity. When would it make sense to store your password on a sticky note by your computer? When it's too complicated to remember, when you need to access your account quickly, when you share your account with family members, never. Oh. And the most popular response was never, which is indeed correct. And only 79% of you think that right now. It is never OK to store it on a post-it note on your computer. You should minimally be using today's password manager for that same process. All right, two of you, a28c3 and c9a23 are still atop the list. We have 3,000-plus points, 3,000-plus points, and probably about the same as well. All right, now we move on to the user-generated content that you all from Harvard and Yale generated for us. Number 6, what is the variable type that stores true/false values? Boolean, string, integer, or double? About 10 seconds to come up with this. We saw these in different languages, these types. But the idea was the same. And in two seconds, we'll see that the answer is Boolean with 96% response rate. All right, what else do we have here? It's still a two-way tie at the top. All right, next question, Carter, is number 7. What placeholder would you use when trying to print a float in C, a float in C? Seven seconds. I'll defer to the visual syntax on the screen for this one. And the most popular and correct answer is, indeed, %f. We never saw %fl and we definitely didn't see %float. Two of you, though, are still in the lead. Nicely done, whoever you are. All right, next question, what does I++ do in C++ where I is an integer value? Note, for the record, we did not teach C++ in this course, but this question is from you. I will admit it's the same as in C, which we did teach. Decrements the integer, deletes the integer, increments the integer by one, or reassigns the integer to zero? The most popular answer and correct answer is increments the integer by one. It definitely doesn't decrement, so. All right, two responses still atop the list. And here we have 6,000-plus, 6,000, and 6,000. So it's getting closer. Using a hash table to retrieve data is useful because it theoretically achieves a search time of O of n, O of n log n, O of log n, or O of 1? Five seconds to make your decision. Getting a little harder. And let's see the results. O of 1, only 30% of you got the correct answer from a very core week 5 topic. That is the theoretical hope of a hash table. In practice, though, to be fair, it can devolve, as we saw, into O of n. We didn't really see those other two answers in the context of hash tables specifically. All right, wow, a28c3 is in the lead now. Let's take a look at number 10, halfway there. What is the first program we made in CS50? This should be fast. All right, Greet, Meow, DNA, Hello, world? One second. And it was, indeed, Hello, world, Hello, world. All right, still in the lead with 10,000 points. And now let's move on to the second half. Question 11, when malloc is used to allocate memory in a C program, that memory is allocated in the pile, heap, bin, or stack? Very creative set of answers. Five seconds. All right, and the results have heap at 43%. Malloc was from the heap at the top. The stack is where function calls go. It's getting a little more worrisome here. But that's OK. Still in the lead with perfect score, it seems, 11,000 points. Next up is number 12. Which data structure allows you to change its size dynamically and store values in different areas of the memory-- an array, a queue, a linked list, or a stack? Change its size dynamically and store different values in different areas of the memory. And the answer from the group is a linked list at 62%, which is correct. An array, as we defined it, cannot be resized. You can create a new array, copy everything over. I'm starting to think maybe we shouldn't end the class on this note. But that's OK. We'll move on. 12,000 points for the lead. And number 13, what does CSS stand for in web development-- computer style sheets, cascading style sheets, creative style systems, colorful sheets styles? And most popular answer is correct with 81%, cascading style sheets. On the top 10 list here at 1,300 points, still a perfect score, and our three human volunteers are doing well here too. 14, how to represent a decimal number 5 in binary. All right, here we go. I'll let you read these. All rights, fingers crossed, decimal number 5 in binary is, indeed, 101. Because that's a 4 plus 0 plus 1 gives us a decimal 5. All right, next question, and amazing a28c3, whoever you are out there, nicely done. Who is the CS50 mascot-- cat, duck, robot dog Spot, Oscar the Grouch? All of whom have appeared in some form. This one will be a little looser with answers, but looks like duck and cat were both the most popular. Duck has kind of become the mascot, suffice it to say. Cat is kind of everywhere on CS50 social media. So we'll accept cat as well. We love Spot, but has only made that one appearance. 15,000. Final few questions, what is the output of printf quote, unquote, "1" plus quote, unquote, "2?" It will return an error, twelve, 3, or 12? English and digits respectively there. Six seconds. All right, one second. And 12 with 74% is correct. Because it's not quite 12, it is more rather 1, 2 because those are two strings that got concatenated would not actually be an error in that case. It's just not what you expect. All right, it's getting a little harder, but still someone's got a perfect score. What does LIFO stand for? Lost In First Order, Last In First Out, Let Inside Fall Outside, Long Indentation For Organization? Good one. Last In First Out, and we discussed this in the context of a stack. Because as you pile things on top of the stack, the last one in is the first one out. All right, nicely done, this player here. Three questions to go. On average, how early did you submit the weekly pset? A couple of days early, no rush, the morning of, a couple of hours early, but was not too nervous, 11:59:59, I live on the edge. Again, user-generated content. And the most popular answer-- [LAUGHTER] Carter and I conferred before class and we autocratically decreed that this is the only right answer and the only one we will accept here, though we appreciate the others as well. Wow, all right, did you take this class for the CS50 shirt? Yes, no, maybe, I'm not telling you? So that is this here shirt, which you'll get at the CS50 fair. One second. And, yes, no, maybe, I'm not telling you, this time, we'll accept all four of those, which brings us to our final question, at which point we'll reveal the scores of all of our participants and see if we can get the number one score online. What is the phrase that David says at the end of each lecture? [INTERPOSING VOICES] DAVID MALAN: All right, before we actually say what the right answer is, though we can show it, Carter, we'll see that there is 98%-- I've never said this at the end here, but 98% answers there. Let's go ahead and look at the top chart. Do we know who web_a28c3 is? Oh my goodness, come on down. And among our friends here, can you pull up each of your scores if you're able to see? And among our human volunteers, 16,792, 17,292, 16,958. So we have our human winner as well. So without further ado, allow me to thank our volunteers. Thanks so much to CS50 staff. We're about to give out some cookies and, if you want, some stress balls here. Cake is now served. And this was CS50. [CHEERING] [INTERPOSING VOICES] [MUSIC PLAYING]