[Week 10] [David J. Malan] [Harvard University] [This is CS50.] [CS50.TV] 

All right! This is CS50 but not for much longer. This is the start of week 10.  On Wednesday we have our quiz, and then next Monday we have some celebratory cake  as we come full circle all the way back from week zero. Today, we talk about one of my favorite topics, truth be told-- that of security and privacy and the implications of all of the hardware and software that we all use these days. To be honest, there are a lot of threats out there that if you haven't really paused to think about them,  they're actually pretty daunting. Case in point--if any of you have ever downloaded a piece of software off of the Internet and installed it on your computer, you've engaged into a significant degree of trust, right? There is nothing having prevented Skype, or Chrome, or any piece of software you've installed onto your computer, from simply deleting all of the files on your hard drive; from uploading all of the files on your hard drive to some bad guy's server; from reading all of your emails; from intercepting all of your instant messages. Because the reality is today with most modern operating systems there really isn't much of a wall between software programs that we install, and you and I are pretty much just kind of crossing our fingers and taking on faith that that app we downloaded for free, or that thing that's 99 cents, is actually completely benign. But as we've seen through C, and now PHP and JavaScript,  with this ability to express ourselves programatically,  you can do most anything you want with a program that a user himself or herself could do. 

So, today we focus on that topic-- not only some threats but also defenses. Indeed, in the world of security in general, there's kind of this cat-and-mouse game, and I daresay the bad guys almost always have a leg up.  When it comes to taking advantage of hardware and software on our own personal computers, we have to realize that a bad guy simply needs to find one simple mistake-- one exploit, one bug--in a piece of software we've written or are running in order for him or her to take over our whole system. By contrast, we--the good guys--need to patch and fix all of those bugs and avoid all of those weaknesses. And so, I daresay on the whole, the bad guys have the advantage. What classes like this and subsequent classes are really about are not about teaching you how to wage the battles that these bad guys do, but are about how to protect yourself or at least how to make a calculated decision that yes, I know this piece of software could indeed read every one of my emails, but I'm okay with that because of the value it brings me on the other hand. 

I'm very pleased to be joined by 2 of the smartest people I know-- Rob Bowden and Nate Hardison. Rob is about to take us for a tour through the lowest level of security tags-- that of the compiler which, up until now, we've all come to love and trust. Rob Bowden. [applause] 

[Rob] All right. David has pretty much taken my whole spiel that I was going to introduce with, but-- Several weeks ago, you saw the example of a buffer-overflow attack which is an example of a hacker hacking into some piece of software that they are not supposed to be hacking into. The other side of this is sometimes you have software that is malicious in and of itself. It doesn't even need to be hacked. The person who wrote the software wants to hack you. 

Let's just jump right into code, taking a look at "login.c". Here, a silly program that validates a username and password combination. Here you should definitely be getting comfortable with C again for the quiz. First, we are using get strings to describe the username, then we're using get string to grab the password,  and then we have some trivial checks of just, is the username "rob"? And is the password "thisiscs50"? Or, is the username "tommy" and the password "i<3javascript"? If either of those is the case,  then we're just going to print "Success", and then we have access. Otherwise, we're going to print "invalid login" and then, of course,  since sketch strings malloc's memory, we free username and password. This is a trivial login program, and if you think about when you log into the appliance, it's pretty similar--or even logging in to your computer-- there's just some login program which is giving you access. Here, we happen to have hard-coded 'rob', 'thisiscs50', 'tommy', 'i<3javascript', but probably there is some file somewhere on your operating system which has a list of usernames who can login to the system and a list of passwords associated with those usernames. Usually the passwords are not just stored in plaintext like this. There is some sort of encryption, but this will do for our example. 

Coming over to our compiler-- it's going to be very straightforward. We need to specify at least some file that we want to compile, and then here--these lines are just reading an A file. It reads the entire file into one big buffer, and then we null-terminate our buffer as always, and finally we just compile the file. We're not going to look at how compile is actually implemented, but as a hint, it just calls Clang. We're going to use this program to compile things instead of Clang. One problem we start with is we see we want to compile our compiler, but if we're not going to use Clang, I don't know what I'm going to compile with. This is a general issue known as bootstrapping. So, just this once, I'm going to use Clang to compile our compiler. 

If you think of GCC and Clang-- those programs, those compilers are constantly being updated, and those compilers are compiled using GCC and Clang. Clang is just one big C or C++ program, so the compiler they use to compile that is Clang. Here, now, we are just going to be using our compiler to compile our compiler, and we can even say--'./compiler', 'compiler.c', 'compile.c', '-o compiler'. Notice this is the exact command I ran before-- just replace Clang with './compiler'. And now we have another compiler, but it's exactly the same. It just calls Clang. 

We're going to use our compiler to compile our login program. Okay--"./compiler login.c -o login". So, undefined reference to "GetString".  Got a "-lcs50". Okay. So now I have our login program. Running it--receive "Please enter your username". One example was rob. Please enter your password--thisiscs50. And success! I have access. Running it again and entering some invalid password-- or invalid username and password-- invalid login. Okay. Nothing interesting about this so far. But, let's take a look at login again-- and this is going to be a somewhat trivial example, but let's add an else in here and say, else if ((strcmp(username, "hacker") == 0 && strcmp(password, "LOLihackyou") == 0)) so now, printf("Hacked!! You now have access. \n"); okay. Compiling this--compiler login.c -o login -lcs50-- now running login--and if I use my username hacker  and password LOLihackedyou-- Did I type it wrong in there before? At login.c--ihack--I'll do hacked because I think I do that later. Okay. Recompiling. Re-running--hacker--LOLihackedyou-- Hacked!! You now have access. 

There doesn't seem to be that much of a difference because it's the same exact check I was doing for other usernames and passwords. Plus, the big thing is that if other people look at this login.c-- say, if I pass this off to my partner, and they open up this file, and they read this, they'll see--okay, why do you have these lines of code here? That's obviously not something that should be in your program. In some programs--like any proprietary software that is not open-source-- you may never see these lines of code. Something like Skype or something--for all you know, Skype is on your computer and there is just some specific username-password combination which will login to Skype in some special way. We don't know about it, and people don't know about it, because they don't get to read the sourcecode to see that there's this hole. 

What we call this-- although this isn't a very clever example-- this is called a back door. If you think of the back door of your house. Here, if I validate with the usernames 'rob' or 'tommy,' that would be like using the "front door." That's the way I'm supposed to securely login. But if I enter with this username and password-- then that's using the "back door." It wasn't the intentional way to get into the program, but it still works. And people in general aren't supposed to know about these back doors. We're going to improve this. Let's revert this to our original login.c, and let's take a look at our new compiler. Okay. Everything up here is exactly the same. We're reading the entire file into a buffer. Everything down here is the same. We just compiled the file. But now I have this big if in here that says, if the file that I happen to be compiling is login.c, then I do something special. What is that something special? I see here some string called 'hack', and looking at these lines of code-- these are the same lines of code--I guess I did use 'ihack' and not 'ihacked' before-- looking at these lines of code, they're the same exact lines of code that I had in login.c before. But now, instead of having them in login.c, I'm going to put them into my compiler. 

These are the lines of code I'm going to want to insert into login.c. These lines of code--the buffer that originally held my login.c is no longer going to be big enough because now I want to also insert this hack into the middle of my program. All this is doing is creating a new buffer that's large enough-- both for the original file and the extra lines of code--the hack--that I want to insert there. Here is something to notice-- char* pattern = "// deny them access!" If we look back at login.c, we see down here this comment--deny them access! In login.c, this comment looks completely innocuous, so you wouldn't suspect any malicious intent with just this comment. But in our compiler, we are specifically going to look for this line of code, and then when we find it-- these lines of code are inserting our hack into that position. So, we are iterating over the entire login.c, we are inserting that hack exactly where it says deny them access, and then we're compiling--not the original login.c but the new login.c with that hack inserted right at 'deny them access.' 

Now I want to compile my new compiler with that hack involved. I'm not going to use Clang, so let's copy the compiler we used before and use that--so, compiler compiler.c compile.c -o evil_compiler. Now our evil compiler-- if we use our evil compiler to compile any other file, the file will compile as it should. It's completely normal. But if we use our evil compiler to compile login-- so, ./evil_compiler login.c -o login -lcs50-- looking at login.c again, there's absolutely nothing in here. It's our usual 'if it's either rob or tommy we're in, else we're out'. But when we run our executable, this will work as usual. Running again--hacker, LOLihackyou-- Hacked!! You now have access. Just looking at login.c, you wouldn't think anything is wrong. But the compiler that's being used to compile login.c is specifically designed to insert this hack into the program. We've only moved our original problem. Originally, we had these lines of code in login.c that if anyone else looked at them, they'd be like, why are these here? Now if anyone happens to look at our compiler, they'll look at these lines of code and say, why are these here? So, we haven't completely solved our problem. But we can use this idea again. 

Taking a look at our third version of the compiler, it's the same idea. We read the entire file into a buffer up here, we compile the file down here, and I have some sort of string hack in here,  but notice the file that I'm actually hacking now. I am hacking compiler.c not login.c. But what am I inserting into compiler.c? I'm inserting this code which is the exact code in our previous compiler, which was meant to insert code into login.c. Here is our if strcmp file login.c and then insert into login.c our hack stuff. This new compiler is intended to create a compiler  which is intended to compile login maliciously. We'll see why that helps us. Everything else is the same, so it works exactly as it worked with our previous compiler. We are just iterating over our entire compile.c. But now instead of looking for 'deny them access,' we're looking for 'compile the file.' Where is compile the file? Looking at our completely innocent compiler.c, it's right down here at the bottom. This comment--compile the file--again, that looks completely innocuous. But, in our malicious version of the compiler, we are going to--right before that comment--insert this code which is intended to insert code when compiling login.c. 

There are a lot of levels of indirection here. Let's actually run it. Again, we'll copy our original compiler. Now, compile--compiler compiler.c compile.c -o evilest_compiler-- and now evilest_compiler. Evilest compiler--if we use that to compile our compiler, let's actually look. Evilest_compiler--bringing it over, all the way back to version 1 of our compiler-- version 1 of our compiler which was just read in the file and compile it. Here, we know that evilest_compiler--when it compiles this file-- is going to insert code right here before compile the file, and that code is going to look exactly like the code from the second version of our compiler  which did this. This code, which is intended to break into login, is going to be inserted into our compiler by evilest compiler. Let's use evilest_compiler to compile our compiler. Now we'll use compiler, come back over to login, and remember this login.c has absolutely nothing suspicious in it. But using our compiler to compile login.c, rob, thisiscs50, success. Let's login hacker, LOLihackyou, Hacked!! You now have access. 

Recognize that this was different from version 2 of our compiler. The compiler we use--let's move back--copy from login-- let's bring our compiler back in here. Removing evilest compiler. All we have left now is compiler. If we look at compiler.c, there is absolutely nothing in there that seems malicious in any way. If we look at login.c, there is absolutely nothing in here that looks malicious in any way. But, when we use our compiler to compile login.c, we get the hackable version of login.c. When we used our compiler to compile a new version of the compiler, we get the hackable version of the compiler. Now if we go out and distribute our compiler executable, and no one will know that there is anything malicious about it. 

This is actually so in--I can't remember the year-- Ken Thompson, and he won the Turing Award-- if you are unfamiliar with the Turing Award, it is almost always defined as the  Nobel Prize of computer science, so that's how I'll define it. Ken Thompson gave a speech when he received his Turing Award called "Reflections on Trusting Trust." Basically, this was the idea of his speech. Except instead of our compiler, he was talking about GCC-- just another compiler like Clang-- and what he was saying is, like our login.c, our login.c seems relatively useless but he was talking about the actual UNIX login.c. When you login to your appliance, there is some login program that is running. That was the login that he was talking about. This was basically his idea. He said that in GCC, he in theory could have planted a bug-- not a bug but a malicious code-- that when compiling the login function--the login file-- would insert a back door so that he could go to absolutely any UNIX system in the world and login with some specific username and password. At the time, GCC was pretty much the compiler that everyone used for anything. If anyone happened to update GCC, then they would recompile GCC using GCC, and you would still get a bad version of GCC  because it was specifically compiled to recognize that it was recompiling the compiler. And if you ever use GCC to recompile a login.c file, then it would insert this back door that he could use to login to any computer. 

This was all theoretical, but--that particular circumstance was theoretical,  but the ideas are very real. In 2003, there was a similar example where-- we'll take a look at this file, and it has absolutely nothing to actually do with it, but the bug is similar. This file just defines a function called divide. It takes an argument a, an argument b, and the intent is to do a divided by b. But it does some error checking, so we know things are weird if b happens to equal zero. If b is zero, then we split this into 2 cases. You might already see the bug. The first case--if a is zero, then we're doing zero divided by zero, and we just say that's undefined. The second case--if a is not zero, then it's something like 1 divided by zero, and we just call that infinity. Else we return the usual a divided by b. And so here, we're running those 3 cases, and we actually run divide--it yells at it for me-- so, ignoring Clang's warnings--  end of non-void function--apparently I didn't compile this beforehand. Return 0. Make divide--all right. With ./divide, we see 3, Infinity, Infinity. Zero divided by zero should not have returned infinity. And if you haven't figured out the bug yet--or didn't see it before-- we see that we're doing a=0. Probably we meant a==0. Probably. 

But, this was actually something that, again, in 2003, the Linux kernel-- so our appliance uses the Linux kernel-- any Linux operating system uses the Linux kernel-- so a bug very similar to this showed up. The idea behind this bug was-- again, there was just some function that was called, and it did a bit of error checking. There were some specific inputs that this error checking-- it should have been like, all right, you can't call this function with a divisor of 0. So, I'm going to just return some error. Except, it wasn't as innocent as just setting a equal to 0. Instead, this line of code ended up doing something more like user = administrator. Or user = superuser. It was an innocent--at first glance--mistake where it could have just been reasonable that I only wanted to report something specific if the user happened to be the superuser administrator. But then re-thinking about it, the person wanted it to look like a simple typo, but if this code had actually been released, then you would have been able to hack into any system by passing a specific flag-- in this case b=0-- and it would automatically make the user the administrator, and then he has full control. This happened in 2003. 

It just so happened that the only reason it was caught was because there happened to be some automated system that noticed the change in this file which never should have been changed by a human. The file should only have been automatically generated. It just so happened that someone touched-- well, the person that wanted to hack touched that file, and the computer caught that touching. So, they changed this and only later realized what a disaster it would have been if this had gotten out into the real world. 

You may be thinking that--coming back to our compiler example-- even though we can't see--looking at the sourcecode-- that anything in particular is wrong, if we actually look at the binary code of compiler, we would see that something is wrong. As an example, if we run the strings function-- which is just going to look over a file and print out all strings it can find-- if we run strings on our compiler, we see that one string that it finds is this strange-- else if (strcmp(username, "hacker")--blah, blah, blah. If someone happened to be paranoid enough to not trust their compiler, they could run strings and see this, and then they would know that there was something wrong with the actual binary. But, strings was inevitably something that was  compiled. So, who's to say that our compiler doesn't just have more special code that says, if strings is ever run on our compiler, don't output all of that malicious code. 

The same idea with if we want to dis-assemble the file-- we learned that the assembler brings us from assembly code to a machine code-- we can go in the opposite direction--objdump -d compiler-- will give us the assembly of our code. Looking at this, it's pretty cryptic, but if we wanted, we could look through this and reason, wait, there's something going on in here that shouldn't be going on, and then we'll recognize that the compiler is doing something malicious. But, just like strings, who's to say objdump wasn't special-cased. Basically, it comes down to you can't trust anything. The point of the paper being called "Trusting Trust" is in general, we trust our compiler. You compile your code and expect it to do what you ask it to do. But, why should you trust the compiler? You did not write the compiler. You don't know what the compiler is necessarily actually doing. Who's to say you can trust it? But even then, well, maybe we can trust the compiler. There are tens of thousands of people who have looked at this. Someone must have recognized something was up with the compiler. 

What if we just go 1 level deeper? It could even be your processor. As ridiculous as it could possibly be, maybe there's some rogue employee at Intel who creates these processors that whenever that processor notices that you're running some command that's meant to login to the computer, the processor will accept some specific username and password combination. It would be wildly complicated, but someone could do it. At that point, are you really going to open up your computer to look at the processor and use a microscope to recognize that these circuits are not lined up as they should be? No one is ever going to catch that error. At some point, you just have to give up and trust something. Most people do trust the compiler at this point. That is to say not necessarily that you should. Looking at a somewhat infamous video-- [dramatic music playing] [It's a UNIX system.  I know this.] [It's all the files--] She said, "It's a UNIX system. I know this." Replace UNIX with whatever your favorite operating system is-- she could have said, "It's a Windows system. I know this." It's a completely meaningless statement,  but for all we know, she happens to know a back door into the UNIX system. She knows some username/password combination that will actually let her do whatever she wants. 

All right. The moral of today is basically you can't trust anything. Even things you write--you didn't write the compiler. The compiler could be bad. Even if you did write the compiler, the thing that's running the compiler could be bad. (laughing) There's not much you can do. The world is doomed. Back to David! [applause] 

[David] Thanks. That was really depressing. But indeed, Rob is correct.  We don't really have a solution to that, but you're about to get some solutions to some more common defenses. In anticipation of this, what Nate and I have been doing offstage there is knowing that there are so many laptops in this room, we've been sniffing all of the wireless traffic going through this room for the past 20 minutes during Rob's talk, so we're going to take a 2 minute break here. Nate's going to set up, and then we're going to talk about all of the stuff we could have found. (laughter) 

So, I may have exaggerated a little bit just for the sake of drama, but we could have been sniffing all of your wireless traffic because indeed, it is that easy. But there are also ways that you can defend against this, and so with that, I give you Nate Hardison. >>[Nate] Sweet. (applause) 

[Nate] Thanks, man. I appreciate the shout out. All right! It's game week. Are you guys excited?  Hopefully it's going to be a big game on Saturday. I imagine you guys at this point--given that you have a quiz on Wednesday all about code, and we just sat through a wonderful lecture by Rob  with a whole bunch of C code in it-- are maybe a little bit tired of code. In this part, we're actually not going to touch any code whatsoever. We're just going to talk about a technology that you use every day, often for many, many hours a day, and we'll talk about the implications with security that there are. 

We've talked a lot about security over the course of the semester, and we started off with a little bit of crypto. [Bdoh lv vwlqng!] And while you guys are probably super-excited to be passing notes to each other in class using a Caesar cipher like this one, in reality, there's some more fun to be had when you're actually talking about security and  that kind of stuff. Today, we're going to cover a few technologies that people actually use in the real world to do all sorts of things from sniffing people's packets to actually going in and breaking into people's bank accounts and all of that. These are legitimate tools that we're talking about with the exception of possibly one tool. 

And I just want to make a quick disclaimer. When we talk about these things, we're talking about them so you know what's out there, and you're aware of how to be safe when you're out using your computer. But we definitely don't want to imply that you should use these tools in your dorm or your house because you can run into lots of big issues. That's one reason today that we actually weren't sniffing your packets. 

All right. Last Monday, we talked about cookies, and HTTP, and authentication, and how Firesheep opens this big door into your Facebook account, to your Hotmail account--if anybody's still using Hotmail-- and many other accounts. A lot of this stuff is going to build off of that, but first, I want to take a quick tour of how the Internet has evolved over time. Back in the '90s, you guys might have remembered actually plugging in your computers with one of these. Now we don't do that so much anymore. It actually turns out that in order to plug an Ethernet cable into my laptop, I now have to use one of these adapters which is kind of crazy. 

Instead, in 1997 we had this new, fun technology came out that is known as IEEE 802.11, so this is the wireless internet standard The IEEE is this governing body that gives out all sorts of-- publishes all sorts of standards with relation to computers. The 802 standards are all about Internet technologies. So 802.3, for example, is the Ethernet standard, 802.15.1 I believe is the Bluetooth standard, and 802.11 is all about wireless Internet. In 1997 this came out. It didn't quite catch on right away. It wasn't until 1999 and the 802.11b standard came out that just got really popular. 

How many of you remember when computers started coming out and getting wi-fi on them? That was kind of cool, huh? I remember getting my first laptop in high school, and it had a wireless card in it. My dad gave it to me and was saying that I should use it for my college apps and all of that, and I had no idea how I was going to look up this stuff online. But fortunately, I had a wireless card, so that was pretty cool. Nowadays, you'll also see 802.11g which is one of the other really popular wireless standards that's out there. Both b and g are pretty outdated at this point. Anybody know what version most people are on right now if they're buying new wireless routers and that kind of stuff? N. Exactly. Bingo. And it turns out that the ac standard is just coming out in a draft form, and there are other versions on the way. With each of these standards what we're gaining is more bandwidth, more data at a faster rate. These things keep changing pretty quickly. It also makes it so that we have to buy more routers and all that fun stuff. 

Let's talk about what wireless communication actually is at its core. With Ethernet and those old dial-up modems, you actually had this stuff that you plugged into your computer, and then you plugged into a modem of sorts, and then you plugged it into a jack in your wall. You had this wired connection, right? The whole point of wireless is getting rid of this stuff. In order to do that, what we have is essentially a radio communication where our wireless router-- designated by our little wireless icon-- is connected to the Internet with this solid arrow indicating some sort of wired connection, but when you connect to your wireless router you're actually using almost like a walkie-talkie between  your computer and your wireless router. What's really cool about this is you can move around. You can carry your computer all over Sanders, go surf the web, whatever you want, just like you all know and love, and you don't ever have to be plugged in to anything. For this to work, we have both this reception and transmission. It really is like that walkie-talkie. 

This wireless router--which in Sanders is sitting underneath this stage, right here-- is always broadcasting and receiving, broadcasting and receiving, and likewise, your computers are all doing that same sort of thing, too. We just can't hear it. The other thing that you can do is you can have multiple computers talking to the same wireless router. The closer you are to a router--and again, this is a radio communication-- the closer you are, the better your signal is, the better your computer 'hears' the router and can communicate with the Internet. If you guys are ever at your dorm, at your house and you're wondering why your signal's bad, it's probably because a). you're not very close to your router, or b). there's something in between you and your router like a cement wall or something that doesn't let those radio waves go through. 

Let's talk a little bit about why bad guys like wi-fi. Bad guys love wi-fi for a few reasons. Here's our nasty bad guy right there. One reason why this bad guy loves wi-fi is because, by default, a lot of wireless routers come and when you set them up, they're unencrypted. This has been a problem, and there have been instances-- multiple instances, now--where bad guy shows up to somebody's house, notices that there's an unencrypted wi-fi to which they can connect. They connect to the wi-fi, and then they start downloading all sorts of fun stuff. And they're not downloading kittens, they're not downloading puppies. This is like BitTorrent. This is the nasty of the nastiest. There have been cases where the FBI has even gotten involved thinking that the person who owns the house is actually the one going out there and downloading stuff that they really shouldn't be. Having unencrypted wi-fi is definitely not something you want to do, if only to not have the FBI come knock at your door. 

Another reason why bad guys love wi-fi is the reason that David talked about earlier during the break. Because it's a radio communication at its core, if you know the channel, you can listen to that radio station. For example, if there's a bad right there sitting in the middle right next to the access point, right next to that wireless router, the bad guy can listen in on all of the wireless traffic that's coming from all of those computers. In fact, these guys--these lucky few who are here in the front row-- because they are super-close to all of these wireless routers  that sit just underneath the stage,  they would be able to hear everybody's traffic in this entire room if you're connected to wi-fi and start browsing through these access points. It's not very hard to sit yourself in a good position to sniff and figure out what other people are doing. It's something to keep in mind, especially if you're not sure where the access point is, and you're browsing say, at a Starbucks. 

It turns out that sniffing and all of that isn't really all that hard to do. There's a program called tcpdump which dumps all sorts of TCP traffic and you can run it pretty simply--just like I did this morning. Here's a little bit of a dump, and here's some of the traffic that was coming over my network at the time. You can see--if you squint really hard--there's a little bit of Spotify in there. On top of tcpdump--because this is kind of a pain to use-- there's a program called Wireshark which bundles this all up in a nice GUI. Wireshark is super-handy so if you go on to take networking classes, this is a tool that you'll come to love since it helps you dissect all of the packets that are floating around out there. But it can also be used for bad. It's very simple to just download this program, boot it up, start a network capture, and see everything that's going on-- and filter and do all sorts of fun stuff with it. 

The other thing that you can do with wireless communication is not only can you eavesdrop but you can also learn how to screw with the network and inject your own information to control the experience that other people  on the same wireless network are getting. Let's take a look at that. Here's Firesheep--which we know and love from last week-- which is that eavesdropping technology. If, for example, we wanted to actively have our bad guy go and mess around with one of these computers, in this scenario we've got a computer trying to go surf to harvard.edu. What happens is, the computer first sends a message to the wireless router and says, hey, I want to go visit www.harvard.edu. Say for some reason they're trying to get information about the game this weekend. Bad guy, since he's sitting right in the middle, right next to that access point, can see that communication coming from the computer into the router, and he knows, "Aha! Somebody's going to harvard.edu." (evilly laughs) There's going to be this latency while the communication goes from the router out to the Internet to go find the webpage at harvard.edu-- just like you guys all know after doing your PHP psets-- and so the bad guy has a little bit of time, a little bit of window, in which he can respond with some stuff. 

Let's say this bad guy, of course, is a Yaley. He responds with harvardsucks.org. Boo! Bad, bad guy! Bad Yaley! Or even worse, he might respond with that. [http://youtu.be/ZSBq8geuJk0]. I'll let you guys figure out what that is. This is actually a technology called Airpwn! which was debuted at  one of the security conferences a few years back. With Airpwn! you're able to actually inject traffic back into the network. The computers that were trying to go out to the Internet and trying to get to Google.com, to Facebook.com, to harvard.edu see the malicious response come in and immediately assume, okay, that's the response that I was waiting for and end up getting content from harvardsucks.org or nameyourfavoriteshocksite.com, and you can see how quickly things will deteriorate. 

All of these sorts of things can't be done with these wired connections because with a wired connection it's hard to snoop on to traffic. If I'm a bad guy and on one end is your computer and on the other end is your router--your modem-- the only way I can get in between that connection is to actually splice my computer in somewhere in the middle or do something else with the router, something downstream. But with wireless, it can be as easy as sitting in the front row of a classroom, and you can do all sorts of nasty stuff to the people in the back. 

Let's talk about how you might defend against some of these things. The people who developed the wireless standards--the 802.11-- they are not dumb people by any stretch of the imagination. This is cool technology and when it debuted in 1999, they came out with this standard called WEP. You can see here when you try and join a wireless network, you have all sorts of different security options. That's kind of a pain because there are 6 all together and it never really makes sense which 1 to join. This 1 at the top is the first one that they came up with called WEP. WEP stands for Wired Equivalent Privacy, I believe, not Wireless Encryption Protocol which is a common misnomer. Because it tries to give you privacy equivalent and security protection  equivalent to that of a wired network With WEP what ends up happening is, you have a simple, little password that you type in and that serves to encrypt all of your communications between your computer and your router. 

What's the problem with WEP though? The password with WEP is really short,  and also everybody uses that same exact password,  and so it's really easy to decrypt. So very quickly people figured out that WEP was a problem, and the only reason you see it show up still on this little guy is-- there are some older systems that do use WEP-- what you should instead be looking for are the WPA and even WPA2 standards that were released later on. These systems are a much better go at protection on wireless Internet. That said, they still do have some hackability. There are tools out there that can go do this. One thing in particular that can be nasty is that if you connect and authenticated to a wireless router and are using some sort of encrypted communication, it turns out that a hacker can easily send a single packet to disconnect you from the router, and once they've disconnected you they can then listen in-- they can sniff those packets as you try to re-establish the connection with your router. And with that information they can then go in and decrypt the rest of your communication. This isn't by any means any sort of secure beyond all imagination. 

The other thing you can do when you're setting up wireless networks or you're joining them is--you notice that here when I'm joining this network,  it asks for the name of my network. This is also known as the SSID. And you see here that on the right I have a box that shows me the available SSIDs. There's a Harvard University, a CS50, and a CS50 Staff network. Now, how many of you knew there was a CS50 Staff network around? Some of you. Not all of you. The problem with this, of course, is that had we not put this up on our list of SSIDs, nobody would have known about it most likely. I hope. Unless you guys are all trying to crack into our wireless. But this is something you can do that's really important when you're setting up a router at home. This probably won't happen for a few years for a lot of you, but do keep in mind that keeping that SSID out of there and not also naming it something super-common will help keep you more secure in the long run. 

A final couple of things you can do. One is HTTPS. If you are at a Starbucks, if you are in a public wi-fi area and you do decide to access your bank account, access your Gmail, your Facebook, make sure that those connections are going over HTTPS. It's an added layer of security, an added layer of encryption. The one thing to keep in mind here is, how many of you have ever clicked through that big, red screen that says, "This website might be bad." I know I have. It's probably when you're all browsing to go see Homeland or something like that, right? Yeah. (audience laughter) Yeah. There you go. We know who's watching Homeland. That big, red screen right there often indicates that something funky is going on. Sometimes it's just the website itself is insecure, but that same big, red screen comes up when people are trying to mount network attacks on you. So if you see that big, red screen come up at a Starbucks, don't click through it. Bad news. Bad news bears. 

The final thing that you can look at  is some sort of VPN. This VPN is available through Harvard--vpn.fas.harvard.edu-- and what this does is it actually establishes a secure connection between you and Harvard, funnels your traffic through it, and that way if you're sitting at a place like a Starbucks you can connect to Harvard, get that safe traffic, and then browse from Harvard. Again, not foolproof. People can get in the middle. They can start to break it, but this is far more secure than relying on the security of the wi-fi alone. 

All right. In sum, when you are setting up wireless networks, when you are going out to use wireless in public-- whether it's a Starbucks, whether it's Five Guys, whether it's B.Good, something like that--wherever they have wi-fi-- be aware of your surroundings. Be aware of what people can do. And be safe. Don't access your bank account. It could be a rude awakening if somebody shows up with your password later on. With that, go crimson! And I'm going to turn things back over to David for a final word. (applause) 

[David] I thought I'd share one thing from personal experience. A tool you might like to play with--though Apple has largely eradicated this issue if you've updated your software since-- but toward this end of not really being able to trust software that we use, and to Nate's points, being able to sniff quite a bit of what other people are doing out there--this was a piece of software that came out about a year-and-a-half ago now. [iPhoneTracker] [http://petewarden.github.com/iPhoneTracker/] For some time, iTunes--before iCloud, when you were syncing your iPods or your iPhones or or your iPads with iTunes--in the interest of backups, what your iPhone and these other devices have been doing for some time is making use of GPS data. 

You all know perhaps that your iPhones and Androids and Windows mobile phones and the like these days can track where you are in the interest of showing you maps and similar--well what Apple and these other companies do is they typically track almost everywhere you've actually been in the interest of  improving quality of service. One, you can get more targeted advertising and the like, but two, they can also figure out where are there wireless hotspots in the world, and this can help with geo-location--sort of triangulation of people's position. 

Long story short, all of us had been walking antennae for some amount of time. Unfortunately, Apple had made the design decision--or lack thereof-- to not encrypt this information when it was being backed-up to iTunes. And what the security researcher found was that this was just a huge XML file-- a huge text file--sitting in people's iTunes software, and if you were just a little bit curious, you could go poking around your spouse's history, your roommate's history, your sibling's history and the like, and thanks to some free software, you could plot all of these GPS coordinates-- latitude and longitude. 

So, I actually did this with my own phone. I plugged in my phone, and sure enough, my version of iTunes was not encrypted at the time, and what I was able to see were my own patterns. Here's the United States and each of these blue circles represents where I happened to have been over those previous months of owning this particular phone.  I spend a lot of time, of course, up in the Northeast, a little time in California, a short-lived trip to Texas, and if you then zoom in on this-- this is all sort of fine and interesting, but I knew this. Most of my friends knew this, but if you dive in deeper, you see where I spend most of my time in the Northeast. If you latch onto some familiar-looking towns-- this big, blue ink splotch is essentially centered over Boston, and then I spend a little bit of time out in the suburbs radiating out from Boston. But I was also doing quite a bit of consulting that year. And this year is the eastern seaboard, and you can actually see me and my iPhone in my pocket traveling back and forth between Boston and New York and Philadelphia further down, as well as spending a little bit of vacation time on the Cape, which is the little arm out there. So, each one of these dots represents some place I had been, and completely unbeknownst to me, this entire history was just sitting there on my desktop computer. If you zoom out--this actually was a little troubling. I had no recollection of ever having been in Pennsylvania that particular year. But I though a little harder about it and I figured out, oh, it was in fact that trip and sure enough, my phone had caught me. 

Apple has since encrypted this information,  but this too is just testament to how much information is being collected about us, and how easily--for better or for worse--it's acceptable. One of the take-aways hopefully from Rob's talk, from Nate's talk and little visuals like this today is just to be all the more cognizant of this so that even though--as to Rob's point--we're sort of screwed, right? There's not much we can do when it comes to some of these threats, but at the end of the day we have to trust something or someone if we want to actually use these technologies. At least we can be making informed decisions and calculated decisions whether or not we should actually be checking this particularly sensitive account  or we should actually be sending that slightly suspect instant message  in a wi-fi environment like this. 

So, with that said, just one quiz remains, one lecture remains. We'll see you on Wednesday then Monday. (applause and cheers) [CS50TV]