[Silence] >> This is CS50! So this is already week 11, which means week 12 is almost here and thus is the end of CS 50, but a few things remain on the horizon. So one, very excitingly, quiz 1 is this Wednesday, not in here to the same locations as last time. See the PDF that's linked on the course's home page. Last night's review session is already online, but genuinely more excitingly on the horizon is the CS50 Hackathon. So as promised, this is going to be an opportunity on Friday, December 2, which is going to be the first evening of reading period, to dive into your final project with the couple of hundred of your classmates alongside you. We're going to charter some harbored shuttles, going to drive down to Tech Square and Kendall Square, which is where Microsoft Nerd Center is, which is their Research and Development Center, which is a very sexy place. And awaiting you will be scenes like this as well as pizza around 8:00 p.m., Chinese food from the Kong at 1:00 a.m., and then for those of you still awake at 5:00 a.m. we will get back in the shuttle and take you to IHOP. So we will start the RSVP process for this over the next week or two. Space is limited, so we may end up lottering, but more details on that on the home page and via e-mail before long. And then this is around 5:30 a.m. last year, here at IHOP. We got the Early Bird Special there -- still dark outside, you can see. So also on the horizon and the culmination of all of CS50 is the CS50 Fair, so this is on Friday, December 9 to which the entire campus is invited, where a spectacle like this will await you, including stress balls and popcorn and candy and final projects galore. Think Middle School Science Fair, but much, much cooler with computers. And so that will be again, the climax of your experience here. So over the course of last week, beyond hating some of those bugs in the Google Earth plug and there were some other fun bugs, this screenshot was taken by one of our students whereby he somehow accidentally replicated just one staff member on his Google Earth; in fact, many, many, many times as you can see in the 2D Earth there. So that was in fact a bug. I would be remiss if I didn't acknowledge my own bug, so you'll recall that just as our Facebook gifts came in, I really screwed up a final demonstration of this library called jQuery, which is meant to simplify you. So Javascript, in particularly technologies like ajax, which again allow you to get more data from server to client, even after the initial web page has loaded. So this demo does actually work. Realize that foolishly I had commented out one line of code about five minutes prior to that demonstration, blowing up and commenting out an important line of code, turns out is bad for demonstration. So realize the code that you do have from last week is good. But then I proceeded to one-up myself further last week, as you may recall, with a little demo involving sending text messages. So let's just fess up. I'm very content putting this admission out on the Internet, so this was the while loop, which we've been using since week one, ironically, and in this while loop we used a library called PHPMailer, which allows you to create a male object, so to speak, and then add an address to it, add a body to it, and then call a send function, which if you use the right types of e-mail addresses you can generate text messages, and not e-mails per se. Unfortunately failure to RTFM means that I did not realize, despite its very clear name, that adding an address to the mail literally adds an address; it does not subtract the prior address. So as a result of this gaff and specifically this line, some of you might have been all too aware that Aaron first got a text message, but then on the second iteration of this while loop Aaron and Abel got a text message, and then on the third iteration of this loop, Aaron and Abel and Abigail got a text message. On the fourth iteration of this loop, Aaron, Abel, Abigail and Adriana got a text message, each individually. SO this kind of scales up fairly quickly. In fact, if you do the math, 1 plus 2 plus dot-dot-dot, all the way up, actually gives us the same running time of selection sort, which if you recall, did a whole lot of operations. Indeed, if you add all of these operations up you get a little something in big O(n) squared. Big O(n) squared in a loop with 600 students at 10 cents per text message, tends to add up, so in the 5 or 8 seconds of cluelessness that I had here on stage and on camera, we sent 10,000 text messages to your classmates, and Aaron has my particular apology since he was on the receiving end of most of those. So be very careful when using loops apparently. So without harping too much on that, where are we headed today? Oh, and incidentally, if you take nothing else away from this, too, thank God I inserted this artificial delay into this loop here. The goal originally was just to avoid getting blacklisted by a mail server, because if you send too many mails at once, a lot of servers will pull back on the reigns and say, no, you can't keep sending all these messages. Now, in my case this actually inserted enough of a delay that we only got to student number 138, but again, if you add up 1 plus 2 plus 3, dot dot dot all the way up to 138, you get about 10,000 text messages. So in any case, after CS50, which is the point at which we're almost at, you can head off in all sorts of different directions. This is actually the same chart that's on the back of today's handout. You got this way back in week zero which is CS50's own unofficial guide to computer science at Harvard. So statistically about half of you will not go on to future courses in computer science, and that's perfectly fine. Hopefully 50 in particular will nonetheless serve you well as you return to your own field, but for those of you curious in pursuing either a concentration or a secondary or just more coursework so that you return to your own field all the more empowered with tools and ideas that you can solve problems specific to your own domain, realize that from 50 in the middle here, you can go off in all of the directions that these areas point in here. And I'll defer to our booklet here and also to CS50's own course shopping tool if you'd like to read up more on the descriptions, but just to help you simplify, realize that after taking CS50 you can go off in several very specific directions as soon as this spring. So the course list you see here are all of the courses for which 50 is the only prerequisite, so you have a whole menu of options ahead of you, and it really does span the gamut of topics in computer science. In general, and this is perhaps a gross oversimplification, there's two main branches in computer science. There's systems and there's theory, and theory tends to be more a disservice to the entire field of paper-pencil whereas systems is more empirical, writing code, running experiments, networking and software and the like. And in these several courses can you get a taste of all of these various directions. To ward off an FAQ, rather these here are the Fall 2012 classes, and these are all summarized again in this booklet, and we'll put this PDF online. But to ward off an FAQ, between me and all of the TFs and CAs we've probably taken all possible CS courses here as well as some even down the road -- bottom right-hand corner -- at MIT, so this is quite the laundry list of courses which is to say most of us can advise you on directions in which you might want to go. A more compelling FAQ -- now, notice the contrast here -- look at all of the course I took, and now look at the courses Mark took, if you would like to just take Mark's courses, and Bill only took these courses here. So I'll leave it to you to infer exactly where computer science can take you here. So these guys have done quite well with their background alone. Oh, and finally -- and this is actually a more sincere note, because what we are about to do is introduce you to a number of the computer science faculty here to give you a sense not just of future courses that you can take here specifically, but really to give you a sense of computer science itself as a field, and what it means to be in theory or systems or all of the various subfields of those two big buckets. But here, I was going through my transcripts from college and grad school -- when these were four courses that I took over the course of mostly just senior year in grad school, and if I'm allowed to hop up very briefly on my little soapbox, I really regret in retrospect not having explored more when I was in college. Thankfully I did take 50, for instance, sophomore year, and that certainly took me off in a different direction, but some of my favorite course, honestly in senior year and in grad school were Dramatic Arts I, which is a survey class of the theater, GOV 1540, the American Presidency, which is like story time all week long with Professor Roger Porter, Latina, which is me the senior with a whole bunch of freshman filling a language requirement, but it was still interesting, and then Anthro 1010, which is Introduction to Archeology, where I had a little admitted grad school life crisis, thinking, wow, this will be a lot more fun digging in the dirt than it would be in the computer lab, but I stayed true to CS there in the end. And this is only to say that I really do regret not having explored other course instances in CS, or the courses in other fields altogether. And the fact that so many of you took 50 this year is really quite inspiring, because so many of you self-describe as not really being bound for computer science. So if you take nothing else away this entire semester, do at least at one point -- and here's the soapbox -- veer off on whatever path you're on far sooner than senior year, because learning a little bit about a new field that's unfamiliar can definitely change your life. And with that said, allow me to introduce three of the computer science faculty in turn: Professor Krzysztof Kgajos, Professor Harry Lewis and Professor Hanspeter Pfister to give you a sense of what might lie ahead for you. Professor Kgajos? [Applause] >> Well, too bad that David stopped my presentation with a punch line slide. Anyway, my name's Krzysztof. I teach CS 179, the Design of Usable and Useful Interactive Systems. I feel like very soon I'm going to change the official title of the course because I'm frankly being a little tired of seeing yet another slide tweaking an application that helps college students have better parties. These are useful things, but there's more to the world than just these two things. So who should be taking this class? This class is designed for people who want to have a say in what they code. If all you want to do in life is geek out and try fantastic code and solve fantastic engineering problems, this is not the class for you. If you want to have a say in what it is that your group, your company, your team does, I think this is a very useful class. In this class you will be learning how to look at the world through somebody else's eyes, how to decide what a good problem is, and how to evaluate success -- oh, and by the way, you will also build. So let me say it again in a little bit more detail. In this class we will spend a lot of time observing people and understanding how they function, and looking for problems that they have in their lives that they do not even know that they have -- problems that are so entrenched, so pervasive that we don't even realize that they're problems, that they could be fixed and that they would be valuable to be fixed. These problems are hard to find, because you need to learn how to look at the world from somebody else's vantage point. You will also learn how to invent really creative solutions. You will learn how to explore a very broad solution of possible, very broad space of possible solutions, go beyond the obvious, and then identify the directions that are most promising to pursue. Again, you will learn how to actually create designs and designs that are supposed to serve somebody other than yourself, and in the process you will actually learn what's wrong with you. There's something fundamentally wrong with everyone who is taking a computer science class. You are disqualified from designing products for other people, because you are weird, you are unusual, and it is very important to understand what makes you different from other people and you will learn it in this class. You will learn how to evaluate the products that you build in such a way that you know that others can use it, and you will know whether you've succeeded or not. Finally, as I said, you will build. You will be building mobile applications, and you will be working in teams, and this will be one of the hardest parts of the course. So I'm going to give you a few examples of the types of activities you will be doing in the class, and also a few of the intellectual questions that we'll answer in the class. So as I said, you'll do a lot of observing. You will be looking for ways in which people misuse their tools to find what it is that they really need. You will learn techniques for analyzing qualitative data so that you can actually make sense of your observations, find patterns, and find things that were initially hidden from you. You will be trying out your ideas a lot, and you will be trying them very quickly and you'll be trying them in a variety of different ways. You will be trying to convince yourself and others that you have a story for your product, for your idea, that addresses a real problem and solves it in a compelling way. You'll try out many different solutions very cheaply before you actually invest valuable resources into your final prototype. >> Vanilla Ice concert? Too bad it'll take eight hours of e-mail chain to get the block mates together. >> Oh, I remember when I used to use e-mail chains. >> Oh, hey, Caleb. What do you use instead of e-mail chains? >> Check this out -- OrGroup, the organic event planner. >> So that's a concert. >> It starts tomorrow night at 8 p.m. >> Invite our block mates? >> Yeah, and let's call it Vanilla Ice at Boston. >> Okay. We're done. >> That's it? >> That's right. Now we just wait to see what our block mates think. >> It's somewhat whimsical. It's a video prototype. It's an example of experience prototyping. These guys, once they had a pretty good idea of what it is they wanted to build, they've generated a movie that really in fine detail, tried to tell a story of how their product would be used. After watching this movie, they and their colleagues were able to tell whether the idea makes sense, does it feel right, does it really -- is it really going to make a difference in somebody's life? So that's yet another activity that we will be doing in this class. And you were laughing at the fidelity of the interactive prototype. It was great, wasn't it? It took them an hour to implement, but it was good enough to let you know how this idea would actually work in practice. And then at each stage when you create these prototypes, we'll be using slightly different tools and methods to actually find out, to estimate whether these prototypes represent something that will be successful in practice. One way in which we'll be trying to predict the success will be through peer critiques. Ours is the only class in the School of Engineering that has studio sessions. You will be meeting once a week on Fridays, presenting your work and offering and receiving critiques. You will have to learn how to be insightful, offer critiques that are to the point, that actually uncover something worth uncovering, but are yet constructive and can help people design better things. You will be generating many ideas, so the one idea that you have at the very end is really great. And as I said, you will build great things, things that often that some students choose to pursue even beyond the class. So a few examples of the mysteries that we'll reveal in 179. We'll explore the mysteries of creativity. What is creativity? What is creative? How can you be creative? How can you make your team creative? And is lobster strudel creative? You will learn how people form mental models of complex systems, why they fail in operating complex systems and why you will fail to predict how they will fail. You will learn whether the mouse is ultimate pointing device or whether you can do better. This is the first mouse ever. We'll review the mysteries of whether Mac is better than PC. We will look at how peoples' abilities, peoples' motor abilities, perceptual abilities and cognitive abilities change the very moment we sit up and start walking. Are we the same person? I'll try to convince you that we're actually very different people and that the way we design mobile user interfaces is actually entirely inadequate and we need a fundamental revolution in mobile interactive design. We'll also look at the issues of esthetics, how do you design to elicit various responses in your users, how do you convey different moods, feelings, and other meanings that you want to attach to your design. We will talk about how you can manipulate peoples' perception of time, how you can make them think that things happen quickly when in fact they took quite awhile. We will talk about how you can embed crowds in your programs, how you can make hundreds of people work quietly in the background, contributing pieces to your program's execution in such a way that you end up creating interactive systems that appear incredibly smart, smarter than what any current artificial intelligence can accomplish. We'll also look at how companies and politicians run large-scale automated studies every time you interact with their products and how they use the results of these studies to redesign their products and sites and tools in such a way that you end up doing what they want you to do. And finally, you will learn why hippos are the most dangerous mammals other than humans on the planet. So to take this class, you need to have taken CS50. The timing has just changed. We have just asked the registrar to move the lectures to Tuesdays and Thursdays at 10:00 in the morning. No longer Monday-Wednesday; we're moving to Tuesdays and Thursdays. Apologies for this late change. This class has mandatory studios on Fridays. We may have a late Thursday option, but probably just Fridays, and there will be weekly assignments mostly done in teams. In this class, I try to subvert your understanding of what computer science is. One view of computer science is that it's all about the technology, it's about coding, it's about computers, it's about the green tongue. I believe computer science is -- whoa, I forgot I had this animation; where is it? Okay. I believe that computer science is a fantastic intellectual problem-solving toolkit, a problem-solving toolkit that you can take outside into the real world to solve problems [inaudible], and this is the message of CS 179. Harry? [Applause] >> So as Harry comes up, I just wanted to give a more personal introduction. So Professor Lewis was my own professor back in the day. I took CS 121, which is Theory of Computation, which to be honest, for me was the hardest class I ever took, at least up until that point. But this was this amazing class which in retrospect I actually grew really to love, enough so that I ended up TF'g for Harry some four or five times. His theory of computation, as you'll perhaps soon see, is really about understanding the fundamentals of computers, what you can, what you cannot do to them, and it turns out there are problems out there that simply cannot be solved, or at least cannot be solved in our lifetimes computationally, even as much as you might actually want to. And so what Harry's focus today in particular will be on is a new course, CS20, which is truly one of those courses that I wish existed back in my day, since computer science theory is quite often about proving things. And my only proof background frankly was Geometry and like the angle/side/angle theorems and those sorts of things. And so for me not being really the mathematically-minded kid, and definitely not among those more comfortable, would have loved this basic framework with which you get equipped in CS20. Professor Lewis? >> Thank you so much. It's nice to be back in CS50. CS50 started under a slightly different name when I taught it I think in '82, so I haven't gotten back in front of the CS50 class all that often but it's nice to be here. And I hope you all become computer science concentrators. If by the way, you actually do want to become a CS concentrator and you're a sophomore and you have to declare your concentration this week, I have office hours all afternoon and you can visit my homepage to see when else you can get your form signed. So as David said, I'm here to talk about CS20, which is called Discrete Mathematics for Computer Science and is subtitled "All the math you should know to do computer science that they won't teach you in Math 1 and Math 21 and the Calculus and Linear Algebra sequence." And it's a course we never had, because it was always assumed back in the days when most of our concentrators came out of Applied Mathematics and Mathematics and Engineering, that people could pick it up as they go. But we decided to pull it all together in one place and teach it as a new course starting in the Spring. Now, there's a couple of different ways to talk about what this course is about. This is one way, which is to throw up some topics. So these are terms that may mean something to you or may not mean something to you -- graphs, counting, proofs, logic, probability, number theory, and some of them may be familiar even though they're not familiar to you by that name. But you can go to the CS20 website and look at the list of topics if that's really what you want to know. I suspect that's not what you really want to know, and in some sense that's not even really what the course is about. What the course is really about is teaching you how to think, teaching you how to solve problems of the kinds that computer scientists come across -- not how to write the code for the problems but how to figure out how fast something will run or whether an algorithm is a good one or not, teaching you how to prove things formally, which is a really useful skill in life but is a critical skill in computer science, since very often the difference between code that runs and code that doesn't run is whether someone has applied good logical reasoning to how it works and more generally, as I say how to think. Now, so there's a whole grab bag of tools and processes that we're going to try to teach in CS20, which is going to get us to some interesting pedagogy, which I'll talk about at the end of my few minutes here. But by way of background, this is on the critical path, on the prerequisite path to courses like CS 121 and 124 but we don't expect that many of you, or even most of you who will be going on in computer science will necessarily be taking CS 20. So in particular, if you were on the math team, certainly if you were the captain of the math team in high school, don't take CS 20. You'll pick up what you need to know that you don't get from the regular math sequence easily enough. If you've taken Math 23 or Math 25 or God forbid, Math 55, don't take CS 20. You'll only intimidate the rest of the class. There is a placement test, a sort of a placement test, up on the CS20 website, and there are no answers given, but just see if you can recognize what's going on in a number of the problems, and if they mostly look familiar to you. Even if you couldn't solve them without doing some checking, you're probably beyond CS20. And by all means, stay away if you can't handle a very well-intentioned course that is having startup problems, because anytime a new course is offered there are going to be startup problems and there'll be glitches and it will not be the well-oiled machine that you're used to from CS50. Okay? So with that, let me give you three examples of the kinds of problems -- I say we're going to do problem-solving -- problems that come up in computer science courses. These are all problems that I've taught in other computer science courses that are kinds of things that we'll work through in CS20. And each of these problems has an interesting bit of Harvard history associated with it. So the first one is this. There is -- this is not Annenberg at breakfast time. This is a stack of pancakes, and it is -- I want you to imagine that it is your job as the waiter carrying the stack of pancakes to the table to get the stack in order so the biggest pancake is at the bottom, the smallest pancake is on the top, and the rest of the pancakes are, as we would say in computer science, sorted from largest to smallest. And the question is, how can you efficiently do that? And the answer is kind of an interesting problem, because if you think about it for a little while, there's a fairly simple 2n algorithm where n is the number of pancakes, namely -- well, let's see, there are the five pancakes. The fifth one is already on the bottom, so we don't need to do anything. We can just sort the top four. And one thing you can do is you can flip the top through. Did I say that the only thing you're allowed to do is to grab a wad of pancakes off the top? That's an important constraint, so you grab a wad of pancakes off the top, putting your thumb squarely on the middle of somebody's pancake, and your finger's on the bottom of somebody else's pancakes, and flip the wad over. So if you flip the top two pancakes over here so that four is on the top, then you can flip the whole top four over, and four will be on the bottom, right above five where it belongs, and then you can repeat this process with the remaining three pancakes. So you get two flips to get one pancake where it belongs, and it's going to take about 2n flips, maybe 2n-1, something like that, to get all the pancakes in order. So I posed this problem in the course that's now called Applied Math 107, and one bright young man went off and worked on it and came up with a better algorithm, and let's see if anybody recognizes him. This is a photo from about the time when he did this. Yes. Are you laughing because it's a mug shot, or are you laughing because you recognize who it is? Who? >> It's Bill Gates. >> Yeah, it's Bill Gates. That's Bill Gates, looking like he's going to own the world someday, doesn't he? Anyway, there he was. So he's actually got a paper on sorting by prefix reversal, which is the pancake problem. So the pancakes are just of course some kind of metaphor for some kind of computer operations that you would carry out on data. That's problem number 1, so that's sort of a counting problem. Here's the second problem. So this is the Monty Hall problem. How many people have ever heard of the Monty Hall problem? About half of you -- that's good. But about half of you haven't, right? No, those who haven't? A few, only a few. Okay, well, do this quickly. Monty Hall is a quiz show host. He's got three doors. There's a million dollars behind one door. You're the contestant. He asks you to pick a door, you pick a door. He opens one of the other doors and a goat is behind there, and then he asks you, okay, now that I've shown you where the goat is, do you want to stick with door 3, which is the one you chose, or do you want to switch to door 2. And what's the answer? >> You should switch. >> The answer is you should switch, which a lot of very smart people refuse to believe. And not only should you switch, but you can calculate how much you actually should be willing to pay Monty for the privilege of switching. This is a problem in probability theory. The thing that makes this problem particularly interesting to me was -- I taught this in a course and then I wrote a textbook that came out of the course, and because we were using Monty Hall's name, decided to write to Monty Hall and ask him if he minded if we used his name, and Monty Hall wrote back the following letter. "Dear Larry" -- Larry was my co-author. It said da-da-da, fine, go ahead and do it, but I'd like to ask you a question. You say the player should switch doors. Now, I don't know much about algorithms, but I see it wouldn't make any difference after the player having been shown, why should he attempt to switch to door B? So it turns out Monty Hall actually didn't understand the Monty Hall problem, which is kind of an amazing, amazing thing to think about. Okay, so that's problem number 2. Problem number 3 is not really a problem. I just want to talk about the abstraction of graphs, which we'll spend some time on in CS20. There's a graph. Anybody know what that is a graph of? That is the Facebook social network. It's a graph with about a billion nodes, 800, something like that. I forget what the exact number is supposed to be these days, and countless -- well, not countless, but a very large number of edges connecting the nodes. The cities and countries are just shown there for graphic effect. This is really just an abstraction. And the interesting thing about this and why you might want to think that graphs, too, are something worth understanding a little bit about, is another little piece of theoretical computer science at Harvard history, which is this e-mail which I received in January of 2004. Professor, I have been interested in graph theory and its implications to social networks for awhile now, so I did some research that has to do with linking people through articles they appear in from the Crimson. I thought people would find this interesting, so I've set up a preliminary site that allows people to find the connection through the people and articles or any person that the most frequently mentioned person in the time frame I looked at. This person is you. The reason this person was me of course, was because I had been Dean of the College, and so people were always throwing things at me and it came out in the Crimson pretty much every day as bad old Dean Lewis. That e-mail was of course, from Mark Elliott Zuckerberg, who had taken my Theory of Computation course. I had a very interesting reaction. I said, can I see it before I said yes. It's all public information, but there's somehow a point at which aggregation of public information feels like an invasion of privacy. Imagine that that was my first reaction when I saw this idea with me as the central node in the social network graph. So but then he showed it to me and I said, aw, sure, what the hell, seems harmless, and let him do it, which is my way with students. I'm always quite indulgent about these things. All right, so those are three little examples of quote unquote, "theoretical computer science," and they explain just by way of stories why theoretical computer science is important to know something about. Now, a little bit about the course. This is a brand new course, and it's going to be taught in a new way, never been done before. We are renovating a classroom, which Professor Kgajos is also going to get to use, I believe. Maybe not -- I'm not sure about that actually, but I will certainly be using it. I mean, I don't have anything to say about whether he gets to use it. I think he might have changed his mind about whether he wants to use it. It didn't come out right. So because this is a course that's going to teach you problem-solving, we're going to spend class time, Monday, Wednesday, Friday at 10:00, mostly solving problems in small groups. So we're going to have little rearrangable tables with chairs around them. Think kindergarten, okay, but the idea is that we want people to learn from each other as well as from us, the activity of solving problems. And the role of the staff is going to be to go around and teach and coach people on how to make progress and how to be persuasive that their solutions are actually correct. So that's my two-bit introduction to CS20. Go check out the Web site. As I say, it's good background for all kinds of computer science courses. We don't have it officially in the prerequisite chain to any, because for generations people have taken those more advanced courses without having this in one place, and if you can't figure out from the course site whether you belong in the course or not, by all means consult me. Thank you. [Applause] >> Lastly let me introduce Professor Hanspeter Pfister, who teaches 171, a course on visualization, and it's to Hanspeter that even I often turn anytime CS50's about to release some handout or some chart or some new dataset, we first seek to get his blessing, lest we do the field a disservice. So Hanspeter Pfister'. >> Thank you, David, and thanks a lot of inviting me, giving me the opportunity to talk to you all. It's great to see you all -- wow, it's a big class! So I'm teaching CS 171, as David said. I've been teaching it for about four years now, so this is going to be my fifth year, and I'm very excited to teach this class because it gives me an opportunity to talk about my own research, which is visualization and computer graphics. You probably know what visualization is. It's about conveying information through visual representations -- so graphs, charts, maps, social graphs, tree maps, bubble charts, VAT visualizations, such as election maps, et cetera. And we're going to look at a lot of examples, in the course, and one of the goals of the class s to teach you a form of visual literacy that you may not have yet, which is to recognize when data is being mistreated with a bad visualization. So usually at the beginning of the class, we'll do a quick visualization critique, which will tell you basically what to look for in a visualization, and how you can distinguish good from bad visualizations. And that of course has the goal of preparing you to create your own great visualizations. Now, why is visualization important? Well, first of all it has to do with the data that we're generating today. And I don't need to explain this to you. You're living it. You're actually probably a prime user of data online. Data has become so predominant nowadays that we simply can't make sense of it anymore without computational means. And computation is great, but computation and visualization is even better. So we can look at the data in a way that is not normally possible with just computation. Now, this explosion of information doesn't just happen online; it also happens in the sciences, and as a matter of fact it happens in our physical spaces. So Joe Hellerstein at UC Berkeley calls this the Industrial Revolution of Data. We have machines that automatically generate data. We have RFID tags. We have sensors in our roads and our bridges. We have new scientific instruments that automatically scan the sky or that look for subatomic particles, et cetera, and of course we have all the mobile devices. And everything around us is collecting data, either with our knowledge or without, so the amount of data is growing exponentially and there is no end in sight. So having a way to make sense of the data is crucial, I believe, in basically every job that you could imagine today. Now, looking at data this way isn't very useful. It's probably a good first step, but it doesn't tell you much, and all you see is a bunch of numbers. So visualization allows you to look at your data in a form that is more digestible for humans, and that is through the visual system. Now, one of the main goals of visualization is to analyze data, so it allows us to dig into the data in a way that is not possible without interactivity and without a graphical output. So here is an example of the baby name wizard. Let me just ask, how many of you have looked at the baby name wizard? Nobody. How many of you have babies? Okay, that explains it. Let me tell you, once you have a baby on the way, this is going to be a favorite activity of yours and your partners, which is to bicker about baby names. And Martin Wattenberg, who is now at Google and who, in between did fabulous work at IBM with the Many Eyes system, created this visualization actually in the service of a book that his wife wrote. So the book is basically statistical information about different baby names and the frequency at which they are used in the United States. And what you're seeing in this visualization is I believe the top 1,000 baby names in the U.S., going back to the 1880s. So this is going back a long time. Now, this graph doesn't look very informative. It's basically a stacked area chart, and you can mouse over it to see the different names, but it doesn't tell you much until you click at one of the names. And now you're seeing basically the frequency at which this name has been popular over time. It's important to look on the axes here, so on the right we see the number of babies per 1 million that have this name. And as you see, Anthony is still pretty popular, maybe a little bit on the decline. Now, this is an interactive visualization, so we can actually start typing here. So here are all the names that start with "A," and we can look for example for Ana. And as we see, Ana used to be very popular and kind of lost popularity and might now be a little bit more on the upswing. So tell you a little personal story. When we named our first daughter, my wife's name is Jennifer and she had about six Jennifers in her class. It was a very popular name. We can actually look at that -- Jennifer. So that peak, that's when my wife went to school. So we wanted to give our daughter a name that wasn't so popular, and we thought, well, let's go to some older names, and so we figured, well, Lilly, Lilly's a great name. So we named our first daughter Lilly. And sure enough, other people had the same idea. The point here is not to talk about baby names, of course; the point here is to talk about interactive visualization and how it's important and useful if you want to analyze data. It's a lot more fun to play with this application than to read the book. Actually, the best thing is to get the application and the book and do both. Another example is from the "New York Times" where you can use visualization to reveal patterns. So here is one of their many great examples of visualizations. This shows you how different groups spend their day, and they collected information that is publicly available about how different groups of people spent their day. So you can recognize some activity. So we're sleeping, we're watching some TV, we might be traveling or we might do household activities. Of course, every now and then we do some work or we eat, right. And now we can drill into this data. We can look at how employed people spend their day compared to let's say, unemployed people. What do you expect will happen? Less work, right, but more TV. We can look for example, at the mix of high school graduates versus bachelor's versus people with advanced degrees, so here are high school graduates, here are people with a bachelor's degree. They tend to work a little bit more, and people with advanced degrees tend to work even more -- or maybe they eat more, I don't know. If you don't have children, versus if you have one child, you can see that you tend to watch less TV. If you have two or more children, God bless you, and you can't watch TV at all. Anyways, so this is just a really nice way to look at this data and to reveal patterns, and of course, I can also click on this and get the actual distribution, and again, I can still drill into it and see how that distribution changes. So revealing patterns through interactive visualizations, again, something that is not possible if you look at this data just in a spreadsheet. And finally, visualizations are great for presentations, right, so when you present something at your future workplace, or even now, when you're in school, it's important to have good visualizations of your data. But it's also important to tell a story with our data. And again, the "New York Times" is doing a fabulous job at this. And I'd like to play you a movie where they talk about what I would consider an arcane baseball fact, which probably some of you are recognizing it a really important thing, which again, I didn't grow up with baseball so I may not recognize it as that, but the visualization that you'll see and the story that you'll hear will convince you that wow, this is really cool, and just looking at the raw data wouldn't actually have that effect. So let me play this. >> Mariano Rivera is one of the most dominant closers in history. But what may be most remarkable is that he's done it by confounding hitters with mostly one pitch, his signature cutter. John Flaherty of the Yes Network, faced Rivera as a hitter and also caught him when he played for the Yankees. >> From a hitter's standpoint, he's out on the mound. It feels like he's not even putting any effort into it and the ball explodes on you, and from a catching standpoint, he's the easiest guy ever to catch because he throws the ball right where you want it. >> Rivera uses a seemingly effortless delivery, which he can flawlessly repeat, pitch after pitch. His cutter is thrown very much like a fast ball, but the pitch has significant lateral movement. He creates and adjusts this movement with the different pressure he puts on the ball with his fingers. The pitch's lateral movement keeps it off the bat's sweet spot, moving in on the hands of a left-handed batter and toward the end of the bat of a righty. To a hitter, Rivera's cutter first appears like a straight fast ball, making it hard to distinguish the two pitches during the first fractions of a second when the hitter must decide if, when, and where to swing. Hitters often rely on reading a pitch's spin to determine what pitch is coming, but Rivera's fast ball and cutter have what appear to the hitter as the same spin. Many pitchers throw their cutters more like sliders, with their fingers pulling down on the side of the ball. This can create more downward and lateral movement than a cutter, but it also creates the signature spin of a slider, a spinning red dot that the hitter can recognize and adjust to. With identical deliveries and spins on Rivera's pitches, hitters are at a loss to identify and then attack a pitch until it is too late, and the balls end up in very different locations. Here are the nearly 1,300 pitches that Rivera threw in 2009, each frozen at the point when the batter must make his swing decisions, but with few clues to determine the pitch's ultimate location, the batter can be faced with guessing at these outcomes. Here are the cutters to left-handers. Here are the cutters to right-handers and fast balls to right-handers. He throws almost no fast balls to lefties. As this map of his 2009 pitches shows, Rivera is remarkably adept at hitting the corners, keeping the ball away from the middle of the plate, the easiest spot for a batter to make good contact. >> Looking from this perspective it's not surprising that the real hot spot is that side on a leftie. I think he can hit that spot with his eyes closed. >> Rivera's simple but effective formula has made him baseball's most dominant closer. >> How many of you knew what a cutter is before this? Okay, a good ten percent -- that's good. So consider this. You're reading this story in the paper with text only, right. I think you'd have a very different impression of what's going on, and I probably wouldn't read it because I'd be bored after the first paragraph. And then consider the other alternative is you look at the raw data. The raw data of course, is trajectories of his pitches over a season or more, and just looking at that data in itself wouldn't be that interesting. What's really making this come alive is the story that they're telling around it, and this combination of storytelling and visualization that I think makes the point very effectively and that will be memorable. So visualization help you analyze, they help you reveal pattern in your data and they help you tell stories or present your data. And as Donald Norman, a famous user interaction person said, it is things that make us smart. It goes back to the beginning of humanity. We invented writing to make us smarter, we invented graphics to make us smarter, and visualization is on the same line of evolution. And why's that? Well, it's because we have limited cognitive abilities. We can only store so much information, and we'll discuss that in the course and we'll show you many examples of how the perceptual system is very good at revealing patterns, and also how sometimes it really fails us. So visualization helps us think. Many of you might use doodling or sketching or mind mapping even, to help you think about certain problems. It reduces the load on the working memory, which is very limited, as I mentioned, and it offloads cognition. It's sort of an external device that helps us think about the data and about problem. And of course, it uses the power of human perception, so we have about 50 percent of our brain dedicated to visual processing, and it's our most important organ, and so it's just fitting that we use it to analyze data. So the goals in the course are first of all, to teach you principles of effective visualizations, and as I mentioned to give you a visual vocabulary or a way to think about good and bad visualizations. Then we'll learn about gathering data. We'll use Python as our tool, so you'll learn how to scrape Web sites for your pleasure and hopefully not illegally. We'll also learn how to implement interactive visualizations using a language called Processing, which is a Java framework that is relatively easy to pick up and that allows it to create interactive visualization such as you've seen in the baby name wizard. And throughout the course, there'll be a mix of programming homeworks and there'll also be a final group project where you get to create your own visualizations. The group project is really sort of the culmination of the class, and we make it as open-ended as possible, so the description is, you find a problem that's of interest to you, you have to collect data that solves some of the questions you might have for this particular problem, and then you create an interactive visualization to dig into the data and to hopefully answer your questions. So let me show you couple of examples of what previous students have done in the class. First is a few screenshots. So this is a project that was done by Samir Paul and Jesse Rader about three or four years ago, and it's been very popular because it's a visual interface for the queue guide, and it allows you to filter the data. You can adjust workload rating, you can adjust the difficulty, and you can find of course, the scores of all the different courses, and the bar charts give you the distribution of this course. And it's great to find the easiest courses that give the highest grades. Jason Gao looked at the distribution of energy consumption on the Harvard campus. Again, this is data that is available but that nobody really looks at because it's just in a spreadsheet, so by putting the data onto a map and by making the map interactive, we could dig into the data and find at what time of year which kind of facilities and houses used what kind of energy. Naveen Sinha is a foodie. He's now a grad student at CES, and he created a visualization to look at different restaurants in the Boston neighborhood, and he compared the restaurant ratings of four different publications, the Boston Magazine, the Globe, the Herald and the Phoenix. And as you can see, these ratings are quite different and so he normalized them and he looked at the distribution, and you can actually look at that and you can select restaurants based on the ratings or based on what you might perceive as the best rating. Karen Hansen, she is a distance student. She actually lives in California, so CS 171 is also taught through the Extension school and we have about 20 to 30 distance students each year. And she was in the Admissions Office at UC Berkeley, and she looked at the distribution of students that applied to UC Berkeley and the distribution of how many of those got accepted and where they came from, et cetera. Actually this would be fun to do for Harvard, I believe. Xiao He looked at publications and how they change over time, so he looked at publications in the Systems Biology department, and was particularly interested to see, did the publication pattern change before or after tenure. And I'll let you free to guess what the pattern was he found. And it'd be actually fun to do this on a larger scale for different departments, such as Computer Science. And lastly let me show you David Jacopille's project, which he produced last year. In each of the projects we ask the students to create a little screen cast, two-minute screen cast that we show at the end of the course. And so I'll show you David's screen cast, just to give you an idea of the interactive visualization that he created. >> Hi, I'm Dave Jacopille. I love the New York Times, but I couldn't tell you quantitatively why, so I decided to get some data and visualize it. My dataset includes 24 years of New York Times articles, from '87 to 2010, 2.2 million articles total. Those articles are positioned here on the timeline. By hovering over the timeline you can see a one-day bin of all the articles that day, particularly around 2 to 300. The World Map highlights all of the countries mentioned that day and helps give an impression of the global emphasis of the times. Pending across the timeline shows Africa being the most lightly covered. By hovering over the countries on the map, you have a different view of the data. Here you are seeing all of the days that a country was mentioned in the Times. China is clearly on the Times radar. Nvidia was covered about every ten days. Going back to the timeline, you can see there's something plotted that's kind of spiky here. This is currently article count, and these spikes are the Sunday Edition. Apparently it isn't just advertisements; you get twice the articles on Sunday. If we zoom out, take a look at all 24 years, you can see larger trends. Article count, for instance, is dropping a little bit in the past couple of years, but to compensate for that, if you switch to words per article you can see there's a bit of an upward trend in the words per article. Articles are getting a little longer. Another interesting discovery was positive words. I made a positive word list, and all of a sudden in '97, there was a huge uptick in positive words. Maybe somebody sent out a memo that asked everyone to be more upbeat in their reporting. Finally, whatever view you're in, there's a sidebar over here that provides a lot more details. >> So I think this is a good example of first of all, collecting data. He mentioned in the beginning he collected 2 million articles, which wasn't a small feat, actually. He had to do a lot of web scraping to get that data. Then he had to process the data. Usually when you collect the data it's not in a usable form, so he wrote some Python scripts to clean up the data to get rid of filler words, of articles, et cetera, and finally he created this visualization. And I think it's a nice mix of interactivity and sort of insights in filtering. So CS 171 is online. You can go to CS171 dot org to see more, and you find a syllabus there and the last year's schedule of lectures. In the spring it will be very similar to what you're finding there now. Just to give you a list of the topics here, so we're starting out with some fundamentals, talking about the perceptual system and what makes good and bad visual encodings. Then we talk a little bit about interaction techniques. And then we start to talk about different methods to visualize statistical graphs, maps, trees and networks, and of course, higher-dimensional data. I should actually mention to Harry, the Facebook data you showed, which is a wonderful example of network visualization, didn't actually have any plot of the countries on it. This was just the raw Facebook data, and the amazing thing is that the countries pop out visually because of course, the data is distributed on the land masses. So we end up with a few guest lectures. I usually have a great line-up of guest lectures. Ben Fry, the creator of Processing, Martin Wattenberg, whose baby name wizard you just saw, and his partner, Fernanda Viegas, they're not at Google doing great visualization work there, Bang Wong from the Broad Institute and Janet Iwasa, talking about biological visualization, et cetera. So we end up with a really nice overview of what visualization practitioners do currently in their daily work. And I believe that's it. Oh, I should mention the course takes place Tuesday and Thursday, 2:30 to 4:00 p.m., and we have sections on Fridays that you are highly encouraged to attend; however, they're not mandatory, and during sections on Fridays we usually talk about some of the implementation details. For example, if you don't know Python, we try to give introductions to Python or Processing or other tools that you might use through the course. Thank you very much. [Applause] >> So it really is quite remarkable, to be honest, how much the CS Department has evolved, even since when I was here. Back in my day you took CS50 then 51 then 121 then 124 then 141 -- there was very much a linear path through the department, but as you saw at the beginning today there's so many directions in which you can go, even if you're not looking to major or minor in computer science. As our three presentations today here suggest, it really is applicable to all sorts of domains. And for all of those other courses, realize that we will post later today at CS50 dot net slash lectures, five or ten-minute clips from a whole bunch of other CS courses, if you would like to shop those virtually. In the meantime, tonight we have office hours; tomorrow night as well. Wednesday is quiz 1 and next Monday is your very last CS50 lecture ever. We will see you think. Ah, this so sweet! See you then!