[ Silence ] >> Welcome to almost the end of CS50. This is week 11. Week 12, as you know, is the end game. So last lecture this coming Monday. Quiz 1 is this coming Wednesday, so a few announcements about that. So quiz same place as last time. See the hand up that's on the website if you haven't already. I'll be holding those last minute late night panick office hours Tuesday night in [inaudible name] Cafe on the ground floor at 8 p.m. till whenever. If you have questions that you would like to address at that point in time, and the TF's and CA's have some as well this week. Coming up is P set 5's deadline. So this little guy has yet to have anyone submit photographs of him, and my God, I walked all over campus for like 2 hours with that friend taking these photographs, and I know some of you guys have found some of these photos because Ken and I and Alex and Jansu have all been photographed; but no one's actually submitted their little map of photos. You have till Friday at midnight of this week for that amazing prize coming up next Monday. So also this week we have this little blurb, the Harvard extension school has put together this program here, very fancy invitation here. There is a panel discussion with a number of alumni and fairly famous people led by our own Harry Lewis, a professor in the computer science department. It's entitled "No More Teachers No More Books" higher ed in the networked age. This is this Wednesday at 4 o'clock if this interests you. And then they've also invited some of us, 10 or fewer, to the faculty club afterwards. So we thought we'd turn this into a faculty club reception with David and team if you would like to join us. Go ahead and go to CS50.net/rsvp. Among other things you'll have chances to chat with faculty, myself, some of the TF's and CA's, also Craig Silverstein who is Google employee number 1, and who's actually on this panel that Harry is moderating. So take a peek at that if you're interested. This, I thought, was kind of timely. So I was literally riding the T this morning, and the Metro, this free paper in Boston had this ad here, or this article. New Best Friend for MBTA Busriders, and it goes on to discuss an API that the MBTA has just released in hopes, frankly, that other people will now implement the rest of this tool for them. So they have opened up a shuttle and API by which you can find out the next times of buses actually arriving in real time. So not printed schedules, but actual real time data for a number of the routes. So frankly, even if you submitted your proposal already, do feel free to bite off something that's perhaps a little different if that appeals. Also today at 3 p.m., so half an hour after lecture, Apple Computer itself is going to be offering a seminar on iPhone development. So whether you're tackling an iPhone app project, or even are just curious, or like pizza, come by at 3 p.m. Maxwell Dworkin 119. That's 1 floor up on the computer science building. And what else? We got anything else here? OK. So I'm told UC elections are in progress. And I'm a bit of a packrat when it comes to maintaining archives of things, so look at what I found. [ Laughter and applause ] So I'd like perhaps to take a chance to re-declare my candidacy some 10 years later. This is perhaps an example of how not to do web design, or this is what happens when you try to design a website around an animated gif that you just really want to use for some purpose. If you actually click through this you'll see my very verbose platform that I had at the time, including a letter to freshmen, some campaign posters. And just to give you a little retrospect from 10 years ago, the undergraduate counsel, so this is 1996 or 1997. The undergraduate counsel today suffers from 2 problems. Not only does the counsel lack the attention of the student body, it lacks respect as well. As recent voting turnout suggests, most students neither know who their representatives are, nor do they care. Lowell House, our favorite house given the number of you in this class, for instance is home to some 450 students yet only 25 of them voted in 1997. So I do hope that things have come a long way since then. This platform didn't take me very far, clearly, but good luck to those in this year's elections, which I gather are now in progress. So please, if nothing else, no animated gif's in your own projects. Finally, I've been told to wear this today. Our own Yuki Machita has been designing our CS50 store. It's store.CS50.net. And I'm told those of you who celebrate, they make wonderful Christmas gifts. So keep that in mind perhaps. But today is ultimately about life after 50. This handout that you have here is an excerpt from the course catalogue, and all of the courses that you are qualified to take after CS50; some of them this spring, some of them next fall, and 1 or 2 of them only being offered the spring thereafter. So today realize it's not just about advertising some courses that we think you might want to take, but it's also meant to be a little more academic than that, and to give you guys a sense of what more there is in computer science. So many of you statistically will consider CS50 to be a terminal course. You got out of it what you want, and that's sort of enough for you. But realize that there's a lot of holes that you can begin to fill in. Realize that this course surveys a lot of fields, a lot of subfields that you can really only begin to appreciate and master by taking full fledged courses in these fields. So what we have today is a team of computer science faculty who will be giving you a taste, yes, of these 5 courses that you might take this spring or coming fall; but also a taste of computer science itself. So realize that there's so much more out there. You've only begun to scratch the surface, and it really takes a good amount of time and attention to really feel like you've honed this stuff. So with that said, allow me to introduce Professor Greg Morrisett who teaches computer science 51, which is historically, sure OK, which is historically the follow up... [ Applause ] ... to computer science 50, and this is the course that I myself took as a second course in computer science. So it's a little close to my heart as well. >> Professor Greg Morrisett: Thank you David. Good morning, afternoon... sorry. What is CS51? CS50 is entitled something like programming. What's it called? What's the official name? Programming 1 or something, introduction to computer science 1. 51 has the wonderful title of introduction to computer science 2. So what's the 2 all about? Well I hope most of you realize at this point that programming is not that hard. I teach 5th graders sometimes how to program. Scratch is kind of fun for doing that. What I have learned is that you can teach just about anybody how to program. But it's very, very hard to program well and that's what 51 is all about. It's not just programming, now it's taking it to the next level and seeing how you can program well. What do I mean by well? You need to be able to write code that's reliable, efficient, that it obeys the contracts that it has with the outside world in terms of correctness, in terms of efficiency. It should be testable. The ideal code you should be able to prove correct. It should be easy to maintain. It should be beautiful, something you want to take home and show to your mom. Really good code is elegant. And that's what 51 is about. The other part of 51 is that it's supposed to expand your problem solving skills by making it possible to quickly and easily recognize the right tool for the job. So the right tool might be a data structure, algorithm. It may be a conceptual way to think or frame the problem, or to break it up into more easily solved subproblems. It may even be a programming language. And the prime directive of CS51 is that at all costs to be as lazy as possible. A really good programmer is so lazy they will go out of their way to write code to make it possible to not write code in the future. So they pull out the right things into libraries, into reusable data structures and algorithms. They keep interfaces small and simple, and they reuse somebody else's code whenever they can. But 1 other thing that they do, and the thing that I'm going to highlight just today, is they pick the right programming tool for the job. So they may pick the right programming language for the job, and to give you an idea of this I want you to think just for a second as if the Arabs had never invented the numeric notation that we use today. And instead we were still stuck with Roman numerals. Imagine trying to write an algorithm just to multiply 2 Roman numerals. It's actually very hard to do. About the only way I know how to do it, that's even reasonable, is to translate them first into Arabic numerals, some form of numbers and do the algorithm at that level, and then convert them back. So the way, the language that you have for conceptualizing your problem can have a strong influence on whether you can find a solution and how efficient or how elegant the solution is. Now it's true that often times we don't have the luxury of picking the language or the tools that we're using for a particular programming task, but if you can think in a different language, at the right level of abstraction about a problem, and if you understand how those mechanisms are realized, how they're implemented, then you could take those ideas and translate them back into the programming environment that you are forced to use. And that happens day to day in real programming tasks. It is not thinking necessarily at the level of C code, but thinking at a much higher level of abstraction and then be able to decompose the problem and translate it back down. So here's a simple example. This is 1 of the data structures that we'll look at. It's the red-black tree. It's a data structure that goes back to Sedgewick and... Michael's laughing. You cover this in 124? Nope. That's why we do it in 51. So it's a balanced binary tree. There are many flavors of these like B trees, 23 trees, AVL trees. Very useful for representing sets or finite maps, or tables when you need to have efficient support for lookup, insertion, deletion, and so forth. The key thing about red-black trees is that in order to maintain good asymptotic balance for lookup or insertion, you have to keep the tree balanced. And these 2 invariants up here, these 2 properties, are properties that ensure that your tree is always bushy enough that you can guarantee logarithmic access time for insertion or deletion. So these 2 invariants say that every node, red node, has a red child. No red node has a red child, and every path from the root has the same number of black nodes. So it's just a way of laying out the tree in a balanced fashion so that no chain is too deep. And if you insert a new node and end up breaking these invariants, so here we have... this new number, 25, inserted into the tree. We've broken that invariant. Then you have to reorganize the tree on the fly to restore the invariants. OK? Now the algorithms for doing this, if you formulate them in 1 language they can be very tricky to get right. And if you formulate them in another they can actually be quite simple. So here is the code for doing insertion and re-balancing in a high level language called ML. It's a functional programming language, and it's 1 of the languages that we'll be looking at, and provides certain features such as pattern matching on algebraic data types that makes writing this code essentially a 1 pager, or a 1 slide. If you look at your favorite algorithms textbook for example, Sedgewick has an algorithms textbook, you look at the code for doing this and see it's 4 times the size of that code up there. Here is the code for doing insertion, except I ran out of room and actually have to double the amount of code down here and repeat it except swap red and black. And also I didn't show you the subroutines, tree insert, left rotate and right rotate. This is what left rotate looks like. And right rotate is just like it except you sort of flip things around and make... so the point is that actually staring at this code and deciding whether you've got it right, and whether you're respecting those 2 properties can be very hard to do. And if you test the code, you may find out that in your particular test cases it seems to pass all the tests. But if you've got a really tricky problem, you'd like to have more than just testing as your form of assurance. You might like to go back and formally prove that your code respects these properties that I mentioned. Proving that your algorithm is correct is pretty easy on 1 page of code. It's certainly a lot easier than doing it on 4 pages of low level code. This code also has other advantages. It's safe with respect to exceptions, and multi threading, whereas the C code is not. And how do you get those properties? Part of that is because the language is forcing you into a certain discipline that gives you those properties for free. On the other hand this code is doing functional updates to a data structure, whereas the C code is doing imperative updates. And how do you efficiently translate a high level language like this down into something like C code? That's something that we're going to be looking at in 51. The goal in this class is to give you an understanding, not just of 1 programming language, but of a bunch of linguistic abstractions that you're going to run into over and over again; although usually they'll have different names like Microsoft will give it some name, and Google will give it a different name. And you want to be able to recognize these abstractions and apply them wherever you are. You may be operating in an OO language and you need the concept of a closure. How can you represent closures as objects? How can you represent objects as closures? These are some of the ideas that we'll be talking about in some of the linguistic mechanisms like laziness, that we'll be using in the class. We're also going to be seeing some exposure to software engineering techniques. I call these software engineering in the small. So instead of training you how to code 50 million lines of code, which is basically something nobody knows how to do well, witness Microsoft. We're going to be talking about at least little things that you can do to make your code more readable, more testable, more maintainable. So simple things like modular design and how do you use language mechanisms to get modularity? How do you do integration and unit tests? And how do you read somebody's code in a critical way? How do you do code reviews? We're also going to be looking at abstract models of computation beyond simple programming languages. So we'll be looking at for example, models for space and time, and this is a good prelude into 124 which Michael will tell you about. We'll be talking about models for reasoning about code at a high level abstraction, like a substitution model of evaluation. We'll also be talking about how to prove properties of code, like how do you prove the correctness? Or how do you prove the asymptotic efficiency of your code in a very rigorous, informal, mathematical way. OK, so that's really all I have to say about 51. I'll be around the whole time though, in case you have any questions. Let me now hand the mic over to Matt Welch, who's another professor in computer science, and he'll be telling you about CS61. If there are any questions while he's coming up here, I'll be happy to yield. No? Good. [ Applause ] >> Matt Welch: I'm assuming there's... Hi everybody. I'm going to do the rest of this in Latin because we are standing with these esteemed gentlemen on the left and on the right. So what I want to do is talk to you a little bit about CS61. CS61 and CS51 are like 2 sides of the same coin. They're both intended to be classes that are taken right after taking CS50. In fact a lot of freshmen are also taking CS61 this semester. CS61 is an introduction to computer systems and how computers work internally. So the basics of the class, we meet on Tuesdays and Thursdays at 2:30. The pre-requisite is this class, or just any C programming experience. You can in fact use CS61 for the CS concentration breadth requirement that is to satisfy the need to have several classes with different middle digits. You can also use it for the CS secondary. We generally say that most students who have an interest in CS, whether you're going to be a concentrator or just do the secondary, should take both CS51 and CS61 because they really compliment each other nicely. So what's CS61 all about? Really this class is about understanding the guts of how computers work, getting under the hood and really driving down on the things that matter to you as a programmer in terms of the performance of the code that you write. So what we try to do is understand how a processor works internally, and then how do you write good C code that can map well onto what the processor knows how to do. This is extremely important in real world because in the real world you're going to be expected to write programs that perform well. And that are, of course, also correct. We'll talk a lot about caching and memory management. That is, how to lay out the data structures in your program so that they give the best performance on the processor and the memory architecture of the machine that you're using. We'll talk a lot about concurrency. So every computer that you can buy today basically has multiple processors, multiple cores on the same processor chip. And in order to make use of those you generally need to write programs that can do multiple things simultaneously. Threading and processes are 2 straightforward ways of accomplishing that, but writing this kind of code is very complicated because the threads have to interact with each other. So this is 1 of these things that you need to understand how to do. And just generally how to write rock solid and fast systems codes. CS61 is extremely practical if you want to improve your programming skills. The reason that I introduced the class is because we saw that there is a huge gap between many of the concepts presented in computer science classes in the reality of how real systems work. And so as you learn how to program in higher level languages like Java or ML, that you do need to have an understanding of how those map down onto what's happening at the physical electronics inside the computer. That's really critical for performance. So you need to understand operating system... you need to know these details to understand things like operating systems, databases, compilers, and so forth. Let me tell you a story that I think motivates the need for a class like CS61. How many of you have heard of this case of Ken Thompson who is 1 of the coinventors of Unix and C, hacking the original Unix system? Has anybody heard this story before? This is a fascinating lesson in why not to trust the code that you're running. So Ken Thompson, he coinvented the Unix operating system. He won the Turing Award, which is the computer science community's equivalent of the Nobel Prize. The Turing Award is awarded once a year to an emminent computer scientist, and it's a really, really big deal to win the Turing Award. And as part of winning the Turing Award, the winner is asked to give a lecture on their work. And during his Turing Award lecture Thompson made the submission. He basically reminded people back in the early days of the Unix system he wanted to be able to log in into any Unix computer that was installed, without having to have an account. So what he did was he hacked the log in program, and added a backdoor. Yeah? So he added a special password that only he knew, that if he tried to log in with a password then it would let him in. Yeah, OK? And that was really great for debugging. He wasn't trying to do anything malicious with it, but he wanted to understand what was going on inside. The problem was that everybody had the source code, back in those days everything was open source so if somebody happened to look at the code for log in they would notice that there's this backdoor in the code. He didn't want people to know about this. He wanted to cover his tracks. So what he did was he hacked the C compiler, and the C compiler then understood when it was compiling log in.C to add the code for the backdoor to it before compiling. Anybody see what's going on here? Yeah? So if you looked at log in.C itself, you would not see the backdoor code. Alright, so what's the problem? This works right? So if you're looking for login.C you won't see the backdoor, but there's somewhere else you could look that you'd notice where the backdoor is. Where is it? In the C compiler code itself. Remember the source code for everything, including the C compiler, was just sitting right there on every Unix system. So you could just look at the source of the C compiler. So here's what he did. He hacked the C compiler to recognize when it was compiling itself... to add the code to add the code to login.C for the backdoor. This is a true story. I'm not making this up. And so then he deleted that version of the C compiler, so now we have a C compiler in binary form only that would always, everytime it compiled itself, would inject this additional backdoor code into the new version of the C compiler, which would then inject the backdoor code into log in. OK, so this is amazing. This is just so cool right? It's just extremely nefarious. The only way that you would have been able to understand that this was happening is if you were able to look at the machine code for either the C compiler or for the log in and notice that there's this backdoor there. It was never present in source code form. So there's no source code that represents the backdoor anymore. This is an interesting problem. In CS61 one of the key skills you're going to gain is being able to read machine code and understand what the machine code is doing. So what we're going to do in CS61, learn how machines really work. You're going to really be an expert at using things like GDB. We're going to debug the hardest and the most interesting bugs. We're going to drive down to the machine code level in order to do that. Hacking binaries for fun and profit, 1 of the labs in the this class, I'm going to give you a broken binary and you're going to exploit a buffer overrun bug in that binary and cause that binary to do things it was not designed to do. This is basically the jist of how most viruses and worms propagate. You're going to measure and improve the performance of your programs. We're going to be very focused on getting good performance, and you're going to write multi threaded concurrent programs like a professional. So you're really going to understand how to exploit all those extra cycles and cores on your machines. So this sounds like it's a lot, it's hard, but I want to emphasize this is an introductory class. All of you can take CS61 and be successful in it. So it's challenging but it's a lot of fun. This is not intended just for hardcore CS concentrators. We have 5 lab assignments. The first one I give you a binary and it asks you for a series of 7 passwords. You have to get all 7 passwords right. I don't give you the source code. So how do you figure out what the passwords are? You have to disassemble the binary, read the machine instructions, and understand what it's trying to ask you for. This is a lot of fun and it's quite addictive. You'll do this buffer overrun bug, you'll implement your own version of malloc. I promise after doing this you will understand how pointers work. Write your own Unix shell and you're going to build your own concurrent multi threaded internet service. So there will be 1 midterm exam and 1 final. So that's basically what the whole course is. I'm going to skip over this. You can go look on the website to look at the syllabus. And I just want to leave you with this. So this is my favorite way of representing what CS61 is. How many of you know what this is from? This is from the Matrix. So this is how I think of it, which is CS61 is the red pill. If you take CS61 you're going to understand what's going on at all these lower levels of what's really happening inside the system. So I encourage you to take CS61 and I will show you how deep the rabbit hole goes. Thanks. [ Applause ] >> Well we'll switch the movie real quick. >> This is my former professor, Professor Michael Mitzenmacher, who teaches computer science 124. >> Professor Michael Mitzenmacher: So I'm really into algorithms and how they can be used in actual reality. So to start with I thought I'd, before explaining the course, explain some of the research I do by way of a movie; try and give evidence that it's not just that we come out here and say yeah, algorithms are important and useful and allow you to do cool things. But actually like to show you that they allow you to do cool things. So hopefully this will go. [ Silence ] >> In a perfect hash table we can look up any rhythm in constant time. We construct perfect hash tables on the GPU using a hybrid of 2 randomized constructions. A thread is a sign to each item. First it leads the randomly chosen top level hash function, and uses it to choose a bucket to hash to. The item then commits a counter for it's bucket using an atomic operation. We prefix on the array of final counts, at which point we know a base and an offset for each item. We allocate memory and then each item writes itself to the location defined by it's base plus offset. Within a bucket, we organize the items using Cuckoo hashing. The table is broken up into 3 subtables, each with it's own hash function. Items that cannot fit into the first table proceed to the 2nd one, and items that cannot fit into the 2nd table proceed to the 3rd. An item that fails to fit into the 3rd table, like H, goes back to the 1st table and evicts the item currently in the location that it wants. The eviction process continues until all items have found a spot. Each bucket is processed within a block. We write out the Cuckoo hash tables with all the 1st tables together, then all the 2nd, and then all the 3rd tables. This allows more coalesced access for parallel lookups. Hash tables are useful for representing voxel live surface data. Only the surface voxels are stored in the hash table, greatly reducing the storage requirements. This is called spacial hashing. Here we are using spacial hashing to find the interception of 2 moving surfaces. At every frame we read a point cloud for each surface, and construct a hash table for the voxels containing the points. We find the green intersections by looking up the voxels of 1 surface and the table of the other. We use the hash tables to flood fill along the surface to find a blue region inside the other object. Here we compute a in difference, subtracting 1 running man from the other. The user can move the men around interactively. We achieve a frame rate of about 27 frames per second using a virtual voxel grid of size 128 cubed with surfaces of roughly 160,000 points. [ Silence ] Matching is another important application of hash tables. Here we have 2 different images of a carved relief from Persepolis. We use a sobel edge detector to select distinctive feature points in both images. We encode triples of points from 1 image in a way that is invariant to translation, rotation, and scaling, and store the codes into a hash table. Then we look up triples from the other image in the table. We find the match, we look for a correspondence. In this case the top ranked correspondence matches the 2 images almost perfectly. This technique is called geometric hashing. In this clip of 2 people fighting, you look for the following man on the upper left. The graph shows the number of votes for the best correspondence in each frame. The best match usually aligns the query shape to the man in black, but when he bends down the algorithm cannot find enough matching triples. When there is a perfect match we find it. Here we recognize some of the monuments that Mac Harding dances in front of in a semi famous You Tube video. This image of Saint Basil's Cathedral at the Kremlin matches Mac's video. It does not match this Bangkok street scene. Here we match this image of the Taj Mahal to the video. We don't get a good match at Progue Castle, as expected. We match a video frame, including buoying for all potential correspondences, in about 2 seconds on average. The construction of the hash table for about 3 million items takes 44 milliseconds. Thank you for your attention. [ Applause ] >> Alright. So I'm not promising that the end of 124 you'll be able to do that necessarily, but I am promising that you'll get a lot of the skills that you will need to go forward into your further classes where you will be able to take algorithms and data structures, and find cool things to do with them. I view CS124 as putting the real science in computer science, that it's going to give you the tools you need to do a lot of really interesting things all the way down the line. [ Silence ] Alright, so the course goal for this is to provide everyone with a solid background in algorithms and data structures in preparation for future work, graduate level work, in algorithms, further work in your undergraduate classes, or for jobs in industry. One of the things that I'm always excited and relieved to hear afterwards is that when people go out on their job interviews, they always come back and tell me that they were asked questions that have... exactly do with something they learned in CS124. This is the sort of skills people are looking for when they're looking for a job, not just that you know how to program, but that you know how to think about problems, that you know the ways to attack problems, that you know the sort of tricks and techniques that can be used to solve problems to get working code solutions. And so the key course goal is to actually get you ready, whether this means you're going on in your studies, or going on in the work world. So the class set up, I welcome people from all different areas. Certainly the bulk of people tend to be computer science and applied math concentrations, but we also get people every year from biology, physics, economics, mathematics, and other areas because more and more as other fields are doing more and more computing, the people in those fields are finding that they need to understand algorithms and how they work. The assigments in the class, because I do believe in algorithm practice and not just algorithm theory, they tend to be a mix. There are certainly going to be plenty of theoretical and mathematical work, but we're also going to have programming assignments. My joke is I find that this maximizes the feelings of unhappiness. The mathematicians are unhappy because they have to program, and non mathematicians are unhappy because they have to do math, but really is the best way to get both ends to understand not just how algorithms... to think about them abstractly, but also to implement them and how... the tangible gains they can actually get you in real problems. So this class, CS50, is the minimum prerequisite, but I should point out certainly 51 and 61, 121 and strong math backgrounds are all helpful but not at all required. The sort of topics we're going to cover, a rough list. We'll go over graph algorithms. We'll go over a variety of techniques for solving problems, reading algorithms, divide and conquer, dynamic programming, linear programming. We'll look at things like hashing, something that you just saw a video on, some advanced hashing techniques. And randomness, how to use randomness in algorithms, how to understand how randomness can help you in algorithms. We'll look at MP complete problems, and how to solve them. If you've learned anything about incomplete, MP complete problems, that may seem like an odd statement; that MP complete problems are supposed to be the problems that we're not supposed to be able to solve. But of course in the real world you can't just go to your boss and say, well this problem's MP complete so I don't have to do it anymore. They're going tricks back to solutions, so we'll come up with ways and techniques in thinking that we'll let you solve them for practical situations. Currently scheduled, if I read the book right, Tuesday, Thurday 11:30 to 1:00. I hope you'll consider it for this spring. If not I'll hope you'll consider it for your next spring, offer it every year, and I hope to see you there. Thanks. [ Applause ] >> So next up is going to be Professor Hanspeter Pfister. He teaches computer science 171, which is visualization. This is actually a course like CS61 and CS179 that actually didn't exist in my time. And to be honest it's sort of with envy that I see what kind of courses they've added to the undergraduate roster, because and this isn't just saying this because my former professors are here, or because I think I need to say this. A lot of these, I think, are great fun and the fact that you can take these courses after just 50 alone is really quite empowering. So, Professor Pfister. >> Professor Pfister: Thanks a lot and thanks for coming. So I'm teaching visualization, so you might ask well what is visualization? Visualization is essential to convey information through visual representations. So we're talking about graphs, charts, maps, and all kinds of fancy ways to visually depict information. And in particular this class will focus on interactive applications of visualization where you can actually interact with any of these in an interactive way. So if the previous courses were more about the science of computer science, this class is much more about the design and computer science. So it's a combination of both design and computer science. And as a matter of fact, only about a third of the students in this class are CS concentrators. So this is really a class that should appeal to pretty much everybody because I believe everybody is dealing with information. Now why is this class important? I think actually visualization is 1 of the big topics that we currently have to deal with. It's because we're living in the age of information explosion. So we have too much information online. We can't really make sense of it all in just text form, and so we need some other representations in order to understand it all. This is even more so in the sciences where people have coined the phrase 'industrial revolution of data'. So we're now able to collect data at a much faster rate than what we're able to process or comprehend. And that's because we have all these great sensors from the very small to the very large that provide us with more and more data. So in order to understand all of this we need some means to basically look at the data, and this is probably the worst possible way to do that. So instead of just looking at numbers, we want to look at visual representations. And as Donald Norman, one of the great industrial designers and ACI expert has said, it is things that make us smart. So we learned how to write down things so we can preserve history. We learned how to make pictures and graphs so that we can understand information and convey it. So this class, or I should say visualization in general, will help us think. I hope the class will help you think too. It releases the load on working memory. It all floats from our cognition because we can literally represent the information in a different medium. And it uses the power of human perception. So as you may know, our visual system is the biggest sensory organ and it's represented in our brain with about half of the neurons in the brain, so we have a huge amount of perceptual power that we can actually use to understand information. So the goals of the class are to teach you the principles of how to make effective graphs, charts, maps, and visualizations; and then how to teach you to gather the data, in particular from online sources. So we're going to teach you how to use Python, which is a very nice and simple to learn scripting language, to scrape information from webpages. And then using that information that you gathered, implement interactive visualizations using a Java framework called processing. And again, processing is relatively easy to learn. And finally we have a few programming homeworks and a final project where you can put all of those skills that you're learning in the class to use. So what I'd like to do next is show you a few projects that students have done last year. And I'm going to start with Jason Gowell. So Jason actually in all this, the students start with a question study of personal interest. I'm sorry I can't fit it all on the screen right now, but he started with the question 'what is the energy used of the different buildings on the Harvard campus?' And it turns out that that information is actually available from the Harvard Office of... what is it called... I guess of Public Buildings. So what he did is he plotted the yearly energy use of the different buildings on the map using circles representing the size of the energy use. And as you may be able to see here on the right, he also has a pie chart that shows you how the different buildings use that energy depending on the type of energy. So let me move this over. So green is electricity, red is steam, blue is chilled water. And you can see that depending on the building shown up here with their name, you have very different energy usage patterns. And as a matter of fact, if I were able to scroll, you could actually see that this energy usage pattern changes over the years. So down here he has a visualization of how this changes over the course of the year, and the pie chart up there will automatically update. Now going back here, let me just show you the last bit of the visualization, which is you also have a list view that shows you the biggest and smallest energy consumers on the Harvard campus over here. And again, depending on the time of year, you have different buildings that use different amounts of energy. So this was an effective visualization to show you the energy usage on the Harvard campus. Next project I'd like to show is by Navin Sinha [phonetic]. Navin was interested in finding good restaurants in the Cambridge and Boston area. And instead of just building yet another visualization of the ratings, he actually went to the different newspapers here in town, Boston Magazine, Boston Globe, Boston Herald, and Boston Phoenix, and he computed an average, or compound score for each of the restaurants that those newspapers rate. And then what he's showing here is the deviation from this average score between the different newspapers. So let's quickly look at that. [ Silence ] So here again is this visualization. Let's first turn off the filter here. You can actually see the name of the restaurant here. So number 1 is Hoya, followed Les [inaudible name], followed by Number 9, followed by Uni, and so on. So for those of you who like to eat well, you probably recognize some of these names. So the restaurants are ordered from right to left, the top ranked restaurant on the average rating being on the right most side. But then he's showing the deviation. Bars that go above the line show that this particular newspaper, for example Boston Magazine, has ranked this restaurant here, Aura, on average higher than the compound average score of all of the ratings. And similarly some of them are ranked lower. So you might see for example the Boston Herald seems to be more critical of some of the restaurants that are typically ranked highly, and then some others are ranked a little better over here. Then you can look at the different categories. So here are the inexpensive restaurants. You might find out that Garden at the Cellar is a very good value if you like good food and don't want to spend so much money. You can conversely look at the expensive restaurants. Not surprisingly, most of them are here in the top category. I'm sorry, you can't really see some of the bars on this projector, but I'll just show you these are all top ranked and expensive. But then this one here, Moo, is actually expensive but doesn't get such a good compound ranking. So it might not be such a great value for your money. You may want to find out which cuisine is ranked highest, and it turns out Japanese actually is... Oya and Uni and the Oshi suishi bar. They're all ranked very highly, and so on. Let me show you another example from last year. Karen Hanson, she works in the microcellular and biology laboratory and she's going to apply to grad school, and she wanted to find out how are the different MCB labs around the country deal with admissions. So she put together this nice visualization... again I probably cannot move this, I'm sorry. But she put together a nice visualization of admissions based on different schools and based around the country, showing you what are the areas where admissions are relatively high. So definitely California is 1 of them, Massachusetts being another. And then showing you the sort of statistics of the typical student that gets into these programs. And then over here she also has more information about each school, how many students they admit per year, and what kind of characteristics that each school is looking for. And then also the top faculty in the country that admit students in MCB, and what kind of profile that they're typically looking for. So lastly related to this, there was a project by Shao Hai [phonetic] who also was interested in MCB, but he looked at the publication records of faculty at MCB here at Harvard. What he did, he has a visualization showing you the different publications that the faculty have. So you have the list of faculty here on the right, and then the size of the circle shows you how many papers a year they published. You can see the legion, but I happen to know the big circle is 30 papers a year. So this professor here had 30 publications a year. The X axis is numbers of years since PhD, so this is a senior person, and here is actually a more junior person. And the Y axis is if they were a senior author or a junior author, i.e. first author on the paper. So you can see some of the more junior faculty tend to put themselves as first authors, whereas most of the senior faculty tend to be senior authors. Again he was interested in this particular question. He also did a timeline of publications over time for each of the faculty. So you can see that the senior faculty tend to peter out after a certain number of years of being in the business. So I think these were some really fun examples from last year. I invite you to go to the website and you'll see a lot more. And I also want to briefly mention the topics we'll cover. We'll talk about perception, sort of the fundamentals of interaction, design, and visual design. And then we'll talk about different visualization types such as graphs, maps, trees and networks. We talk about how to visualize high dimensional data. And finally we have some guest lectures of people coming in talking about application areas such as scientific visualization, or life science visualization, or even visualization in the arts. So I hope to see you all during shopping period, and thanks a lot. [ Applause ] >> So we have just 1 more teaser for you. This one from Professor Krzysztof Gajos who's teaching computer science 179, which is sensibly about user interfaces. >> Professor Krzysztof Gajos: Hi my name is Krzysztof Gajos. How many of you here are first years at Harvard? Well so am I. I'm very excited to be here. I think what makes Harvard computer science incredibly exciting is not only the computer scientists, but everybody else at Harvard. It's the fact that Harvard focuses not only on the bits, but also on the rest of the world. We are here to make an impact. Greg, Matt, and Michael told you a lot about how you can solve problems, and the skills that you need for solving problems. And I think this is actually important for computer science. In fact let me start with a story from my own undergraduate experience. My first research opportunity down at the technical college, a little bit down the river, was with an information retrieval group. David Cargo was incredibly interested in web search. It was a time of Alta Vista, Web Crawler, Meta Crawler, and each week he would start the meeting by bringing up 1 of those new search engines because they tended to pop up roughly once a week. He would test it always in the same way. He would type in his name and see what came up. And inevitably the first thing that would come up in those engines would be a publications database that... that said his name more than once, or another website that mentioned him but it was never his homepage. And 1 day I knew that things had changed. He looked different. There was suspense in the classroom. He was going to tell us something very special. He brought up a search engine, a brand new search engine, typed in his name, and out came his name. What was the search engine? No it was not Alta Vista, it was Google. What was special about Google? What made Google special? It was the fact that there was a very important problem, search problem, that has been around for many years. There were lots of people working on it, and Google guys figured out how to solve that problem well. They used all of the skills and computer science and came up with the great solution. They started with a well defined problem and they came up with a great solution, and that made them great. But I argue that besides being incredibly smart, [inaudible name] incredibly lacking. How many great problems, well specified problems, can you name off the top of your head? Probably not that many, right? I think as computer scientists we need another skill, and this is the skill that all of these people have. These people have the skill of going out in the world and finding the problem. They go talk to people, figure out that people have the problem that they themselves might not even know about, and then they come up with such a compelling solution that they change how the world operates. And this actually is the premise of CS179. CS179, called the usable interactive systems, will be offered for the first time, kind of on a permanent basis, this spring. It was offered once before by a visiting faculty. And this class is not about solving problems. It's not about understanding problems. It's about discovering problems. So the fundamental question of this class is what is computer science, and what it's not. Unfortunately many people, not you because you wouldn't be here, think of computer science as something like this. But this is not what computer science is. Computer science is an extremely powerful intellectual tool kit for discovering and solving problems. And the successes of computer scientists depends not only on how well you can solve the problems, but on what problems you solve. And in CS170 we'll precisely focus on this entire lifecycle of innovation. In CS179 you will learn how to discover that there exists a problem, a powerful problem, a problem that if you solved it could change people's lives. You will learn how to invent solutions. And when we invent solutions to problems that we've just discovered, it's not just about writing codes to specification. It's figuring out what aspects of the problem are actually important. It's figuring out who the stakeholders are and whose needs you should address first. It's about figuring out what you need to do in order for your solution to actually be sustainable and productive. An extremely important skill that you will learn in CS179 is that you are not an average person, which is actually a terrible handicap because it means that you have no idea what normal people think. When we design solutions we design for different people. We design for people who love control. We design solutions for people who like to tweek. But this is not what other people like to do, and we will actually learn about our own biases and what makes... and how we can overcome them. We'll also learn how to formally and informally evaluate our ideas, and actually a big part of the class is going to be critique sessions. Once a week we are going to meet in small groups, present our contributions for the week, and other students in that small group will actually present formative critiques of whatever progress you and your group made to help you improve your system. One of the greatest assets that we have as innovators, is other innovators, other people who understand the innovation process and who can offer useful feedback that can help shape your solutions. And finally this class will be about teamwork. All of the projects will be done in teams of 3, and the projects will last most of the semester with a couple of short breaks. So on the practical side, the main prerequisite for this class is CS50 or any kind of programming experience. We will be using... the specific programming skills that you will need to be successful in this class. We'll let you cover in the class all you need to know, is how to think like a programmer. Lectures will be twice a week. We'll have critique sessions, and we'll have regular weekday assignments. There may be final exam. I haven't decided yet. An exciting thing is, Apple has offered to lend us iPod Touchs. Every team of 3 will have their own iPod Touch, and iPod Touch web development will give the platform that we'll use for innovating and discuss. Any questions? OK. I will see you in the spring of 2010. [ Applause ] >> So I thought I'd leave you guys with 1 thought. So truth be told, I think there's something very empowering about studying computer science. And I say this having defected from gov long ago. I can think of enumerable experiences, just in the real world with this background I had; not just in 50, but in networking courses and in theory courses, where I actually realized very quickly I understand better what this person I'm talking to is actually talking about. So case in point, Comcast is my, was for years, my cable modem provider. And every night around like 9 p.m. I would just lose internet connectivity altogether, and this happened again and again and again. And I talked to so many damn representatives on the phone about this pattern, and of course the irony there being such a big company, oh sir we can send someone out between 9 a.m. and 5 p.m. everyday. But it only happens after 5 p.m. and so I hypothesized for them on the phone, you know maybe this has to do with some real world usage patterns. Right? A lot of my neighbors come home around 7 p.m., 8 p.m. Perhaps this is putting more of a load on your equipment. No, no, no, no. It looks fine right now sir, at 9 a.m. in the morning. And long story short, this sort of instinctive understanding of how routers work and how local hardware and electronics work, helped me realize maybe there was something a little more to it then. Cisco, for instance, is a very popular company that makes really expensive and really high powered routers and switches. And in my consulting life some months ago, we had just deployed this real world Cisco switch, a Cisco firewall actually. And I'd never used these things before. A lot of the stuff was over my head. The documentation's like this thick, but I knew from CS143, networking, you know, how a router is supposed to work, how a firewall is supposed to work. And there too this firewall, brand new out of the box, installed by very expensive consultants, was crashing on us again and again everyday. And frankly I whipped out my CS50 skills SSH into this thing, because it's really just a Unix computer that's running, and found all of these core files which I really didn't understand but I knew from my time 10 years ago weren't necessarily a good thing. And sure enough the firmware, the software running on this Cisco, very expensive switch, was just buggy. Cisco had screwed up, and the thing was seg faulting once a night around a particular time. But it was these instincts, being able to solve these problems myself and being able to identify problems, as Krzysztof has mentioned there, in the real world. It's like wow, wouldn't it be great if we had this Twitter agregator. Why? Not because it's terribly useful, but because we can. And that's the sense of empowerment that I got from all of these computer science courses I got. And whether you're considering majoring or minoring, or just dabbling in computer science, hopefully you gather from the faculty who joined us today, that there's many different directions in which you guys can go after this. So I hope you will join us next semester on Wednesday for quiz 1, and also on Monday for our final class. See you on Wednesday! [ Music ] ==== Transcribed by Automatic Sync Technologies ====