>> Alright, welcome back, this is CS50. This is the start of Week 7. So, though, this week we'll continue using the language C for the problems we explore. We'll actually begin to dive in next week into a bit of web programming. And for that we'll introduce a program HTML or XHTML, a little bit of CSS, both of which are not so much programming languages as they are markup or design languages, with which you can construct a web page. We'll introduce a bit of PHP, a language called SQL, with which you can query databases, and then a week after, another language called JavaScript, and this is all toward an end of applying the same kind of ideas that we've been exploring thus far this semester, but no longer at a boring black and white blinking prompt, but rather in a user interface with which you're much more familiar these days, namely the web. So, that's what's on the horizon. Problem Set 5, though, involves a little something like this. So this is one of the images that you might have seen already in pset5, which went out on Friday. Hidden inside of this image is actually a secret message, the answer to a particular riddle, and this is an example of, what is generally known as steganography, taking an image, taking a picture that might just look like red noise, or might look like a photograph that you took, and using a computer to manipulate some of the underlying bits that represent that image to hide some message. Now, we took a fairly simplistic approach, simply hiding in this red noise a cyan or a bluish message with which you can then forward, you can then write code to recover the bluish text, but more generally you can take an arbitrary photograph off of the web, off of your camera, and run it through a program that maybe you write or that you download or buy and actually have it hide entire paragraphs of text, entire books worth of texts depending on the resolution of the image. It's actually quite cool what you can do, and the implications, of course, are that you don't necessarily have to use cryptography per se to send a message. You can send someone an attachment containing an image that looks completely innocuous to anyone who might intercept it, and yet, if they run it through this special program, and change for instance all of the pixels to red or all of the pixels to blue, or some of the pixels to red or blue, something along those lines, do they then see the message that's revealed. So, if you like this sort of stuff, certainly Google around for something called steganography, and you'll see lots of really neat examples. But, in this Problem Set 2 we do introduce the idea of images themselves in file format. So, we introduced a week or two ago the idea of actually persisting your data to disc. Almost all of the programs we wrote in the first several weeks, as soon as they quit, as soon as main ended, that was it. All state, all memory was lost that you might have built up while running this program. And then we introduced this ability to save files with fprintf, and in this problem set we hand you some files, namely some BMP files for bitmap files that essentially look like this. So, if you've ever wondered how you go around representing an image, well, the simplest way, frankly, is just to assume that 1 means white, zero means black, arbitrarily, and then with just those building blocks can you implement the simplest of pictures. This is a little smiley face here. Turns out that all images, all image formats, are pretty much structured as rectangles. And so you need to have a row of pixels on the top and then columns of pixels going top to bottom. And if you kind of take a step back, even though these things are not spaced in the same way, notice how each of the ones at top left match to each of the white squares at top right. Similarly are zeros black. And so this file up here on the left, if you stored those patterns of bits inside of a file, you can then say this represents a smiley face. Now, what does it mean to say it represents a smiley face? Well, you have to standardize how those bits are laid out in a file. This is probably an oversimplification because these days anything that has like a three-letter file extension, .DOC for Microsoft Word documents, .XLS for Excel, .BMP for bitmap files, they are standardized in some way. Microsoft or Apple or someone on the Internet said, you know what, if you have a file that ends in that extension, the bits inside of it are going to be laid out as follows. And so what we do in Problem Set 5 is introduce you to one such format, this BMP, bitmap file format, which pretty much does get implemented to like this. But thankfully you can actually use 24-bit color, not 1-bit color; a zero or one. You can actually have 24 bits per pixel, which gives you millions of total colors as you'll see in the pset. But what Microsoft said years ago was that for this BMP file format, the first several bits, or the first several bytes in any such file, have to be laid out in the following way. In other words, there has to be what we generally call metadata inside of the file. It's not necessarily the data you care about, it's not the zeros and ones that represent the picture, but it's some ancillary information in, like, what's the width of this image and what's the height of it and what's the total size of this file, little clues that might be helpful to programs like Photoshop and Windows and Mac OS. So that when they open this file, they know what to expect and they know how to display it. So, this is explained to some detail in this fact, although we don't care about a lot of these fields, thankfully. But, you'll see fields called like bfSize, which refers to the size of the file, biWidth, biHeight. And notice even though this is laid out in the chart here, this is just Microsoft's way of defining the contents of a struct. So, you'll actually see in a file called bmp.h that we implement this specification using a little bit of C code and it looks a little something like this. Let me go ahead and scroll down, and notice that there is one of the structs. It's a struct called a bitmap file header and it seems to have some of those same variable names, bfSize, is one of the ones I just mentioned there. So, I won't go into great detail and lecture because some of this information you don't need, per se, and a lot of it is covered in this spec. But what is curious is that inside of these files is a whole bunch of data types that you might not have seen before: WORD, DWORD. If we scroll down further, I think we see others, LONG, in all capitals. So it turns out that in different languages, and sometimes different operating systems, there are of course, different types and Microsoft has standardized, at least back in the day, along the lines of these types, but thankfully they map very nicely to a Linux computer or a Mac OS computer, so long as you know what to expect. So, you'll see in the spec and in the comments here that we used typedef, the thing for creating synonyms for data types, to say that, you know what, Microsoft calls a word we are going to call a uint16_t. So, this is actually a type we haven't talked about. But thankfully there do exist in C data types whose sizes do not vary based on the computer you're using. Right? One of the little disclaimers in the quiz 0, even, was assume a 32-bit architecture, like the cloud. Now, we have to say that because if you're on a new or fancier machine like your own laptop, it might very well be 64 bits, which means some of the answers to quiz questions, like how big is a long, how big is an int, might have actually differed, depending on the machine. Now, this is incredibly annoying if you're the computer programmer, the computer scientist and you need to decide for yourself how many bits to use to represent a number. You don't want your program to work or not work just based on what kind of computer your users happen to be using. So, thankfully, even in C there's this additional header file that we have not used before called stdint.h and that gives us access to these cryptic-looking data types, but whose names pretty much say who they are. So, for instance, a WORD, in the world of Microsoft, we have defines as a uint16_t and what does the "u" probably stand for? So, unsigned. This just means that I'm going to use 16 bits to represent a piece of data, but know that this value should be unsigned. In other words, don't waste one of the bits allowing you to have negative numbers in there. I'm only going to put positive numbers, or nonnegative numbers in this unsigned 16-bit int. A LONG, in the world of Microsoft, if you read the documentation at these URLs, that's actually a signed 32-bit integer, so we looked in our stdint.h file and we pulled out this type, which is a 32-bit signed int. And, then, for a byte, and this is frankly, the most useful in the problem set, because you'll probably be able to use this same data type for the forensic aspect of it, and more on that in just a moment. And notice that we have defined what Microsoft calls a byte and what anyone in this room would now just call a chunk of 8 bits, as a uint8_t. So, you could define a byte as pretty much just a char, right? A char as far as we've discussed is just 8 bits, but that's not... Technically, even though we call a char a char, which kind of implies a character is going to go in here, that's not really a strict requirement. A char in our context is technically just an 8-bit value. And even though we tend to use a char to represent A's and B's and C's, technically you can put any 8-bit numbers in there you want: zero on up to 255. So, in this context, when you're actually working with bits in a file, bits inside of an image, a bitmap file, you really don't want to confuse 8-bit chunks, bytes, as numbers, as characters, because obviously an image is not going to have A's, B's and C's in it. It's going to have just zeros and ones. So this is our way of specifying very precisely, assume that a byte is going to be an unsigned value, that is there's no notion of negativity in this context, I only need 8 bits for a byte and this typedef here mandates that in my code for this pset, I will always be treating a byte as 8 bits and with no notion of negativity. And, this is a very useful thing, so that your code doesn't break and in quizzes frankly, you don't have to have these parentheticals saying on this type of architecture. It will just work because now we have more portable types. But, moving to something that's a little more fun, we did in fact, take our stroll across campus, although the weather wasn't great the past week or so our stroll rather devolved into a stroll along Facebook and Google Images where we actually found a whole bunch of fun pictures of computer scientists, one being Matt Chartier here, one of our teaching fellows. And what you'll find is, as promised, I accidentally formatted my CompactFlash card on which I was taking all of these photographs. Now, it's not really possible to share a CompactFlash card, or a little memory stick among 500 people, and so what we did was, sort of industry standard practice, we made a forensic image of this CompactFlash card. I plugged it into one of those memory card readers that most of you have, or I plugged it into a camera with one of those USB cables so that it could talk to my computer and then I ran a special command on my computer that says start grabbing all of the bits from this CompactFlash card until I say stop. And I said stop after about 4 million or so bits because I thought, okay that's enough. I didn't take, what is it, a gigabyte worth of photos, I just had a few megabytes worth. So I then packaged up a forensic image. It's about 4 megabytes and inside of there are all of the JPEGs from this problem set. So the program you're going to have to write for the latter part of the pset is a program called Recover that opens up a file, namely this forensic image, called card.raw and it's going to have to iterate for all of the bits in this file from top to bottom and every time it sees what looks like a JPEG you should actually start copying those bits to a new file called something.jpeg and then the next time you see what looks like another JPEG, you close that previous file and you open a new one using a function called fwrite and fopen, as you'll see, and you start writing those same bits to the next file. And in the end you should find yourself inside of your pset5 JPEG directory with 50 JPEGs, all of which were once on this digital camera's memory card. And then the real fun begins. On a per-section basis you are challenged over the next couple of weeks to find the people in these photos, and Matt being one of them, and photograph yourself with them. And the section that does this first and most accurately will win, as we promised an amazing prize. So, with that said, the Hacker edition. So, the Hacker edition is mostly similar because this is, frankly, too much fun of a pset to deprive even the hacker types of doing some of these same problems. But in Hacker Edition you will find that one of the components of the pset on both standard and Hacker requires that you implement a program with which to resize an image. Now, resizing an image is actually relatively straightforward if it looks like this. If you think of a picture as a grid of pixels, top to bottom, left to right, if I want to make this smiley face twice as big, well, I can simply convert each of these single pixels, individual squares, to just four. Right, because if I just move one pixel and make it 2 wide and 2 tall that will essentially double the size of the image so you can relatively easily do this with a couple of for loops. What's a little trickier, though, is that if you actually have a bigger image, say a bitmap of Matt and we say don't blow Matt up, but shrink him down, right, then, you have to decide which pixels do you throw away, which do you compress and how do you actually take an arbitrarily large bitmap file and shrink it down. You're going to have to make some judgment calls in the Hacker edition of this pset. So, SFTP, quick demonstration of this. It is of course, the case that when you SSH to cloud CS50.net, you're sitting at your own laptop or your own desktop and you're connecting to our servers, cloud.cs50.net, where you have your home directories and your storage space and all of this. Now, this is a bit of an issue if the purpose of this pset is to write a program that creates JPEGs and bitmap files, but those bitmap files and JPEG files are now going to live on the cloud, which means you can't just double click on them because again, SSH has never been a point and click interface and so you need a program with which to transfer those files down to your own computer and thus there exists this other protocol SFTP, for Secure File Transfer Protocol. It's better than its predecessor, File Transfer Protocol, which was not secure. Passwords were sent in the clear. So, on a Mac you can run, as the pset says, a program called Cyberduck. It's a free program that you can download. You'll click a little icon like this, open connection, and there's only a couple of details to be wary of here and we have a how-to online so you don't have to retain all of this right now. You'll see in this dropdown menu, each program can connect to servers in all sorts of ways, to different protocols. The default tends to be this insecure one, so you want to make sure you're using the secure version, if you're a Mac user. For server, you of course type cloud.cs50.net. You want to leave the port as 22. It's the same number as SSH uses. You type in your username, you type in your password, you click connect and then you'll be connected to the cloud. And just as though you typed ls you'll see the contents of your home directory. But now you have a GUI, a Graphical User Interface. So, if you want you can do things like drag a file and then just over this window and let go and the file will be transferred. Now, in the context of Windows, same exact thing, but with a different program. It happens to be called WinSCP, and if you follow the instructions here with screen shots you'll see exactly the same process. And then it just boils down to the obvious. If you've got JPEGs on the server, click and drag them to your hard drive. If you want to upload something to the cloud, do the opposite process. So, that is a little something called SFTP. Before I forget, the "Harvard Science Review," a student group on campus is looking for webmasters with which to make their website. This is actually a very common request this time of year as we look towards final projects in just a few weeks time. If you are interested in joining them as a webmaster go ahead to Google Harvard Science Review Campus publication, click contact and let them know that you might be interested. In fact, these kinds of things are certainly viable options for final projects in the course. More on that on Monday when we dive into web programming. Alright, so a couple of whirlwind comments on quiz 0? So, overall it seemed to go quite well. We had a good time with it, but certainly do reach out to us if you have any questions or concerns especially if you are on the lower end of things. Realize plenty of opportunities still remain with psets, another quiz, and the final project to bolster any scores with which you might be disappointed. But a couple of highlights of very common mistakes that we all saw while grading the quiz. So, this was an excerpt of mine from the problem involving bottles whereby there was a scoping issue. We declared a variable s, assigned at the string bottles, and then we printed it. A very common answer was that the problem in this program is that you actually have to dereference s by doing this. We'll refer to the online answer key to see exactly what the real answers were, but this is in fact, not correct. If s is declared as a char*, it is of course, by nature, a pointer. If you say char* = "hello"; something like this, it is of course a pointer. %s means put a string here. Well, what's a string? Well, we realized in the past week or two that string is just a pointer to a chunk of memory. It's the first byte of an array of characters. So it's absolutely proper to just say something like s here. If instead you really wanted to use this * notation, what is that really pointing at? When you dereference s, what do you get back? The first character, right? Because the star operator is the dereference operator, and it says go there, there being an address. So, really if you really want to use that syntax, that's fine, but you have to realize that the appropriate format code is then going to be %c, because what's at that address, specifically at that byte, is of course just a character. So, do let us know if you do have questions about that question in particular. Another common gotcha was this one. So even though we talk about zero in many different ways, there's the zero that you type in at the keyboard, there's the zero which is the decimal number, there's NULL which is supposedly zero, there's which is supposedly zero, all of these things actually have different data types or they're slightly different in different contexts. So, realize that NULL's type is a pointer. When we say NULL in a program you're technically saying this, in case you've seen this in any book. NULL is typedef'd in stdlib.h to be a ((void *) 0). So, what does this mean? This means yes, it's zero, but it's a pointer. What type of pointer? And I don't really know, so it turns out in C there exists this notion of a void * pointer, which just means it's a generic pointer. It could point at anything, I don't know, but it is in fact, a pointer. It's not in fact, zero. This is different now from saying 'a0', which is actually the NULL terminating character. Even though it's called NULL, most people write it as NULL, which is a stupid distinction to make admittedly, but it is a character so its data type is char. The relevance of this, of course, was in strlen. When some of you were implementing strlen, you probably did something like... you probably wanted to check if (s[0] == NULL). This was not correct because if s was a string and s[i] is a character, what you really wanted to check for was this. So, just do beware of that distinction there. And then finally this guy too. So is a new line and is a carriage return. This is not a sort of important question at first glance, but it is important in the context of portability of files especially as you exit the course, you're doing research, or projects where you have to import files or data sets from other people. The fact of the matter is the stupid distinction amongst operating systems does tend to create problems. Linux systems do use Mac's use, to denote the end of a line, and then Windows uses for good measure both or So, that's why this is germane to anyone actually manipulating files in the end. And this one too. This one's pretty well explained in the quiz solution itself, but do make sure you understand the distinction between including the header file this is linking against a piece of code. So, looking ahead, you guys for the first time this year have to preterm plan, which means unofficially say to the world what courses you're probably going to take next year. So we actually thought we'd use this, and we'll discuss more on Wednesday, as an opportunity to consider what kind of mindset you should have going into final projects. So, we recently surveyed 1600 or so people that used Harvard courses this past fall. This again, was the CS50 app with which you can shop for courses a little more easily. We got back responses from these folks, which we've taken into account all of their suggestions, kind of considered what were the most important feature requests, and on Wednesday we'll release a newer version of this which improves upon this previous version. The point here is not so much to dwell on this particular application. But when it comes time to design final projects, especially if you have your eye on doing something that affects campus, affects other people, builds you a startup of sorts, I'm asking these kinds of questions, how do you design software well, how do you design with user interface in mind, is certainly important. And we'll also reveal some of the gripes some of you had, anonymously, with various programs and machines you've seen in the world, excluding the MBTA system. So, any questions, administratively on really, any of all that? That was a lot of boring stuff, I felt. Alright, no more boring stuff. Alright, so Mather House. So, this is where we left off right before Quiz 0, and it was teasing you with additional data structures. Thus far our toolkit's been pretty sparse. We introduced variables way early on in week 0 with Scratch so you could actually store some data. We introduced lists in the context of Scratch. You can add like, an inventory for the fruit-playing RPG, that kind of context. And then in C we had arrays, which were similar to lists, but we had some frustrations with arrays in C already whereby you have to know their size in advance, you can very easily segfault if you go too far, so there's some downsides there. We began to address some of those downsides with that data structure by introducing linked lists two weeks ago. And what was one of the selling points of a linked list versus an array? So, you could insert at any point. Rather than in an array, whereby if you want to put something in the middle of all these people, we would have to say, you know, hey, please move all the way down, and then every element that has to be moved, which is expensive. That's a linear time operation. With a linked list you find your location and then you just say, you know, excuse me, just move one person over and then update your pointers which only took two or three operations total. You, of course, still have to find that location, but at least you still have this dynamism. Where over in the linked list, you don't necessarily have a bound on how many elements you can fit. You're only limited by the amount of RAM in your computer, the amount of RAM that malloc is willing to return to you. But what's a downside, someone else, of using a linked list versus an array? There's never a 100 percent improvement. What's a downside? [ Inaudible audience response ] >> I'm sorry? [ Inaudible audience response ] >> So, you lose random access. Whereas, an array, you knew that everything was equidistant and so you could use this [ ] notation and just jump magically, immediately to any element in the array. With a linked list, you can't just jump to the middle element because you don't know where that middle element is. In fact, with a linked list, you only know where the first element is. Thankfully he or she knows where the next element is, so everything is chained together nicely, but you do in fact, lose that random access. Now, who cares? What's the downside of losing this random access? [ Inaudible audience response ] >> Linear search. Alright, so the searching process devolves into something linear again, right. You can't leverage binary search, you can't leverage divide and conquer, you can't leverage some of these really useful, simple ideas that we've been taking advantage of sometimes. So, again, there's a tradeoff here. So the data structure we introduced next was this notion of a stack and this was actually just to formalize the data structure we've been using for sometime. So, the stack of trays conveys this idea that when you put a tray on top, the one that you are going to take off first, unless you want to make a huge scene, is going to be which one, obviously? So, the top, right? So, a stack is a data structure where it is last in first out, or a computer scientists would typically write LIFO, last in first out. Now, this might seem a weird structure. For instance, it would be kind of unfair if an amusement park implemented a stack for the line, right? That would mean that the last person to get on line for some roller coaster is going to be the first one to get on. Might be great to be that person but it's not necessarily quite fair. Now, we have certainly used the stack in the context of computing, right? The whole layout of memory leverages this idea of a stack, which is useful because it does make sense that when a function gets called, and another one and another one, you don't want to jump back down to this guy. You want to slowly unravel this stack and pop frames off one at a time. But sometimes a queue makes more sense. So a queue is another data structure in computer science terms. We've now got linked lists, we've got arrays, but arrays are avery limited data structure. We have a notion of a stack, which we can certainly implement, and we have the notion of a queue, which we can certainly implement as well. It turns out, when we start talking about more interesting, more sophisticated data structures, many of them can be implemented in code with previous data structures. For instance, if you want to implement the notion of a queue, a line, the idea being for a queue that you implement a FIFO structure (first in first out). So, the first person on line at 5:00 a.m. in the morning is the first person to get out of line and be able to buy the iPhone, in this case. So, with what data structure could you implement a queue? [ Inaudible audience response ] >> In other words, if you want to implement a new struct in C and you want to call this thing a queue and it's going to have the properties of FIFO (first in first out), well, how do you represent the individual people? Or, let's just number them, how do you store integers in a queue? What data structure could you actually use in C? So, we could just use an array. Right? In theory, Apple, for fire code reasons or management reasons, they might just say, you know what, we're only going to let a thousand people line up outside of this door or a hundred people. So, in a sense it might be reasonable to bound a number of slots in your data structure. So a queue could just really underneath the hood be an array. And that array can be a fixed size. But what do you then have to be careful of? Well, I can implement a queue just sketching this out partly in C and partly in pseudocode here. I might call my struct a queue and my queue might actually have, let's say, int members, the people in this queue, and it's going to be bounded at a hundred. So, now I have a struct called a queue and I could then have a function like, pop. So, let me do int pop (queue *q). And the purpose of this function in C is going to be returned to me the next person that should be removed from the queue. So, in other words, a data structure, more generally is the, rather... A data structure generally has not only data associated with it inside, it also typically has some standard operations associated with it. And even though we didn't highlight this with linked lists, a linked list's operations are the ones we played with. Delete is an operation on a linked list data structure. Insert, search, sort might be another one. So anything we've been calling a function or an algorithm you can think now as applying very specifically to a data structure. So, a common one for a queue is pop, or you can call it different things, but the purpose here is that I call this function, I pass in a queue, I want it to return to me the number of the person who should be removed from that queue next. So, now let's just brainstorm for a moment. If I pass in a queue and I need to be told who the next person in line is that should be allowed into the store, how would I possibly implement that? Do I need additional data inside of this data structure in order to implement that idea? [ Inaudible audience response ] >> What is the rule? Alright, so the rule is again FIFO. So for a data structure like a queue, suppose we filled it up with a hundred people or a hundred integers, the rule is, when I call pop, I want the first person in line. Now, what does that mean in this context? What should pop return in the simplest case? [ Inaudible audience response ] >> The next element or array, or actually, if you want to return the first person, you know, why don't I just keep it simple, return [member]... whoops, return members[0]; right? So, it's just an array. So if you want to return the first person in line, just return members[0]. But what's the problem now? If again, the point here is to implement a queue, if the Apple employee executes the same function again, who's going to be allowed into the store next? The same guy, the same person, same person, same person. So, this is clearly flawed. So the next time around what do I return? Well, pop needs to know something more here. Members... Oh, actually, this is the wrong syntax. Remember, to get at something? It's q->members[0]; Take the struct called queue, go there and return the member's zeroth element. What do I really want to do? Well, more generally, I don't want to return always the first person in line. I probably want to return the person at the head of the line. But, ahead might be a variable. Where should I store this variable? How could I possibly keep track of where someone is? [ Inaudible audience response ] >> So, maybe local, but if I put a local variable inside pop, as soon pop returns I will have forgotten who I just removed from the list. >> Inside the struct. >> So, inside the struct, right? So, don't forget, a couple of weeks ago when we introduced a student struct, whenever we had data we wanted to keep together, like an ID number, a name, and a house, we just lumped it all together inside of the same struct. So, you know, I could just do this: int head; for my queue and the head variable is what ultimately keeps track of who's next in line. So now I might say go ahead and get the person at the head of the line by accessing the same queue structure, its head field and return it. But I'm not quite there yet. What's the problem now? Who's going to get pulled off the head of the line the next time I call pop? [ Inaudible audience response ] >> So, I do need to iterate, but who specifically is going to get popped off the front of the line the next time this is called? [ Inaudible audience response ] >> It's the same person right? It's not sufficient to generalize who gets popped off if I'm not actually going to maintain this data structure. So really I want to do something like this: int next gets the person who's currently at the head of the line. Agreed? Alright, and then ultimately I want to return this next person. But what do I need to do first? [ Inaudible audience response ] >> Right, I need to plus, plus this. So really I want to say the person at the head of the line should now be the person at the head of the line plus one. So, now I've taken in this data structure, I've grabbed the person who supposedly at the front of the line. I then say, okay, the next time let me plan for this, the next time go ahead and call the next person, which is equivalent to plus, plussing this. So, I could actually, simplify this. This is in fact identical to just saying this. And then I return not the current value of head, because then I'd be skipping the first person in line, I then return the person that I called next. Now, there's one bug here. What's going to happen eventually with this implementation of pop? [ Inaudible audience response ] >>At the end of the line I'm going to segfault, or I'm at least going to start touching memory that's not actually there. So this is as though the Apple employee is going to like start talking to no one at the end of the line trying to pull more and more people that just aren't there. So, we need a fix here and what operator probably saves us? [ Inaudible audience response ] >> What's that? [ Inaudible audience response ] >> Not or, but what do we do generally when we want to wrap around? Right, so we could do something this: head = q ->head + 1. And then we can make sure that we always stay within the boundaries of this array with something like this. But again, I'm kind of skipping a step here. Now, if I implement this as follows, this assumes that there's going to be a continual flow of people. I presumably now need, if I'm going to allow this to now back wrap around to the other side, what do I also need to keep track of apparently? [ Inaudible audience response ] >> What's that? [ Inaudible audience response ] >> So the end of the line, right? I need to keep track of how many of the one hundred integers that I reserved space for are actually legitimate candidates to be removed from this data structure. So, I'm probably going to need another integer in here, something like int size; or something along those lines. But here's where things start to get interesting. When you're designing a data structure to solve a problem, you typically wrap up the data you care about inside of a structure. You implement one or more functions with which you then start to manipulate that data, so that if you have a goal in mind like, let me implement in code this notion of a queue so that a computer program, so one of these employees on a little iPhone can be told who should be popped off the list first. Well, you could implement this using these kinds of data members. We'll look at this with more detail with problem set 6 next week, but this is just a taste of the thought process that goes into actually designing interesting data structures; not these most simplistic of ones, arrays and linked lists, that we've looked at thus far. But all of this generally reduces to using this new feature, this power called pointers and memory addresses in C. Unfortunately, you can do a lot of damage using pointers in C, both in terms of security, as we've discussed with buffer overflows, and just in terms of correctness. Probably almost everyone in this room has had a segfault at some point of some sort because you've touched memory that doesn't belong to you. And sometimes these problems can be hard to spot. You might have not looked at your code before submitting it for some pset, felt pretty good that, got it. There are no bugs in this code. And then your TF finds something, nonetheless. Well, this is not necessarily the easiest thing to debug on your own, and so thankfully some tools exist. So Valgrind is a program we'll start using for problem set... you can start using for problem set 5. It's particularly germane, as you'll see, to problem set 6 that essentially analyzes your code for you. And if it can, tells you where you've made memory-related mistakes so that you can proactively fix them so that your code isn't broken, doesn't get downgraded certainly, and ultimately is better written. So, let's take a look at an example. This is among your printouts for today. It's a simple program called memory.c. But I wrote this intentionally to have a couple of flaws. It doesn't do all that much. At the bottom of this file, notice I have what's called main. It's sole purpose in life is to call the function f(); and then it returns 0 just by default. And then what does f(); do? It returns nothing. It takes in no arguments as indicated by void here. But it does do a couple of things, albeit uselessly. It first declares int *x. It then mallocs how many bytes? So 40, right? If we assume that an int is 4 bytes, it's going to allocate 40 bytes using malloc, storing the address in x. And then it appears to, at the tenth location in this chunk of memory, so at the location of the 11th integer, zero indexed, it's going to arbitrarily put the value 0. But this is problematic because how many ints did I actually allocate? So only 10. If I'm indexing into this array at [10] I've gone one step too far. Right? This is a buffer overrun. It's not really an exploit because you're not going to compromise a machine really, with just the number zero. But you are in fact overstepping the bound. So you might be able to catch this just by looking through the code, but not necessarily. So let's see what this program can actually do for us. So, I'm going to go ahead and run this command. It's valgrind. That's the command. -v is a switch that says be verbose. Tell me as much as you can. And then -- leak - check=full says, do all of the memory leak tests that the program is designed to support. How do I know this? I read the man page, or frankly, I ran just valgrind, and it yelled at me and it told me to run this other command instead. So that's how you can remember. So let me go ahead and paste this in. First let me make the program memory. Seems to compile. If I run it, seems to work just fine. If I run gdb on it, and then run it, seemingly a program exits normally, so it looks like it's okay. But let's see if valgrind knows better. So, let me run this command on the program called memory in my current directory. Enter. Wow, so a whole bunch of text flew by. That's what happens when you tell a program to be verbose. A lot of this is not all that interesting. It's a bit more arcane than we'd care about. But you can skim through it even if this is the very first time. And you can kind of look for the proverbial red flag. So, let me scroll down to... Oh, this is interesting: Invalid write of size 4. And I say it's interesting because it's not cryptically referring to all these files that I've never heard of. It's referring actually to something I wrote: memory.c line 23. Then let me scroll down a little further. Oh, and this is bad sounding: 40 bytes in 1 blocks are definitely lost in loss record 1 of 1. Alright, so it couldn't be more verbose. But the point is, again it's my code, right? So it's finding fault with my code, so let me look a little closer. Let's do the first one first. So, Invalid write of size 4 apparently induced at this address, but that's not useful to me, the human, at this line number, 23, in memory.c. So, let me go ahead and open memory.c line 23. Aha! Line 23 is this thing here where I say x[10] = 0; and the error message being indicated there is again, this: Invalid write of size 4. So, what does it mean by saying size 4? So, it's an int, right? I'm storing an int, namely the int 0. It happens to be 4 bytes, and so that's why it's saying Invalid write. You know, it's not completely holding my hand and telling me how to fix this problem. But now I see, oh, line 23, there's definitely a problem there. Bam! I've now detected my buffer overrun. Now, valgrind can't necessarily do this in all contexts, especially if your program is more dynamic, takes user input with GetInt() or GetString(), or the command line. But at least in this case we catch that error. What else did it yell at me for? Let me look at the other error, which was this one. Let me rerun the command. It was toward the bottom: Invalid write of size 4. We saw this... Ooh: definitely lost: 40 bytes in 1 blocks. That was not good. And here it is: (memory.c:22). So, let me go into memory.c, line 22. It's the one above it. Oh, so I'm losing 40 bytes. Well, why is that? Well, apparently I'm not doing what? I'm not calling free. So, let's actually see if I can fix this. Again, the program is still pretty pointless, but let me at least knock off one of these. Any time you call malloc, henceforth you should absolutely be calling free. But you want to be calling free on the same pointer that malloc returned, which is x in this case. And we save that. Let me remake memory. Let me go ahead and now run valgrind again and see if it yells a little less. Okay, good. Now I only have one errors. And... So, they have such a fancy sophisticated program, and again, they didn't even parenthesize the s. This is really just an if condition they could have had. So even experienced programmers cut corners sometimes. Alright. Not that you should. Alright, so this problem still remains. So, in short, there's a whole lot of cryptic output, but thankfully with Valgrind, you don't have to care about most of it. Looking for any mentions of your code or your line numbers can absolutely start helping you chase down any kind of memory-related bugs. But we see this recurring theme here. So every time we look at memory, every time we look at pointers, we see things like this. And I had to use this slide because I thought it was the cutest thing ever. Alright, five of us think this is cute. So, this, of course, hexadecimal. So why do we keep using hexadecimal? So, we already have binary, where you've got eight bits... Or actually, you only have two digits: zero and one. We've got decimal, which we all grew up with, which is zero through nine. Hexadecimal has zero through nine and A through F, so this is 16 total characters. And why hexadecimal? So, someone just asked the other day after class, you know, why use 16 characters? Why not 12 and not 14? Well, we absolutely could. There is base 12. There is base 14, there's base 20, there's any base system you want to come up with. What's particularly nice about hexadecimal though, is that you're representing zero through nine, and a through f. That's 16 total digits. To represent 16 total values how many bits do you need? You only need four. So with four bits, 1, 2, 3, 4, I can represent, of course, the number zero pretty effectively, and I can represent the number 15, so this is a total of 16 possible values. So what's nice about hexadecimal is that because hex, implying 16, is a power of 2, what it means is that with one hexadecimal digit, you can represent four bits, and thus with two hexadecimal digits you can represent eight bits or one byte. So, any time you see 0x, just saying here comes hexadecimal, -01, or 02 or 03, it's just a nice human convention because that zero and 1 are actually represented using 8 bits in total. For instance, 0x01, if we write this out, let's go ahead and do 0x01, well, if you actually implement this is binary, well, this first zero actually represents 4 bits. So, that's actually the bits 1, 2, 3, 4. This 1 is actually four bits, so that's the number this? No, it's just the number 1, so 1, 2, 3, 4. This means the value we say 0x02 is just, zero, zero, zero, zero, then zero, zero, zero, 2. Or, no, right, so what's 2? It should really be 1 0. So it's still just binary, you know, from several weeks ago. This would be 1, 2, 3, 4 for the zero, then 1, 2, 3, 4 for the 3. And then finally, if we keep going up in this way, let's go up to ff, this means four ones, now we have the number 255. So, hex is just a nice way of saying, expressing a number in four-bit chunks at a time. So, why is this actually useful? It's because with this notation it makes it really easy to start manipulating data at the bit level. Thus far we've only focused on things like integers and maybe chars, but we've never really cared about the underlying bit representation. But now for this pset, now for this part of the course, where we're actually talking about files and representing data persistently, being able to twiddle, or tweak or flip, just change a zero to a one or a one to a zero, can actually give us a great deal of new power, which we'll take a look at in just a five-minute break. Okay, so we are back and I haven't exactly vetted this, but given that the subject of steganography did come up here, I thought I'd at least do a quick Google search. And let's assume at the moment that Wikipedia has gotten this right. Here is an image of what appears to be a nice sky and a tree, but it turns out that a subset of the bits that compose this picture have been tweaked in such a way, and by tweaked I mean zeros have become ones, ones have become zeros, or something along those lines. It turns out that if you undo those changes that what you actually get back out of this picture, is that picture. Okay, cuteness was not the point. [ Laughing ] >> So, what's powerful here is that, again, you can read the article. This is just the Wikipedia steganography article. What's powerful here is that you can take really an arbitrary image, and if you were using enough bits to represent each of the pixels, each of the dots, you could hide a lot of additional information, whether it's text or whether it's a cat, or anything else altogether. So realize that this is a very powerful technique, and we're just scratching the surface of it with problem set 5. So we promised to do some of these lower-level bit manipulations and we actually have this ability. So, in the past you've used the ampersand and you've used the vertical bar for the purposes of conditions and boolean expressions. && is a boolean AND, it says, make sure this and this is true. And then we had the vertical bars, which was, make sure this or this was true. Now, it turns out if you actually just use one ampersand or one vertical bar, they have very different meanings. What it allows you to do is what's called a bitwise AND or a bitwise OR, or there's some other operators all together. What bitwise means is that this operator will let you take an 8-bit char or a 32-bit int or a 64-bit long long, and it will allow you to change from 1 to 0, or 0 to 1, any of the 8's, or the 32-bits, or the 64 bits in that particular number. So, it's a step toward implementing something like the tree-to-cat transformation, if you can actually manipulate bits at that level. And in the context of pset 5 you won't necessarily do it at the bit level, you'll do it at the byte level in terms of the red,green, and blue values in the pictures we give you. So, let's actually see an example of this in action. Let's try a simple one first. So, in binary.c, which is among your printouts today, we have this code here. It's not all that long. It's got a do-while loop whose purpose in life is just to ask the user again and again for a non-negative integer so that we have a number to play with. And then down here there's a bit of fanciness. So, first let's take a look at what the program actually does and then we'll delve into how it's doing that. Let me go ahead and make binary. I'm going to go ahead now and run binary, and I need a non-negative integer, please. Let's start with the number 3. Enter. And it looks like I've provided an integer, it's the number 3. But to represent that as binary you have what appeared to be 30 zeros and then a one and a one and this is consistent, presumably with what we know to be the binary representation of three. Let's do one other. So let me go ahead and do something like a 255. I should get a whole bunch more ones, and sure enough, I do. And let's try something even bigger. Let me go ahead and type in like, big number. So that happens to map to this binary number. So, assuming this is correct, how is it actually doing this? Well, let's take a look. So, most of this code, uninteresting. It's just a do-while whose purpose in life is, again, to get an int from the user. So the magic is in here and here's how relatively easy, if it's a little cryptic at first, it is to start manipulating individual bits. What's my goal here? Well, I know just from how computers are implemented, that an integer is inside RAM 32 bits. So what I really want to do is get access to each of those individual bits, from the right-hand side, all the way over to the left. In other words, if I know conceptually that the number 3 looks like this, I want to be able to print out those bits, and so actually I think I misspoke, I want to start at the leftmost bit in this integer, and I want to print out what it is, a zero or a one. I want to then move to the next bit, the next bit, the next bit. So in other words, I want to take the actual integer and iterate over all 32 of its bits. So, how can I do this? Well, let's take a look. So, first I'm going to declare an int i. I'm going to declare i to be the size of an int, so that's 4 times 8; so that's 32, minus 1. So, i is initially 31. Why didn't I just say, i equals 31? I wanted this code to be portable. If you ran it on a newer Mac or PC, I wanted to make sure the code didn't break on you, so I generalized the size of an integer in case it varies. So, i is initially 32, and this is kind of old school now,. It looks like this loop is going to go from i... Sorry, i is 31... on down to zero, equal to zero. So this loop is going to iterate 32 times. Now, why 31 and zero? Well, whenever you write a number in binary, the convention is that if you write, 1, 2, 3, 4, 5, 6, 7, 8, this bit here; the rightmost, is what's generally know as the lowest-order digit, the rightmost digit, the lowest-order digit, lowest order in the sense that this guy is the least contributing number in this whole number. It's got the least weight in terms of the grade school columns, ones columns, twos columns, fours columns; this is the smallest column. So, it's the least order of bits. By contrast, the guy all the way over here on the left, that's the highest-order bit. So, he's the most powerful bit because he contributes a 128. That's the 128s column in the context of binary. So we generally number, then, by convention, the right-hand side. This would be bit number 0, and this guy over here would be bit number 7. And by contrast, if we have 32 bits, 0 would be over there, 31-bit would be over there. So, you number it in sort of the opposite way you might think. Alright, so what does this mean for our code? Well, I'm going to start at the 31st bit and move my way down to the zero bit. So, conceptually this is right. I'm going to iterate over all 32 bits from left to right. What do I need to do? Well, when I have this integer n that the user provided, that just gives me this high-level number, the number 3, the number 1 million, whatever they typed in. I need to dive in and actually look at individual bits. So there's this notion in programming called a mask. So, I've declared a variable of type int called mask. I've set it equal to this fancy thing. So this is literally the number 1 that we're familiar with. What does that really equal? Well, if it's a binary number, this is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32. So, that is the number very tediously represented in binary. So, that's what I get if I do int x = 1;. But I'm not quite doing that, or int mask = 1;. But I'm not quite doing that. I have another part of an expression here. This is the left-shift operator. So it means literally, take the bits in this number and shift them over to the left. Now, obviously that's going to leave you with some blank spaces on the right. So by default those should get filled in with zeros. So the effect is really, with this left shift, it's going to move all the ones and zeros that way and it's going to fill the placeholders with zeros again. So, what does this mean? Well, i is initially 31. So this means that this 1 effectively is going to get shifted over a whole bunch of places. And I can't quite depict this perfectly on both sides, but I shifted over once, now it's there. One more time, now it's there. 31 times total, it ends up there, and as promised, this gets filled in, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, I've got to learn how to copy paste. Alright, paste, paste, zero. Okay, so that is the result of not just initializing mask to 1, but saying initialize it to 1, but then shift those bits over 31 places, or more generally, i = 31. Now, what was the point of this exercise? Well, now I have conceptually this thing called a mask. And a mask is generally... And actually the same idea exists in Photoshop, if you guys have ever played with the notion of a mask there. In that context it's usually black and white, or red and white, something like that. But the point is there's two states. In this mask here, there's two states: zero and one. The one on the left-hand side is essentially conceptually a hint that I care about that bit. All the other zeros mean I don't care about those bits. So now what I can do is essentially line up this mask with a given number and I can just look at the number I care about by, as we'll see, ANDing them together. So, let's look at the code. I initialize mask, that long sequence of bits and then I do this: if (n & mask). So if (n & mask) printf("1");. So, what does this mean? Well, n, we recall is this. This is my mask now, so let me lay this out nicely, make this bigger. So that's my mask. n, let's say, is the number 3. So n is... here we go, that's the number 3. So, now what I'm saying is AND these two things together. So I'll just lay it out kind of like a mathematical expression. So I'm going to AND them together. And what that gives me, it's a very sophisticated ASCII art, what? So, the boolean AND operator, just like... Sorry, the bitwise AND operator, the single &, just like the boolean AND operator says that for an answer to be true, both of the inputs must be true. If you want an output of 1, the inputs must both be 1. So, what this means is, let's do 1 & 0, and the answer is 0. What's 0 & 0? It's also 0, or again in the world of bools is false. And false? The answer is false so we write down a 0. This is the same question again and again and again. What happens when I get down here? 0 & 1? 0. 0 & 1? 0. So the answer to (n & mask) is what number? It's 32 zeros, which we know to be the value false, right? False is anything that's non-positive and non-negative. So anything that's zero is false. So (n & mask) returns the boolean expression false. So does this condition print 1 or 0? So, it prints 0, but this is useful because again, here's n. I want... but n, really when I typed it in is the number 3. But underneath the hood the n is this and I want to print out its leftmost bit. Its leftmost bit is a zero. and indeed I'm about to print out a zero because of this condition here. Now, we don't have to repeat this story 30 more times because it's pretty much the same. If I now let my loop continue, the next iteration i equals what? So, it's not 31; it's instead 30, right? Because I'm doing i--. So, what does this mean? Again, I initialize mask to 1. Then I left-shifted 30 places, which means again if I've got 1... Let's let me cheat here. If this is the value 1 and then I shift it 30 places, that gives me what value? It gives me this value. So, now my mask on the second iteration is almost the same, but the bit I care about is not the leftmost one, it's the second leftmost one. And now as this pattern repeats, i is going to get decremented and decremented, so your mask looks the same except that one, the single one, is kind of inching its way down toward the end of the string. Now, once you get to the end of the string, what's ultimately going to get printed out? Well, at the very last couple of iterations my mask is going to look like this. This is my second-to-last mask, and my last mask is going to look like this. Right? Because the 1 is going to keep moving along to the right-hand side. So, let's go ahead and do this second-to-last one. 1 & 1 is going to give me 1. 0 & 1 is going to give me... Oh, sorry, we're not at that point of this story. Alright, so now the last iteration, my mask is this guy here. 1 & 1 is 1, and so it gets printed out as 1, 1. Alright, sorry, I feel like I didn't say this very well. At the end of the day what is going on? We're using the mask to tell the computer we care about one bit and only one bit and it's boolean AND to essentially extract only that bit and print it out. So the end result, after we've iterated 32 times, is that we've printed out 000000 until finally our mask lines up with this bit, and our mask lines up with this bit, and we actually print the number that we care about. Okay, so let's actually use this power for something that's otherwise been done in a different way, and we'll see what the implications are, this new capability. Let me point out this. I'm going to keep using my little cheat sheet here, and I'm going to remind us that the value A is the number 65. Well, what is that in binary? Number 65 in binary is really 2, 3, 4, 5, 6, 7, 8. That's the number 65 in binary, agree or disagree? Alright, so that's the 64s column, then the 1s column. So that's the number 65. And what was the letter little a? 97. Oh, too fast. 97, alright. So how do we represent 97? We need a 1 there, and then you need another 32. I think that's it. Agreed? 64 plus 32 plus 1, so that's the number 97. Here's a curiosity that we've never really leveraged before. Big A and little a are always 32 digits apart, and we know this just by simple arithmetic: 97 minus 65 is always 32. But there's an interesting side effect, probably deliberate of that fact. What is the only difference between the binary representation of 65 and 97? >> [Inaudible audience response] >> There's this single 1 in, coincidence, the 32's column, right? This is the 1 columns, 2s, 4s, 8s, 16s, 32s. They only differ in the 32s column. If we do a little sanity check here. What about big B and little b? Well, this would be the number 66, which means that is this, and the then the number little b is 98, and that just means this. So you know what? That pattern is going to persist. So these big and little letters are always only differing by one bit position. So, you know what? This gives us a new capability to capitalize a number if we want or lowercase it. Let's go ahead and look at this, tolower.c. Alright, let's scroll down. Again, this program is almost completely boring, except for this one line of code. So, you've called the function tolower(). You might have looked at this on the quiz 0 itself. But notice what we do here. In the top here, I just ask the user for an "Uppercase character, please:". I use GetChar() and I insure that the user gives me uppercase A through uppercase Z. And then I execute one line of code that's going to do something really quite neat. Let me go ahead and make tolower. Run tolower. Enter. Uppercase character, please. Let me give the letter A, and I get back lowercase a. Let's do B, little b; uppercase Z, little z. Well, how is that working? Well, take a look at what's going on here now that we understand how to manipulate individual bits. printf("%c... So that's uninteresting. Print a character. What character? Well, it looks like printf("%c c | 0x20);. Alright, well, let's just bite these off one at a time. So 0x20 is actually what? It's 1, 2, 3, 4, bits, followed by another 4 bits as per our discussion of hexadecimal. Let's do the easiest ones first. The last 4 are all zeros because that's this bit here and then what about these four? What should I replace these four placeholders with? [ Inaudible audience response ] >> 0010. So, this is an interesting thing. It looks like 0x20, because I know what hexadecimal notation is, is just a succinct way of representing a binary number that has a 1 in which position? [ Inaudible audience response ] >> The 32s only. So now I'm saying this expression here in code. Take c, which is going to be the number 65 or 66, or so forth, and OR it with 0x20. So, what does this mean? Well, lets take A. This is A. Let's OR it with this expression here. And now just do our notion of ORing. So, this position here: 0 | 0 is 0. 1 | 0 is? [ Inaudible audience response ] >> No, careful. >> Is 1, right? Because the OR, just like in the world of booleans, something OR something, only one of them has to be true. So true or false gives me true. False or true gives me true. False, false, false, false, true, so my answer to c OR'd with 0x20 is this. And you know what that equals? This here. So, now using this bit flipping or bit twiddling, as people are fond of saying, can we actually flip the bit individually. What about the opposite? If I actually wanted to force something to lowercase and implement on the fly tolower, this is very easy. I can tell the user to give me something different. And now, how would I go about turning off that bit? [ Inaudible audience response ] >> I don't want to OR it. So OR typically has the effect at the bitwise level of turning bits on, changing zeros to ones, as we just saw. So, we do want to use AND. But if we just AND it together, is that going to give us the right answer? In fact if we AND these together, notice what's going to happen. I'm going to get all zeros, which is definitely not an uppercase A. So, what do I really want to do? I don't want to AND it with this. I want to AND it with... What about the opposite? Let me temporarily just come up with a second line. Let me flip these bits: 11011111. What if I do this? What will that give me? This will give me what? [ Inaudible audience response ] >> 01... Hmm. Doesn't seem to be helping me, is it? Or it is. That is in fact capital A, right? If you look here and then look here, that is in fact capital A. So, let's see this in code. Let's go ahead: toupper.c. And in fact, it's as simple as this. Now, how did I come up with this hexadecimal code? Well, df, this one's actually pretty easy. Same thing. I just said df, so 0xdf equals, let's see, 1, 2, 3, 4; 1, 2, 3, 4... the last four of these are pretty easy, 1, 2, 3, 4, and then what's d? So, d is actually 13, because a is 10, b is 11, 12, 13... So what does this represent? [ Inaudible audience response ] >> 1101. df. Alright, now that we have this ability to manipulate bits at the individual level, can we start to implement much more interesting data structures. So, it all comes full circle in that we talked briefly conceptually about stacks and queues, and these allow us to perform different operations on data and also represent them in more sophisticated ways. But suppose that the problems that we want to start solving are not these trivial play examples, of like people in line or students in a queue, and so forth. But suppose that we want to solve some real-world problems. One of them might be implementing a spellchecker, right? Problem set 6, as we've teased you with before, amounts to being handed a huge text file from us, which has 140,000 English words in it. And you are then challenged to represent all of those words in memory. Now, a very simplistic implementation might very well be to just take those 140,000 words, allocate a really big array, and store one word after the other, and maybe you'd alphabetize to keep things slightly optimal, and then you can perform binary search if asked the question, is this word spelled correctly? And I say binary search because given a word, you can quickly look it up in your huge dictionary and check: is it actually present? If so, it's obviously spelled correctly; if not, it's presumably not spelled correctly. So that would give you an answer. But binary search is log n, which isn't actually all that impressive, relative to a better running time, namely Big O of 1. O(1) represents what notion of running time? [ Inaudible audience response ] >> Right, constant time. And in fact, thus far, we've been pretty content to just have log n running time, or even n running time, as opposed to n squared, or the like. But what if the Holy Grail really is to get constant running time. Or if I ask a question, I want the answer now. I don't want to wait n. I don't want to wait O(n squared). I want don't even want to wait log n time. I want that answer now. Case in point, something like Google probably is not doing linear search and probably not even binary search of the billions of pages that it actually has in its database. Hopefully there's a much faster way so that I get my seemingly instantaneous reply. And the ideal, of course, is one step. Give me an answer in just one step. So, one of the questions we'll be able to ask, now that we have some of these new skills, is that if I have a container for data, with something we'll generally call a hash table, into which I can put numbers, I can put students, I can put words from a dictionary, how can I put those words or those humans into these buckets of a hash table in such a way that when I ask for that person or that word again, you can get this answer instantly? And to motivate this, ultimately with problem set 6, not only will you be challenged to implement the fastest spellchecker possible that's correct, you will also be challenged to implement the fastest spell checker vis-a-vis your classmates, if you choose to opt in, whereby we'll rank on the course's homepage what your running time currently is, what your amount of RAM currently is, so we'll have a bit of a race next week, if you choose to partake, to implement the fastest spellchecker in CS50. So with that little teaser we will see you on Wednesday. [ Applause ]