>> Alright, welcome back. This is CS50 and this is the end of week nine. So just a couple of FYIs. This Friday we'll resume the tradition of lunch if you are available, 1:15 p.m. Go to cs50.net/rsvp. This week we'll be joined by Eugene Chung of NEA, a VC firm, as well as by Andrew McCollum, one of the co-founders of Facebook, so if you'd like to eat as well as strike up chats with these kind folks and CS50 staff, do rsvp. Seminars, too, so besides having these videos and these upcoming in-person seminars, know that in addition to the BlackBerry's that RIM has kindly made available for a small number of us to do final projects with, Microsoft has also now contributed the same, a few Windows phones, so we will post later today on the course's home page a link via you can register an interest in doing if you would like a mobile project, and we will see if we equip you with some hardware. So the rumors are true. A certain someone will be coming to campus on Monday. So as to his name it's Mark Zuckerberg. He's with a company called The Facebook. This is a social networking site. If you're on Myspace go to Facebook.com to try out this one. In case there's been a misinformation, realize that the panel session which will be Friday, ah, Monday evening is open to all the undergraduates, but space is going to be limited to only about 200 people. And so OCS has a URL that you can visit if you would like to apply by submitting your resume as well as a question for Mark, and then they and Facebook will decide exactly how the ticketing is done. So I really don't want to see 614 disappointed faced during CS50 on Monday. Mark will be at MIT during CS50 on Monday, not here. So you can keep doing your thing, and if you're watching this right now in your dorm room, well, you can keep doing the same on Monday as well, but in the evening will be this special event. And you can go to this URL and type in that key word to actually apply for this even. So, it should be fun. Alright. So imagine my surprise when in reviewing the week's videos and looking at our curriculum I came across this little nugget of our section. You may know the character in this video. We film one of our sections each week for our distance students and our sleep year students, and this here is what we saw on Halloween Monday. [ Pause ] . >> [Video Voice] So you might asking yourself what's with the giant pumpkin? Well, of course tonight is Halloween. Um, this year I decided to wear a giant pumpkin again to overcome my self- consciousness in this costume [Laughter]. >> So he goes on and tells like this really sad two minute story about how he showed up to a Halloween party when he was like ten years old dressed as a pumpkin and the only one dressed up at this party as anything. So it's a very sweet tale. But the funny thing was just fast forwarding through random points of Monday's section and just say bunkin', bunkin' like teaching about the post submissions and thinking about web pages. It's actually kind of surreal. But let me play at least the end of this clip so Jason gets closure. >> [Video Voice] So we hope it stays inflated for the full 90 minutes and it doesn't affect the audio too much, and it doesn't distract you, and more importantly that it doesn't distract you from the material. But if does I can't do anything about it because I'm not wearing anything under here so you'll just have to go with it. [ Laughter and Applause ] >> So this is CS50. So, without further ado today's goal is to equip us with some more of the fundamental concepts with which you can start implementing more and more dynamic websites. We are well beyond the point of HTML and CSS now, and we've begun looking, you'll recall, at PHP, this actual programming language as well as briefly on Monday this database language called SQL. So realize that conceptually we've been throwing a lot at you all at once. And they're all related but they are all nonetheless autonomous languages and autonomous technologies that just happen to be co-mingled a whole lot together. And so MySQL you'll recall is a very popular database in which you can store data in table form, in row form and the like. Problem set seven you'll find or are finding hopefully really walks you through the process of using some database tables and the like, and you'll find that it's a very good stepping stone for any web based final project. But there's even more that we can do. So let me turn us back to our copy of some of the frosh IMs code from the other day. I'm going to go ahead here and open up gedit, I'm going to go ahead and open up froshims5 which was one of the last ones we looked at. And recall that this had at least one nice feature whereby in addition to checking did the user give me their name and their gender and their dorm? Also, later on if they didn't I was at least kind enough to not only yell at them with big read text if there's an error, go ahead and display this div tag in red color, while we also later on recall added this little tidbit over on the right hand side. These form elements can have value attributes that can themselves have some string inside of quotes. And so here we had post bracket quote unquote name, so that at least if I got my name right I don't need to redo this. And this might seem like a simple thing, but just imagine or take notice in weeks to come just how many websites don't do these very simple user conveniences. A very underappreciated feature of programming and web development is user interface design. And, frankly, one of the reasons that so many of us are enamored with things like Android phones and iPhones and the like is because some companies do actually get this. And so keep these sorts of things in mind as you design your own project. But HTML special chars long though a function name it is actually does have some compelling use. Why did we wrap post bracket name with this function call, html special chars? What did it do for us? [ Pause ] Anyone at all? Yeah? [ Inaudible Audience Answer] >> Yeah, exactly. So it's to ensure that users can't inject arbitrary code. In this case they can't type like some HTML tags or, as we'll soon see, some JavaScript code that might then accidentally get executed on the browser. And now, recall the simple example I did by outputting like a bold tag, open bracket B closed bracket. It was kind of stupid in that the only one I'm really messing with on Monday's example was myself, right? All I did was make the web page look bad and broken to myself. But recall these things called phishing attacks and the spams we all get daily where there's links linking to websites, and recall, too, that forms can be submitted both by post as well as by GET. And if you submit a form by a GET that just means every piece of information is inside of the URL. So if you can store form submissions inside of a URL and you can put URLs in emails like spam, well, you can trick people effectively into submitting forms. And one of the things you'll be able to compromise potentially if you don't distrust all user input in this fashion is peoples' cookies can be stolen, more on those today. And what this really means, it means if you're logged into something sensitive like your Facebook account or your bank account or the like, potentially some bad guy who has duped you into clicking some phishing email can log into your account and then do whatever he or she wants with your own access. So this is good practice and necessary practice these days. But this form was not all that user friendly. If I go ahead an open up Firefox and I go to local host slash tilde jharvard froshims5, we see this form here and I can start to register. And let me give at least a dorm and click register, but it was only a partial improvement in terms of UI, user interface. Notice I did not pre-populate dorm this time, and that's kind of annoying, right? How do we actually do this? Well, unfortunately, we have to do it in a slightly different way because the dorm field recall was implemented not with an input tag but with something called a select tag. So let me zoom in on the HTML for dorm here, and notice that drop down menus are a little different. Open bracket select, you give the parameter a name, and then you have a whole bunch of options and values down, down, down, down the list, and there's a dichotomy here recall. There's the value of the option and then also what the human sees. For simplicity I did exactly the same thing. I put value equal to what the user sees, but you could have something different if you wanted more descriptive text or in quotes maybe even less descriptive text. But notice that there's no mention of selection here. There's no mention of value. So, of course, when I reload this page or spit out the HTML again it's going to forget that Matthews was selected. Somehow what we need to do here is if I scroll down to Matthews I need to somehow output something like this, not just option value equals Matthews but option selected equals selected value equals Matthews. Now, if you're thinking this looks stupid this does look stupid. The fact that you have to say selected equals quote unquote selected, this is sort of a remnant from the early days of HTML where in the beginning you didn't need to have values associated with attributes. You could just have single key words like this. And, indeed, if people go around and read HTML references you might still see people doing things like this. But the folks decided a few years ago that we need to at least start standardizing the syntax, and for any of those anomalies where they were just single key words it's going to be the same key words equals quote unquote itself. So that is just the way things are for better or for worse. But this is an uninteresting detail intellectually, but programmatically now how do we generate that string among all of these options selectively? In other words, as I'm spitting out or generating in my PHP code this drop down menu, I need to pause with some kind of if condition or branch and say, wait a minute, if this is what the user selected I need to re-select this for him or her by spitting out precisely that HTML. So let's take a look at version 6 then here frosh IMs. So notice at the very top we have a bunch of comments as before, but I've introduced a new feature that you might have seen already in pset7. That of arrays that can be declared using literally a function called array. So, frankly, this is kind of a nasty piece of syntax in PHP that you can't just declare an array with square brackets or some other syntax as we even could in C, but instead you have to literally call a function called array. But, again, this is simply the way it is. So dollar sign dorms I've capitalized just to convey the idea here that this is a global array that I'm going to be using everywhere. But it's just a variable storing this array of elements. So PHP supports normal arrays, bracket zero, bracket one, bracket two. So this is not an associative array, this is just an array. And I've hit enter on every line just to keep it more readable. So notice at the end I have close paren, semicolon, end of function call. So stored in this moment in time then in this variable is that whole array. So this is all copy paste from earlier. I'm just making sure the user has actually submitted all of the fields I care about. So let's see now how I'm generating the HTML. Well, if I scroll down here notice that captain and gender are actually the same, but look how much more elegant now the dorm generation is. So I have a new construct that we predicted would come a couple days ago which is the four each construct. I have this inside of open bracket question mark which says to the web serve here's some PHP code, don't spit this out, interpret it instead. The syntax for this loop is four each variable name as other variable names. So the array comes first and then a temporary variable. We could call this anything we want, but you might as well choose something that's a little more friendly. And then notice the colon here, and this is important because the moment you close PHP mode here stuff is just going to start getting spit out raw. So you need to just be clear to PHP that what follows is actually inside of conceptually this loop. And notice the opposite of this is the somewhat verbosely named endfoureach with no space. So alternatively you could do something C-style. You could actually say, well, open curly brace and then close curly brace here, so this would actually be fine as well. It just, hmm, looks a little uglier perhaps. But either style is fine so long as you are consistent. Then what are we doing here? Well, notice this is not inside of PHP mode. So that means if we're inside of this loop this stuff is just going to spit out literally. So open bracket, option, value equals quote, and then we get back into PHP mode and then we exit PHP mode, close quote, close bracket, open PHP mode and we spit out dorm again. Recall the redundancy of the way I structured this, end PHP mode and then raw HTML again. But I'm not doing something here. I'm kind of skipping a step. I thought I'd just insist it was important which is I'm not escaping dorm here. Why not? [ Pause ] What's that? [ Inaudible Audience Answer ] >> Yeah, that's it. It's as simple as that, right? If I am the one who created this list fo elements I don't really need to call a function and incur the slight computational cost of actually executing a function just to escape a string that I myself wrote. Now, if you really wanted to be paranoid and not even trust yourself or the person you're working with you could certainly do this. But realize the distinction here is as simple as, well, this array was from me not from the user. So now we have the ability to dynamically general this list. We're not doing any kind of ifs or elfs yet to actually see which the user has already submitted, so the end result is actually going to be pretty similar. If I go to froshim6.php in my browser, dammit, that's the mistake David forgot to fix since Monday, so we're going to cheat and copy this and put this over here, and how about by next week we'll fix that problem just like the WiFi question I keep forgetting to ask. [Laughter] So let's go that was not intended to be a running gag. Alright, so here we go, and now I messed up my formatting so that's alright. So, here we go, simulate the correctness. [Laughter] Do as I say, not as I do. Alright, so now I can go ahead and type David and I can click register, and I'm actually still going to get yelled at here, but the menu is still the same. It's not pre-populated yet, but if I actually look at the page source by right clicking or control clicking, I at least see the HTML that I saw before. Now, there's a little more white space this time and some weird indentation. And, again, this is not a big deal. Your output does not need to be pretty printed, but your actual PHP code and HTML you write should be, but this is now machine generated. And, in fact, if you start looking at the source code of most websites you'll see patterns like this where it's all indented identically. And that's not because a human necessarily did that. It's because there's some programming code, some PHP or whatnot that's spitting this out in some kind of loop. So I need to be able to ask myself the question if what I'm about to spit out in this drop down menu is equal to what the user typed in or selected, I need to reselect it for him or her. So let's go ahead and open up froshim7 now. And we see the same array of dorms up top. We see the same error checking code at top. And if I scroll down now notice that I've done something slightly different, and realize there's a bunch of different ways to do this, and you'll see different ways in section and in online tutorials. But here notice that just to keep thing prettier I've only entered PHP mode once at the very top, and then I'm closing it down here, just because, frankly, if you start going in and out and in and out of PHP mode it just gets hard to read. So I decided to write this all in one big fore loop so we have the same looping structure for each dorms as dorm and then, frankly, this is week one stuff again. It's a different language but same idea. If what the user submitted which recall is stored inside of a special super global variable called dollar sign, underscore, post, equals, equals the value of the current dorm as we are looping through all of them will then go ahead and spit out literally option selected equals quote unquote selected, value equals quote unquote dorm, then dorm, then closed option, closed quote, semicolon. And now notice this. This is C stuff also. If I've got double quotes on the outside I have to have single quotes on the inside. Or, what else could I do to avoid confusing the interpreter by having weird quotes in the middle of quotes? You could escape it, right? When in doubt you could do something like backslash quote and that would also get the job done as well. Frankly, it's just a little harder to read so you might as well toggle in and out of single quotes. Else, if the user did not type in the current element just go ahead and spit this out instead. So, again, week one stuff sort of updated for PHP. So let's now go back to the browser. This is version 7. Let me go ahead and open up froshim7.php, and let me go ahead and learn from my past mistakes. Let's go back to 6. Alright, uh huh, alright, alright, and reload, problem solved. David was from Matthews. I'm going to skip gender and captain, register. I'm yelled at but notice what did not break this time. Now Matthews is preselected, and if I go into my source code, view page, source, scroll down, here viola, it's not being formatted as nicely now because I'm not printing out white space but that's fine, too. In fact, we're saving some bytes this way. But if I scroll right, right, right, right, right to Matthews, notice that Matthews does look at little different and that's it. So, again, we're really using PHP now to dynamically generate HTML to structure and stylize our page as we see fit. Any questions? [ Pause ] . Anything at all? Alright. Yeah? >> Did you already explain what echo meant? >> Oh, did I explain what echo meant? No, so sorry. I took for granted what echo meant. And it really means kind of what the word suggests, echo this literally. So it's actually synonymous with saying print. However, echo is just a built in option in PHP so I could either say print and print this out, and let me scroll back to the left, or you can say echo. They're pretty much equivalent. And, in fact, you don't actually need the parenthesis for the echo call which is perhaps useful for some folks. So good question there. Other questions? >> Where did you introduce the lower case dorm variable? >> Ah, good question. Where did I introduce the lower case dorm variable? It was implicitly declared inside of the four each construct itself. So the moment I mention it after the key word as it becomes it comes into existence, and it automatically gets updated on every iteration of the loop. Alright, so recall that where we left off on Monday was with this slightly more exciting example in that we generated emails automatically for froshims8. And if we scroll down here notice that we see the same HTML because this form submits to register 8, and recall oh, sorry, this was not the email example. This was instead the database example. So recall what we actually did with the data here. So let's focus now on the top which was just some error checking, and below that was the database connection. So there's a couple of things going on here. We mentioned when we first started talking about the internet that a single server can certainly do multiple things these days. It can be a web server, email server, instant messaging server. And so when a packet of data arrives at some server on the internet how does the server know if it is an email or web page request or instant message? What was the underlying technology that answered questions of that form? Yeah? >> Port number. >> Yeah, so it was a port number, right? We talked briefly and we had this fun video talking about IP and TCP, and TCP was simply the protocol that says, one, if data gets dropped somewhere along the way, TCP is responsible for re-sending it. But TCP was also responsible for assigning a number to most services on the internet, like web is 80 by convention and also 443 which is the SSL or HTTPS version, more on that today, 25 is email, 22 is something called SSH, 21 is something called FTP and the like. So upon receiving some packet and seeing, okay, this is from my IP address and it's for port 80, the web server knows that this is indeed for him. But in this case there's another something running on the server, in this case the appliance, but it could be an actual server on the internet, and that's something called a MySQL server, a database server whose purpose in life is to listen for connections, listen for request for databases, and then listen for things like insert and delete and select and the like. So if we look here at these first two lines the first line is literally doing that, having my PHP code open in connection, a network connection to the local host using user name jharvard and password crimson. In theory the server could be elsewhere on the internet, but because MySQL traffic is not encrypted by default it's not a good idea to try to connect to a database server elsewhere on the internet from your own web server, so they're usually in the same company, in the same building or the like, or literally on the same machine as is the case here. So what we next do is we select a very specific database. This is like opening a specific Excel file on your desktop even though you might have multiple ones. And then we have to get into this habit of avoiding dangerous input, scrubbing input or error checking really. So notice I'm calling this very long named function which really does something as simple as any time it sees like a quote mark it makes it backslash quote mark and a couple of other things. Very simple things so that you're not accidentally tricked into executing a delete statement or an update statement or something potentially dangerous. I don't have to bother scrubbing the captain input, because remember captain is a check box and so its value is going to be nothing at all or it's going to be quote unquote on, so I'm not even passing the user's input literally. I'm just checking implicitly if the captain field has something in it go ahead and set captain to one, else set captain to zero. And then I'm just going to literally insert my own one or zero. Gender I do want to scrub because the user is submitting either M or F, and I don't want them potentially to submit something other than those. And then here, too, with MySQL real escape string I'm making sure that the dorm the user has submitted is also legitimate. Now, why is this even a concern? If we go to the web page, right, the only values I can choose from are those here in the drop down. But we saw very simple examples of this the other day. If I right click or control click on this, recall that there's this inspect element option thanks to a free plugin called Firebug. This lets me do a bunch of things. For one thing it, remember, cleans up your HTML, makes it nicely indented and you can expand and collapse it which just makes it user friendly to navigate and poke around. But you can literally change things in the web page, not permanently but on my own computer. So I could actually change Matthews to be something like University Hall is where I live, and we'll change this to be University Hall, and then I go back here and, viola, I've changed the form. So now all I have to do is click submit, and even though the web server did not design for University Hall to be in this list, what's going to get submitted is University Hall. Now, this is kind of innocuous and silly, but what if I instead did something in my HTML like, well, not University Hall but maybe something like a quote mark and then delete from users enter, now the quote mark, let's get rid of the quote mark because I just broke my own HTML attack, so let's just simulate it with delete, okay? So this is an oversimplification. It's not sufficient just to send delete to the server, but it's this easy to actually change the web page. And, frankly, real hackers don't go using these free gooey tools and changing the HTML. You would actually write a program, a little PHP script like we did on Monday for doing spell checking in PHP, and you can simulate being a web browser. In fact, recall that the survey from problem set five where we asked you guys to gripe about things that could be better on campus, one of the top contenders was this website here. Wait, let's go back to a different browser here, let's use Chrome, was this guy here. It seems they have a whole lot of information on this site when really all of us apparently only care about like what's in the menu, and even getting there sometimes takes multiple clicks. But there's a lot of news going on at HUD's right now. But in any case we have here we go, case in point, this week's menu, hot entrees, okay. So here is the menu, and unfortunately this is hosted I think by some third party product that maybe Harvard has paid for. This is all HTML, and this menu changes every day because presumably HUDs has their own database. And so we could, frankly, as humans just say, alright, well, I'm going to highlight and copy this, I'm going to paste this into my own database. Highlight and copy turkey noodle soup, paste this into my own database or your Excel spreadsheet. And, frankly, if you've ever done a research project or even something for a student group there's probably some very tedious process you've done at some point involving the web that could have ideally been automated. And so what we actually do for CS50 is if you go to manual.cs50.net/apis for application programming interfaces, you'll see that CS50 has a whole bunch of APIs. API we'll talk more about next week but, again, it's a way of interfacing your code and your programs with someone else's code or someone else's data. And so every year we get very common requests for how can I make something related to the course catalog or events on campus or food or maps or news or tweets and the like. So what we as a course have done is created an API via which you can query CS50's server and get information like what's on the dining hall menu today or tomorrow or next week or what was it a year ago today? What's the nutritional content? And so if you go to the HUD's the Harvard food API from CS50 and scroll down you'll see that there's a lot of detail on how to do this. But if you're familiar with Excel files you might also be familiar with CSV files, comma separated values files. These are just sort of simple spreadsheets. And so what you can do by visiting a certain URL that's there on the top right is you can provide a specific date. So in this case here, let me scroll down to the menu, not the nutritional one, so notice here that we have told folks that if you want to get the breakfast menu for date of March 21, 2011, you can literally visit this URL on CS50's server, food.cs50.net, and what you will get back effectively is a big CSV file, an Excel spreadsheet. And this is just what we're doing for problem set seven when we grab data from Yahoo Finance. You're getting data back like this. But unfortunately HUDs does not provide like Yahoo does a little download link. So what we as a course had to do was, well, we opened up this page and we looked at view page source, and we ignored all the distractions up at top, and we started looking for patterns. And once we found a pattern like this feels like patterns, let's actually use control f, so let's look for chipotle corn bisque, control f, alright there it is. So here is the HTML in HUD's website, and notice it's in a whole bunch of text, some of which we've seen, some of which we haven't. There's some div tags, some image tags and the like. But, my God, like this is how the data is actually presented on the internet. So long story short what we did as a course is we wrote something called a screen scraper. This is, frankly, a tool of last resort, but we have a program running on cs50.net that every day pretends to be a browser like Firefox, goes to HUD's website, downloads the HTML and it throws away the images and uninteresting stuff like that, and then we quote unquote parse all of this HTML. We look for TD tags, input tags, div tags, and then we look for patterns like, oh, this looks like a piece of food. And then we associate that back with the date and the meal and so forth. And long story short we scrape all of this data into our own database so that we can re-expose this. Now, what's the relevance then to what we just did with the simple form submission? Well, it's very, very easy to pretend to be a browser. All you have to do is understand HTTP, understand HTML and you can simulate all of this. In fact, for final projects every year we always have folks who want to grab like sports scores from ESPN.com or the like. So realize that in manual.cs50.net there's a long article on how you can write your own screen scraper to get most any data you want from the internet in order to do analyses. People have done this with Facebook friendships and so forth to do research projects and the like, but realize that it is now within your grasp. So, we've scrubbed out inputs both for name, captain, gender, dorm. Now we have to construct a SQL query. So this query here is of this form, insert into table name, and then a comma separated list of fields that we want to insert into, then values and then another comma separated list of the actual values we want to insert and then that's it. MySQL query passing in that SQL code and viola, it's now in our database. So, who cares? What do we actually do when it's in our database? Well, think about what we could now do. We could whip up a little script for a proctor and we could say something like this. And I'll just do a cursory form of this. Let me go ahead and do registrants, oops, let's do open PHP mode, close PHP mode, registrants, actually let's just open this one. Oops, not that. Let's just open this one and viola, it's already out of the oven. The program now will look a little something like this. So connect to database. Same exact thing up top. We connect with local host jharvard and crimson to that specific database called week nine. Notice then MySQL query I just create a variable called dollar sign SQL. I store in it what's apparently another SQL query, and this one follows the form select field names, so I could do name, captain, dorm, gender in any order, or a little more succinctly I can just say star which is the wildcard operator from the table name. I could write this in lower case here, but as a matter of style it's I would say easier to read when at least your special key words are all in upper case. However, realize this FAQ, table names and field names should be case sensitive. So if you capitalized it in your database do it that way in your code. And then execute this query. But here is the common sticking point. When you execute this database query and you use the select command, what do you actually get back? Well, inside of the database are these tables, right, like Excel worksheets. So what you're getting back is not some one person's name and gender and so forth. Rather you're getting back what we'll call a result set. Think of this is as an array of rows from the database. So if we scroll down here and we actually want to write this page that's supposed to show the proctor who's in charge of frosh IMs who has registered, notice we actually have to ask this result set that we got back from MySQL query which is up here, give me a row, give me a row, give me a row. And we can do that with a while loop. We can say while there is a row to give me, so this function MySQL fetch array when past a so called result set, a collection of all the rows in the database that match that select query, go ahead and assign one at a time to an invariable called row, and then I can get at the individual fields in that table by doing dollar sign, row, open bracket, quote, unquote, name, closed bracket. So based on this syntax what type of variable or what type of data structure is row at this point in the story? [ Inaudible Audience Answer ] >> Yeah, it's an array but more specifically an? It's an associative array, right? It's an array that can have not just numeric indices but words as its keys. And so what's this going to do? LI is list item, UL is unordered list. So let's just jump to the aesthetic results here and go back to registrants, stop PHP, this is going to talk to that same database, let me go into registrants and, viola, David has registered. Well, it's not all that interesting right now. Let's go into register, let's say eight, stop PHP, let's actually register Matt this time as the team captain from I don't remember where he lives so we'll say Apley Courts, register, okay, Matt is apparently registered. Well, let's check. Let's go back to registering stop PHP and now we have Matt. And then we could have another person, and we can now see this. If I open up my little administrative tool called phpMyAdmin and I log in as jharvard and crimson, I then get to see this sort of web based interface for all these tables one of which is week nine registrants. And if I zoom here notice that despite all the messy words and icons there's David, there's Matt. And if I want to go around here and even modify things, say Matt actually wants to change his name, this is not something Matt himself should do. But you as the administrator could certainly change a row which then has the effect of changing the actual table. So now we have the ability to store data as long as we want. And, frankly, not to sort of set the expectations too high, this is at the core of what even Facebook did early on. You have someone register, you ask for their name, you ask for their residence, you ask then for them to list off their friends and so forth. You can do all of that simply by creating these tables and storing that kind of information. And any time you have something conceptually different that you want to store, say user profiles and friends and likes and activities, you can have a different MySQL table doing all of that. And recall from problem set seven that we don't just store user's user names and their hashes of passwords. What other field is also by default in P set seven associated with each user? [ Inaudible Audience Answer] >> What's that? So not just money but an ID, and ID specifically. So here's the user's table, recall that you get for P set seven. You have all these user names and these hashes which are hashes of passwords, but we also gave everyone a unique ID. And if you read through the P set spec you'll see that this ID is auto incrementing, which mean I in code do not have to generate or figure out what the next number should be. The database will do that for me. So Facebook did the same thing. Some of you who have never signed up for nicknames for your URLs might have facebook.com/ profile.php?ID=12345 or some number much bigger. That's because all of us have unique Facebook IDs. And you can actually infer from those IDs who's been on Facebook the longest. The bigger your number is the later you signed up. So there are certainly folks this is really sad that I know this, I am number 6,545. That was apparently the number in which I signed up. Mark I think for some reason his ID equals 3, and then there's also some familiar names if you poke around the Facebook API where you can see everyone's IDs. But long story short why have IDs for users when we already have user names which themselves are supposed to be unique? Caesar, Chartier, Guest, Jharvard, why have IDs at all? >> It's much easier to index ints. >> What do you mean index ints? >> Well, on the database you can find the past [inaudible] for example the order by number. >> Perfect, right. So it actually is for performance reasons. Caesar is kind of a short word, but it's one, two, three, four, five, six characters. Jharvard is slightly more. Rbowden is a few characters as well. That's a bunch of bytes. And certainly for longer user names like some you have for your atcollege.harvard.edu accounts, that suggests that you have to do string comparisons a lot, StrComp if you think back to the C function. It feels like it should be much faster if you instead give everyone a unique number like an integer, then you're using only four bytes or 32 bits, and plus then you can then sort things numerically, you can create as we've seen in class more sophisticated data structures like link lists and hash tables and the like that store those numbers. So, indeed, storing ID numbers that are ints or something called big ints which are 64 bit integers tends to be the best way of storing your data so that then if you want to list for, say, Caesar, all of his friends, you can have another table, and let's whip something up here. I'm going to go ahead and create a new table for P set seven, which is irrelevant to P set seven, but we'll use the same users called friends. I'm going to give this two columns, enter. And the first column I'm going to say user A and over here I'm going to say user B. And if I give these guys both integer types, the idea here is that if I want to associate users with other users to form symmetrics, whether symmetric or asymmetric, all I have to do is say that Caesar, for instance, is friends with, who was number 2, well whoever number 2 was. So if I want to say that Caesar and Matt are friends, all I have to store in this table 1 and 2. And as you'll see before long, there are ways to then join these tables. So as long as you have one field in common like an ID field, you can actually figure out or look up users in one table whose IDs are already in another. And in fact even in CS50's own core shopping tool, we have this notion of what your friends are taking or shopping in the way of courses. This is literally what we do. When you sign into Harvard course using your Facebook long in, one of the things Facebook gives us because we're using their API is a huge array of all of the IDs of all of your friends. And can then use those IDs to look up your friends' names and profile pictures and their courses as well. So if nothing else if you're just curious as to what Facebook actually makes available to people you can play around with their tutorials on line and see just how much information is being shared and, frankly, how much data we even have on you just because you've logged into your Facebook account. So that little warning that freaks you out or you completely ignore and just okay these days, it's actually your consent to giving websites a whole bunch of data about you in machine readable format as we'll see quite soon. Any questions? Feels like a tough crowd today. Why don't we take our five minute break here. Alright, it feels like we definitely have a dead feel today. So I'll try to tell a scary story at the end that affects your privacy and security. Alright. [Laughter] Alright. Don't wake that person up in the green there today. Tell her I say hi. Alright. [Laughter] Okay, so we promised that there's this feature to store data in the server even after a user has gone from one web page to another and that icon has stopped spinning, right? We mentioned on Monday that HTTP is stateless which just means that you don't maintain a persistent connection to the server usually. Rather you have to click on a link or submit a form to actually go from page to page. Now, there are some exceptions to this. Facebook itself right now actually a lot of something called AJAX which means a lot of JavaScript code is constantly querying the server saying do I have an instant message, do I have an instant message, do I have a status update or the like. So not all website do this, but most websites do actually not maintain a state unless you do it do not maintain a persistent connection to the server. But suppose we want to do that. Let me go ahead and open up a file called counter.php in gedit. And this is a very short program among this week's code that looks like this. At the very top I've got some comments, and then I've got this new function that you may have seen in problem set seven but just taken for granted called session start. Session start is simply a function that PHP uses to tell the web server give me access to a special global variable called dollar sign, underscore, session. This is another associative array inside of which you can put anything, for instance, the contents of someone's shopping cart, their user ID to remember that they've logged in, any data that you want to persist from page load to page load. And, frankly, any website that has users logging in these days uses sessions so that, again, the icon can stop spinning and the connection can close so this is partially for scalability sake. If you go to Amazon.com, it might take a second or two to download the whole web page, but you the human might spend five seconds, a minute on that web page, and it would just be a waste of resources to have your browser constantly connected to the web server for all of that minute. So instead the browser disconnects, you see the content, and then you can click something or submit something to actually get more content. So session start ensures that even if this user disconnects and, frankly, even if this user closes their laptop lid, walks across campus and goes back to their dorm and opens the laptop lid, then it's going to still remember who you are and that you're logged in. It's relatively rare that you need to re-log into Facebook and other sites because they're remembering that's your log in, especially if you click that little check box that most sites have. So this line of code just means give me access to dollar sign, underscore, session. So what am I doing next? Well, we've seen the isset function before. It just says is this variable set, does it have a value? And I'm going to say if isset, dollar sign, underscore, session, quote unquote counter. So if this variable is bin set what do I want to do? I want to grab its value, and for reasons we'll see in a moment I want to store in a variable called dollar sign counter, all lower case. Else, if this variable is not set in the session, that is we've never seen this user before, go ahead and initialize this counter variable to zero. Now, down here we have dollar sign, session, quote unquote counter, GETs, counter plus one. So this is a counter in the literal sense. We're doing plus one, plus one, plus one every time this code is execute, that is every time this page is loaded. What does the page do? It's actually very simple. You have visited this site some number of times. I'm kind of regressing back to week two or three where I cut grammatical corners but so be it. The point here is that I'm outputting the counter value. So let's open this up. Let me go ahead and open up Firefox. Let me go back to local host in John Harvard's account and then open up counter and, okay, I've visited the site zero times. Let me zoom in. Let me go ahead and hit reload or control R one time, two times, three times, and this will very quickly get boring, but notice at the very top of Firefox it is connecting to the server and then disconnecting, so it's apparently remembering somehow that I've been here before. Now, this is a simple example, but in real websites like Facebook and the like, it just remembers that you've logged in. That user 6,545 is logged in so they don't have to pester me on every page give me your password, give me your password. Or, in the case of Amazon, so that they don't forget as you go from page to page what's already in your shopping cart. So you could put product IDs, not user IDs, in a shopping cart as well. So let's see how this actually works. Let me go ahead and open up let me first go ahead and clear my cache, and so this, too, should be a good habit to get into when doing anything web related when writing software is to clear your browser's cache constantly just so that if you already changed some code on the server but the browser didn't realize for efficiency reasons, this way you're telling it to re-download the code. So I'm going to go ahead and clear now. I'm going to go ahead and open up Firefox anew. And I'm going to visit this URL, but before I hit enter I'm going to click this guy up here. So pre-installed in the appliance is that firebug tool. If I click this it's going to open here at the bottom, and what I'm going to do is scroll not to the HTML part this time but to this tab, the net tab. So by default it's off for performance. I'm going to click enable the net tab, and now I just get another tab whose purpose in life is going to be to sniff all of my HTTP traffic very similar to what that live HTTP header's plug in did a while ago for us. So here we go. I'm going to go ahead and hit enter and, voila, apparently I've only visited one web page. It's called counter.php. It used HTTP GET so there's no post involved. It was on this server called local host. The file that came back was 140 bytes. The IP address that it's on is apparently 127.0.0.1. That is the numeric synonym for quote, unquote local host, and port 80 means web server. That's all. So we're seeing all these fundamentals here. So let me go ahead and expand this thing, and we'll see exactly what was sent from client to server. So let me scroll down here, the request headers. So in addition to sending that GET request, it included, my browser, all this information, the name of the server it's contacting, the user agent which is the browser, so this cryptic string uniquely identifies this version of Firefox on this version of Fedora Linux. Then there's some slightly uninteresting stuff about what the browser supports, and it also mentions here, oh, I accept GZip, so one way that web browsers and servers save time is they compress information, HTML on the fly automatically, so this is the browser's way of say, hey, I can support compression if you want to compress the responses you're going to send me. So what did the server send back? Well, here are the server's headers. The server announces the date and time in Greenwich mean time here. It mentions what server software we're running. It mentions the version of the server software we're running which is actually a potential security hole, but in the appliance we leave everything on for debugging purposes. This is also a potential security concern, the server is also very freely saying, by the way, I have PHP installed. Moreover I have PHP 5.3.8 installed. Why is this probably not the best practice for a server in general to do this security wise? [ Inaudible Audience Comment] >> So it's saying you could inject code potentially. So, one, you're obviously revealing that you're running PHP as opposed to other languages, and it's not always obvious from the URL. Two, suppose that there's some bug discovered in PHP's interpreter, and so some big announcement goes out on the internet on various email security lists and says, hey, everyone beware PHP 5.3.8 is buggy. There's this security hole in it, here's how people can take advantage of it so be sure to update. Well, the whole world is not going to update instantaneously. So all the bad guys have to do now is troll around on the internet looking for IP addresses of web servers that say, hey, I'm running that buggy version of PHP, here is my proclamation thereof. So bad practice in production servers, but for development purposes on an appliance it's okay in this case. But here's the magic. Set cookie. So you might generally know that cookies are some kinds of files or information planted by web servers on your browser. How is that done? Literally as simple as this. When you request any web page from a server, a response comes back that includes all of this juicy information, date and time, server name and so forth, but also potentially an HTTP header that literally says set cookie, and then it gives the cookie a name and a value and then potentially some other details. So what's really happened here is that PHP, thanks to the session start function, has automatically sent a cookie to my browser called PHPSESSID, which is just the convention, and then a big crazy value of this. And this is essentially a pseudo random string that's ideally supposed to be unique associated and given only to me. So henceforth the connection closes, and there is no spinning icon or anything. I've visited the website zero times, but notice if I reload this page now, and let me collapse this back to just one line, if I reload this page it does indeed say at top left I've now visited one time, but let's now look at this request. In this request, in the request header, notice what my browser has perhaps presumptuously sent to the server. It's sending not a set cookie header, just a cookie header. And so my browser is saying, hey, by they way the last time I visited you, you gave me a cookie called PHPSESSID and here is the value that it was equal to. So you can think of this as a handstamp at like an amusement park or a club where they stamp your hand to indicate that you've paid or that you're 21 plus. And if you're ever asked this question again you don't have to take our your ID or your ticket, you instead just show your handstamp. And this is really what's going on with cookies. You're saying I've been here before, I've been here before. And what's stamped on your hand is this really big number, because what the web server then does is it stores in its own database that big number and associates with that big number the contents of dollar sign, underscore, session. So that big associative array in which you can put user IDs, friendship IDs, shopping cart contents, it's stored somehow on the server, and it's associated with that big number so that the browser and server in the future assume that anyone who presents this number must be the guy that I gave this number to in the first place so, voila, let me let you pass. So it works really nicely, and it doesn't require that you maintain a constant connection to the server. It doesn't require that you physically remain in the amusement park or club. You can get back in just by showing this handstamp. So where is the opportunity for bad guys now? How do you exploit this very useful HTTP feature? What can you do? [ Inaudible Audience Comment ] >> So getting the cookie, right? So this cookie is being sent back and forth from server to client, but it's being sent over HTTP, specifically over port 80. And 80 is generally not encrypted, 443 or URLs that start with HTTPS are encrypted. So what this literally means is that if my browser is requesting this web page and then getting a response, and the server is not in the appliance, on the same physical computer, but it's a normal server on the internet, Facebook.com, Amazon.com, what this means is that that server is replying and saying here is your big secret number, send this back to us every time you revisit. But if you're using HTTP you're literally showing your hand to everyone on the internet saying here's my big secret number, right? It's not really a secret. Now, there are solutions to this, namely HTTPS which means if you know just casually as users it encrypts everything. And that also encrypts your handstamp and all of these cookies that are involved. But most, many, they're saying most websites do not encrypt traffic by default. In fact, only up until a few months, and only a few months ago did Facebook itself start offering across the board this ability to use HTTPS for all of your connections. And so as I think I mentioned a week or two ago it was perfect timing in week eight of CS50 in fall 2010 this researcher released a tool, a plug in for Firefox called Firesheep. And this tool simply automated the process of looking around the room with WiFi capabilities and saying who is showing their handstamp at this point in time? And the fellow special cased web sites, popular ones like Facebook and Gmail and twitter and a bunch of others. So he was looking for patterns like Gmail.com, Facebook.com, and any time he saw in the room someone with a handstamp for or from that domain name, he would then listen to it and store it in the program's memory and then display it to the user. And so what you see when using a tool like this is a little something like this. So here is a screen shot. Things have gotten a little more locked down now both in terms of WiFi and also in terms of this tool working. But what you would see is you load up Firefox, and we actually did this in class if you want to watch last year's video, you click the start capturing button, and then what you see within a few seconds are all of the unsuspecting people who are logged into these various websites. So these are actual screenshots from the fellow's presentation where he logged into one of his buddy's accounts. But rewind about 12 months to CS50 this was great fun. We had a like a list of 33 CS50 students who were using like Facebook at that moment in time. And what this tool allows you to do is with a single double click log into that person's Facebook account as them. So how does this actually work? It's actually really simple, right? If all of the internet trusts that you'll just present this handstamp, this cookie, to prove I'm already logged in, you don't need to ask me to log in again, well anyone who can sniff that cookie then can present him or herself as that person. And they don't know your Facebook password or your Gmail password but that doesn't matter because they're already into your account. So double click that name and what you would see in the browser is, voila, Ian Gallagher's accounts, or in last year's case one of our TF's accounts at which point I could proceed and post on her wall or poke people or do anything because Facebook is not going to re-authenticate me. It's going to assume, hey, you have the cookie and it must, in fact, be you. So it seems that this wonderfully useful system that like every website uses on the internet uses today is fundamentally flawed. In fact, this is still possible. I think, unless Facebook's made it by default, you have to go into account at the top right and tinker around with the security settings and say always use secure connections. Gmail started doing this a few months back as a result of some of the hacking incidents that they had. But most websites don't do this partly for performance reasons, partly for naivety reasons whereby just the consumers aren't demanding this or they're just not cognizant of this. In fact, the only websites that really tend to enforce SSL, the HTTPS-type sites all the time, are now Facebook and Gmail and banks and cs50.net since we got bitten, too, shortly after that presentation. [Laughter] So what's the takeaway here? Well, how do you solve this problem, right? Like we introduced this last year, albeit at the risk of teaching 500 students how to then hack into their roommates' computers that day, for good purposes. How do you defend against this? Because it is still possible. And if you're sitting somewhere on campus, if you're sitting in Starbucks, an airport or even your own home with siblings, you are vulnerable to interceptions of data especially if your WiFi connection is not secure. If you don't see that little padlock icon or have to type a password to get onto the WiFi network you're particularly vulnerable. So what can you do? Well, websites unfortunately have to do most of the solving for us. But at least at home, back home home, if you control your wireless network you can at least turn on that padlock icon. It's not just for the sake of keeping random people outside your house from using your WiFi for free. It genuinely is a security concern. They could see not just not just log into your Facebook account but see all of the traffic you're sending, right? If you haven't realized most of the instant messages you send are typically not encrypted unless you're using Gmail these days over SSL. So all of those IMs you're sending friends, all of those emails you're sending friends, are still going out on the internet in the clear. Even Gmail if you send from your college.harvard.edu to your personal Gmail account, even if you're using SSL, the moment you hit send if you're emailing an outsider on the internet who is not using Gmail servers, bam, that email is out there. And anyone in theory between points A and B can see all of your traffic. So WPA2 refers to an encryption protocol that you can use on your own home wireless network. Using HTTPS is probably the most resilient approach to protecting your Facebook account. But that's only because they now support this. A lot of websites again assume or face the reality that turning on SSL might just be expensive computationally. And this isn't always the case. It depends on the hardware and software you have. But in theory if you need to not just send data but encrypt the data and then send that data, just intuitively that's going to take some CPU cycles. And even if it's just a few cycles, you only have a finite number. So that means if you need to do more work per user, well, you're going to have to have more servers to sustain the same number of users potentially. But there's also these other tools so that you're well equipped. Force-TLS is a plugin for Firefox if that's your browser of choice that will try to force all of your connections to SSL if the website actually supports it. Another one from the EFF it's called HTTPS Everywhere which does something quite similar as well. So with these mechanisms and honestly just a bit of savvy you can protect yourself. But another very robust mechanism that assumes you have access to something special is that of a VPN. So Harvard has something called a Virtual Private Network. A lot of companies have this. You can even set one up in your home. So henceforth if you're ever particularly worried about doing something sensitive like checking mail or financial data in a public space, whether it's at the Harvard University SSID or if it's in Starbucks or the like, even if that wireless access point does not offer encryption, you can connect to VPN.FAS.Harvard.edu. You'll get redirected to an SSL connection. You can then log into Harvard's VPN. Your computer will then be given an IP address on Harvard's network, not on Starbucks network or the like, and henceforth all of your traffic will be encrypted between you and Harvard. After that who knows where it's going to go. And, frankly, after that Harvard knows everything you're doing, so just realize when you connect to Harvard now all of their servers have access to your data. But if you're at least trying to prevent the sketchy guy next to you in Starbucks from looking over your shoulder virtually and sniffing your passwords and poking your friends, you can at least secure yourself with any of these particular mechanisms. So I promised a sketchy security oriented ending. Why don't we go ahead and end on that note today early, and I'll take one-on-one questions up here. See you next week.