1 00:00:10,046 --> 00:00:11,416 >> Alright, welcome back. 2 00:00:11,656 --> 00:00:15,346 This is CS50 and this is the end of week nine. 3 00:00:15,346 --> 00:00:17,366 So just a couple of FYIs. 4 00:00:17,366 --> 00:00:19,906 This Friday we'll resume the tradition of lunch 5 00:00:19,906 --> 00:00:24,556 if you are available, 1:15 p.m. Go to cs50.net/rsvp. 6 00:00:24,556 --> 00:00:28,846 This week we'll be joined by Eugene Chung of NEA, a VC firm, 7 00:00:28,846 --> 00:00:30,986 as well as by Andrew McCollum, one of the co-founders 8 00:00:30,986 --> 00:00:33,896 of Facebook, so if you'd like to eat as well as strike up chats 9 00:00:33,896 --> 00:00:37,236 with these kind folks and CS50 staff, do rsvp. 10 00:00:37,976 --> 00:00:40,636 Seminars, too, so besides having these videos 11 00:00:40,636 --> 00:00:43,966 and these upcoming in-person seminars, know that in addition 12 00:00:43,966 --> 00:00:46,476 to the BlackBerry's that RIM has kindly made available 13 00:00:46,476 --> 00:00:49,836 for a small number of us to do final projects with, 14 00:00:49,836 --> 00:00:51,886 Microsoft has also now contributed the same, 15 00:00:51,886 --> 00:00:54,516 a few Windows phones, so we will post later today 16 00:00:54,516 --> 00:00:57,476 on the course's home page a link via you can register an interest 17 00:00:57,706 --> 00:01:00,116 in doing if you would like a mobile project, and we will see 18 00:01:00,116 --> 00:01:01,826 if we equip you with some hardware. 19 00:01:02,536 --> 00:01:04,566 So the rumors are true. 20 00:01:04,806 --> 00:01:08,186 A certain someone will be coming to campus on Monday. 21 00:01:08,816 --> 00:01:14,666 So as to his name it's Mark Zuckerberg. 22 00:01:16,196 --> 00:01:18,996 He's with a company called The Facebook. 23 00:01:19,466 --> 00:01:21,316 This is a social networking site. 24 00:01:21,316 --> 00:01:22,556 If you're on Myspace go 25 00:01:22,556 --> 00:01:24,706 to Facebook.com to try out this one. 26 00:01:25,246 --> 00:01:27,866 In case there's been a misinformation, 27 00:01:27,866 --> 00:01:30,556 realize that the panel session which will be Friday, ah, 28 00:01:30,556 --> 00:01:33,506 Monday evening is open to all the undergraduates, 29 00:01:33,546 --> 00:01:36,256 but space is going to be limited to only about 200 people. 30 00:01:36,256 --> 00:01:39,656 And so OCS has a URL that you can visit if you would 31 00:01:39,656 --> 00:01:43,066 like to apply by submitting your resume as well as a question 32 00:01:43,066 --> 00:01:44,876 for Mark, and then they 33 00:01:44,876 --> 00:01:47,656 and Facebook will decide exactly how the ticketing is done. 34 00:01:47,656 --> 00:01:49,506 So I really don't want 35 00:01:49,506 --> 00:01:53,026 to see 614 disappointed faced during CS50 on Monday. 36 00:01:53,256 --> 00:01:56,816 Mark will be at MIT during CS50 on Monday, not here. 37 00:01:57,096 --> 00:01:59,036 So you can keep doing your thing, 38 00:01:59,036 --> 00:02:01,306 and if you're watching this right now in your dorm room, 39 00:02:01,526 --> 00:02:03,456 well, you can keep doing the same on Monday as well, 40 00:02:03,586 --> 00:02:06,486 but in the evening will be this special event. 41 00:02:06,486 --> 00:02:09,336 And you can go to this URL and type in that key word 42 00:02:09,336 --> 00:02:11,806 to actually apply for this even. 43 00:02:12,396 --> 00:02:13,856 So, it should be fun. 44 00:02:14,416 --> 00:02:16,796 Alright. So imagine my surprise 45 00:02:16,866 --> 00:02:19,886 when in reviewing the week's videos and looking 46 00:02:19,886 --> 00:02:21,446 at our curriculum I came 47 00:02:21,446 --> 00:02:23,516 across this little nugget of our section. 48 00:02:24,156 --> 00:02:25,776 You may know the character in this video. 49 00:02:25,776 --> 00:02:28,396 We film one of our sections each week for our distance students 50 00:02:28,396 --> 00:02:29,496 and our sleep year students, 51 00:02:29,496 --> 00:02:32,866 and this here is what we saw on Halloween Monday. 52 00:02:33,516 --> 00:02:36,516 [ Pause ] 53 00:02:37,016 --> 00:02:37,083 . 54 00:02:37,083 --> 00:02:38,426 >> [Video Voice] So you might asking yourself what's 55 00:02:38,776 --> 00:02:39,776 with the giant pumpkin? 56 00:02:39,776 --> 00:02:43,276 Well, of course tonight is Halloween. 57 00:02:43,276 --> 00:02:46,806 Um, this year I decided to wear a giant pumpkin again 58 00:02:46,896 --> 00:02:51,516 to overcome my self- consciousness 59 00:02:51,516 --> 00:02:52,416 in this costume [Laughter]. 60 00:02:52,586 --> 00:02:58,266 >> So he goes on and tells like this really sad two minute story 61 00:02:58,266 --> 00:03:00,566 about how he showed up to a Halloween party when he was 62 00:03:00,566 --> 00:03:03,676 like ten years old dressed as a pumpkin and the only one dressed 63 00:03:03,676 --> 00:03:05,246 up at this party as anything. 64 00:03:05,566 --> 00:03:07,886 So it's a very sweet tale. 65 00:03:07,886 --> 00:03:11,126 But the funny thing was just fast forwarding 66 00:03:11,126 --> 00:03:13,006 through random points of Monday's section 67 00:03:13,006 --> 00:03:14,966 and just say bunkin', bunkin' like teaching 68 00:03:14,966 --> 00:03:18,596 about the post submissions and thinking about web pages. 69 00:03:18,596 --> 00:03:19,906 It's actually kind of surreal. 70 00:03:20,136 --> 00:03:21,836 But let me play at least the end of this clip 71 00:03:21,836 --> 00:03:23,126 so Jason gets closure. 72 00:03:23,186 --> 00:03:25,936 >> [Video Voice] So we hope it stays inflated 73 00:03:25,936 --> 00:03:26,916 for the full 90 minutes 74 00:03:26,916 --> 00:03:28,546 and it doesn't affect the audio too much, 75 00:03:28,586 --> 00:03:30,906 and it doesn't distract you, and more importantly 76 00:03:30,906 --> 00:03:32,376 that it doesn't distract you from the material. 77 00:03:32,416 --> 00:03:34,186 But if does I can't do anything about it 78 00:03:34,186 --> 00:03:36,196 because I'm not wearing anything under here 79 00:03:36,196 --> 00:03:37,666 so you'll just have to go with it. 80 00:03:38,516 --> 00:03:42,616 [ Laughter and Applause ] 81 00:03:43,116 --> 00:03:46,846 >> So this is CS50. 82 00:03:47,586 --> 00:03:52,206 So, without further ado today's goal is to equip us 83 00:03:52,206 --> 00:03:54,556 with some more of the fundamental concepts 84 00:03:54,556 --> 00:03:56,526 with which you can start implementing more 85 00:03:56,526 --> 00:03:57,806 and more dynamic websites. 86 00:03:57,806 --> 00:04:00,816 We are well beyond the point of HTML and CSS now, 87 00:04:00,966 --> 00:04:02,546 and we've begun looking, you'll recall, at PHP, 88 00:04:02,546 --> 00:04:05,816 this actual programming language as well as briefly 89 00:04:05,816 --> 00:04:09,216 on Monday this database language called SQL. 90 00:04:09,536 --> 00:04:12,326 So realize that conceptually we've been throwing a lot 91 00:04:12,326 --> 00:04:13,386 at you all at once. 92 00:04:13,386 --> 00:04:16,836 And they're all related but they are all nonetheless autonomous 93 00:04:17,086 --> 00:04:20,646 languages and autonomous technologies that just happen 94 00:04:20,646 --> 00:04:22,786 to be co-mingled a whole lot together. 95 00:04:23,016 --> 00:04:26,326 And so MySQL you'll recall is a very popular database 96 00:04:26,326 --> 00:04:28,856 in which you can store data in table form, 97 00:04:28,856 --> 00:04:29,856 in row form and the like. 98 00:04:30,116 --> 00:04:31,606 Problem set seven you'll find 99 00:04:31,606 --> 00:04:34,156 or are finding hopefully really walks you through the process 100 00:04:34,156 --> 00:04:37,056 of using some database tables and the like, and you'll find 101 00:04:37,056 --> 00:04:38,596 that it's a very good stepping stone 102 00:04:38,596 --> 00:04:40,156 for any web based final project. 103 00:04:40,386 --> 00:04:41,836 But there's even more that we can do. 104 00:04:41,836 --> 00:04:45,856 So let me turn us back to our copy of some 105 00:04:45,856 --> 00:04:47,706 of the frosh IMs code from the other day. 106 00:04:47,706 --> 00:04:49,506 I'm going to go ahead here and open up gedit, 107 00:04:49,506 --> 00:04:53,576 I'm going to go ahead and open up froshims5 which was one 108 00:04:53,576 --> 00:04:55,286 of the last ones we looked at. 109 00:04:55,286 --> 00:04:59,096 And recall that this had at least one nice feature whereby 110 00:04:59,646 --> 00:05:02,446 in addition to checking did the user give me their name 111 00:05:02,446 --> 00:05:03,666 and their gender and their dorm? 112 00:05:03,726 --> 00:05:07,246 Also, later on if they didn't I was at least kind enough 113 00:05:07,246 --> 00:05:09,026 to not only yell at them with big read text 114 00:05:09,676 --> 00:05:12,526 if there's an error, go ahead and display this div tag 115 00:05:12,526 --> 00:05:14,826 in red color, while we also later 116 00:05:14,826 --> 00:05:17,066 on recall added this little tidbit 117 00:05:17,066 --> 00:05:18,406 over on the right hand side. 118 00:05:18,406 --> 00:05:21,046 These form elements can have value attributes 119 00:05:21,046 --> 00:05:24,286 that can themselves have some string inside of quotes. 120 00:05:24,466 --> 00:05:27,736 And so here we had post bracket quote unquote name, 121 00:05:27,976 --> 00:05:30,086 so that at least if I got my name right I don't need 122 00:05:30,086 --> 00:05:31,336 to redo this. 123 00:05:31,336 --> 00:05:33,996 And this might seem like a simple thing, but just imagine 124 00:05:33,996 --> 00:05:37,856 or take notice in weeks to come just how many websites don't do 125 00:05:38,056 --> 00:05:40,066 these very simple user conveniences. 126 00:05:40,066 --> 00:05:42,886 A very underappreciated feature of programming 127 00:05:42,886 --> 00:05:45,206 and web development is user interface design. 128 00:05:45,206 --> 00:05:47,876 And, frankly, one of the reasons that so many of us are enamored 129 00:05:47,876 --> 00:05:50,676 with things like Android phones and iPhones and the like is 130 00:05:50,676 --> 00:05:53,176 because some companies do actually get this. 131 00:05:53,386 --> 00:05:56,236 And so keep these sorts of things in mind 132 00:05:56,236 --> 00:05:57,486 as you design your own project. 133 00:05:57,816 --> 00:06:00,756 But HTML special chars long though a function name it is 134 00:06:00,866 --> 00:06:02,706 actually does have some compelling use. 135 00:06:02,706 --> 00:06:05,526 Why did we wrap post bracket name 136 00:06:05,816 --> 00:06:08,476 with this function call, html special chars? 137 00:06:09,126 --> 00:06:09,976 What did it do for us? 138 00:06:10,516 --> 00:06:12,546 [ Pause ] 139 00:06:13,046 --> 00:06:13,706 Anyone at all? 140 00:06:13,906 --> 00:06:13,973 Yeah? 141 00:06:14,516 --> 00:06:17,036 [ Inaudible Audience Answer] 142 00:06:17,536 --> 00:06:18,916 >> Yeah, exactly. 143 00:06:19,296 --> 00:06:22,426 So it's to ensure that users can't inject arbitrary code. 144 00:06:22,426 --> 00:06:25,666 In this case they can't type like some HTML tags or, 145 00:06:25,666 --> 00:06:27,676 as we'll soon see, some JavaScript code 146 00:06:27,676 --> 00:06:30,236 that might then accidentally get executed on the browser. 147 00:06:30,406 --> 00:06:32,746 And now, recall the simple example I did by outputting 148 00:06:32,746 --> 00:06:35,406 like a bold tag, open bracket B closed bracket. 149 00:06:35,406 --> 00:06:37,836 It was kind of stupid in that the only one I'm really messing 150 00:06:37,836 --> 00:06:40,386 with on Monday's example was myself, right? 151 00:06:40,386 --> 00:06:44,256 All I did was make the web page look bad and broken to myself. 152 00:06:44,306 --> 00:06:46,486 But recall these things called phishing attacks 153 00:06:46,486 --> 00:06:49,116 and the spams we all get daily where there's links linking 154 00:06:49,116 --> 00:06:53,296 to websites, and recall, too, that forms can be submitted both 155 00:06:53,296 --> 00:06:55,256 by post as well as by GET. 156 00:06:55,256 --> 00:06:59,006 And if you submit a form by a GET that just means every piece 157 00:06:59,006 --> 00:07:00,556 of information is inside of the URL. 158 00:07:01,066 --> 00:07:04,226 So if you can store form submissions inside of a URL 159 00:07:04,376 --> 00:07:07,206 and you can put URLs in emails like spam, well, 160 00:07:07,206 --> 00:07:10,026 you can trick people effectively into submitting forms. 161 00:07:10,026 --> 00:07:12,566 And one of the things you'll be able to compromise potentially 162 00:07:12,836 --> 00:07:15,376 if you don't distrust all user input 163 00:07:15,376 --> 00:07:18,056 in this fashion is peoples' cookies can be stolen, 164 00:07:18,056 --> 00:07:18,856 more on those today. 165 00:07:18,856 --> 00:07:21,086 And what this really means, it means if you're logged 166 00:07:21,086 --> 00:07:23,146 into something sensitive like your Facebook account 167 00:07:23,146 --> 00:07:26,076 or your bank account or the like, potentially some bad guy 168 00:07:26,076 --> 00:07:28,986 who has duped you into clicking some phishing email can log 169 00:07:28,986 --> 00:07:30,706 into your account and then do whatever he 170 00:07:30,706 --> 00:07:33,086 or she wants with your own access. 171 00:07:33,256 --> 00:07:36,826 So this is good practice and necessary practice these days. 172 00:07:37,076 --> 00:07:38,936 But this form was not all that user friendly. 173 00:07:38,936 --> 00:07:41,566 If I go ahead an open up Firefox and I go 174 00:07:41,566 --> 00:07:45,206 to local host slash tilde jharvard froshims5, 175 00:07:45,406 --> 00:07:47,826 we see this form here and I can start to register. 176 00:07:47,826 --> 00:07:50,126 And let me give at least a dorm and click register, 177 00:07:50,586 --> 00:07:52,956 but it was only a partial improvement in terms 178 00:07:52,956 --> 00:07:54,946 of UI, user interface. 179 00:07:54,946 --> 00:07:57,446 Notice I did not pre-populate dorm this time, 180 00:07:57,776 --> 00:07:59,286 and that's kind of annoying, right? 181 00:07:59,286 --> 00:08:00,376 How do we actually do this? 182 00:08:00,376 --> 00:08:01,466 Well, unfortunately, we have to do it 183 00:08:01,466 --> 00:08:02,646 in a slightly different way 184 00:08:02,946 --> 00:08:07,066 because the dorm field recall was implemented not 185 00:08:07,066 --> 00:08:10,126 with an input tag but with something called a select tag. 186 00:08:10,126 --> 00:08:13,056 So let me zoom in on the HTML for dorm here, 187 00:08:13,166 --> 00:08:15,276 and notice that drop down menus are a little different. 188 00:08:15,276 --> 00:08:17,936 Open bracket select, you give the parameter a name, 189 00:08:17,936 --> 00:08:20,476 and then you have a whole bunch of options and values down, 190 00:08:20,476 --> 00:08:23,196 down, down, down the list, and there's a dichotomy here recall. 191 00:08:23,196 --> 00:08:24,686 There's the value of the option 192 00:08:24,686 --> 00:08:26,666 and then also what the human sees. 193 00:08:26,666 --> 00:08:28,806 For simplicity I did exactly the same thing. 194 00:08:28,806 --> 00:08:30,736 I put value equal to what the user sees, 195 00:08:30,956 --> 00:08:32,316 but you could have something different 196 00:08:32,316 --> 00:08:33,856 if you wanted more descriptive text 197 00:08:34,126 --> 00:08:36,456 or in quotes maybe even less descriptive text. 198 00:08:37,006 --> 00:08:39,516 But notice that there's no mention of selection here. 199 00:08:39,516 --> 00:08:40,826 There's no mention of value. 200 00:08:41,046 --> 00:08:43,416 So, of course, when I reload this page or spit 201 00:08:43,416 --> 00:08:45,346 out the HTML again it's going to forget 202 00:08:45,346 --> 00:08:47,066 that Matthews was selected. 203 00:08:47,306 --> 00:08:50,196 Somehow what we need to do here is if I scroll 204 00:08:50,196 --> 00:08:53,716 down to Matthews I need to somehow output something 205 00:08:53,716 --> 00:08:56,386 like this, not just option value equals Matthews 206 00:08:56,596 --> 00:09:01,576 but option selected equals selected value equals Matthews. 207 00:09:01,616 --> 00:09:04,246 Now, if you're thinking this looks stupid this does 208 00:09:04,246 --> 00:09:04,796 look stupid. 209 00:09:04,796 --> 00:09:05,426 The fact that you have 210 00:09:05,426 --> 00:09:08,206 to say selected equals quote unquote selected, this is sort 211 00:09:08,206 --> 00:09:10,336 of a remnant from the early days of HTML 212 00:09:10,606 --> 00:09:13,126 where in the beginning you didn't need 213 00:09:13,126 --> 00:09:16,076 to have values associated with attributes. 214 00:09:16,076 --> 00:09:18,186 You could just have single key words like this. 215 00:09:18,406 --> 00:09:19,556 And, indeed, if people go around 216 00:09:19,556 --> 00:09:22,116 and read HTML references you might still see people doing 217 00:09:22,116 --> 00:09:22,836 things like this. 218 00:09:23,196 --> 00:09:26,786 But the folks decided a few years ago that we need 219 00:09:26,786 --> 00:09:28,786 to at least start standardizing the syntax, 220 00:09:29,026 --> 00:09:30,396 and for any of those anomalies 221 00:09:30,396 --> 00:09:31,976 where they were just single key words it's going 222 00:09:31,976 --> 00:09:34,786 to be the same key words equals quote unquote itself. 223 00:09:34,956 --> 00:09:37,516 So that is just the way things are for better or for worse. 224 00:09:37,786 --> 00:09:40,466 But this is an uninteresting detail intellectually, 225 00:09:40,726 --> 00:09:43,306 but programmatically now how do we generate 226 00:09:43,456 --> 00:09:47,206 that string among all of these options selectively? 227 00:09:47,246 --> 00:09:49,536 In other words, as I'm spitting out or generating 228 00:09:49,536 --> 00:09:52,936 in my PHP code this drop down menu, I need to pause 229 00:09:52,936 --> 00:09:54,846 with some kind of if condition or branch and say, 230 00:09:54,846 --> 00:09:57,936 wait a minute, if this is what the user selected I need 231 00:09:57,936 --> 00:10:00,766 to re-select this for him or her by spitting 232 00:10:00,766 --> 00:10:02,606 out precisely that HTML. 233 00:10:03,116 --> 00:10:06,596 So let's take a look at version 6 then here frosh IMs. 234 00:10:06,596 --> 00:10:09,226 So notice at the very top we have a bunch of comments 235 00:10:09,226 --> 00:10:11,526 as before, but I've introduced a new feature 236 00:10:11,526 --> 00:10:12,966 that you might have seen already in pset7. 237 00:10:13,066 --> 00:10:17,336 That of arrays that can be declared using literally a 238 00:10:17,336 --> 00:10:18,896 function called array. 239 00:10:19,016 --> 00:10:21,556 So, frankly, this is kind of a nasty piece of syntax in PHP 240 00:10:21,556 --> 00:10:24,716 that you can't just declare an array with square brackets 241 00:10:24,716 --> 00:10:27,496 or some other syntax as we even could in C, but instead you have 242 00:10:27,496 --> 00:10:29,866 to literally call a function called array. 243 00:10:30,146 --> 00:10:32,066 But, again, this is simply the way it is. 244 00:10:32,386 --> 00:10:34,996 So dollar sign dorms I've capitalized just 245 00:10:34,996 --> 00:10:39,016 to convey the idea here that this is a global array 246 00:10:39,016 --> 00:10:40,356 that I'm going to be using everywhere. 247 00:10:40,616 --> 00:10:43,796 But it's just a variable storing this array of elements. 248 00:10:44,096 --> 00:10:47,426 So PHP supports normal arrays, bracket zero, 249 00:10:47,426 --> 00:10:48,506 bracket one, bracket two. 250 00:10:48,506 --> 00:10:51,356 So this is not an associative array, this is just an array. 251 00:10:51,556 --> 00:10:53,956 And I've hit enter on every line just to keep it more readable. 252 00:10:54,346 --> 00:10:56,826 So notice at the end I have close paren, 253 00:10:56,966 --> 00:10:58,996 semicolon, end of function call. 254 00:10:59,206 --> 00:11:00,916 So stored in this moment in time then 255 00:11:01,206 --> 00:11:03,656 in this variable is that whole array. 256 00:11:03,746 --> 00:11:05,546 So this is all copy paste from earlier. 257 00:11:05,736 --> 00:11:08,356 I'm just making sure the user has actually submitted all 258 00:11:08,356 --> 00:11:09,456 of the fields I care about. 259 00:11:09,746 --> 00:11:12,616 So let's see now how I'm generating the HTML. 260 00:11:12,846 --> 00:11:15,266 Well, if I scroll down here notice that captain 261 00:11:15,266 --> 00:11:16,766 and gender are actually the same, 262 00:11:17,026 --> 00:11:20,166 but look how much more elegant now the dorm generation is. 263 00:11:20,626 --> 00:11:22,136 So I have a new construct 264 00:11:22,136 --> 00:11:24,706 that we predicted would come a couple days ago 265 00:11:24,706 --> 00:11:26,236 which is the four each construct. 266 00:11:26,426 --> 00:11:29,086 I have this inside of open bracket question mark which says 267 00:11:29,086 --> 00:11:31,206 to the web serve here's some PHP code, 268 00:11:31,206 --> 00:11:33,706 don't spit this out, interpret it instead. 269 00:11:34,046 --> 00:11:37,316 The syntax for this loop is four each variable name 270 00:11:37,566 --> 00:11:39,566 as other variable names. 271 00:11:39,566 --> 00:11:42,376 So the array comes first and then a temporary variable. 272 00:11:42,596 --> 00:11:44,636 We could call this anything we want, but you might 273 00:11:44,636 --> 00:11:46,516 as well choose something that's a little more friendly. 274 00:11:46,786 --> 00:11:49,086 And then notice the colon here, and this is important 275 00:11:49,386 --> 00:11:53,766 because the moment you close PHP mode here stuff is just going 276 00:11:53,766 --> 00:11:55,146 to start getting spit out raw. 277 00:11:55,146 --> 00:11:57,646 So you need to just be clear to PHP 278 00:11:57,646 --> 00:12:00,456 that what follows is actually inside 279 00:12:00,456 --> 00:12:02,496 of conceptually this loop. 280 00:12:02,496 --> 00:12:06,616 And notice the opposite of this is the somewhat verbosely named 281 00:12:06,756 --> 00:12:08,106 endfoureach with no space. 282 00:12:08,676 --> 00:12:10,856 So alternatively you could do something C-style. 283 00:12:10,886 --> 00:12:13,166 You could actually say, well, open curly brace 284 00:12:13,616 --> 00:12:15,276 and then close curly brace here, 285 00:12:15,606 --> 00:12:18,176 so this would actually be fine as well. 286 00:12:18,416 --> 00:12:21,166 It just, hmm, looks a little uglier perhaps. 287 00:12:21,166 --> 00:12:23,716 But either style is fine so long as you are consistent. 288 00:12:23,996 --> 00:12:24,956 Then what are we doing here? 289 00:12:24,956 --> 00:12:27,566 Well, notice this is not inside of PHP mode. 290 00:12:27,566 --> 00:12:29,026 So that means if we're inside 291 00:12:29,026 --> 00:12:32,026 of this loop this stuff is just going to spit out literally. 292 00:12:32,026 --> 00:12:35,006 So open bracket, option, value equals quote, 293 00:12:35,446 --> 00:12:40,006 and then we get back into PHP mode and then we exit PHP mode, 294 00:12:40,186 --> 00:12:42,976 close quote, close bracket, open PHP mode 295 00:12:43,166 --> 00:12:44,456 and we spit out dorm again. 296 00:12:44,516 --> 00:12:48,246 Recall the redundancy of the way I structured this, end PHP mode 297 00:12:48,456 --> 00:12:50,516 and then raw HTML again. 298 00:12:51,036 --> 00:12:52,266 But I'm not doing something here. 299 00:12:52,266 --> 00:12:53,946 I'm kind of skipping a step. 300 00:12:53,946 --> 00:12:56,586 I thought I'd just insist it was important 301 00:12:56,816 --> 00:12:58,846 which is I'm not escaping dorm here. 302 00:12:59,536 --> 00:12:59,916 Why not? 303 00:13:00,516 --> 00:13:04,556 [ Pause ] 304 00:13:05,056 --> 00:13:05,706 What's that? 305 00:13:06,031 --> 00:13:08,031 [ Inaudible Audience Answer ] 306 00:13:08,046 --> 00:13:08,316 >> Yeah, that's it. 307 00:13:08,316 --> 00:13:09,166 It's as simple as that, right? 308 00:13:09,166 --> 00:13:12,426 If I am the one who created this list fo elements I don't really 309 00:13:12,426 --> 00:13:13,826 need to call a function 310 00:13:13,826 --> 00:13:16,406 and incur the slight computational cost 311 00:13:16,636 --> 00:13:18,346 of actually executing a function just 312 00:13:18,346 --> 00:13:20,816 to escape a string that I myself wrote. 313 00:13:20,816 --> 00:13:22,296 Now, if you really wanted to be paranoid 314 00:13:22,296 --> 00:13:24,336 and not even trust yourself or the person you're working 315 00:13:24,336 --> 00:13:25,576 with you could certainly do this. 316 00:13:25,736 --> 00:13:28,226 But realize the distinction here is as simple as, well, 317 00:13:28,226 --> 00:13:30,916 this array was from me not from the user. 318 00:13:31,086 --> 00:13:34,626 So now we have the ability to dynamically general this list. 319 00:13:34,626 --> 00:13:39,066 We're not doing any kind of ifs or elfs yet to actually see 320 00:13:39,066 --> 00:13:41,076 which the user has already submitted, 321 00:13:41,226 --> 00:13:43,256 so the end result is actually going to be pretty similar. 322 00:13:43,256 --> 00:13:47,066 If I go to froshim6.php in my browser, dammit, 323 00:13:47,896 --> 00:13:50,286 that's the mistake David forgot to fix since Monday, 324 00:13:50,376 --> 00:13:54,016 so we're going to cheat and copy this and put this over here, 325 00:13:54,076 --> 00:13:57,616 and how about by next week we'll fix that problem just 326 00:13:57,616 --> 00:13:59,556 like the WiFi question I keep forgetting to ask. 327 00:13:59,796 --> 00:14:02,726 [Laughter] So let's go that was not intended 328 00:14:02,726 --> 00:14:03,776 to be a running gag. 329 00:14:04,076 --> 00:14:07,826 Alright, so here we go, and now I messed 330 00:14:07,826 --> 00:14:09,126 up my formatting so that's alright. 331 00:14:09,426 --> 00:14:12,446 So, here we go, simulate the correctness. 332 00:14:12,796 --> 00:14:15,926 [Laughter] Do as I say, not as I do. 333 00:14:16,396 --> 00:14:19,146 Alright, so now I can go ahead and type David 334 00:14:19,146 --> 00:14:21,126 and I can click register, and I'm actually still going 335 00:14:21,126 --> 00:14:23,486 to get yelled at here, but the menu is still the same. 336 00:14:23,486 --> 00:14:25,716 It's not pre-populated yet, but if I actually look 337 00:14:25,716 --> 00:14:28,216 at the page source by right clicking or control clicking, 338 00:14:28,416 --> 00:14:31,066 I at least see the HTML that I saw before. 339 00:14:31,156 --> 00:14:33,056 Now, there's a little more white space this time 340 00:14:33,056 --> 00:14:34,266 and some weird indentation. 341 00:14:34,376 --> 00:14:35,546 And, again, this is not a big deal. 342 00:14:35,546 --> 00:14:37,586 Your output does not need to be pretty printed, 343 00:14:37,586 --> 00:14:40,396 but your actual PHP code and HTML you write should be, 344 00:14:40,676 --> 00:14:42,146 but this is now machine generated. 345 00:14:42,326 --> 00:14:44,226 And, in fact, if you start looking at the source code 346 00:14:44,226 --> 00:14:46,646 of most websites you'll see patterns like this 347 00:14:46,646 --> 00:14:48,466 where it's all indented identically. 348 00:14:48,466 --> 00:14:50,506 And that's not because a human necessarily did that. 349 00:14:50,706 --> 00:14:53,336 It's because there's some programming code, some PHP 350 00:14:53,406 --> 00:14:56,276 or whatnot that's spitting this out in some kind of loop. 351 00:14:56,506 --> 00:14:59,246 So I need to be able to ask myself the question if what I'm 352 00:14:59,246 --> 00:15:01,586 about to spit out in this drop down menu is equal 353 00:15:01,586 --> 00:15:04,006 to what the user typed in or selected, 354 00:15:04,296 --> 00:15:07,026 I need to reselect it for him or her. 355 00:15:07,296 --> 00:15:10,706 So let's go ahead and open up froshim7 now. 356 00:15:10,826 --> 00:15:13,466 And we see the same array of dorms up top. 357 00:15:13,706 --> 00:15:16,226 We see the same error checking code at top. 358 00:15:16,276 --> 00:15:18,776 And if I scroll down now notice 359 00:15:18,776 --> 00:15:20,506 that I've done something slightly different, 360 00:15:20,826 --> 00:15:23,176 and realize there's a bunch of different ways to do this, 361 00:15:23,176 --> 00:15:25,816 and you'll see different ways in section and in online tutorials. 362 00:15:26,126 --> 00:15:27,296 But here notice that just 363 00:15:27,296 --> 00:15:30,876 to keep thing prettier I've only entered PHP mode once 364 00:15:30,876 --> 00:15:33,196 at the very top, and then I'm closing it down here, 365 00:15:33,496 --> 00:15:35,876 just because, frankly, if you start going in and out and in 366 00:15:35,876 --> 00:15:37,906 and out of PHP mode it just gets hard to read. 367 00:15:38,136 --> 00:15:40,596 So I decided to write this all in one big fore loop 368 00:15:40,766 --> 00:15:43,546 so we have the same looping structure for each dorms as dorm 369 00:15:43,796 --> 00:15:46,106 and then, frankly, this is week one stuff again. 370 00:15:46,106 --> 00:15:47,766 It's a different language but same idea. 371 00:15:47,976 --> 00:15:51,346 If what the user submitted which recall is stored inside 372 00:15:51,346 --> 00:15:54,186 of a special super global variable called dollar sign, 373 00:15:54,186 --> 00:15:58,746 underscore, post, equals, equals the value of the current dorm 374 00:15:58,856 --> 00:16:02,046 as we are looping through all of them will then go ahead and spit 375 00:16:02,046 --> 00:16:05,416 out literally option selected equals quote unquote selected, 376 00:16:05,496 --> 00:16:09,076 value equals quote unquote dorm, then dorm, then closed option, 377 00:16:09,136 --> 00:16:11,316 closed quote, semicolon. 378 00:16:11,456 --> 00:16:12,316 And now notice this. 379 00:16:12,316 --> 00:16:13,516 This is C stuff also. 380 00:16:13,516 --> 00:16:15,666 If I've got double quotes on the outside I have 381 00:16:15,666 --> 00:16:17,426 to have single quotes on the inside. 382 00:16:17,526 --> 00:16:22,156 Or, what else could I do to avoid confusing the interpreter 383 00:16:22,156 --> 00:16:24,236 by having weird quotes in the middle of quotes? 384 00:16:25,516 --> 00:16:26,826 You could escape it, right? 385 00:16:26,826 --> 00:16:29,846 When in doubt you could do something like backslash quote 386 00:16:30,046 --> 00:16:31,796 and that would also get the job done as well. 387 00:16:31,796 --> 00:16:33,466 Frankly, it's just a little harder to read so you might 388 00:16:33,466 --> 00:16:35,646 as well toggle in and out of single quotes. 389 00:16:35,696 --> 00:16:37,766 Else, if the user did not type 390 00:16:37,766 --> 00:16:40,576 in the current element just go ahead and spit this out instead. 391 00:16:40,926 --> 00:16:43,976 So, again, week one stuff sort of updated for PHP. 392 00:16:43,976 --> 00:16:45,536 So let's now go back to the browser. 393 00:16:45,786 --> 00:16:46,846 This is version 7. 394 00:16:46,846 --> 00:16:49,096 Let me go ahead and open up froshim7.php, 395 00:16:49,096 --> 00:16:53,056 and let me go ahead and learn from my past mistakes. 396 00:16:53,756 --> 00:16:55,496 Let's go back to 6. 397 00:16:56,696 --> 00:17:03,026 Alright, uh huh, alright, alright, 398 00:17:03,276 --> 00:17:05,456 and reload, problem solved. 399 00:17:05,456 --> 00:17:07,666 David was from Matthews. 400 00:17:07,666 --> 00:17:10,666 I'm going to skip gender and captain, register. 401 00:17:10,666 --> 00:17:13,216 I'm yelled at but notice what did not break this time. 402 00:17:13,216 --> 00:17:16,246 Now Matthews is preselected, and if I go into my source code, 403 00:17:16,246 --> 00:17:18,866 view page, source, scroll down, here viola, 404 00:17:19,076 --> 00:17:20,856 it's not being formatted as nicely now 405 00:17:20,856 --> 00:17:21,656 because I'm not printing 406 00:17:21,656 --> 00:17:23,236 out white space but that's fine, too. 407 00:17:23,236 --> 00:17:24,806 In fact, we're saving some bytes this way. 408 00:17:25,086 --> 00:17:27,636 But if I scroll right, right, right, right, right to Matthews, 409 00:17:27,886 --> 00:17:29,046 notice that Matthews does look 410 00:17:29,046 --> 00:17:30,606 at little different and that's it. 411 00:17:30,856 --> 00:17:32,576 So, again, we're really using PHP now 412 00:17:32,576 --> 00:17:35,236 to dynamically generate HTML to structure 413 00:17:35,236 --> 00:17:38,006 and stylize our page as we see fit. 414 00:17:39,416 --> 00:17:39,976 Any questions? 415 00:17:40,516 --> 00:17:44,426 [ Pause ] 416 00:17:44,926 --> 00:17:45,396 . Anything at all? 417 00:17:45,396 --> 00:17:46,366 Alright. Yeah? 418 00:17:46,766 --> 00:17:48,716 >> Did you already explain what echo meant? 419 00:17:48,716 --> 00:17:49,926 >> Oh, did I explain what echo meant? 420 00:17:49,926 --> 00:17:50,796 No, so sorry. 421 00:17:50,796 --> 00:17:52,406 I took for granted what echo meant. 422 00:17:52,406 --> 00:17:55,036 And it really means kind of what the word suggests, 423 00:17:55,036 --> 00:17:56,076 echo this literally. 424 00:17:56,076 --> 00:17:58,566 So it's actually synonymous with saying print. 425 00:17:59,036 --> 00:18:02,506 However, echo is just a built in option in PHP 426 00:18:02,506 --> 00:18:04,966 so I could either say print and print this out, 427 00:18:05,106 --> 00:18:08,046 and let me scroll back to the left, or you can say echo. 428 00:18:08,536 --> 00:18:09,796 They're pretty much equivalent. 429 00:18:09,936 --> 00:18:12,496 And, in fact, you don't actually need the parenthesis 430 00:18:12,496 --> 00:18:15,156 for the echo call which is perhaps useful for some folks. 431 00:18:15,706 --> 00:18:16,526 So good question there. 432 00:18:17,186 --> 00:18:17,756 Other questions? 433 00:18:17,756 --> 00:18:23,816 >> Where did you introduce the lower case dorm variable? 434 00:18:23,816 --> 00:18:24,506 >> Ah, good question. 435 00:18:24,506 --> 00:18:26,486 Where did I introduce the lower case dorm variable? 436 00:18:26,486 --> 00:18:29,096 It was implicitly declared inside 437 00:18:29,096 --> 00:18:30,666 of the four each construct itself. 438 00:18:30,926 --> 00:18:33,426 So the moment I mention it after the key word 439 00:18:33,656 --> 00:18:36,216 as it becomes it comes into existence, 440 00:18:36,266 --> 00:18:37,776 and it automatically gets updated 441 00:18:37,776 --> 00:18:39,576 on every iteration of the loop. 442 00:18:40,536 --> 00:18:43,306 Alright, so recall that where we left off on Monday was 443 00:18:43,306 --> 00:18:45,986 with this slightly more exciting example 444 00:18:45,986 --> 00:18:49,436 in that we generated emails automatically for froshims8. 445 00:18:49,436 --> 00:18:52,786 And if we scroll down here notice that we see the same HTML 446 00:18:53,046 --> 00:18:56,916 because this form submits to register 8, and recall oh, 447 00:18:56,916 --> 00:18:58,526 sorry, this was not the email example. 448 00:18:58,526 --> 00:19:00,936 This was instead the database example. 449 00:19:01,196 --> 00:19:03,446 So recall what we actually did with the data here. 450 00:19:03,446 --> 00:19:06,106 So let's focus now on the top 451 00:19:06,106 --> 00:19:07,476 which was just some error checking, 452 00:19:07,726 --> 00:19:09,976 and below that was the database connection. 453 00:19:09,976 --> 00:19:11,556 So there's a couple of things going on here. 454 00:19:11,556 --> 00:19:13,136 We mentioned when we first started talking 455 00:19:13,136 --> 00:19:16,346 about the internet that a single server can certainly do multiple 456 00:19:16,346 --> 00:19:17,146 things these days. 457 00:19:17,146 --> 00:19:18,706 It can be a web server, email server, 458 00:19:18,706 --> 00:19:19,826 instant messaging server. 459 00:19:20,156 --> 00:19:23,046 And so when a packet of data arrives at some server 460 00:19:23,046 --> 00:19:26,186 on the internet how does the server know if it is an email 461 00:19:26,306 --> 00:19:28,776 or web page request or instant message? 462 00:19:29,886 --> 00:19:31,086 What was the underlying technology 463 00:19:31,086 --> 00:19:32,866 that answered questions of that form? 464 00:19:33,446 --> 00:19:33,536 Yeah? 465 00:19:34,416 --> 00:19:34,816 >> Port number. 466 00:19:35,496 --> 00:19:37,686 >> Yeah, so it was a port number, right? 467 00:19:37,686 --> 00:19:40,946 We talked briefly and we had this fun video talking about IP 468 00:19:40,946 --> 00:19:45,106 and TCP, and TCP was simply the protocol that says, one, 469 00:19:45,166 --> 00:19:47,436 if data gets dropped somewhere along the way, 470 00:19:47,436 --> 00:19:49,346 TCP is responsible for re-sending it. 471 00:19:49,616 --> 00:19:52,516 But TCP was also responsible for assigning a number 472 00:19:52,746 --> 00:19:56,076 to most services on the internet, like web is 80 473 00:19:56,246 --> 00:20:00,446 by convention and also 443 which is the SSL or HTTPS version, 474 00:20:00,446 --> 00:20:05,036 more on that today, 25 is email, 22 is something called SSH, 475 00:20:05,036 --> 00:20:07,366 21 is something called FTP and the like. 476 00:20:07,626 --> 00:20:09,726 So upon receiving some packet and seeing, okay, 477 00:20:09,726 --> 00:20:12,906 this is from my IP address and it's for port 80, 478 00:20:13,096 --> 00:20:15,336 the web server knows that this is indeed for him. 479 00:20:15,586 --> 00:20:19,276 But in this case there's another something running on the server, 480 00:20:19,276 --> 00:20:21,466 in this case the appliance, but it could be an actual server 481 00:20:21,466 --> 00:20:24,016 on the internet, and that's something called a MySQL server, 482 00:20:24,016 --> 00:20:26,486 a database server whose purpose in life is to listen 483 00:20:26,486 --> 00:20:29,466 for connections, listen for request for databases, 484 00:20:29,466 --> 00:20:31,866 and then listen for things like insert and delete 485 00:20:31,866 --> 00:20:33,116 and select and the like. 486 00:20:33,446 --> 00:20:37,196 So if we look here at these first two lines the first line 487 00:20:37,196 --> 00:20:41,466 is literally doing that, having my PHP code open in connection, 488 00:20:41,506 --> 00:20:42,526 a network connection 489 00:20:42,526 --> 00:20:44,986 to the local host using user name jharvard 490 00:20:45,066 --> 00:20:46,166 and password crimson. 491 00:20:46,486 --> 00:20:49,046 In theory the server could be elsewhere on the internet, 492 00:20:49,376 --> 00:20:51,896 but because MySQL traffic is not encrypted 493 00:20:51,896 --> 00:20:54,816 by default it's not a good idea to try to connect 494 00:20:54,886 --> 00:20:57,186 to a database server elsewhere on the internet 495 00:20:57,406 --> 00:20:59,456 from your own web server, so they're usually 496 00:20:59,456 --> 00:21:01,516 in the same company, in the same building or the like, 497 00:21:01,786 --> 00:21:04,376 or literally on the same machine as is the case here. 498 00:21:04,686 --> 00:21:08,016 So what we next do is we select a very specific database. 499 00:21:08,156 --> 00:21:10,306 This is like opening a specific Excel file 500 00:21:10,306 --> 00:21:12,566 on your desktop even though you might have multiple ones. 501 00:21:12,906 --> 00:21:14,566 And then we have to get into this habit 502 00:21:14,816 --> 00:21:18,516 of avoiding dangerous input, scrubbing input 503 00:21:18,516 --> 00:21:19,996 or error checking really. 504 00:21:20,266 --> 00:21:23,006 So notice I'm calling this very long named function 505 00:21:23,246 --> 00:21:26,286 which really does something as simple as any time it sees 506 00:21:26,286 --> 00:21:29,526 like a quote mark it makes it backslash quote mark 507 00:21:29,526 --> 00:21:30,706 and a couple of other things. 508 00:21:30,706 --> 00:21:34,056 Very simple things so that you're not accidentally tricked 509 00:21:34,056 --> 00:21:37,726 into executing a delete statement or an update statement 510 00:21:37,726 --> 00:21:39,436 or something potentially dangerous. 511 00:21:39,756 --> 00:21:42,296 I don't have to bother scrubbing the captain input, 512 00:21:42,496 --> 00:21:44,246 because remember captain is a check box 513 00:21:44,246 --> 00:21:46,696 and so its value is going to be nothing at all or it's going 514 00:21:46,696 --> 00:21:47,846 to be quote unquote on, 515 00:21:48,156 --> 00:21:50,336 so I'm not even passing the user's input literally. 516 00:21:50,386 --> 00:21:55,306 I'm just checking implicitly if the captain field has something 517 00:21:55,426 --> 00:21:57,786 in it go ahead and set captain to one, 518 00:21:58,026 --> 00:21:59,486 else set captain to zero. 519 00:21:59,486 --> 00:22:00,086 And then I'm just going 520 00:22:00,086 --> 00:22:02,336 to literally insert my own one or zero. 521 00:22:02,606 --> 00:22:04,486 Gender I do want to scrub 522 00:22:04,486 --> 00:22:06,866 because the user is submitting either M or F, 523 00:22:07,386 --> 00:22:08,926 and I don't want them potentially 524 00:22:08,926 --> 00:22:10,646 to submit something other than those. 525 00:22:10,646 --> 00:22:14,426 And then here, too, with MySQL real escape string I'm making 526 00:22:14,426 --> 00:22:17,766 sure that the dorm the user has submitted is also legitimate. 527 00:22:18,166 --> 00:22:19,946 Now, why is this even a concern? 528 00:22:19,946 --> 00:22:22,976 If we go to the web page, right, the only values I can choose 529 00:22:22,976 --> 00:22:24,916 from are those here in the drop down. 530 00:22:24,916 --> 00:22:27,146 But we saw very simple examples of this the other day. 531 00:22:27,146 --> 00:22:29,066 If I right click or control click on this, 532 00:22:29,456 --> 00:22:32,166 recall that there's this inspect element option thanks 533 00:22:32,166 --> 00:22:33,996 to a free plugin called Firebug. 534 00:22:34,276 --> 00:22:35,636 This lets me do a bunch of things. 535 00:22:35,636 --> 00:22:38,066 For one thing it, remember, cleans up your HTML, 536 00:22:38,066 --> 00:22:42,266 makes it nicely indented and you can expand and collapse it 537 00:22:42,266 --> 00:22:43,656 which just makes it user friendly 538 00:22:43,656 --> 00:22:44,916 to navigate and poke around. 539 00:22:45,186 --> 00:22:47,496 But you can literally change things in the web page, 540 00:22:47,496 --> 00:22:49,726 not permanently but on my own computer. 541 00:22:49,726 --> 00:22:53,136 So I could actually change Matthews to be something 542 00:22:53,566 --> 00:22:57,706 like University Hall is where I live, and we'll change this 543 00:22:57,706 --> 00:23:02,556 to be University Hall, and then I go back here and, viola, 544 00:23:02,556 --> 00:23:03,986 I've changed the form. 545 00:23:04,186 --> 00:23:06,046 So now all I have to do is click submit, 546 00:23:06,046 --> 00:23:09,066 and even though the web server did not design 547 00:23:09,296 --> 00:23:11,566 for University Hall to be in this list, what's going 548 00:23:11,566 --> 00:23:13,896 to get submitted is University Hall. 549 00:23:14,056 --> 00:23:15,756 Now, this is kind of innocuous and silly, 550 00:23:15,876 --> 00:23:18,716 but what if I instead did something in my HTML like, well, 551 00:23:18,716 --> 00:23:22,626 not University Hall but maybe something like a quote mark 552 00:23:22,626 --> 00:23:29,806 and then delete from users enter, now the quote mark, 553 00:23:29,806 --> 00:23:31,006 let's get rid of the quote mark 554 00:23:31,006 --> 00:23:33,846 because I just broke my own HTML attack, 555 00:23:34,446 --> 00:23:37,136 so let's just simulate it with delete, okay? 556 00:23:37,236 --> 00:23:38,856 So this is an oversimplification. 557 00:23:38,856 --> 00:23:41,186 It's not sufficient just to send delete to the server, 558 00:23:41,286 --> 00:23:43,926 but it's this easy to actually change the web page. 559 00:23:43,926 --> 00:23:47,136 And, frankly, real hackers don't go using these free gooey tools 560 00:23:47,136 --> 00:23:48,076 and changing the HTML. 561 00:23:48,316 --> 00:23:50,616 You would actually write a program, a little PHP script 562 00:23:50,616 --> 00:23:54,446 like we did on Monday for doing spell checking in PHP, 563 00:23:54,446 --> 00:23:57,336 and you can simulate being a web browser. 564 00:23:57,516 --> 00:24:01,166 In fact, recall that the survey from problem set five 565 00:24:01,166 --> 00:24:03,496 where we asked you guys to gripe about things 566 00:24:03,816 --> 00:24:05,866 that could be better on campus, 567 00:24:05,866 --> 00:24:10,006 one of the top contenders was this website here. 568 00:24:10,006 --> 00:24:11,966 Wait, let's go back to a different browser here, 569 00:24:12,996 --> 00:24:15,496 let's use Chrome, was this guy here. 570 00:24:16,806 --> 00:24:20,416 It seems they have a whole lot of information on this site 571 00:24:20,416 --> 00:24:22,836 when really all of us apparently only care about like what's 572 00:24:22,886 --> 00:24:24,896 in the menu, and even getting there sometimes takes 573 00:24:24,896 --> 00:24:25,606 multiple clicks. 574 00:24:26,016 --> 00:24:28,606 But there's a lot of news going on at HUD's right now. 575 00:24:28,736 --> 00:24:32,556 But in any case we have here we go, case in point, 576 00:24:32,556 --> 00:24:35,766 this week's menu, hot entrees, okay. 577 00:24:35,766 --> 00:24:36,676 So here is the menu, 578 00:24:36,786 --> 00:24:38,626 and unfortunately this is hosted I think 579 00:24:38,626 --> 00:24:41,266 by some third party product that maybe Harvard has paid for. 580 00:24:41,416 --> 00:24:44,096 This is all HTML, and this menu changes every day 581 00:24:44,096 --> 00:24:46,346 because presumably HUDs has their own database. 582 00:24:46,636 --> 00:24:50,306 And so we could, frankly, as humans just say, alright, well, 583 00:24:50,306 --> 00:24:51,866 I'm going to highlight and copy this, 584 00:24:51,866 --> 00:24:53,416 I'm going to paste this into my own database. 585 00:24:53,416 --> 00:24:55,016 Highlight and copy turkey noodle soup, 586 00:24:55,276 --> 00:24:57,776 paste this into my own database or your Excel spreadsheet. 587 00:24:57,776 --> 00:24:59,836 And, frankly, if you've ever done a research project 588 00:25:00,116 --> 00:25:03,336 or even something for a student group there's probably some very 589 00:25:03,336 --> 00:25:06,046 tedious process you've done at some point involving the web 590 00:25:06,346 --> 00:25:08,286 that could have ideally been automated. 591 00:25:08,526 --> 00:25:11,376 And so what we actually do for CS50 is if you go 592 00:25:11,376 --> 00:25:15,126 to manual.cs50.net/apis 593 00:25:15,176 --> 00:25:19,506 for application programming interfaces, 594 00:25:19,796 --> 00:25:21,696 you'll see that CS50 has a whole bunch of APIs. 595 00:25:21,696 --> 00:25:24,596 API we'll talk more about next week but, again, 596 00:25:24,596 --> 00:25:28,226 it's a way of interfacing your code and your programs 597 00:25:28,466 --> 00:25:30,666 with someone else's code or someone else's data. 598 00:25:31,006 --> 00:25:33,106 And so every year we get very common requests 599 00:25:33,106 --> 00:25:36,126 for how can I make something related to the course catalog 600 00:25:36,126 --> 00:25:38,756 or events on campus or food or maps or news 601 00:25:38,756 --> 00:25:40,436 or tweets and the like. 602 00:25:40,696 --> 00:25:44,156 So what we as a course have done is created an API via 603 00:25:44,156 --> 00:25:47,176 which you can query CS50's server and get information 604 00:25:47,176 --> 00:25:49,786 like what's on the dining hall menu today or tomorrow 605 00:25:49,786 --> 00:25:52,296 or next week or what was it a year ago today? 606 00:25:52,296 --> 00:25:53,636 What's the nutritional content? 607 00:25:53,946 --> 00:25:57,806 And so if you go to the HUD's the Harvard food API from CS50 608 00:25:57,806 --> 00:26:01,186 and scroll down you'll see that there's a lot 609 00:26:01,186 --> 00:26:02,396 of detail on how to do this. 610 00:26:02,736 --> 00:26:03,416 But if you're familiar 611 00:26:03,416 --> 00:26:06,616 with Excel files you might also be familiar with CSV files, 612 00:26:06,716 --> 00:26:09,196 comma separated values files. 613 00:26:09,506 --> 00:26:11,646 These are just sort of simple spreadsheets. 614 00:26:11,936 --> 00:26:13,126 And so what you can do 615 00:26:13,126 --> 00:26:14,986 by visiting a certain URL that's there 616 00:26:14,986 --> 00:26:17,986 on the top right is you can provide a specific date. 617 00:26:18,516 --> 00:26:21,896 So in this case here, let me scroll down to the menu, 618 00:26:21,896 --> 00:26:24,896 not the nutritional one, so notice here 619 00:26:24,896 --> 00:26:26,836 that we have told folks that if you want 620 00:26:26,836 --> 00:26:32,636 to get the breakfast menu for date of March 21, 2011, 621 00:26:32,816 --> 00:26:37,126 you can literally visit this URL on CS50's server, food.cs50.net, 622 00:26:37,296 --> 00:26:40,666 and what you will get back effectively is a big CSV file, 623 00:26:40,666 --> 00:26:41,486 an Excel spreadsheet. 624 00:26:41,486 --> 00:26:44,136 And this is just what we're doing for problem set seven 625 00:26:44,336 --> 00:26:47,246 when we grab data from Yahoo Finance. 626 00:26:47,246 --> 00:26:48,496 You're getting data back like this. 627 00:26:48,766 --> 00:26:51,616 But unfortunately HUDs does not provide 628 00:26:51,616 --> 00:26:53,836 like Yahoo does a little download link. 629 00:26:53,836 --> 00:26:56,376 So what we as a course had to do was, well, 630 00:26:56,376 --> 00:26:59,106 we opened up this page and we looked at view page source, 631 00:26:59,616 --> 00:27:02,406 and we ignored all the distractions up at top, 632 00:27:02,406 --> 00:27:03,946 and we started looking for patterns. 633 00:27:04,376 --> 00:27:08,406 And once we found a pattern like this feels like patterns, 634 00:27:08,406 --> 00:27:10,456 let's actually use control f, so let's look 635 00:27:10,456 --> 00:27:16,516 for chipotle corn bisque, control f, alright there it is. 636 00:27:16,866 --> 00:27:20,386 So here is the HTML in HUD's website, and notice it's 637 00:27:20,386 --> 00:27:22,276 in a whole bunch of text, some of which we've seen, 638 00:27:22,276 --> 00:27:23,036 some of which we haven't. 639 00:27:23,036 --> 00:27:25,216 There's some div tags, some image tags and the like. 640 00:27:25,586 --> 00:27:29,146 But, my God, like this is how the data is actually presented 641 00:27:29,146 --> 00:27:29,816 on the internet. 642 00:27:29,816 --> 00:27:31,246 So long story short what we did 643 00:27:31,246 --> 00:27:32,706 as a course is we wrote something called a 644 00:27:32,706 --> 00:27:33,546 screen scraper. 645 00:27:33,776 --> 00:27:35,936 This is, frankly, a tool of last resort, 646 00:27:36,066 --> 00:27:40,016 but we have a program running on cs50.net that every day pretends 647 00:27:40,016 --> 00:27:43,016 to be a browser like Firefox, goes to HUD's website, 648 00:27:43,246 --> 00:27:45,716 downloads the HTML and it throws away the images 649 00:27:45,716 --> 00:27:47,066 and uninteresting stuff like that, 650 00:27:47,306 --> 00:27:50,296 and then we quote unquote parse all of this HTML. 651 00:27:50,296 --> 00:27:53,796 We look for TD tags, input tags, div tags, and then we look 652 00:27:53,796 --> 00:27:56,616 for patterns like, oh, this looks like a piece of food. 653 00:27:56,836 --> 00:27:58,756 And then we associate that back with the date 654 00:27:58,756 --> 00:27:59,916 and the meal and so forth. 655 00:27:59,916 --> 00:28:02,346 And long story short we scrape all of this data 656 00:28:02,346 --> 00:28:05,386 into our own database so that we can re-expose this. 657 00:28:05,666 --> 00:28:09,176 Now, what's the relevance then to what we just did 658 00:28:09,176 --> 00:28:10,496 with the simple form submission? 659 00:28:10,706 --> 00:28:14,056 Well, it's very, very easy to pretend to be a browser. 660 00:28:14,056 --> 00:28:17,056 All you have to do is understand HTTP, understand HTML 661 00:28:17,056 --> 00:28:18,436 and you can simulate all of this. 662 00:28:18,766 --> 00:28:21,016 In fact, for final projects every year we always have folks 663 00:28:21,016 --> 00:28:22,786 who want to grab like sports scores 664 00:28:22,786 --> 00:28:24,526 from ESPN.com or the like. 665 00:28:24,806 --> 00:28:27,596 So realize that in manual.cs50.net there's a long 666 00:28:27,596 --> 00:28:30,476 article on how you can write your own screen scraper 667 00:28:30,476 --> 00:28:32,936 to get most any data you want from the internet 668 00:28:33,026 --> 00:28:34,166 in order to do analyses. 669 00:28:34,166 --> 00:28:35,836 People have done this with Facebook friendships 670 00:28:35,836 --> 00:28:37,706 and so forth to do research projects and the like, 671 00:28:38,066 --> 00:28:42,366 but realize that it is now within your grasp. 672 00:28:42,366 --> 00:28:44,526 So, we've scrubbed out inputs both for name, 673 00:28:44,716 --> 00:28:46,066 captain, gender, dorm. 674 00:28:46,306 --> 00:28:48,106 Now we have to construct a SQL query. 675 00:28:48,106 --> 00:28:51,356 So this query here is of this form, insert into table name, 676 00:28:51,626 --> 00:28:54,456 and then a comma separated list of fields that we want 677 00:28:54,456 --> 00:28:56,296 to insert into, then values 678 00:28:56,296 --> 00:28:57,686 and then another comma separated list 679 00:28:57,686 --> 00:29:00,456 of the actual values we want to insert and then that's it. 680 00:29:00,796 --> 00:29:04,116 MySQL query passing in that SQL code and viola, 681 00:29:04,116 --> 00:29:05,236 it's now in our database. 682 00:29:06,296 --> 00:29:07,456 So, who cares? 683 00:29:07,456 --> 00:29:09,516 What do we actually do when it's in our database? 684 00:29:09,746 --> 00:29:11,326 Well, think about what we could now do. 685 00:29:11,326 --> 00:29:14,006 We could whip up a little script for a proctor 686 00:29:14,226 --> 00:29:15,946 and we could say something like this. 687 00:29:15,946 --> 00:29:17,776 And I'll just do a cursory form of this. 688 00:29:18,356 --> 00:29:20,796 Let me go ahead and do registrants, oops, 689 00:29:21,126 --> 00:29:27,086 let's do open PHP mode, close PHP mode, registrants, 690 00:29:27,196 --> 00:29:30,576 actually let's just open this one. 691 00:29:30,576 --> 00:29:32,116 Oops, not that. 692 00:29:32,336 --> 00:29:36,606 Let's just open this one and viola, 693 00:29:36,606 --> 00:29:37,766 it's already out of the oven. 694 00:29:38,146 --> 00:29:40,456 The program now will look a little something like this. 695 00:29:40,846 --> 00:29:41,756 So connect to database. 696 00:29:42,036 --> 00:29:43,046 Same exact thing up top. 697 00:29:43,046 --> 00:29:44,986 We connect with local host jharvard and crimson 698 00:29:44,986 --> 00:29:46,816 to that specific database called week nine. 699 00:29:47,116 --> 00:29:49,916 Notice then MySQL query I just create a variable called dollar 700 00:29:49,916 --> 00:29:50,566 sign SQL. 701 00:29:50,786 --> 00:29:53,776 I store in it what's apparently another SQL query, 702 00:29:53,946 --> 00:29:56,736 and this one follows the form select field names, 703 00:29:56,856 --> 00:30:02,816 so I could do name, captain, dorm, gender in any order, 704 00:30:02,846 --> 00:30:05,396 or a little more succinctly I can just say star 705 00:30:05,396 --> 00:30:08,386 which is the wildcard operator from the table name. 706 00:30:08,596 --> 00:30:11,576 I could write this in lower case here, but as a matter 707 00:30:11,636 --> 00:30:14,186 of style it's I would say easier to read 708 00:30:14,186 --> 00:30:16,686 when at least your special key words are all in upper case. 709 00:30:16,886 --> 00:30:19,276 However, realize this FAQ, table names 710 00:30:19,276 --> 00:30:21,176 and field names should be case sensitive. 711 00:30:21,176 --> 00:30:23,336 So if you capitalized it in your database do it 712 00:30:23,336 --> 00:30:24,426 that way in your code. 713 00:30:24,556 --> 00:30:26,026 And then execute this query. 714 00:30:26,186 --> 00:30:27,906 But here is the common sticking point. 715 00:30:27,906 --> 00:30:29,446 When you execute this database query 716 00:30:29,576 --> 00:30:32,516 and you use the select command, what do you actually get back? 717 00:30:32,776 --> 00:30:35,066 Well, inside of the database are these tables, right, 718 00:30:35,066 --> 00:30:36,296 like Excel worksheets. 719 00:30:36,996 --> 00:30:41,146 So what you're getting back is not some one person's name 720 00:30:41,206 --> 00:30:42,546 and gender and so forth. 721 00:30:42,546 --> 00:30:44,946 Rather you're getting back what we'll call a result set. 722 00:30:45,326 --> 00:30:49,696 Think of this is as an array of rows from the database. 723 00:30:50,006 --> 00:30:52,056 So if we scroll down here and we actually want 724 00:30:52,056 --> 00:30:53,346 to write this page that's supposed 725 00:30:53,346 --> 00:30:55,496 to show the proctor who's in charge of frosh IMs 726 00:30:55,786 --> 00:30:58,146 who has registered, notice we actually have 727 00:30:58,146 --> 00:31:02,546 to ask this result set that we got back from MySQL query 728 00:31:02,546 --> 00:31:06,556 which is up here, give me a row, give me a row, give me a row. 729 00:31:06,676 --> 00:31:08,056 And we can do that with a while loop. 730 00:31:08,056 --> 00:31:10,946 We can say while there is a row to give me, 731 00:31:11,216 --> 00:31:14,326 so this function MySQL fetch array when past a 732 00:31:14,326 --> 00:31:16,966 so called result set, a collection of all the rows 733 00:31:16,966 --> 00:31:19,626 in the database that match that select query, go ahead 734 00:31:19,626 --> 00:31:23,666 and assign one at a time to an invariable called row, 735 00:31:23,996 --> 00:31:27,226 and then I can get at the individual fields in that table 736 00:31:27,536 --> 00:31:31,626 by doing dollar sign, row, open bracket, quote, 737 00:31:31,626 --> 00:31:33,076 unquote, name, closed bracket. 738 00:31:33,666 --> 00:31:37,116 So based on this syntax what type of variable or what type 739 00:31:37,116 --> 00:31:39,926 of data structure is row at this point in the story? 740 00:31:40,516 --> 00:31:42,576 [ Inaudible Audience Answer ] 741 00:31:43,076 --> 00:31:44,546 >> Yeah, it's an array but more specifically an? 742 00:31:44,746 --> 00:31:46,086 It's an associative array, right? 743 00:31:46,086 --> 00:31:48,716 It's an array that can have not just numeric indices 744 00:31:48,836 --> 00:31:50,586 but words as its keys. 745 00:31:50,836 --> 00:31:51,936 And so what's this going to do? 746 00:31:51,936 --> 00:31:54,596 LI is list item, UL is unordered list. 747 00:31:54,916 --> 00:31:58,966 So let's just jump to the aesthetic results here 748 00:31:59,036 --> 00:32:02,296 and go back to registrants, stop PHP, this is going to talk 749 00:32:02,296 --> 00:32:05,706 to that same database, let me go into registrants and, 750 00:32:05,706 --> 00:32:07,856 viola, David has registered. 751 00:32:08,086 --> 00:32:10,346 Well, it's not all that interesting right now. 752 00:32:10,346 --> 00:32:13,576 Let's go into register, let's say eight, stop PHP, 753 00:32:13,576 --> 00:32:17,406 let's actually register Matt this time as the team captain 754 00:32:17,406 --> 00:32:22,016 from I don't remember where he lives so we'll say Apley Courts, 755 00:32:22,516 --> 00:32:25,166 register, okay, Matt is apparently registered. 756 00:32:25,166 --> 00:32:25,886 Well, let's check. 757 00:32:26,136 --> 00:32:29,166 Let's go back to registering stop PHP and now we have Matt. 758 00:32:29,496 --> 00:32:31,926 And then we could have another person, and we can now see this. 759 00:32:32,236 --> 00:32:34,806 If I open up my little administrative tool called 760 00:32:34,806 --> 00:32:38,656 phpMyAdmin and I log in as jharvard and crimson, 761 00:32:39,356 --> 00:32:41,666 I then get to see this sort of web based interface 762 00:32:41,666 --> 00:32:45,126 for all these tables one of which is week nine registrants. 763 00:32:45,476 --> 00:32:49,276 And if I zoom here notice that despite all the messy words 764 00:32:49,276 --> 00:32:51,156 and icons there's David, there's Matt. 765 00:32:51,336 --> 00:32:54,086 And if I want to go around here and even modify things, 766 00:32:54,086 --> 00:32:55,876 say Matt actually wants to change his name, 767 00:32:55,876 --> 00:32:58,026 this is not something Matt himself should do. 768 00:32:58,146 --> 00:33:01,036 But you as the administrator could certainly change a row 769 00:33:01,226 --> 00:33:04,066 which then has the effect of changing the actual table. 770 00:33:04,066 --> 00:33:06,456 So now we have the ability to store data as long as we want. 771 00:33:06,456 --> 00:33:09,736 And, frankly, not to sort of set the expectations too high, 772 00:33:09,946 --> 00:33:12,826 this is at the core of what even Facebook did early on. 773 00:33:12,826 --> 00:33:14,736 You have someone register, you ask for their name, 774 00:33:14,736 --> 00:33:17,296 you ask for their residence, you ask then for them 775 00:33:17,296 --> 00:33:19,546 to list off their friends and so forth. 776 00:33:19,776 --> 00:33:22,326 You can do all of that simply by creating these tables 777 00:33:22,546 --> 00:33:24,096 and storing that kind of information. 778 00:33:24,096 --> 00:33:26,236 And any time you have something conceptually different 779 00:33:26,236 --> 00:33:30,046 that you want to store, say user profiles and friends and likes 780 00:33:30,046 --> 00:33:33,176 and activities, you can have a different MySQL table doing all 781 00:33:33,176 --> 00:33:33,496 of that. 782 00:33:33,696 --> 00:33:35,166 And recall from problem set seven 783 00:33:35,166 --> 00:33:37,376 that we don't just store user's user names 784 00:33:37,746 --> 00:33:39,446 and their hashes of passwords. 785 00:33:39,446 --> 00:33:41,746 What other field is also by default 786 00:33:41,746 --> 00:33:43,976 in P set seven associated with each user? 787 00:33:44,516 --> 00:33:47,626 [ Inaudible Audience Answer] 788 00:33:48,126 --> 00:33:48,356 >> What's that? 789 00:33:48,356 --> 00:33:51,176 So not just money but an ID, and ID specifically. 790 00:33:51,176 --> 00:33:52,586 So here's the user's table, 791 00:33:52,586 --> 00:33:54,246 recall that you get for P set seven. 792 00:33:54,496 --> 00:33:56,666 You have all these user names and these hashes 793 00:33:56,666 --> 00:33:57,826 which are hashes of passwords, 794 00:33:58,046 --> 00:33:59,766 but we also gave everyone a unique ID. 795 00:33:59,766 --> 00:34:01,726 And if you read through the P set spec you'll see 796 00:34:01,726 --> 00:34:04,306 that this ID is auto incrementing, which mean I 797 00:34:04,306 --> 00:34:06,436 in code do not have to generate or figure 798 00:34:06,436 --> 00:34:07,826 out what the next number should be. 799 00:34:07,826 --> 00:34:09,286 The database will do that for me. 800 00:34:09,516 --> 00:34:10,956 So Facebook did the same thing. 801 00:34:10,956 --> 00:34:13,196 Some of you who have never signed up for nicknames 802 00:34:13,196 --> 00:34:15,986 for your URLs might have facebook.com/ 803 00:34:15,986 --> 00:34:22,076 profile.php?ID=12345 or some number much bigger. 804 00:34:22,296 --> 00:34:25,456 That's because all of us have unique Facebook IDs. 805 00:34:25,456 --> 00:34:28,136 And you can actually infer from those IDs who's been 806 00:34:28,136 --> 00:34:29,246 on Facebook the longest. 807 00:34:29,326 --> 00:34:32,786 The bigger your number is the later you signed up. 808 00:34:32,786 --> 00:34:36,046 So there are certainly folks this is really sad 809 00:34:36,046 --> 00:34:40,576 that I know this, I am number 6,545. 810 00:34:40,746 --> 00:34:42,916 That was apparently the number in which I signed up. 811 00:34:43,136 --> 00:34:45,576 Mark I think for some reason his ID equals 3, 812 00:34:45,866 --> 00:34:47,796 and then there's also some familiar names if you poke 813 00:34:47,796 --> 00:34:50,356 around the Facebook API where you can see everyone's IDs. 814 00:34:50,636 --> 00:34:53,746 But long story short why have IDs for users 815 00:34:53,746 --> 00:34:55,246 when we already have user names 816 00:34:55,246 --> 00:34:57,026 which themselves are supposed to be unique? 817 00:34:57,106 --> 00:35:00,806 Caesar, Chartier, Guest, Jharvard, why have IDs at all? 818 00:35:01,436 --> 00:35:03,436 >> It's much easier to index ints. 819 00:35:04,056 --> 00:35:05,326 >> What do you mean index ints? 820 00:35:05,736 --> 00:35:07,286 >> Well, on the database you can find the past [inaudible] 821 00:35:07,286 --> 00:35:09,646 for example the order by number. 822 00:35:09,736 --> 00:35:10,386 >> Perfect, right. 823 00:35:10,386 --> 00:35:12,186 So it actually is for performance reasons. 824 00:35:12,286 --> 00:35:15,496 Caesar is kind of a short word, but it's one, two, three, 825 00:35:15,496 --> 00:35:16,936 four, five, six characters. 826 00:35:17,146 --> 00:35:19,456 Jharvard is slightly more. 827 00:35:19,456 --> 00:35:21,676 Rbowden is a few characters as well. 828 00:35:21,866 --> 00:35:22,806 That's a bunch of bytes. 829 00:35:22,806 --> 00:35:25,176 And certainly for longer user names like some you have 830 00:35:25,176 --> 00:35:28,306 for your atcollege.harvard.edu accounts, that suggests 831 00:35:28,336 --> 00:35:30,106 that you have to do string comparisons a lot, 832 00:35:30,106 --> 00:35:32,346 StrComp if you think back to the C function. 833 00:35:32,596 --> 00:35:34,696 It feels like it should be much faster 834 00:35:34,896 --> 00:35:38,286 if you instead give everyone a unique number like an integer, 835 00:35:38,436 --> 00:35:41,316 then you're using only four bytes or 32 bits, 836 00:35:41,316 --> 00:35:43,496 and plus then you can then sort things numerically, 837 00:35:43,746 --> 00:35:44,826 you can create as we've seen 838 00:35:44,826 --> 00:35:47,476 in class more sophisticated data structures like link lists 839 00:35:47,476 --> 00:35:50,146 and hash tables and the like that store those numbers. 840 00:35:50,146 --> 00:35:53,596 So, indeed, storing ID numbers that are ints 841 00:35:53,596 --> 00:35:57,076 or something called big ints which are 64 bit integers tends 842 00:35:57,076 --> 00:35:59,736 to be the best way of storing your data so that then 843 00:35:59,946 --> 00:36:04,566 if you want to list for, say, Caesar, all of his friends, 844 00:36:04,846 --> 00:36:08,396 you can have another table, and let's whip something up here. 845 00:36:08,396 --> 00:36:11,366 I'm going to go ahead and create a new table for P set seven, 846 00:36:11,366 --> 00:36:12,716 which is irrelevant to P set seven, 847 00:36:12,716 --> 00:36:14,836 but we'll use the same users called friends. 848 00:36:14,836 --> 00:36:17,086 I'm going to give this two columns, enter. 849 00:36:17,086 --> 00:36:22,356 And the first column I'm going to say user A 850 00:36:22,356 --> 00:36:25,106 and over here I'm going to say user B. 851 00:36:25,176 --> 00:36:29,246 And if I give these guys both integer types, the idea here is 852 00:36:29,246 --> 00:36:32,316 that if I want to associate users with other users 853 00:36:32,316 --> 00:36:34,826 to form symmetrics, whether symmetric or asymmetric, 854 00:36:34,956 --> 00:36:37,436 all I have to do is say that Caesar, for instance, 855 00:36:37,436 --> 00:36:44,106 is friends with, who was number 2, well whoever number 2 was. 856 00:36:44,896 --> 00:36:47,906 So if I want to say that Caesar and Matt are friends, 857 00:36:48,136 --> 00:36:51,276 all I have to store in this table 1 and 2. 858 00:36:51,276 --> 00:36:53,596 And as you'll see before long, there are ways 859 00:36:53,636 --> 00:36:55,576 to then join these tables. 860 00:36:55,576 --> 00:36:58,346 So as long as you have one field in common like an ID field, 861 00:36:58,576 --> 00:37:00,996 you can actually figure out or look up users 862 00:37:00,996 --> 00:37:04,036 in one table whose IDs are already in another. 863 00:37:04,226 --> 00:37:06,796 And in fact even in CS50's own core shopping tool, 864 00:37:06,796 --> 00:37:10,026 we have this notion of what your friends are taking or shopping 865 00:37:10,026 --> 00:37:10,846 in the way of courses. 866 00:37:10,996 --> 00:37:12,286 This is literally what we do. 867 00:37:12,286 --> 00:37:15,356 When you sign into Harvard course using your Facebook long 868 00:37:15,356 --> 00:37:17,246 in, one of the things Facebook gives us 869 00:37:17,246 --> 00:37:21,926 because we're using their API is a huge array of all of the IDs 870 00:37:21,926 --> 00:37:22,776 of all of your friends. 871 00:37:23,046 --> 00:37:25,856 And can then use those IDs to look up your friends' names 872 00:37:25,856 --> 00:37:28,406 and profile pictures and their courses as well. 873 00:37:28,686 --> 00:37:30,676 So if nothing else if you're just curious 874 00:37:30,676 --> 00:37:32,836 as to what Facebook actually makes available 875 00:37:32,836 --> 00:37:35,696 to people you can play around with their tutorials on line 876 00:37:36,256 --> 00:37:38,916 and see just how much information is being shared and, 877 00:37:38,916 --> 00:37:42,246 frankly, how much data we even have on you just 878 00:37:42,246 --> 00:37:44,986 because you've logged into your Facebook account. 879 00:37:44,986 --> 00:37:47,106 So that little warning that freaks you out 880 00:37:47,106 --> 00:37:49,416 or you completely ignore and just okay these days, 881 00:37:49,676 --> 00:37:53,176 it's actually your consent to giving websites a whole bunch 882 00:37:53,276 --> 00:37:55,686 of data about you in machine readable format 883 00:37:56,356 --> 00:37:58,266 as we'll see quite soon. 884 00:37:59,076 --> 00:37:59,676 Any questions? 885 00:37:59,676 --> 00:38:02,926 Feels like a tough crowd today. 886 00:38:02,926 --> 00:38:04,366 Why don't we take our five minute break here. 887 00:38:04,726 --> 00:38:11,586 Alright, it feels like we definitely have a dead 888 00:38:11,586 --> 00:38:11,966 feel today. 889 00:38:12,056 --> 00:38:14,246 So I'll try to tell a scary story at the end 890 00:38:14,286 --> 00:38:17,356 that affects your privacy and security. 891 00:38:18,516 --> 00:38:21,086 Alright. [Laughter] Alright. 892 00:38:21,376 --> 00:38:26,366 Don't wake that person up in the green there today. 893 00:38:26,366 --> 00:38:28,896 Tell her I say hi. 894 00:38:28,896 --> 00:38:34,906 Alright. [Laughter] Okay, so we promised 895 00:38:34,906 --> 00:38:38,906 that there's this feature to store data in the server even 896 00:38:39,126 --> 00:38:41,636 after a user has gone from one web page to another 897 00:38:41,756 --> 00:38:43,426 and that icon has stopped spinning, right? 898 00:38:43,426 --> 00:38:46,816 We mentioned on Monday that HTTP is stateless which just means 899 00:38:46,816 --> 00:38:49,366 that you don't maintain a persistent connection 900 00:38:49,366 --> 00:38:50,526 to the server usually. 901 00:38:50,756 --> 00:38:54,496 Rather you have to click on a link or submit a form 902 00:38:54,496 --> 00:38:56,346 to actually go from page to page. 903 00:38:56,556 --> 00:38:57,966 Now, there are some exceptions to this. 904 00:38:57,966 --> 00:38:59,916 Facebook itself right now actually a lot 905 00:38:59,916 --> 00:39:02,046 of something called AJAX which means a lot 906 00:39:02,046 --> 00:39:05,116 of JavaScript code is constantly querying the server saying do I 907 00:39:05,176 --> 00:39:07,876 have an instant message, do I have an instant message, 908 00:39:07,876 --> 00:39:09,296 do I have a status update or the like. 909 00:39:09,596 --> 00:39:10,896 So not all website do this, 910 00:39:10,896 --> 00:39:15,216 but most websites do actually not maintain a state unless you 911 00:39:15,216 --> 00:39:17,586 do it do not maintain a persistent connection 912 00:39:17,626 --> 00:39:18,306 to the server. 913 00:39:18,456 --> 00:39:19,796 But suppose we want to do that. 914 00:39:19,796 --> 00:39:20,846 Let me go ahead and open 915 00:39:20,846 --> 00:39:24,676 up a file called counter.php in gedit. 916 00:39:24,676 --> 00:39:29,716 And this is a very short program among this week's code 917 00:39:30,136 --> 00:39:31,256 that looks like this. 918 00:39:31,596 --> 00:39:33,986 At the very top I've got some comments, 919 00:39:33,986 --> 00:39:36,406 and then I've got this new function that you may have seen 920 00:39:36,406 --> 00:39:38,106 in problem set seven but just taken 921 00:39:38,106 --> 00:39:39,986 for granted called session start. 922 00:39:40,386 --> 00:39:43,536 Session start is simply a function that PHP uses 923 00:39:43,536 --> 00:39:46,326 to tell the web server give me access 924 00:39:46,326 --> 00:39:50,066 to a special global variable called dollar sign, 925 00:39:50,136 --> 00:39:51,416 underscore, session. 926 00:39:51,416 --> 00:39:53,506 This is another associative array inside 927 00:39:53,506 --> 00:39:55,566 of which you can put anything, for instance, the contents 928 00:39:55,566 --> 00:39:58,126 of someone's shopping cart, their user ID to remember 929 00:39:58,126 --> 00:40:00,786 that they've logged in, any data that you want to persist 930 00:40:01,136 --> 00:40:02,646 from page load to page load. 931 00:40:02,646 --> 00:40:05,166 And, frankly, any website that has users logging 932 00:40:05,166 --> 00:40:08,306 in these days uses sessions so that, again, 933 00:40:08,436 --> 00:40:11,806 the icon can stop spinning and the connection can close 934 00:40:12,006 --> 00:40:14,096 so this is partially for scalability sake. 935 00:40:14,096 --> 00:40:16,546 If you go to Amazon.com, it might take a second or two 936 00:40:16,546 --> 00:40:17,776 to download the whole web page, 937 00:40:18,036 --> 00:40:20,086 but you the human might spend five seconds, 938 00:40:20,126 --> 00:40:22,856 a minute on that web page, and it would just be a waste 939 00:40:22,856 --> 00:40:26,266 of resources to have your browser constantly connected 940 00:40:26,266 --> 00:40:28,366 to the web server for all of that minute. 941 00:40:28,466 --> 00:40:29,936 So instead the browser disconnects, 942 00:40:29,936 --> 00:40:32,176 you see the content, and then you can click something 943 00:40:32,176 --> 00:40:34,696 or submit something to actually get more content. 944 00:40:34,926 --> 00:40:37,156 So session start ensures that even 945 00:40:37,156 --> 00:40:38,916 if this user disconnects and, frankly, 946 00:40:38,946 --> 00:40:42,656 even if this user closes their laptop lid, walks across campus 947 00:40:42,656 --> 00:40:45,026 and goes back to their dorm and opens the laptop lid, 948 00:40:45,276 --> 00:40:48,646 then it's going to still remember who you are 949 00:40:48,646 --> 00:40:49,706 and that you're logged in. 950 00:40:49,936 --> 00:40:52,936 It's relatively rare that you need to re-log into Facebook 951 00:40:52,936 --> 00:40:55,496 and other sites because they're remembering that's your log in, 952 00:40:55,496 --> 00:40:56,426 especially if you click 953 00:40:56,426 --> 00:40:58,186 that little check box that most sites have. 954 00:40:58,516 --> 00:41:01,456 So this line of code just means give me access 955 00:41:01,456 --> 00:41:03,226 to dollar sign, underscore, session. 956 00:41:03,606 --> 00:41:04,796 So what am I doing next? 957 00:41:04,796 --> 00:41:06,926 Well, we've seen the isset function before. 958 00:41:06,926 --> 00:41:09,756 It just says is this variable set, does it have a value? 959 00:41:09,986 --> 00:41:13,116 And I'm going to say if isset, dollar sign, underscore, 960 00:41:13,116 --> 00:41:14,766 session, quote unquote counter. 961 00:41:14,966 --> 00:41:17,946 So if this variable is bin set what do I want to do? 962 00:41:17,946 --> 00:41:20,136 I want to grab its value, and for reasons we'll see 963 00:41:20,136 --> 00:41:21,496 in a moment I want to store 964 00:41:21,536 --> 00:41:24,516 in a variable called dollar sign counter, all lower case. 965 00:41:24,916 --> 00:41:27,616 Else, if this variable is not set in the session, 966 00:41:27,856 --> 00:41:31,246 that is we've never seen this user before, go ahead 967 00:41:31,246 --> 00:41:34,356 and initialize this counter variable to zero. 968 00:41:34,536 --> 00:41:36,756 Now, down here we have dollar sign, session, 969 00:41:36,756 --> 00:41:39,376 quote unquote counter, GETs, counter plus one. 970 00:41:39,576 --> 00:41:41,686 So this is a counter in the literal sense. 971 00:41:41,686 --> 00:41:43,126 We're doing plus one, plus one, 972 00:41:43,126 --> 00:41:45,826 plus one every time this code is execute, 973 00:41:46,086 --> 00:41:48,616 that is every time this page is loaded. 974 00:41:48,616 --> 00:41:49,396 What does the page do? 975 00:41:49,396 --> 00:41:50,616 It's actually very simple. 976 00:41:50,806 --> 00:41:53,706 You have visited this site some number of times. 977 00:41:54,166 --> 00:41:56,576 I'm kind of regressing back to week two or three 978 00:41:56,576 --> 00:41:59,026 where I cut grammatical corners but so be it. 979 00:41:59,426 --> 00:42:01,916 The point here is that I'm outputting the counter value. 980 00:42:02,186 --> 00:42:02,926 So let's open this up. 981 00:42:02,926 --> 00:42:04,646 Let me go ahead and open up Firefox. 982 00:42:04,956 --> 00:42:09,026 Let me go back to local host in John Harvard's account 983 00:42:09,026 --> 00:42:11,246 and then open up counter and, okay, 984 00:42:11,246 --> 00:42:12,906 I've visited the site zero times. 985 00:42:12,936 --> 00:42:13,766 Let me zoom in. 986 00:42:14,066 --> 00:42:17,436 Let me go ahead and hit reload or control R one time, 987 00:42:17,826 --> 00:42:21,176 two times, three times, and this will very quickly get boring, 988 00:42:21,326 --> 00:42:24,046 but notice at the very top of Firefox it is connecting 989 00:42:24,046 --> 00:42:26,036 to the server and then disconnecting, 990 00:42:26,316 --> 00:42:28,386 so it's apparently remembering somehow 991 00:42:28,696 --> 00:42:30,276 that I've been here before. 992 00:42:30,486 --> 00:42:32,716 Now, this is a simple example, but in real websites 993 00:42:32,716 --> 00:42:34,876 like Facebook and the like, it just remembers 994 00:42:34,936 --> 00:42:35,966 that you've logged in. 995 00:42:35,966 --> 00:42:40,526 That user 6,545 is logged in so they don't have to pester me 996 00:42:40,526 --> 00:42:43,116 on every page give me your password, give me your password. 997 00:42:43,286 --> 00:42:45,716 Or, in the case of Amazon, so that they don't forget as you go 998 00:42:45,716 --> 00:42:48,526 from page to page what's already in your shopping cart. 999 00:42:48,526 --> 00:42:51,086 So you could put product IDs, not user IDs, 1000 00:42:51,376 --> 00:42:52,606 in a shopping cart as well. 1001 00:42:52,886 --> 00:42:54,746 So let's see how this actually works. 1002 00:42:54,746 --> 00:42:57,956 Let me go ahead and open up let me first go ahead 1003 00:42:57,956 --> 00:43:00,746 and clear my cache, and so this, too, should be a good habit 1004 00:43:00,746 --> 00:43:03,066 to get into when doing anything web related 1005 00:43:03,066 --> 00:43:04,256 when writing software is 1006 00:43:04,496 --> 00:43:07,556 to clear your browser's cache constantly just so that 1007 00:43:07,786 --> 00:43:09,906 if you already changed some code on the server 1008 00:43:10,026 --> 00:43:12,976 but the browser didn't realize for efficiency reasons, 1009 00:43:12,976 --> 00:43:15,106 this way you're telling it to re-download the code. 1010 00:43:15,106 --> 00:43:16,736 So I'm going to go ahead and clear now. 1011 00:43:17,026 --> 00:43:19,996 I'm going to go ahead and open up Firefox anew. 1012 00:43:19,996 --> 00:43:23,216 And I'm going to visit this URL, but before I hit enter I'm going 1013 00:43:23,216 --> 00:43:24,506 to click this guy up here. 1014 00:43:24,506 --> 00:43:27,206 So pre-installed in the appliance is that firebug tool. 1015 00:43:27,206 --> 00:43:29,826 If I click this it's going to open here at the bottom, 1016 00:43:30,446 --> 00:43:33,736 and what I'm going to do is scroll not 1017 00:43:33,736 --> 00:43:37,256 to the HTML part this time but to this tab, the net tab. 1018 00:43:37,466 --> 00:43:39,296 So by default it's off for performance. 1019 00:43:39,346 --> 00:43:40,936 I'm going to click enable the net tab, 1020 00:43:40,936 --> 00:43:43,636 and now I just get another tab whose purpose in life is going 1021 00:43:43,636 --> 00:43:47,686 to be to sniff all of my HTTP traffic very similar to what 1022 00:43:47,686 --> 00:43:51,206 that live HTTP header's plug in did a while ago for us. 1023 00:43:51,606 --> 00:43:52,076 So here we go. 1024 00:43:52,076 --> 00:43:55,266 I'm going to go ahead and hit enter and, voila, 1025 00:43:55,266 --> 00:43:57,516 apparently I've only visited one web page. 1026 00:43:57,756 --> 00:43:58,966 It's called counter.php. 1027 00:43:58,966 --> 00:44:02,336 It used HTTP GET so there's no post involved. 1028 00:44:02,626 --> 00:44:04,326 It was on this server called local host. 1029 00:44:04,326 --> 00:44:07,156 The file that came back was 140 bytes. 1030 00:44:07,456 --> 00:44:11,196 The IP address that it's on is apparently 127.0.0.1. 1031 00:44:11,446 --> 00:44:14,416 That is the numeric synonym for quote, unquote local host, 1032 00:44:14,416 --> 00:44:17,316 and port 80 means web server. 1033 00:44:17,316 --> 00:44:17,736 That's all. 1034 00:44:17,736 --> 00:44:19,536 So we're seeing all these fundamentals here. 1035 00:44:19,776 --> 00:44:21,606 So let me go ahead and expand this thing, 1036 00:44:21,726 --> 00:44:24,636 and we'll see exactly what was sent from client to server. 1037 00:44:24,636 --> 00:44:27,016 So let me scroll down here, the request headers. 1038 00:44:27,286 --> 00:44:31,736 So in addition to sending that GET request, it included, 1039 00:44:31,736 --> 00:44:33,866 my browser, all this information, 1040 00:44:33,866 --> 00:44:36,776 the name of the server it's contacting, the user agent 1041 00:44:36,776 --> 00:44:39,706 which is the browser, so this cryptic string uniquely 1042 00:44:39,706 --> 00:44:41,716 identifies this version of Firefox 1043 00:44:41,716 --> 00:44:43,516 on this version of Fedora Linux. 1044 00:44:43,796 --> 00:44:45,916 Then there's some slightly uninteresting stuff 1045 00:44:45,916 --> 00:44:49,546 about what the browser supports, and it also mentions here, oh, 1046 00:44:49,546 --> 00:44:52,826 I accept GZip, so one way that web browsers 1047 00:44:52,826 --> 00:44:55,666 and servers save time is they compress information, 1048 00:44:55,666 --> 00:44:59,086 HTML on the fly automatically, so this is the browser's way 1049 00:44:59,086 --> 00:45:01,776 of say, hey, I can support compression if you want 1050 00:45:01,776 --> 00:45:04,016 to compress the responses you're going to send me. 1051 00:45:04,306 --> 00:45:05,676 So what did the server send back? 1052 00:45:05,776 --> 00:45:06,986 Well, here are the server's headers. 1053 00:45:07,196 --> 00:45:09,096 The server announces the date and time 1054 00:45:09,096 --> 00:45:10,586 in Greenwich mean time here. 1055 00:45:10,776 --> 00:45:13,656 It mentions what server software we're running. 1056 00:45:13,866 --> 00:45:16,476 It mentions the version of the server software we're running 1057 00:45:16,716 --> 00:45:18,646 which is actually a potential security hole, 1058 00:45:18,646 --> 00:45:20,086 but in the appliance we leave everything 1059 00:45:20,086 --> 00:45:21,336 on for debugging purposes. 1060 00:45:21,596 --> 00:45:24,346 This is also a potential security concern, 1061 00:45:24,346 --> 00:45:26,586 the server is also very freely saying, 1062 00:45:26,896 --> 00:45:28,846 by the way, I have PHP installed. 1063 00:45:28,846 --> 00:45:31,806 Moreover I have PHP 5.3.8 installed. 1064 00:45:32,386 --> 00:45:35,076 Why is this probably not the best practice for a server 1065 00:45:35,076 --> 00:45:35,976 in general to do this security wise? 1066 00:45:36,516 --> 00:45:43,546 [ Inaudible Audience Comment] 1067 00:45:44,046 --> 00:45:45,926 >> So it's saying you could inject code potentially. 1068 00:45:45,926 --> 00:45:47,566 So, one, you're obviously revealing 1069 00:45:47,566 --> 00:45:50,436 that you're running PHP as opposed to other languages, 1070 00:45:50,556 --> 00:45:53,806 and it's not always obvious from the URL. 1071 00:45:53,806 --> 00:45:56,996 Two, suppose that there's some bug discovered 1072 00:45:56,996 --> 00:45:59,776 in PHP's interpreter, and so some big announcement goes 1073 00:45:59,776 --> 00:46:02,386 out on the internet on various email security lists and says, 1074 00:46:02,386 --> 00:46:05,886 hey, everyone beware PHP 5.3.8 is buggy. 1075 00:46:06,096 --> 00:46:07,726 There's this security hole in it, 1076 00:46:07,726 --> 00:46:09,306 here's how people can take advantage 1077 00:46:09,306 --> 00:46:10,466 of it so be sure to update. 1078 00:46:10,786 --> 00:46:13,026 Well, the whole world is not going to update instantaneously. 1079 00:46:13,026 --> 00:46:15,056 So all the bad guys have to do now is troll 1080 00:46:15,056 --> 00:46:18,226 around on the internet looking for IP addresses of web servers 1081 00:46:18,226 --> 00:46:20,546 that say, hey, I'm running that buggy version of PHP, 1082 00:46:20,546 --> 00:46:23,246 here is my proclamation thereof. 1083 00:46:23,246 --> 00:46:25,766 So bad practice in production servers, 1084 00:46:25,766 --> 00:46:27,106 but for development purposes 1085 00:46:27,106 --> 00:46:29,106 on an appliance it's okay in this case. 1086 00:46:29,476 --> 00:46:31,166 But here's the magic. 1087 00:46:31,786 --> 00:46:32,526 Set cookie. 1088 00:46:32,526 --> 00:46:36,146 So you might generally know that cookies are some kinds of files 1089 00:46:36,146 --> 00:46:39,076 or information planted by web servers on your browser. 1090 00:46:39,306 --> 00:46:40,146 How is that done? 1091 00:46:40,416 --> 00:46:41,746 Literally as simple as this. 1092 00:46:42,236 --> 00:46:44,566 When you request any web page from a server, 1093 00:46:44,566 --> 00:46:46,996 a response comes back that includes all 1094 00:46:46,996 --> 00:46:49,256 of this juicy information, date and time, server name 1095 00:46:49,256 --> 00:46:52,376 and so forth, but also potentially an HTTP header 1096 00:46:52,576 --> 00:46:54,516 that literally says set cookie, 1097 00:46:54,746 --> 00:46:58,116 and then it gives the cookie a name and a value 1098 00:46:58,196 --> 00:46:59,996 and then potentially some other details. 1099 00:47:00,266 --> 00:47:02,366 So what's really happened here is that PHP, 1100 00:47:02,366 --> 00:47:05,056 thanks to the session start function, 1101 00:47:05,286 --> 00:47:10,776 has automatically sent a cookie to my browser called PHPSESSID, 1102 00:47:10,776 --> 00:47:12,146 which is just the convention, 1103 00:47:12,416 --> 00:47:14,046 and then a big crazy value of this. 1104 00:47:14,636 --> 00:47:18,406 And this is essentially a pseudo random string that's ideally 1105 00:47:18,406 --> 00:47:22,096 supposed to be unique associated and given only to me. 1106 00:47:22,516 --> 00:47:24,766 So henceforth the connection closes, 1107 00:47:24,846 --> 00:47:27,146 and there is no spinning icon or anything. 1108 00:47:27,146 --> 00:47:28,826 I've visited the website zero times, 1109 00:47:29,096 --> 00:47:31,206 but notice if I reload this page now, 1110 00:47:31,766 --> 00:47:34,626 and let me collapse this back to just one line, 1111 00:47:34,656 --> 00:47:36,926 if I reload this page it does indeed say 1112 00:47:36,926 --> 00:47:39,066 at top left I've now visited one time, 1113 00:47:39,316 --> 00:47:40,806 but let's now look at this request. 1114 00:47:41,096 --> 00:47:43,536 In this request, in the request header, 1115 00:47:44,306 --> 00:47:47,496 notice what my browser has perhaps presumptuously sent 1116 00:47:47,496 --> 00:47:48,216 to the server. 1117 00:47:48,466 --> 00:47:51,626 It's sending not a set cookie header, just a cookie header. 1118 00:47:52,016 --> 00:47:53,566 And so my browser is saying, hey, 1119 00:47:53,566 --> 00:47:55,736 by they way the last time I visited you, 1120 00:47:55,876 --> 00:47:58,376 you gave me a cookie called PHPSESSID 1121 00:47:58,566 --> 00:48:01,166 and here is the value that it was equal to. 1122 00:48:01,326 --> 00:48:02,836 So you can think of this as a handstamp 1123 00:48:02,836 --> 00:48:05,356 at like an amusement park or a club where they stamp your hand 1124 00:48:05,356 --> 00:48:08,386 to indicate that you've paid or that you're 21 plus. 1125 00:48:08,386 --> 00:48:10,696 And if you're ever asked this question again you don't have 1126 00:48:10,696 --> 00:48:12,286 to take our your ID or your ticket, 1127 00:48:12,286 --> 00:48:13,896 you instead just show your handstamp. 1128 00:48:14,136 --> 00:48:16,426 And this is really what's going on with cookies. 1129 00:48:16,426 --> 00:48:19,016 You're saying I've been here before, I've been here before. 1130 00:48:19,136 --> 00:48:21,726 And what's stamped on your hand is this really big number, 1131 00:48:21,956 --> 00:48:24,486 because what the web server then does is it stores 1132 00:48:24,486 --> 00:48:28,116 in its own database that big number and associates 1133 00:48:28,116 --> 00:48:30,376 with that big number the contents 1134 00:48:30,376 --> 00:48:33,086 of dollar sign, underscore, session. 1135 00:48:33,536 --> 00:48:37,166 So that big associative array in which you can put user IDs, 1136 00:48:37,166 --> 00:48:40,916 friendship IDs, shopping cart contents, it's stored somehow 1137 00:48:40,916 --> 00:48:44,036 on the server, and it's associated with that big number 1138 00:48:44,256 --> 00:48:47,876 so that the browser and server in the future assume that anyone 1139 00:48:47,876 --> 00:48:51,586 who presents this number must be the guy that I gave this number 1140 00:48:51,586 --> 00:48:55,156 to in the first place so, voila, let me let you pass. 1141 00:48:56,506 --> 00:48:59,846 So it works really nicely, and it doesn't require 1142 00:48:59,846 --> 00:49:01,926 that you maintain a constant connection to the server. 1143 00:49:02,016 --> 00:49:03,906 It doesn't require that you physically remain 1144 00:49:03,906 --> 00:49:05,406 in the amusement park or club. 1145 00:49:05,596 --> 00:49:07,826 You can get back in just by showing this handstamp. 1146 00:49:08,566 --> 00:49:11,096 So where is the opportunity for bad guys now? 1147 00:49:11,096 --> 00:49:15,676 How do you exploit this very useful HTTP feature? 1148 00:49:16,776 --> 00:49:18,976 What can you do? 1149 00:49:19,516 --> 00:49:22,556 [ Inaudible Audience Comment ] 1150 00:49:23,056 --> 00:49:23,816 >> So getting the cookie, right? 1151 00:49:23,816 --> 00:49:27,636 So this cookie is being sent back and forth from server 1152 00:49:27,636 --> 00:49:29,206 to client, but it's being sent over HTTP, 1153 00:49:29,206 --> 00:49:31,396 specifically over port 80. 1154 00:49:31,676 --> 00:49:34,476 And 80 is generally not encrypted, 1155 00:49:34,476 --> 00:49:38,916 443 or URLs that start with HTTPS are encrypted. 1156 00:49:39,096 --> 00:49:40,806 So what this literally means is 1157 00:49:40,806 --> 00:49:43,476 that if my browser is requesting this web page 1158 00:49:43,586 --> 00:49:45,836 and then getting a response, and the server is not 1159 00:49:45,836 --> 00:49:48,056 in the appliance, on the same physical computer, 1160 00:49:48,056 --> 00:49:50,636 but it's a normal server on the internet, Facebook.com, 1161 00:49:50,666 --> 00:49:53,626 Amazon.com, what this means is that that server is replying 1162 00:49:53,626 --> 00:49:57,296 and saying here is your big secret number, send this back 1163 00:49:57,296 --> 00:49:58,676 to us every time you revisit. 1164 00:49:58,846 --> 00:50:02,746 But if you're using HTTP you're literally showing your hand 1165 00:50:02,746 --> 00:50:05,736 to everyone on the internet saying here's my big secret 1166 00:50:05,736 --> 00:50:06,196 number, right? 1167 00:50:06,196 --> 00:50:07,366 It's not really a secret. 1168 00:50:07,496 --> 00:50:11,436 Now, there are solutions to this, namely HTTPS which means 1169 00:50:11,476 --> 00:50:14,776 if you know just casually as users it encrypts everything. 1170 00:50:14,776 --> 00:50:17,286 And that also encrypts your handstamp and all 1171 00:50:17,286 --> 00:50:18,446 of these cookies that are involved. 1172 00:50:18,636 --> 00:50:21,746 But most, many, they're saying most websites do not encrypt 1173 00:50:21,746 --> 00:50:22,706 traffic by default. 1174 00:50:22,886 --> 00:50:24,336 In fact, only up until a few months, 1175 00:50:24,336 --> 00:50:27,726 and only a few months ago did Facebook itself start offering 1176 00:50:27,726 --> 00:50:31,306 across the board this ability to use HTTPS 1177 00:50:31,786 --> 00:50:32,936 for all of your connections. 1178 00:50:33,116 --> 00:50:34,456 And so as I think I mentioned a week 1179 00:50:34,456 --> 00:50:38,146 or two ago it was perfect timing in week eight of CS50 1180 00:50:38,146 --> 00:50:41,326 in fall 2010 this researcher released a tool, 1181 00:50:41,326 --> 00:50:44,716 a plug in for Firefox called Firesheep. 1182 00:50:45,096 --> 00:50:48,476 And this tool simply automated the process of looking 1183 00:50:48,476 --> 00:50:50,886 around the room with WiFi capabilities and saying 1184 00:50:50,926 --> 00:50:53,026 who is showing their handstamp at this point in time? 1185 00:50:53,346 --> 00:50:55,846 And the fellow special cased web sites, popular ones 1186 00:50:55,846 --> 00:50:58,666 like Facebook and Gmail and twitter and a bunch of others. 1187 00:50:58,866 --> 00:51:02,126 So he was looking for patterns like Gmail.com, Facebook.com, 1188 00:51:02,126 --> 00:51:05,916 and any time he saw in the room someone with a handstamp for 1189 00:51:05,916 --> 00:51:09,136 or from that domain name, he would then listen to it 1190 00:51:09,256 --> 00:51:11,016 and store it in the program's memory 1191 00:51:11,266 --> 00:51:13,346 and then display it to the user. 1192 00:51:13,586 --> 00:51:15,756 And so what you see when using a tool 1193 00:51:15,756 --> 00:51:18,266 like this is a little something like this. 1194 00:51:19,026 --> 00:51:20,736 So here is a screen shot. 1195 00:51:20,736 --> 00:51:23,406 Things have gotten a little more locked down now both in terms 1196 00:51:23,406 --> 00:51:27,716 of WiFi and also in terms of this tool working. 1197 00:51:27,716 --> 00:51:29,736 But what you would see is you load up Firefox, 1198 00:51:29,796 --> 00:51:31,356 and we actually did this in class if you want 1199 00:51:31,356 --> 00:51:32,326 to watch last year's video, 1200 00:51:32,566 --> 00:51:35,626 you click the start capturing button, and then what you see 1201 00:51:35,626 --> 00:51:38,406 within a few seconds are all of the unsuspecting people 1202 00:51:38,406 --> 00:51:40,886 who are logged into these various websites. 1203 00:51:40,886 --> 00:51:43,246 So these are actual screenshots from the fellow's presentation 1204 00:51:43,506 --> 00:51:46,506 where he logged into one of his buddy's accounts. 1205 00:51:46,776 --> 00:51:49,756 But rewind about 12 months to CS50 this was great fun. 1206 00:51:49,756 --> 00:51:53,216 We had a like a list of 33 CS50 students who were using 1207 00:51:53,216 --> 00:51:55,256 like Facebook at that moment in time. 1208 00:51:55,496 --> 00:51:56,976 And what this tool allows you to do is 1209 00:51:56,976 --> 00:51:59,026 with a single double click log 1210 00:51:59,026 --> 00:52:01,676 into that person's Facebook account as them. 1211 00:52:01,966 --> 00:52:03,476 So how does this actually work? 1212 00:52:03,476 --> 00:52:05,286 It's actually really simple, right? 1213 00:52:05,286 --> 00:52:07,176 If all of the internet trusts 1214 00:52:07,266 --> 00:52:09,946 that you'll just present this handstamp, this cookie, 1215 00:52:09,946 --> 00:52:12,846 to prove I'm already logged in, you don't need to ask me to log 1216 00:52:12,846 --> 00:52:14,756 in again, well anyone who can sniff 1217 00:52:14,826 --> 00:52:18,036 that cookie then can present him or herself as that person. 1218 00:52:18,036 --> 00:52:20,146 And they don't know your Facebook password 1219 00:52:20,146 --> 00:52:22,636 or your Gmail password but that doesn't matter 1220 00:52:22,636 --> 00:52:24,436 because they're already into your account. 1221 00:52:24,616 --> 00:52:26,246 So double click that name and what you would see 1222 00:52:26,246 --> 00:52:28,666 in the browser is, voila, Ian Gallagher's accounts, 1223 00:52:28,666 --> 00:52:30,666 or in last year's case one of our TF's accounts 1224 00:52:30,836 --> 00:52:32,936 at which point I could proceed and post on her wall 1225 00:52:32,936 --> 00:52:36,496 or poke people or do anything because Facebook is not going 1226 00:52:36,496 --> 00:52:38,106 to re-authenticate me. 1227 00:52:38,106 --> 00:52:39,736 It's going to assume, hey, you have the cookie 1228 00:52:39,736 --> 00:52:41,696 and it must, in fact, be you. 1229 00:52:42,376 --> 00:52:44,796 So it seems that this wonderfully useful system 1230 00:52:44,796 --> 00:52:45,996 that like every website uses 1231 00:52:45,996 --> 00:52:48,726 on the internet uses today is fundamentally flawed. 1232 00:52:48,726 --> 00:52:50,586 In fact, this is still possible. 1233 00:52:50,586 --> 00:52:53,516 I think, unless Facebook's made it by default, you have to go 1234 00:52:53,516 --> 00:52:56,066 into account at the top right and tinker 1235 00:52:56,066 --> 00:52:57,456 around with the security settings 1236 00:52:57,456 --> 00:52:59,996 and say always use secure connections. 1237 00:53:00,246 --> 00:53:03,336 Gmail started doing this a few months back as a result of some 1238 00:53:03,336 --> 00:53:04,946 of the hacking incidents that they had. 1239 00:53:05,186 --> 00:53:08,946 But most websites don't do this partly for performance reasons, 1240 00:53:08,946 --> 00:53:12,686 partly for naivety reasons whereby just the consumers 1241 00:53:12,686 --> 00:53:14,926 aren't demanding this or they're just not cognizant of this. 1242 00:53:14,926 --> 00:53:18,466 In fact, the only websites that really tend to enforce SSL, 1243 00:53:18,556 --> 00:53:23,136 the HTTPS-type sites all the time, are now Facebook and Gmail 1244 00:53:23,176 --> 00:53:26,376 and banks and cs50.net since we got bitten, too, 1245 00:53:26,376 --> 00:53:27,516 shortly after that presentation. 1246 00:53:28,916 --> 00:53:30,556 [Laughter] So what's the takeaway here? 1247 00:53:30,746 --> 00:53:32,886 Well, how do you solve this problem, right? 1248 00:53:32,886 --> 00:53:35,986 Like we introduced this last year, albeit at the risk 1249 00:53:35,986 --> 00:53:37,966 of teaching 500 students how to then hack 1250 00:53:37,966 --> 00:53:41,586 into their roommates' computers that day, for good purposes. 1251 00:53:41,586 --> 00:53:43,036 How do you defend against this? 1252 00:53:43,036 --> 00:53:44,426 Because it is still possible. 1253 00:53:44,426 --> 00:53:46,386 And if you're sitting somewhere on campus, if you're sitting 1254 00:53:46,386 --> 00:53:49,666 in Starbucks, an airport or even your own home with siblings, 1255 00:53:49,876 --> 00:53:52,726 you are vulnerable to interceptions of data especially 1256 00:53:52,726 --> 00:53:55,416 if your WiFi connection is not secure. 1257 00:53:55,416 --> 00:53:57,376 If you don't see that little padlock icon or have 1258 00:53:57,376 --> 00:53:58,596 to type a password to get 1259 00:53:58,596 --> 00:54:01,146 onto the WiFi network you're particularly vulnerable. 1260 00:54:01,616 --> 00:54:02,626 So what can you do? 1261 00:54:02,816 --> 00:54:05,726 Well, websites unfortunately have to do most 1262 00:54:05,726 --> 00:54:06,756 of the solving for us. 1263 00:54:06,756 --> 00:54:09,146 But at least at home, back home home, 1264 00:54:09,386 --> 00:54:11,176 if you control your wireless network you can 1265 00:54:11,176 --> 00:54:12,866 at least turn on that padlock icon. 1266 00:54:13,146 --> 00:54:14,136 It's not just for the sake 1267 00:54:14,136 --> 00:54:16,066 of keeping random people outside your house 1268 00:54:16,066 --> 00:54:17,656 from using your WiFi for free. 1269 00:54:17,896 --> 00:54:19,726 It genuinely is a security concern. 1270 00:54:19,726 --> 00:54:23,976 They could see not just not just log into your Facebook account 1271 00:54:23,976 --> 00:54:26,096 but see all of the traffic you're sending, right? 1272 00:54:26,096 --> 00:54:27,606 If you haven't realized most 1273 00:54:27,606 --> 00:54:30,936 of the instant messages you send are typically not encrypted 1274 00:54:30,936 --> 00:54:33,546 unless you're using Gmail these days over SSL. 1275 00:54:33,766 --> 00:54:36,446 So all of those IMs you're sending friends, 1276 00:54:36,446 --> 00:54:38,926 all of those emails you're sending friends, are still going 1277 00:54:38,926 --> 00:54:40,676 out on the internet in the clear. 1278 00:54:40,676 --> 00:54:43,176 Even Gmail if you send from your college.harvard.edu 1279 00:54:43,176 --> 00:54:46,306 to your personal Gmail account, even if you're using SSL, 1280 00:54:46,486 --> 00:54:48,906 the moment you hit send if you're emailing an outsider 1281 00:54:48,906 --> 00:54:51,646 on the internet who is not using Gmail servers, bam, 1282 00:54:51,646 --> 00:54:52,846 that email is out there. 1283 00:54:52,846 --> 00:54:54,466 And anyone in theory between points A 1284 00:54:54,466 --> 00:54:56,896 and B can see all of your traffic. 1285 00:54:56,896 --> 00:55:00,736 So WPA2 refers to an encryption protocol that you can use 1286 00:55:00,736 --> 00:55:03,416 on your own home wireless network. 1287 00:55:03,416 --> 00:55:08,316 Using HTTPS is probably the most resilient approach 1288 00:55:08,316 --> 00:55:10,036 to protecting your Facebook account. 1289 00:55:10,226 --> 00:55:12,716 But that's only because they now support this. 1290 00:55:12,716 --> 00:55:17,896 A lot of websites again assume or face the reality that turning 1291 00:55:17,896 --> 00:55:20,776 on SSL might just be expensive computationally. 1292 00:55:20,776 --> 00:55:21,856 And this isn't always the case. 1293 00:55:21,856 --> 00:55:23,776 It depends on the hardware and software you have. 1294 00:55:24,126 --> 00:55:27,316 But in theory if you need to not just send data 1295 00:55:27,526 --> 00:55:29,726 but encrypt the data and then send that data, 1296 00:55:30,036 --> 00:55:32,336 just intuitively that's going to take some CPU cycles. 1297 00:55:32,336 --> 00:55:34,006 And even if it's just a few cycles, 1298 00:55:34,256 --> 00:55:35,536 you only have a finite number. 1299 00:55:35,796 --> 00:55:39,336 So that means if you need to do more work per user, well, 1300 00:55:39,336 --> 00:55:41,076 you're going to have to have more servers 1301 00:55:41,076 --> 00:55:43,246 to sustain the same number of users potentially. 1302 00:55:43,556 --> 00:55:44,816 But there's also these other tools 1303 00:55:44,816 --> 00:55:45,726 so that you're well equipped. 1304 00:55:45,726 --> 00:55:49,286 Force-TLS is a plugin for Firefox if that's your browser 1305 00:55:49,286 --> 00:55:53,406 of choice that will try to force all of your connections to SSL 1306 00:55:53,406 --> 00:55:55,256 if the website actually supports it. 1307 00:55:55,476 --> 00:55:59,076 Another one from the EFF it's called HTTPS Everywhere 1308 00:55:59,286 --> 00:56:02,036 which does something quite similar as well. 1309 00:56:02,086 --> 00:56:04,196 So with these mechanisms and honestly just a bit 1310 00:56:04,196 --> 00:56:05,876 of savvy you can protect yourself. 1311 00:56:05,876 --> 00:56:07,616 But another very robust mechanism 1312 00:56:07,906 --> 00:56:09,366 that assumes you have access 1313 00:56:09,406 --> 00:56:11,356 to something special is that of a VPN. 1314 00:56:11,666 --> 00:56:13,966 So Harvard has something called a Virtual Private Network. 1315 00:56:13,966 --> 00:56:14,986 A lot of companies have this. 1316 00:56:14,986 --> 00:56:16,306 You can even set one up in your home. 1317 00:56:16,596 --> 00:56:19,246 So henceforth if you're ever particularly worried 1318 00:56:19,516 --> 00:56:21,916 about doing something sensitive like checking mail 1319 00:56:21,916 --> 00:56:24,286 or financial data in a public space, whether it's 1320 00:56:24,286 --> 00:56:27,456 at the Harvard University SSID or if it's in Starbucks 1321 00:56:27,456 --> 00:56:30,896 or the like, even if that wireless access point does not 1322 00:56:30,896 --> 00:56:35,896 offer encryption, you can connect to VPN.FAS.Harvard.edu. 1323 00:56:35,896 --> 00:56:38,376 You'll get redirected to an SSL connection. 1324 00:56:38,606 --> 00:56:41,296 You can then log into Harvard's VPN. 1325 00:56:41,506 --> 00:56:43,836 Your computer will then be given an IP address 1326 00:56:43,976 --> 00:56:47,086 on Harvard's network, not on Starbucks network or the like, 1327 00:56:47,426 --> 00:56:49,976 and henceforth all of your traffic will be encrypted 1328 00:56:50,166 --> 00:56:51,836 between you and Harvard. 1329 00:56:51,896 --> 00:56:54,446 After that who knows where it's going to go. 1330 00:56:54,446 --> 00:56:56,976 And, frankly, after that Harvard knows everything you're doing, 1331 00:56:56,976 --> 00:56:59,026 so just realize when you connect to Harvard now all 1332 00:56:59,026 --> 00:57:00,686 of their servers have access to your data. 1333 00:57:00,996 --> 00:57:03,866 But if you're at least trying to prevent the sketchy guy next 1334 00:57:03,866 --> 00:57:06,306 to you in Starbucks from looking over your shoulder virtually 1335 00:57:06,306 --> 00:57:08,446 and sniffing your passwords and poking your friends, 1336 00:57:08,446 --> 00:57:11,306 you can at least secure yourself with any 1337 00:57:11,306 --> 00:57:12,976 of these particular mechanisms. 1338 00:57:13,986 --> 00:57:16,976 So I promised a sketchy security oriented ending. 1339 00:57:16,976 --> 00:57:18,876 Why don't we go ahead and end on that note today early, 1340 00:57:19,006 --> 00:57:20,446 and I'll take one-on-one questions up here. 1341 00:57:20,726 --> 00:57:21,706 See you next week.