This is CS50, and this is the start of week 9. Today we focus in particular on design, no longer in the context of C but in the context of PHP and a bit of SQL and a bit of JavaScript, particularly toward an end of both pset 7 and also your final project. In fact, if you are at that point in your final project where presumably as of an hour or so ago you at least started to give some thought to your final project and you're thinking you'd like to collaborate with 1 or 2 classmates, if you're having trouble connecting with said classmates, feel free to fill out the form at cs50.net/partners/form. It just asks you who you are, what kind of project you're thinking about, where you live just for logistical reasons. And then if you want to keep an eye on over the next week or so the spreadsheet URL there, you can then see a read-only version of the Google doc in which we're collecting that information. So if you want to work with someone, by all means feel free to reach out to people via that mechanism. But the majority of folks do work solo. That's totally fine. 21 00:01:02,690 --> 00:01:06,120 So don't feel that this is in any way obligatory. 22 00:01:06,120 --> 00:01:09,680 On Friday it was just me and a few of the team in here, 23 00:01:09,680 --> 00:01:11,100 empty theater for the most part. 24 00:01:11,100 --> 00:01:14,600 There were 3 tourists sitting up there, so that was a little awkward. 25 00:01:14,600 --> 00:01:18,970 What we talked about was databases and we talked about pset 7 a little bit. 26 00:01:18,970 --> 00:01:22,200 And if you didn't happen to catch that on video just yet, that's fine. 27 00:01:22,200 --> 00:01:26,770 I'll try to define any terms that we would otherwise take for granted 28 00:01:26,770 --> 00:01:28,840 based on Friday's lecture. 29 00:01:28,840 --> 00:01:32,550 >> But today we're going to try to get you to the point 30 00:01:32,550 --> 00:01:34,990 of not just being able to do something like pset 7 31 00:01:34,990 --> 00:01:37,360 but really understanding what's going on underneath the hood, 32 00:01:37,360 --> 00:01:41,910 particularly some of the abstractions that we put in place in the functions.php file 33 00:01:41,910 --> 00:01:45,780 to make your lives a bit easier but so that you ultimately understand 34 00:01:45,780 --> 00:01:48,760 so that when the training wheels come off in a few weeks you can still survive 35 00:01:48,760 --> 00:01:53,750 in the real world and do this stuff without any CS50 framework underneath you. 36 00:01:53,750 --> 00:01:57,500 This $_SESSION, for those of you who are familiar 37 00:01:57,500 --> 00:02:01,960 or who already caught the video on Friday, what does SESSION let us do 38 00:02:01,960 --> 00:02:04,330 in a PHP-based web application? 39 00:02:04,330 --> 00:02:09,650 This is a superglobal variable, which means it's similar in spirit to GET and POST 40 00:02:09,650 --> 00:02:13,970 and a few others, but what is this thing useful for? 41 00:02:13,970 --> 00:02:18,320 >> What is SESSION used for? Yeah. [student] Logging in. 42 00:02:18,320 --> 00:02:21,040 Sorry? [student] Logging in. Logging in. Indeed. 43 00:02:21,040 --> 00:02:25,100 In pset 7 we're using this SESSION superglobal to facilitate logging in. 44 00:02:25,100 --> 00:02:28,600 And what's nice about this superglobal is that it's an associative array. 45 00:02:28,600 --> 00:02:33,190 An associative array, recall, is just an array but whose indices no longer have to be numbers 46 00:02:33,190 --> 00:02:37,670 like 012. They can be numbers or they can be even strings. 47 00:02:37,670 --> 00:02:44,890 And so if you've dived into pset 7 yet, you may recall that we are storing a key called ID 48 00:02:44,890 --> 00:02:50,330 inside of this associative array whose value is something like 123-- 49 00:02:50,330 --> 00:02:53,780 whatever the currently logged in user's ID is. 50 00:02:53,780 --> 00:02:59,470 The motivation for this is that even after the user has visited localhost 51 00:02:59,470 --> 00:03:02,720 or my website more generally and then they've logged in, 52 00:03:02,720 --> 00:03:07,320 even if they don't click a link or return to my website for 5 minutes 53 00:03:07,320 --> 00:03:10,730 or even an hour or even a day but they leave their browser window open, 54 00:03:10,730 --> 00:03:14,370 via this superglobal can I remember that they are logged in. 55 00:03:14,370 --> 00:03:21,140 >> In other words, it allows me to store slightly long term anything I want about a user. 56 00:03:21,140 --> 00:03:24,390 And you can think of it really as the incarnation of a shopping cart. 57 00:03:24,390 --> 00:03:27,740 Places like Amazon obviously let you put things into a shopping cart, 58 00:03:27,740 --> 00:03:32,230 but HTTP, the protocol that powers the Web, is stateless 59 00:03:32,230 --> 00:03:34,230 in the sense that when you visit a website, 60 00:03:34,230 --> 00:03:37,290 for the most part you don't have some constant network connection 61 00:03:37,290 --> 00:03:39,270 between your browser and the server. 62 00:03:39,270 --> 00:03:42,190 As soon as you've downloaded the HTML and the JPEGs and the GIFs and all that, 63 00:03:42,190 --> 00:03:48,200 the connection goes away and you just have a copy of the HTML and whatnot from the server. 64 00:03:48,200 --> 00:03:53,000 But if the server wants to remember something about you, 65 00:03:53,000 --> 00:03:57,580 the burden is on the server to actually record that information. 66 00:03:57,580 --> 00:04:00,130 And so you the programmer who have control over the server 67 00:04:00,130 --> 00:04:04,400 can put most anything you want inside of this superglobal associative array 68 00:04:04,400 --> 00:04:06,850 and it will be there the next time the user comes back, 69 00:04:06,850 --> 00:04:12,070 whether it's minutes or even days later, unless they close their browser window, 70 00:04:12,070 --> 00:04:14,360 at which point SESSION disappears. 71 00:04:14,360 --> 00:04:17,779 So it's ephemeral storage, it's non-persistent, and it's meant to go away 72 00:04:17,779 --> 00:04:22,360 as soon as the user closes their browser--not just that tab, often the entire browser, 73 00:04:22,360 --> 00:04:24,930 thereby effectively logging the user out. 74 00:04:24,930 --> 00:04:28,000 So how is this thing actually implemented? 75 00:04:28,000 --> 00:04:31,360 Let's take a quick look at a simple example we looked at on Friday. 76 00:04:31,360 --> 00:04:33,340 For those unfamiliar, it was as simple as this. 77 00:04:33,340 --> 00:04:35,910 This is a web page whose sole purpose in life is to tell me 78 00:04:35,910 --> 00:04:38,000 how many times I have visited this page. 79 00:04:38,000 --> 00:04:41,670 This is the first time here on Monday that I visited it, so it says 0 times. 80 00:04:41,670 --> 00:04:46,940 >> But if I start reloading this page, it says 1 time, 2, 3, 4, 5, 81 00:04:46,940 --> 00:04:49,800 and this will eventually just keep on counting up, up, up, up, up 82 00:04:49,800 --> 00:04:53,130 for each time I actually click Reload on it. 83 00:04:53,130 --> 00:04:58,830 So how is this working? Let me go inside of this file called counter.php. 84 00:04:58,830 --> 00:05:02,490 The top part of it is all blue comments, but the interesting part is here. 85 00:05:02,490 --> 00:05:06,670 On line 13 we call this function session_start, 86 00:05:06,670 --> 00:05:09,600 and that is literally all you need to do if you want to have access 87 00:05:09,600 --> 00:05:13,610 to this special superglobal called $_SESSION. 88 00:05:13,610 --> 00:05:17,430 That makes it all possible, and we'll see in a moment how that's all possible. 89 00:05:17,430 --> 00:05:20,350 In line 16 notice what I'm doing. 90 00:05:20,350 --> 00:05:25,960 If the key, called counter--in other words, the index value--"counter" 91 00:05:25,960 --> 00:05:32,310 exists inside of this array called SESSION, then what am I doing with it in the line below? 92 00:05:32,310 --> 00:05:36,650 What is line 18 doing? 93 00:05:36,650 --> 00:05:40,360 >> [inaudible student response] What's that? [student] Storing the value. Good. 94 00:05:40,360 --> 00:05:45,800 It's storing the value that's in SESSION right now in a new local temporary variable, 95 00:05:45,800 --> 00:05:48,250 $counter in all lowercase. 96 00:05:48,250 --> 00:05:50,770 Notice that PHP is already being a little lazy here. 97 00:05:50,770 --> 00:05:55,550 Notice we don't have any mention of int or float or string or anything like that 98 00:05:55,550 --> 00:06:00,480 because PHP is weakly typed, whereby you don't have to specify the type of a variable, 99 00:06:00,480 --> 00:06:03,310 and in this case here I've not even declared it yet. 100 00:06:03,310 --> 00:06:08,980 I'm declaring it inside of these curly braces and unlike C, this is actually okay. 101 00:06:08,980 --> 00:06:13,800 No matter how deeply nested a variable's declaration is in PHP-- 102 00:06:13,800 --> 00:06:16,650 inside of curly brace, inside of curly brace and the like-- 103 00:06:16,650 --> 00:06:21,230 it will at that moment in time exist for the remainder of the program, 104 00:06:21,230 --> 00:06:22,680 for better or for worse. 105 00:06:22,680 --> 00:06:26,930 So it immediately becomes global as soon as you define it as we're doing here. 106 00:06:26,930 --> 00:06:31,620 >> Otherwise, if I do not find that there's anything in the SESSION superglobal, 107 00:06:31,620 --> 00:06:34,680 I'm apparently initializing this variable counter to 0, 108 00:06:34,680 --> 00:06:37,580 thereby just assuming the user has never been here before. 109 00:06:37,580 --> 00:06:40,030 And then this of course is incrementing the counter how? 110 00:06:40,030 --> 00:06:44,480 I'm updating the value that's inside of this associative array 111 00:06:44,480 --> 00:06:49,530 by setting it equal to whatever counter currently is + 1. 112 00:06:49,530 --> 00:06:53,520 If I scroll down here to the HTML of the page, it's actually pretty simple. 113 00:06:53,520 --> 00:06:58,920 All I have in the body of this page is, "You have visited this site so-and-so times." 114 00:06:58,920 --> 00:07:00,350 And this is a PHP construct. 115 00:07:00,350 --> 00:07:06,080 If you do 00:07:12,600 It's really equivalent to something like printf, which we've seen many times in C, 117 00:07:12,600 --> 00:07:15,940 although as you may know already from the spec in pset 7, 118 00:07:15,940 --> 00:07:20,160 print is also a function that just prints something out, it doesn't actually use format codes, 119 00:07:20,160 --> 00:07:23,270 and you can actually say echo as well. 120 00:07:23,270 --> 00:07:27,460 They're all ever so slightly different even though the net effect is ultimately the same. 121 00:07:27,460 --> 00:07:31,270 So this use of the equals sign is just sort of an elegant way of doing it 122 00:07:31,270 --> 00:07:34,910 more succinctly than you might otherwise be able to. 123 00:07:34,910 --> 00:07:38,370 So that's all this site does. It prints out the value of counter. 124 00:07:38,370 --> 00:07:40,550 How is this all actually happening? 125 00:07:40,550 --> 00:07:43,250 You may recall a week or so ago we started looking underneath the hood 126 00:07:43,250 --> 00:07:47,910 at how a web page works by using this Inspector tab. 127 00:07:47,910 --> 00:07:51,900 >> Chrome has this both in the Mac version, the Windows version, and even the Linux version, 128 00:07:51,900 --> 00:07:59,510 and Firefox and IE have similar mechanisms whereby you have this built-in debugger 129 00:07:59,510 --> 00:08:01,400 inside of the browser. 130 00:08:01,400 --> 00:08:03,040 Let's take a look at the following. 131 00:08:03,040 --> 00:08:06,960 We've got a whole bunch of tabs here, and recall that the leftmost one is Elements, 132 00:08:06,960 --> 00:08:10,700 and no matter how godawful the HTML and JavaScript is in a page, 133 00:08:10,700 --> 00:08:15,710 recall that with the Elements tab you can actually navigate the HTML hierarchically 134 00:08:15,710 --> 00:08:17,050 and nice and neatly. 135 00:08:17,050 --> 00:08:19,370 So if you're trying to learn from a website like Google or Facebook 136 00:08:19,370 --> 00:08:22,370 or really any website, realize that you're probably better off 137 00:08:22,370 --> 00:08:26,360 looking at the source code this way as opposed to viewing the raw source, 138 00:08:26,360 --> 00:08:29,580 which can be a mess, as we've seen especially on Google's site. 139 00:08:29,580 --> 00:08:32,220 So if I instead click on the Network tab here, 140 00:08:32,220 --> 00:08:34,830 let's see what's going on when I visit this page. 141 00:08:34,830 --> 00:08:38,669 First let me clear my cache. 142 00:08:38,669 --> 00:08:43,570 I'm going to go into Settings in Chrome and then go to History 143 00:08:43,570 --> 00:08:46,420 and then Clear all browsing data. 144 00:08:46,420 --> 00:08:48,170 You might be used to doing this for other purposes, [laughter] 145 00:08:48,170 --> 00:08:51,990 but when it comes to developing websites, it's actually useful-- 146 00:08:51,990 --> 00:08:55,980 if you're laughing you know. [laughter] 147 00:08:55,980 --> 00:08:59,310 It's actually really useful when developing websites because the reality is 148 00:08:59,310 --> 00:09:04,100 things like cookies and things like cached HTML files, cached JavaScript files 149 00:09:04,100 --> 00:09:06,390 can actually become a big headache, because if for whatever reason 150 00:09:06,390 --> 00:09:11,500 the browser decides to cache some file and yet you've made changes to that file on the server 151 00:09:11,500 --> 00:09:14,670 but the browser hasn't really realized that the file has changed 152 00:09:14,670 --> 00:09:19,060 and therefore does not actually re-download it even when you click the Reload button, 153 00:09:19,060 --> 00:09:23,210 one of the most surefire ways to just make sure the fault is not with your code, 154 00:09:23,210 --> 00:09:26,480 it's with the behavior of the browser, is to go in here in your browser 155 00:09:26,480 --> 00:09:29,950 and just clear the entire history so that there's no confusion. 156 00:09:29,950 --> 00:09:33,210 >> And then if you really want to be paranoid, quit the browser, restart it, 157 00:09:33,210 --> 00:09:35,660 and then make sure all is working as expected. 158 00:09:35,660 --> 00:09:38,820 So in short, clearing cache is good when doing development. 159 00:09:38,820 --> 00:09:40,690 So here we have the Network tab. 160 00:09:40,690 --> 00:09:46,020 I previously had visited the site 9 times, but let me go ahead now and click Reload. 161 00:09:46,020 --> 00:09:47,500 And I'm back down to 0. 162 00:09:47,500 --> 00:09:52,100 Let's actually see how it is that this SESSION superglobal is being implemented. 163 00:09:52,100 --> 00:09:55,990 I'm going to click on the 1 HTTP request that was made, 164 00:09:55,990 --> 00:09:58,810 and this debugging window lets me look inside of that. 165 00:09:58,810 --> 00:10:01,970 Here I see just the response from the server, which isn't interesting. 166 00:10:01,970 --> 00:10:04,030 I've seen this in any number of ways. 167 00:10:04,030 --> 00:10:06,350 But what's technically interesting are the headers. 168 00:10:06,350 --> 00:10:11,770 If I scroll down here and focus on the request headers and click view source, 169 00:10:11,770 --> 00:10:14,400 what I'm going to see is literally the HTTP request 170 00:10:14,400 --> 00:10:17,250 that just went from my browser to the server, 171 00:10:17,250 --> 00:10:21,400 GET being the operative word and then /counter.php being the file name, 172 00:10:21,400 --> 00:10:25,670 HTTP/1.1 just being the version of HTTP that my browser is using. 173 00:10:25,670 --> 00:10:31,070 This line here is a little reminder from browser to server what the name of the server is 174 00:10:31,070 --> 00:10:33,020 that it wants to talk to. 175 00:10:33,020 --> 00:10:38,200 And then the rest of this is sometimes interesting but not relevant right now. 176 00:10:38,200 --> 00:10:40,090 >> This is just kind of a curiosity. 177 00:10:40,090 --> 00:10:43,530 Cryptic though this string is, any time your browser visits a website 178 00:10:43,530 --> 00:10:47,110 it is informing the server what browser you're using 179 00:10:47,110 --> 00:10:50,040 and what operating system you're using and what version thereof. 180 00:10:50,040 --> 00:10:52,650 So if you've ever wondered how websites like CNN and whatnot 181 00:10:52,650 --> 00:10:56,860 know what the percentages are of Mac users on the Web, PC users, 182 00:10:56,860 --> 00:11:00,820 IE users, Chrome users and the like, it's because all of our browsers 183 00:11:00,820 --> 00:11:04,300 are telling every single website out there what we are. 184 00:11:04,300 --> 00:11:07,410 It doesn't necessarily contain personally identifiable information, 185 00:11:07,410 --> 00:11:13,060 but it does tell the server what your IP address is and what browser and OS you are using. 186 00:11:13,060 --> 00:11:14,720 So that's where this information is. 187 00:11:14,720 --> 00:11:19,960 But what's more interesting now when it comes to these sessions is the response header. 188 00:11:19,960 --> 00:11:22,530 Let me click view source next to response. 189 00:11:22,530 --> 00:11:24,590 What's interesting here is a few things. 190 00:11:24,590 --> 00:11:27,580 1, we got back a status code of 200. 191 00:11:27,580 --> 00:11:29,840 We never see this status code because that means all is well. 192 00:11:29,840 --> 00:11:32,920 It means literally okay in contrast to something else. 193 00:11:32,920 --> 00:11:36,380 What's a number we sometimes see that's bad? [student] 404. 194 00:11:36,380 --> 00:11:39,860 404, file not found, 403 you might be stumbling upon already, 195 00:11:39,860 --> 00:11:43,660 which is forbidden, which means you forgot to chmod something, most likely. 196 00:11:43,660 --> 00:11:45,190 And there's a bunch of others. 197 00:11:45,190 --> 00:11:47,760 >> Down here, this is a little crazy. 198 00:11:47,760 --> 00:11:52,340 I really just wrote this file a few minutes ago by pasting it into gedit. 199 00:11:52,340 --> 00:11:57,100 Why did this page expire in 1981 before there really was a Web? 200 00:11:58,010 --> 00:12:00,730 What's going on there? 201 00:12:00,730 --> 00:12:04,390 >> [inaudible student response] The time stamp. But why? 202 00:12:06,110 --> 00:12:09,120 It's somewhat arbitrary, but it's actually useful. 203 00:12:09,120 --> 00:12:15,500 What this is saying to my browser is this PHP file you've just requested has already expired. 204 00:12:15,500 --> 00:12:18,580 In fact, it expired 30 years ago. 205 00:12:18,580 --> 00:12:20,260 But what does that really mean? 206 00:12:20,260 --> 00:12:22,500 It just means the next time the user visits this page, 207 00:12:22,500 --> 00:12:25,540 whether by reloading or typing the URL in the address bar, 208 00:12:25,540 --> 00:12:28,010 make sure you go and fetch a new copy of it. 209 00:12:28,010 --> 00:12:30,840 This is sort of an example of cache busting, 210 00:12:30,840 --> 00:12:33,790 a stupid word that just means trying to discourage browsers 211 00:12:33,790 --> 00:12:37,260 from actually caching HTML that's been sent from a server 212 00:12:37,260 --> 00:12:41,490 so that you don't accidentally hit reload and then see the same version of the file. 213 00:12:41,490 --> 00:12:43,730 You actually want the server to send a new copy. 214 00:12:43,730 --> 00:12:47,440 So the fact that it's 1981 just means that that's what the appliance is choosing 215 00:12:47,440 --> 00:12:50,280 as an arbitrary date in the past. 216 00:12:50,280 --> 00:12:53,380 But the real juicy line is now this one. 217 00:12:53,380 --> 00:12:57,550 Even before 50 you're probably vaguely familiar with cookies. 218 00:12:57,550 --> 00:13:01,820 As of right now, especially among those less comfortable or in between, 219 00:13:01,820 --> 00:13:04,120 what is a cookie in your understanding right now 220 00:13:04,120 --> 00:13:06,980 even though we're about to make your understanding more technical? 221 00:13:08,150 --> 00:13:10,070 What's a cookie? Yeah. 222 00:13:10,070 --> 00:13:13,890 [student] Information about the user, like if they've written their user name or something. 223 00:13:13,890 --> 00:13:17,370 >> Good. It's information about the user, whether they've typed in their user name already. 224 00:13:17,370 --> 00:13:21,190 Cookies are a way whereby servers can remember something about a user. 225 00:13:21,190 --> 00:13:25,810 And what a cookie really is is a text file or some sequence of bytes 226 00:13:25,810 --> 00:13:28,340 that's planted by the server inside of your browser, 227 00:13:28,340 --> 00:13:31,960 and inside of that file or among those bytes is some kind of identifier. 228 00:13:31,960 --> 00:13:35,640 Maybe it's literally your user name, but more often it's something more cryptic-looking 229 00:13:35,640 --> 00:13:43,700 like this thing here--bo8dal3ct and so forth--this really big alphanumeric string 230 00:13:43,700 --> 00:13:47,050 that's really just meant to be a unique identifier for you. 231 00:13:47,050 --> 00:13:49,790 Or you can think of it as sort of a virtual hand stamp. 232 00:13:49,790 --> 00:13:53,020 If you go to some club or an amusement park, to remember that you've actually paid 233 00:13:53,020 --> 00:13:55,850 and gone in, they put a little red sticker on your hand of some sort, 234 00:13:55,850 --> 00:13:59,270 and that reminds the people at the counter that you've already paid 235 00:13:59,270 --> 00:14:01,340 and you can come and go as you please. 236 00:14:01,340 --> 00:14:04,250 Cookies are a little similar in spirit to that. 237 00:14:04,250 --> 00:14:08,070 The first time I visited this website, as I just did after clearing my cache, 238 00:14:08,070 --> 00:14:11,620 the web server, the appliance in this case, put a stamp on my hand 239 00:14:11,620 --> 00:14:15,030 whose name is PHPSESSID, session ID, 240 00:14:15,030 --> 00:14:18,260 whose value is this really long alphanumeric string. 241 00:14:18,260 --> 00:14:22,470 >> So that's now sort of emblazoned on my hand so that the next time I hit reload 242 00:14:22,470 --> 00:14:25,230 or manually visit this URL in a browser, 243 00:14:25,230 --> 00:14:29,230 my browser by definition of HTTP is going to present the hand stamp 244 00:14:29,230 --> 00:14:31,940 again and again and again. 245 00:14:31,940 --> 00:14:34,550 So even though the server doesn't necessarily know who I am, 246 00:14:34,550 --> 00:14:39,610 they at least know that I'm the same user or at least, more specifically, the same browser. 247 00:14:39,610 --> 00:14:45,660 And so this is ultimately how the SESSION superglobal is implemented. 248 00:14:45,660 --> 00:14:51,200 The server has no idea who you are when you revisit a website for the second or the third time 249 00:14:51,200 --> 00:14:53,410 unless you present this hand stamp. 250 00:14:53,410 --> 00:14:55,530 And as soon as you present that hand stamp, 251 00:14:55,530 --> 00:14:59,370 the web server essentially goes into a little database of its own 252 00:14:59,370 --> 00:15:06,040 and checks, okay, I have just seen the hand stamp of user bo8dal3ct and so forth. 253 00:15:06,040 --> 00:15:09,850 Let me see what information the programmer has stored 254 00:15:09,850 --> 00:15:12,380 inside of the superglobal about this user, 255 00:15:12,380 --> 00:15:17,000 and then let me make sure that that data is again inside of the SESSION superglobal 256 00:15:17,000 --> 00:15:19,830 so that the programmer can re-access that data 257 00:15:19,830 --> 00:15:23,360 even if it was set some minutes or hours ago. 258 00:15:23,360 --> 00:15:26,150 So in other words, cookies, which got a bad rap for some time 259 00:15:26,150 --> 00:15:29,990 because of insecurities in browsers and they can really violate our privacy and all this, 260 00:15:29,990 --> 00:15:31,900 they actually have great utility because without them 261 00:15:31,900 --> 00:15:36,110 you would constantly be logging in to every Facebook page you visit 262 00:15:36,110 --> 00:15:40,680 or every Gmail email you read if the browser didn't have some way of remembering 263 00:15:40,680 --> 00:15:43,320 that you've already authenticated. 264 00:15:43,320 --> 00:15:46,640 >> So in this way cookies are sent back and forth across the wire. 265 00:15:46,640 --> 00:15:52,470 Another curiosity about cookies, especially here, is that this is completely in cleartext. 266 00:15:52,470 --> 00:15:54,930 There's no encryption going on here whatsoever, 267 00:15:54,930 --> 00:15:57,240 and indeed I'm using HTTP at the moment. 268 00:15:57,240 --> 00:16:00,890 One of our favorites moments in CS50, which is now 2 years ago, 269 00:16:00,890 --> 00:16:04,750 was around the time a tool called Firesheep came out. 270 00:16:04,750 --> 00:16:08,320 This was a free piece of software that was made by a security researcher 271 00:16:08,320 --> 00:16:13,250 as a wake-up call for the community to say just how atrociously implemented 272 00:16:13,250 --> 00:16:17,900 certain authentication mechanisms on the Web were. 273 00:16:17,900 --> 00:16:22,880 So for some time, Facebook was almost entirely over HTTP, no HTTPS. 274 00:16:22,880 --> 00:16:25,640 And even if you have no idea how the crypto works, S is secure 275 00:16:25,640 --> 00:16:27,950 so it means there's at least some encryption involved. 276 00:16:27,950 --> 00:16:30,610 Facebook did used to encrypt user names and passwords, 277 00:16:30,610 --> 00:16:33,560 but as soon as you looked at your pokes or your messages or your news feed, 278 00:16:33,560 --> 00:16:35,360 all of that was unencrypted. 279 00:16:35,360 --> 00:16:37,870 So was Gmail until just a year or 2 ago. 280 00:16:37,870 --> 00:16:41,100 Any time you logged in, yes, they used secure encryption, 281 00:16:41,100 --> 00:16:44,300 but thereafter they didn't. And why might that be? 282 00:16:44,300 --> 00:16:49,210 Why not just use cryptography all of the time in use cases like this? 283 00:16:49,210 --> 00:16:53,700 What's that? I think I heard something. [student] Speed. 284 00:16:53,700 --> 00:16:56,250 Speed, right? There are ways around this. 285 00:16:56,250 --> 00:16:59,610 But if you just kind of think about it logically, if you encrypt something, 286 00:16:59,610 --> 00:17:01,820 you have to do at least a little more work. 287 00:17:01,820 --> 00:17:05,460 In pset 2 when you implemented Caesar or Vigenere or even Crack, 288 00:17:05,460 --> 00:17:07,760 just printing a string is relatively easy. 289 00:17:07,760 --> 00:17:12,040 Encrypting and then printing a string minimally requires a bit more work. 290 00:17:12,040 --> 00:17:14,520 >> For super popular websites like Google and Facebook, 291 00:17:14,520 --> 00:17:18,839 if you have to do more work for each user for every single web page they visit, 292 00:17:18,839 --> 00:17:20,520 that just takes more CPU time. 293 00:17:20,520 --> 00:17:22,920 And if you need more CPU time, you might need more servers, 294 00:17:22,920 --> 00:17:24,270 which means you might need more money. 295 00:17:24,270 --> 00:17:27,579 And so for many years this just really wasn't best practice. 296 00:17:27,579 --> 00:17:31,440 People would use SSL encryption only when they needed to. 297 00:17:31,440 --> 00:17:34,960 But it turned out, and as this fellow with Firesheep made super clear, 298 00:17:34,960 --> 00:17:37,920 when you guys who are currently on Facebook right now-- 299 00:17:37,920 --> 00:17:39,880 Out of curiosity, let's see if you'll fess up. 300 00:17:39,880 --> 00:17:42,620 If you're on Facebook right now in some tab, even if it's not foregrounded, 301 00:17:42,620 --> 00:17:46,610 is your URL HTTP or HTTPS? 302 00:17:46,610 --> 00:17:50,560 [multiple students] S. S? [laughter] 303 00:17:50,560 --> 00:17:55,510 Okay. Any HTTP? Just 1? Okay. 304 00:17:55,510 --> 00:17:58,940 So all of us can hack that guy's Facebook account right now. 305 00:17:58,940 --> 00:18:04,100 For the most part this has become turned on by default, at least in some websites. 306 00:18:04,100 --> 00:18:08,120 And long story short, if your web traffic is not encrypted, 307 00:18:08,120 --> 00:18:12,960 not only does the HTML go back and forth across the WiFis unencrypted, 308 00:18:12,960 --> 00:18:16,760 so do things like cookies go back and forth throughout the air 309 00:18:16,760 --> 00:18:18,940 without any form of encryption. 310 00:18:18,940 --> 00:18:23,540 So if you have just a bit of programming savvy or a bit of Googling skills 311 00:18:23,540 --> 00:18:27,410 to find free software that does this, all you have to do is sit in Starbucks 312 00:18:27,410 --> 00:18:30,680 or sit in an airport where there's generally unencrypted WiFi 313 00:18:30,680 --> 00:18:36,070 and just watch for keywords like Set-Cookie: or PHPSESSID 314 00:18:36,070 --> 00:18:39,300 because if you have the technical savvy to just watch the WiFi 315 00:18:39,300 --> 00:18:43,010 for all of the bits that flow throughout the air for this pattern, 316 00:18:43,010 --> 00:18:50,840 you can then say that guy's PHPSESSID happens to be bo8dal and so forth. 317 00:18:50,840 --> 00:18:53,890 And then again if you're sufficiently technically savvy or have the right tool, 318 00:18:53,890 --> 00:18:58,890 you can then just reconfigure your own browser to start presenting that hand stamp 319 00:18:58,890 --> 00:19:05,030 to Facebook.com, and Facebook is just going to assume that you are that guy 320 00:19:05,030 --> 00:19:09,880 because all they know is not who you are but that you have this unique identifier. 321 00:19:09,880 --> 00:19:14,650 So if you steal that unique identifier and present it to the web server as your own, 322 00:19:14,650 --> 00:19:16,860 they are just going to show you that person's news feed 323 00:19:16,860 --> 00:19:18,980 or that person's messages or pokes. 324 00:19:18,980 --> 00:19:23,190 >> And I would Google now how to activate HTTPS for Facebook perhaps. 325 00:19:23,190 --> 00:19:25,150 But it really is as simple as that. 326 00:19:25,150 --> 00:19:27,660 And so Facebook and Google and the like have gotten really good at this, 327 00:19:27,660 --> 00:19:31,870 but keep an eye out all the more for any websites you visit that don't use HTTP 328 00:19:31,870 --> 00:19:35,020 and have some kind of sensitive information on them, 329 00:19:35,020 --> 00:19:37,490 whether it's financial or personal or the like. 330 00:19:37,490 --> 00:19:43,180 If they're not using this, quite possibly can cookies like this be very easily stolen 331 00:19:43,180 --> 00:19:46,270 and then forged, and that's exactly what Firesheep did. 332 00:19:46,270 --> 00:19:48,250 You didn't have to be a programmer. 333 00:19:48,250 --> 00:19:51,680 All you had to do was have an Internet connection, download this free tool, 334 00:19:51,680 --> 00:19:56,490 and what it would do is you log in and then it would show you the Facebook names 335 00:19:56,490 --> 00:20:00,170 of everyone in Sanders, in this particular demonstration, around you 336 00:20:00,170 --> 00:20:03,260 and all you had to do was click on their name and the software automated the process 337 00:20:03,260 --> 00:20:05,970 of sniffing that cookie, presenting it to Facebook as your own, 338 00:20:05,970 --> 00:20:07,990 and, voila, you're logged in. 339 00:20:07,990 --> 00:20:11,190 So this is another one of those "don't do this" officially. 340 00:20:11,190 --> 00:20:14,660 If you have your own home network and you want to tinker, by all means, 341 00:20:14,660 --> 00:20:17,530 but realize this does cross the line on a university environment. 342 00:20:17,530 --> 00:20:20,030 >> But the goal here is really to emphasize not how to do this 343 00:20:20,030 --> 00:20:22,320 but how to defend against these kinds of things. 344 00:20:22,320 --> 00:20:26,180 And the trivial solution here, even though it itself is flawed, 345 00:20:26,180 --> 00:20:31,360 is to really reduce use of any sites that aren't using HTTPS constantly. 346 00:20:31,360 --> 00:20:34,520 So sites like Facebook and Google increasingly have checkboxes 347 00:20:34,520 --> 00:20:36,200 where you can opt in to this sort of thing, 348 00:20:36,200 --> 00:20:40,000 and banks have had this for years for similar reasons. 349 00:20:40,000 --> 00:20:43,580 So just a little bit of a fear factor if we can. But that's it in a nutshell. 350 00:20:43,580 --> 00:20:46,420 That is how a server remembers who you are. 351 00:20:46,420 --> 00:20:50,760 And as soon as they can remember who you are, they can remember anything about you 352 00:20:50,760 --> 00:20:56,140 that the programmer has stored inside of this special superglobal called $_SESSION. 353 00:20:56,140 --> 00:20:59,750 And for pset 7 we're using it trivially just to remember an int, 354 00:20:59,750 --> 00:21:02,260 namely the unique ID of the user who has logged in, 355 00:21:02,260 --> 00:21:05,880 so that we know they've been there before. 356 00:21:05,880 --> 00:21:12,450 Any questions then on sessions or cookies or the like? 357 00:21:12,450 --> 00:21:15,130 Firesheep doesn't work as well anymore, 358 00:21:15,130 --> 00:21:18,310 and you have to put your computer into a special promiscuous mode 359 00:21:18,310 --> 00:21:20,700 so you're actually listening for traffic besides yourselves. 360 00:21:20,700 --> 00:21:23,940 So if you're currently downloading Firesheep, realize it's not quite as easy 361 00:21:23,940 --> 00:21:26,850 as it once was to demonstrate. 362 00:21:26,850 --> 00:21:29,070 All right. And don't do it in Sanders. Do it at home. 363 00:21:29,070 --> 00:21:30,890 Databases. 364 00:21:30,890 --> 00:21:33,580 One of the things we did in pset 7 very deliberately 365 00:21:33,580 --> 00:21:37,780 was we give you a sample database table for users that has some user IDs, 366 00:21:37,780 --> 00:21:41,020 some user names, and some encrypted passwords therein. 367 00:21:41,020 --> 00:21:44,520 And as you'll see, if you haven't already, you're going to have to change the table a little bit. 368 00:21:44,520 --> 00:21:47,710 You're going to have to add some cache to each of the users in that table, 369 00:21:47,710 --> 00:21:51,130 and you're going to have to add another history table, a portfolios table, 370 00:21:51,130 --> 00:21:53,310 or perhaps call it something else. 371 00:21:53,310 --> 00:21:56,740 But in terms of thinking about how to do this, let's open up this tool 372 00:21:56,740 --> 00:22:00,570 which we used on Friday, but if unfamiliar, the appliance comes with a tool 373 00:22:00,570 --> 00:22:04,680 called phpMyAdmin which is coincidentally written in PHP, 374 00:22:04,680 --> 00:22:07,950 but its purpose in life, after I log in here as jharvard with crimson, 375 00:22:07,950 --> 00:22:15,160 is to give me a user-friendly way of viewing and changing my database. 376 00:22:15,160 --> 00:22:18,040 >> The database that I'm running on the appliance is called MySQL. 377 00:22:18,040 --> 00:22:23,420 This is very popular, and it's a free open source database that's wonderfully easy to use, 378 00:22:23,420 --> 00:22:25,620 especially with front ends like this. 379 00:22:25,620 --> 00:22:29,350 What this tool allows me to do, for instance, is poke around tables. 380 00:22:29,350 --> 00:22:30,890 Let me go ahead and do this. 381 00:22:30,890 --> 00:22:36,580 On Friday we created a table called students that was super simple. 382 00:22:36,580 --> 00:22:41,680 It had 3 columns--id, name, and email--and I manually inserted a couple of rows 383 00:22:41,680 --> 00:22:44,420 like David and Mike in this particular example. 384 00:22:44,420 --> 00:22:47,290 Let's take this a bit further, and let's assume that we want to remember more 385 00:22:47,290 --> 00:22:49,660 than just name and email about a user. 386 00:22:49,660 --> 00:22:53,090 Let me click Structure up here at the top. 387 00:22:53,090 --> 00:22:55,440 And again, the pset walks you through the requisite steps here, 388 00:22:55,440 --> 00:22:58,150 so don't worry if some of this is a bit quick. 389 00:22:58,150 --> 00:22:59,690 Then I'm going to click on here. 390 00:22:59,690 --> 00:23:02,270 I'm going to add some number of columns after email 391 00:23:02,270 --> 00:23:04,130 because I want to add something like house. 392 00:23:04,130 --> 00:23:06,640 I forgot to record a student's house. 393 00:23:06,640 --> 00:23:11,400 Let me click Go, and now we have this form that unfortunately is a little wide from left to right, 394 00:23:11,400 --> 00:23:13,710 but I'm going to call the name of this field house, 395 00:23:13,710 --> 00:23:16,050 and then the type I now have to choose. 396 00:23:16,050 --> 00:23:18,870 So let's have a brief chat about the various types in MySQL 397 00:23:18,870 --> 00:23:24,590 because whereas PHP is weakly typed and it sort of plays fast and loose with types, 398 00:23:24,590 --> 00:23:29,430 in a database especially it's super important to actually use typing to your advantage 399 00:23:29,430 --> 00:23:33,260 because one of the things MySQL and other database engines can do for you 400 00:23:33,260 --> 00:23:37,910 is ensure that you don't put bogus data into your database. 401 00:23:37,910 --> 00:23:41,850 This is sort of free error checking available to you. 402 00:23:41,850 --> 00:23:46,250 >> For house we obviously don't want it to be an int, which is a 32-bit value in MySQL. 403 00:23:46,250 --> 00:23:49,810 We did talk briefly on Friday about varchar, which stands for variable length char. 404 00:23:49,810 --> 00:23:54,720 What is this? This allows you to specify that you want this to be a string of some sort. 405 00:23:54,720 --> 00:23:56,840 You don't really know in advance how long it is, 406 00:23:56,840 --> 00:24:00,100 so we'll arbitrarily say a house name can be 255 characters, 407 00:24:00,100 --> 00:24:04,190 but you could go with 32, 64--any number really. 408 00:24:04,190 --> 00:24:10,700 But the advantage of using a varchar over a field called char is what? 409 00:24:10,700 --> 00:24:15,110 Just intuitively if I scroll down here, notice there's char and there's varchar. 410 00:24:15,110 --> 00:24:19,520 Varchar is variable length char; char is a fixed length char. 411 00:24:19,520 --> 00:24:24,730 So based only on that definition, what's the advantage or disadvantage of each of these? 412 00:24:24,730 --> 00:24:30,490 In other words, who cares about the distinction, or why should you care? 413 00:24:31,660 --> 00:24:35,750 >> Yeah. [student] Varchar has more flexibility but takes up more memory. 414 00:24:35,750 --> 00:24:40,730 Good. Varchar takes up more-- Let's see. I'm not sure if I heard that right. 415 00:24:40,730 --> 00:24:42,360 Can you say that once more? 416 00:24:42,360 --> 00:24:45,850 [student] I said varchar probably has more flexibility but it takes up more memory. 417 00:24:45,850 --> 00:24:51,170 Interesting. Okay. Varchar probably gives you more flexibility but takes up more memory. 418 00:24:51,170 --> 00:24:53,220 The latter isn't necessarily true. 419 00:24:53,220 --> 00:24:56,290 It depends on the context, but let's come back to that. 420 00:24:56,290 --> 00:25:03,230 >> [inaudible student response] Exactly. 421 00:25:03,230 --> 00:25:06,900 It's actually the case that char will typically use more memory 422 00:25:06,900 --> 00:25:10,950 because a char, like in C, is like a string, it's an array of characters. 423 00:25:10,950 --> 00:25:13,690 So if you say a char field of length 255, 424 00:25:13,690 --> 00:25:16,910 the database is literally going to give you 255 characters. 425 00:25:16,910 --> 00:25:22,290 And if the house ends up being M-A-T-H-E-R and 6 characters total, 426 00:25:22,290 --> 00:25:25,090 you're wasting over 200 characters. 427 00:25:25,090 --> 00:25:29,640 >> So a varchar effectively only uses as many characters as is necessary 428 00:25:29,640 --> 00:25:31,590 up to a maximum amount. 429 00:25:31,590 --> 00:25:35,470 But the price you pay is actually performance, potentially. 430 00:25:35,470 --> 00:25:39,740 If you know in advance that all of your strings are going to be 8 characters-- 431 00:25:39,740 --> 00:25:43,090 for instance, suppose that you require passwords of length 8-- 432 00:25:43,090 --> 00:25:47,350 the upside of using a char field on occasion, though not often, 433 00:25:47,350 --> 00:25:51,100 is to specify a fixed length for something like a password 434 00:25:51,100 --> 00:25:53,300 because now the database can be even smarter. 435 00:25:53,300 --> 00:25:58,160 If it knows that every char field, every string in a column is the same length, 436 00:25:58,160 --> 00:26:00,780 you get back the feature of random access. 437 00:26:00,780 --> 00:26:05,110 You can jump around among the various char fields in your database table 438 00:26:05,110 --> 00:26:07,940 because think of a database as rows and columns. 439 00:26:07,940 --> 00:26:11,670 So if every one of the strings is the same length, 440 00:26:11,670 --> 00:26:17,820 you know that the first one is at byte 0, the next one is at byte 8 441 00:26:17,820 --> 00:26:20,240 and then 16 and then 24 and so forth. 442 00:26:20,240 --> 00:26:24,500 So if all the strings are of the same length, you can jump around much more efficiently. 443 00:26:24,500 --> 00:26:26,710 So that can be a benefit in terms of performance, 444 00:26:26,710 --> 00:26:29,420 but typically you don't have the luxury of knowing in advance, 445 00:26:29,420 --> 00:26:32,170 so a varchar is the way to go. 446 00:26:32,170 --> 00:26:36,030 Here's another detail that even Facebook ran into eventually. 447 00:26:36,030 --> 00:26:39,670 Ints are great, and we sort of use them by default any time we want a number, 448 00:26:39,670 --> 00:26:41,750 but it's only 32 bits. 449 00:26:41,750 --> 00:26:46,210 >> And even though Facebook doesn't quite have 4 billion users now, 450 00:26:46,210 --> 00:26:48,680 there's definitely some people out there with multiple accounts 451 00:26:48,680 --> 00:26:50,960 or accounts that have been opened and then closed, 452 00:26:50,960 --> 00:26:55,130 and so Facebook itself I believe a few years ago had to transition from int 453 00:26:55,130 --> 00:27:00,010 to, as is aptly called, bigint, which is just 64 bits instead. 454 00:27:00,010 --> 00:27:02,230 So this too is a design decision. 455 00:27:02,230 --> 00:27:06,570 You would be amazingly lucky if your final project turns startup, 456 00:27:06,570 --> 00:27:10,010 has 4 billion and 1 users, give or take, 457 00:27:10,010 --> 00:27:13,200 in which case using ints might be a little shortsighted. 458 00:27:13,200 --> 00:27:16,230 But in reality, your users table is probably fine with ints. 459 00:27:16,230 --> 00:27:19,340 But for something like pset 7, like your history table, 460 00:27:19,340 --> 00:27:23,700 you might have thousands, millions of users if you evolve into etrade.com. 461 00:27:23,700 --> 00:27:26,020 So whereas you might not have more than 4 billion users, 462 00:27:26,020 --> 00:27:30,070 those users you do have might have more than 4 billion transactions over time-- 463 00:27:30,070 --> 00:27:33,200 buys and sells and things in their history. 464 00:27:33,200 --> 00:27:38,090 So if you do anticipate--again, these are good problems to have if you have this much data-- 465 00:27:38,090 --> 00:27:40,920 if you do anticipate data exceeding the size of an int, 466 00:27:40,920 --> 00:27:47,740 going with something like bigint is a direction not frequently enough adopted by designers 467 00:27:47,740 --> 00:27:49,710 because people figure that's not going to be a problem, 468 00:27:49,710 --> 00:27:51,930 but it's this easy to choose something bigger than that. 469 00:27:51,930 --> 00:27:55,380 Decimal we're using in pset 7, which specifies fixed precision 470 00:27:55,380 --> 00:27:59,840 so you can avoid the issues involving floats and doubles and reals and the like. 471 00:27:59,840 --> 00:28:02,440 >> And then there's some other fields here. We'll wave our hands at them to some extent. 472 00:28:02,440 --> 00:28:07,270 But dates, times all have a prescribed format in MySQL, 473 00:28:07,270 --> 00:28:10,830 and the advantage of storing dates as dates and not varchars 474 00:28:10,830 --> 00:28:15,730 means that the database can actually reformat them into different formats, 475 00:28:15,730 --> 00:28:18,800 whether a US format or European format or the like--however you want it-- 476 00:28:18,800 --> 00:28:22,700 much more efficiently than if it were just some generic varchar. 477 00:28:22,700 --> 00:28:25,150 And then there's some other binary, varbinary, blobs. 478 00:28:25,150 --> 00:28:28,580 These are binary large objects, and you can also store binary data 479 00:28:28,580 --> 00:28:30,750 as well as geometric data in a database. 480 00:28:30,750 --> 00:28:34,350 But for us we'll typically care about ints and varchars and the like. 481 00:28:34,350 --> 00:28:36,230 Let's finish up this example with house. 482 00:28:36,230 --> 00:28:40,030 House I'm going to arbitrarily say will be 255 chars. 483 00:28:40,030 --> 00:28:42,850 Then default value we could do this. 484 00:28:42,850 --> 00:28:47,440 We could by default put everyone in Mather House, for instance. 485 00:28:47,440 --> 00:28:49,710 That's how we could specify that the database 486 00:28:49,710 --> 00:28:52,460 should ensure that someone always has a value. But I'll leave that be. 487 00:28:52,460 --> 00:28:55,270 In fact, for people who live off campus and not in a house, 488 00:28:55,270 --> 00:28:59,590 maybe I actually want to specify that the default value for house is NULL, 489 00:28:59,590 --> 00:29:04,890 and then I need to check this box and tell the database it's okay if the user's house is NULL. 490 00:29:04,890 --> 00:29:07,270 >> Again, this is another defense mechanism you can put in place 491 00:29:07,270 --> 00:29:10,590 so you don't even have to put it in your PHP code necessarily. 492 00:29:10,590 --> 00:29:14,630 The database will ensure that things are or are not NULL. 493 00:29:14,630 --> 00:29:17,310 And then lastly, Attributes. 494 00:29:17,310 --> 00:29:18,920 None of these are really relevant. 495 00:29:18,920 --> 00:29:22,880 Binary, unsigned--none of those are relevant to a varchar. 496 00:29:22,880 --> 00:29:24,220 Index. 497 00:29:24,220 --> 00:29:27,320 Does anyone know or remember or have a guess as to what an index is 498 00:29:27,320 --> 00:29:29,510 for something like house? 499 00:29:29,510 --> 00:29:35,240 This too is actually an important and relatively easy design decision. 500 00:29:35,240 --> 00:29:39,200 For those who haven't yet seen it, on Friday we talked briefly about primary keys. 501 00:29:39,200 --> 00:29:43,240 In a database table, a primary key is the field or column 502 00:29:43,240 --> 00:29:46,270 that uniquely identifies rows in the table. 503 00:29:46,270 --> 00:29:49,150 So in the current table we have IDs, we have names and emails. 504 00:29:49,150 --> 00:29:52,050 Which of those is the best candidate to be a primary key, 505 00:29:52,050 --> 00:29:55,810 whose role is to uniquely identify rows? 506 00:29:55,810 --> 00:29:57,530 Probably ID. 507 00:29:57,530 --> 00:29:59,930 Arguably, we could also use what though? 508 00:29:59,930 --> 00:30:02,860 Maybe you could use email because in theory it's unique 509 00:30:02,860 --> 00:30:05,380 unless people are sharing email accounts. 510 00:30:05,380 --> 00:30:09,980 But the reality is that if you're using a numeric ID like 1234, 511 00:30:09,980 --> 00:30:14,170 that's only 32 bits, whereas an email address could be this many bytes or this many bytes. 512 00:30:14,170 --> 00:30:16,610 So in terms of efficiency for unique identifiers, 513 00:30:16,610 --> 00:30:19,270 it tends to be good practice just to use an int 514 00:30:19,270 --> 00:30:23,090 even if you have some string candidate that you could arguably use. 515 00:30:23,090 --> 00:30:26,760 >> For something like house, this should not be a primary key 516 00:30:26,760 --> 00:30:30,770 because then only 1 person could live in Mather and 1 person in Currier and the like. 517 00:30:30,770 --> 00:30:32,790 Similarly, this should not be unique. 518 00:30:32,790 --> 00:30:37,830 The difference between primary and unique is that in the case of our current table, 519 00:30:37,830 --> 00:30:42,620 ID would be primary but email is not primary for the reason we just mentioned-- 520 00:30:42,620 --> 00:30:44,740 performance--but it should still be unique. 521 00:30:44,740 --> 00:30:47,200 So you can still enforce uniqueness without making the claim 522 00:30:47,200 --> 00:30:49,520 that it's a super important primary field. 523 00:30:49,520 --> 00:30:52,610 But this one is quite helpful: Index. 524 00:30:52,610 --> 00:30:56,180 If you know in advance for your final project, for pset 7, or in general, 525 00:30:56,180 --> 00:30:59,480 that this field house is going to be something you search on a lot 526 00:30:59,480 --> 00:31:01,910 using the select keyword or something else, 527 00:31:01,910 --> 00:31:05,180 then you can preemptively tell the database to work its magic 528 00:31:05,180 --> 00:31:10,510 and make sure that it creates in memory any fancy data structures necessary 529 00:31:10,510 --> 00:31:13,770 to expedite searches based on house. 530 00:31:13,770 --> 00:31:17,860 Maybe it will use a hash table, maybe it will use a linked list. 531 00:31:17,860 --> 00:31:21,260 In reality, it tends to use a tree, often a structure called a B-tree-- 532 00:31:21,260 --> 00:31:24,090 not a binary tree but a B-tree--which is a very wide tree 533 00:31:24,090 --> 00:31:27,370 that you might see in a class like CS124, the data structures class. 534 00:31:27,370 --> 00:31:31,800 But in short, you don't have to worry about that when using smart database software. 535 00:31:31,800 --> 00:31:35,890 You can just tell it, "Index this field so I can search on it more efficiently." 536 00:31:35,890 --> 00:31:40,250 >> If you leave this off and you try to search for everyone in the database who lives in Mather, 537 00:31:40,250 --> 00:31:42,710 it will devolve into linear search. 538 00:31:42,710 --> 00:31:45,360 And if you've got 6,000 undergrads all living in some house, 539 00:31:45,360 --> 00:31:47,900 you're going to search the entire table to find the Matherites, 540 00:31:47,900 --> 00:31:52,190 whereas if you say Index, hopefully it will be something close to a logarithmic search 541 00:31:52,190 --> 00:31:54,510 to find those kinds of students. 542 00:31:54,510 --> 00:31:56,750 This is just a free feature to turn on, 543 00:31:56,750 --> 00:31:59,530 even though it does come at a price of some amount of space. 544 00:31:59,530 --> 00:32:02,690 Lastly, auto-increment, this AI field, 545 00:32:02,690 --> 00:32:05,830 which just means if it's an int and you don't want to care to increment it yourself 546 00:32:05,830 --> 00:32:07,570 every time there's a new user, check that, 547 00:32:07,570 --> 00:32:11,910 and each user that gets inserted will automatically get a new ID. 548 00:32:11,910 --> 00:32:15,620 Let's click Save, and now let's find fault with this design. 549 00:32:15,620 --> 00:32:20,200 If I go into Browse, notice that both Mike and my house is NULL. 550 00:32:20,200 --> 00:32:22,420 I can use phpMyAdmin to edit this manually. 551 00:32:22,420 --> 00:32:25,110 I can go in here and type in Mather and then hit Enter, 552 00:32:25,110 --> 00:32:27,740 and now notice the table is different. 553 00:32:27,740 --> 00:32:29,270 But notice I could do something else as well. 554 00:32:29,270 --> 00:32:33,530 David's ID is 1, so phpMyAdmin again is just an administrative tool; 555 00:32:33,530 --> 00:32:35,970 this is not something your users are ever going to see. 556 00:32:35,970 --> 00:32:38,810 So if I instead click the SQL tab up top-- 557 00:32:38,810 --> 00:32:41,450 and again, pset 7 will introduce you to more of these queries-- 558 00:32:41,450 --> 00:32:45,260 I can manually execute the SQL structured query language command 559 00:32:45,260 --> 00:32:56,410 UPDATE users SET house = 'Pfoho' WHERE id = 1. 560 00:32:56,410 --> 00:33:00,830 These SQL queries are, nicely enough, pretty readable from left to right. 561 00:33:00,830 --> 00:33:04,350 Update the users table, set the field called house to Pfoho 562 00:33:04,350 --> 00:33:06,830 where the user's ID is 1. 563 00:33:06,830 --> 00:33:11,480 Or I could even do where email = 'malan@harvard.edu'. 564 00:33:11,480 --> 00:33:14,860 So long as that uniquely identifies me, that would work as well. 565 00:33:14,860 --> 00:33:18,810 But ID tends to be higher performance, so let's do that. 566 00:33:18,810 --> 00:33:22,950 Let's click Go. Okay, lecture.users doesn't exist. What's my error? 567 00:33:22,950 --> 00:33:26,220 What's the table actually called here? 568 00:33:26,220 --> 00:33:28,770 It's called students just because that's what we did up here at top left. 569 00:33:28,770 --> 00:33:31,860 It's called students, not users. So click Go now. 570 00:33:31,860 --> 00:33:34,330 1 row affected. Query took 0.01 seconds. 571 00:33:34,330 --> 00:33:38,010 If I click Browse now, now Malan lives in Pfoho. 572 00:33:38,010 --> 00:33:42,070 So that's another taste of SQL, but the pset will walk you through a bit more of that. 573 00:33:42,070 --> 00:33:44,710 >> There's a stupid decision I've already made here. 574 00:33:44,710 --> 00:33:47,820 I would argue that this database design is inefficient 575 00:33:47,820 --> 00:33:51,650 because the more people I add to the students table, 576 00:33:51,650 --> 00:33:54,730 the more of us I start adding, the more of the TFs I start adding, 577 00:33:54,730 --> 00:33:58,320 we're going to start to see what redundancies in this table? 578 00:34:00,840 --> 00:34:06,020 >> Yeah. [student] Seeing that it's in students, we're using the same [inaudible] 579 00:34:06,020 --> 00:34:07,360 The same-- Right, exactly. 580 00:34:07,360 --> 00:34:10,400 So if 400 people live in Mather, give or take, 581 00:34:10,400 --> 00:34:15,000 eventually this table is going to have 400 rows that say "Mather," "Mather," 582 00:34:15,000 --> 00:34:16,590 "Mather," "Mather," "Mather." 583 00:34:16,590 --> 00:34:19,820 We're wasting all of these bytes, and there's a couple of takeaways there. 584 00:34:19,820 --> 00:34:23,080 1, there's the crazy corner case where if someone pays a lot of money 585 00:34:23,080 --> 00:34:25,949 and renames Mather, we now have to change our whole database table. 586 00:34:25,949 --> 00:34:29,730 That's not going to happen often, though Pfoho was once called North House 15 years ago, 587 00:34:29,730 --> 00:34:32,310 so it happens. But that's not all that compelling. 588 00:34:32,310 --> 00:34:36,000 More compelling than a corner case like that of needing to update the data in bulk 589 00:34:36,000 --> 00:34:41,150 for a database is why are you storing M-A-T-H-E-R again and again and again and again? 590 00:34:41,150 --> 00:34:43,020 That's a lot of chars, 6 chars. 591 00:34:43,020 --> 00:34:45,500 Can't we do even better than that, especially for Pforzheimer? 592 00:34:45,500 --> 00:34:48,320 Surely we can do better than that many characters. 593 00:34:48,320 --> 00:34:51,790 Why not just associate a unique identifier with each house 594 00:34:51,790 --> 00:34:55,020 and store that for each user? So let's try this. 595 00:34:55,020 --> 00:35:00,610 Rather than just use the students table, let me go up to my lecture database up here at top left. 596 00:35:00,610 --> 00:35:02,600 Notice here it says Create table. 597 00:35:02,600 --> 00:35:04,550 Let me create a new table called houses. 598 00:35:04,550 --> 00:35:08,880 The number of columns is going to be 2. Enter. 599 00:35:08,880 --> 00:35:11,200 Now I have 2 fields. 600 00:35:11,200 --> 00:35:14,600 I'm going to call this the name, and it's going to be a varchar of length 255, 601 00:35:14,600 --> 00:35:18,770 >> but that's pretty arbitrary. Let me put this down here by convention. 602 00:35:18,770 --> 00:35:22,840 So put an ID up here. Let's give every house a unique identifier. 603 00:35:22,840 --> 00:35:25,360 Let's give every house a name. 604 00:35:25,360 --> 00:35:30,980 Let's specify that the identifier will be unsigned just by convention to only use positive numbers. 605 00:35:30,980 --> 00:35:35,020 Let's go ahead and give this an auto-increment field for now. 606 00:35:35,020 --> 00:35:38,160 And do we need anything else? 607 00:35:38,160 --> 00:35:41,010 Let's go ahead and click Save. 608 00:35:41,010 --> 00:35:42,480 Now I have a second table. 609 00:35:42,480 --> 00:35:45,860 Notice as an aside this is the slightly cryptic SQL command 610 00:35:45,860 --> 00:35:50,280 that you would have had to type manually if not using an administrative tool like phpMyAdmin. 611 00:35:50,280 --> 00:35:51,990 So another reason we use it. 612 00:35:51,990 --> 00:35:55,480 It's wonderfully useful sort of pedagogically because you can click around 613 00:35:55,480 --> 00:36:01,050 and figure out how things work by just copying and pasting what phpMyAdmin did. 614 00:36:01,050 --> 00:36:04,150 But the Create table command is what was just executed, and here is my table. 615 00:36:04,150 --> 00:36:11,370 Let me go ahead now and use raw SQL rather than oversimplify by clicking the Insert tab. 616 00:36:11,370 --> 00:36:15,040 Let me do INSERT INTO houses, 617 00:36:15,040 --> 00:36:22,230 and I'm going to say the name of the house is going to have a value of 'Mather'. 618 00:36:22,230 --> 00:36:24,790 That's it. This syntax is a little more cryptic. 619 00:36:24,790 --> 00:36:26,660 This is the name of the fields we want to insert. 620 00:36:26,660 --> 00:36:30,390 These are the values we want to insert into those fields. Let me click Go. 621 00:36:30,390 --> 00:36:34,410 1 row inserted took 0.02 seconds. Let me click Browse now. 622 00:36:34,410 --> 00:36:42,020 >> Notice if I click Browse, there's Mather, whose ID is by automation the number 1. 623 00:36:42,020 --> 00:36:45,000 Let me do another one. Let me go into the SQL tab. 624 00:36:45,000 --> 00:36:52,950 INSERT INTO houses. The name of the house is going to have a value of Pfoho and so forth. 625 00:36:52,950 --> 00:36:56,350 Go. And I can keep doing this again and again and again. 626 00:36:56,350 --> 00:36:59,470 Or if you get bored using phpMyAdmin, you can just use the Insert tab 627 00:36:59,470 --> 00:37:01,000 and not have to type the raw SQL. 628 00:37:01,000 --> 00:37:04,690 You can just bang it out more quickly by typing, for instance, Currier, Enter, 629 00:37:04,690 --> 00:37:07,610 and now if we click Browse, there's Currier with an ID of 3. 630 00:37:07,610 --> 00:37:09,920 So this is what we mean by auto-increment. 631 00:37:09,920 --> 00:37:12,280 But now we have to fix something in students. 632 00:37:12,280 --> 00:37:16,240 In students what should the data type of the house field now be? 633 00:37:16,240 --> 00:37:19,450 It should be an int, right? 634 00:37:19,450 --> 00:37:23,950 So the goal here is to factor out, otherwise known as normalize, the tables 635 00:37:23,950 --> 00:37:27,940 so that we don't store information redundantly in any of my tables. 636 00:37:27,940 --> 00:37:31,130 And again, the path we were on here is going to say Mather, Mather, 637 00:37:31,130 --> 00:37:34,220 Mather, Mather, Pfoho, Pfoho, Pfoho, Pfoho, which is very redundant 638 00:37:34,220 --> 00:37:36,240 in terms of the wastefulness of the chars. 639 00:37:36,240 --> 00:37:40,820 So let me go ahead and change this by clicking Structure, 640 00:37:40,820 --> 00:37:44,620 and let me go ahead and check off the house field, click Change, 641 00:37:44,620 --> 00:37:46,990 and now I'm going to change this to be an int. 642 00:37:46,990 --> 00:37:49,490 255 is no longer relevant. 643 00:37:49,490 --> 00:37:54,010 Let me go ahead and say that's fine if it's still NULL. Save. 644 00:37:54,010 --> 00:37:55,870 Now table students has been altered successfully, 645 00:37:55,870 --> 00:37:59,090 and notice again house is an int. 646 00:37:59,090 --> 00:38:02,220 As an aside, ignore the number in parentheses when it comes to ints. 647 00:38:02,220 --> 00:38:03,770 >> This is for legacy reasons. 648 00:38:03,770 --> 00:38:06,920 Back in the day when you didn't have GUIs, you instead had a command line environment, 649 00:38:06,920 --> 00:38:11,580 the 10 and 11 respectively specified how many characters you should show 650 00:38:11,580 --> 00:38:13,950 in the terminal window to actually display fields. 651 00:38:13,950 --> 00:38:19,150 It has nothing to do with the bit length of the actual field, so we'll just ignore that for now. 652 00:38:19,150 --> 00:38:20,990 Now I have to go into this table. 653 00:38:20,990 --> 00:38:24,610 And if David lives in Mather, house should not be 0, 654 00:38:24,610 --> 00:38:27,350 which is a default int value closest to NULL. 655 00:38:27,350 --> 00:38:29,810 He should live in house 1. 656 00:38:29,810 --> 00:38:36,870 Let's arbitrarily say that Mike lives in Pfoho, so house number 2. 657 00:38:36,870 --> 00:38:40,160 Now my table looks a little more cryptic. 658 00:38:40,160 --> 00:38:41,960 But consider the efficiency. 659 00:38:41,960 --> 00:38:44,860 I'm now using only 32 bits to identify the house, 660 00:38:44,860 --> 00:38:49,530 which means there's only 1 canonical definition of my house Mather and Pfoho 661 00:38:49,530 --> 00:38:52,090 and that's in the houses table. 662 00:38:52,090 --> 00:38:55,880 So if I want to now rejoin these tables, think of it this way. 663 00:38:55,880 --> 00:39:01,980 Here I have my students table, and on the right-hand side there's these numbers, 1 and 2. 664 00:39:01,980 --> 00:39:04,180 1 is Mather, 2 is Pfoho. 665 00:39:04,180 --> 00:39:08,580 We have those same numbers in this other table, which is called houses, 666 00:39:08,580 --> 00:39:11,020 1 and 2 and 3 for those 3 houses. 667 00:39:11,020 --> 00:39:14,990 What we now want to do is have the ability in code, PHP and SQL, 668 00:39:14,990 --> 00:39:18,800 to sort of rejoin these tables, where if these are the students and these are the houses, 669 00:39:18,800 --> 00:39:22,050 we want to somehow combine them so that 1 lines up with 1, 670 00:39:22,050 --> 00:39:25,670 2 lines up with 2, and so that we can figure out where David 671 00:39:25,670 --> 00:39:28,000 and where Mike and where everyone else lives. 672 00:39:28,000 --> 00:39:31,850 To do this we can execute a SQL query like the following. 673 00:39:31,850 --> 00:39:40,470 SELECT * FROM students JOIN houses ON-- 674 00:39:40,470 --> 00:39:43,000 And now what fields do we want to join on? 675 00:39:43,000 --> 00:39:49,520 So students.house = houses.id. 676 00:39:49,520 --> 00:39:54,150 >> A little cryptic, but this part means literally create a new temporary table 677 00:39:54,150 --> 00:39:56,690 that's the result of joining students and houses. 678 00:39:56,690 --> 00:40:00,340 And how do you want to combine the tips of my fingers here? 679 00:40:00,340 --> 00:40:05,280 Set the students' house field equal to the houses' ID field. 680 00:40:05,280 --> 00:40:10,220 And if I now click Go, I get back exactly what I hoped to. 681 00:40:10,220 --> 00:40:15,890 David is in Mather, Mike is in Pfoho, and I also see the unique identifiers. 682 00:40:15,890 --> 00:40:18,640 But the point is now I have a complete table. 683 00:40:18,640 --> 00:40:23,020 And so the takeaway here for pset 7 or really for the final project: 684 00:40:23,020 --> 00:40:25,830 If you find that you're storing any piece of information redundantly, 685 00:40:25,830 --> 00:40:28,850 whether it's a house, maybe it's a city, state, and ZIP 686 00:40:28,850 --> 00:40:32,050 where ZIP can usually but not always be used as a unique identifier, 687 00:40:32,050 --> 00:40:35,810 do go through the exercise mentally and then with something like phpMyAdmin 688 00:40:35,810 --> 00:40:40,660 of factoring out that common data because especially as your website gets more well used 689 00:40:40,660 --> 00:40:45,440 and more popular, this is how you make sure that everything is super fast, 690 00:40:45,440 --> 00:40:51,930 by giving the database as many hints as to uniqueness as possible. 691 00:40:51,930 --> 00:40:53,860 That was a lot. 692 00:40:53,860 --> 00:40:59,010 Any questions? All right. Let's take a 5-minute break there and regroup. 693 00:41:01,600 --> 00:41:03,540 All right. 694 00:41:03,540 --> 00:41:08,680 The following is an example that was used some years ago when I took CS161, 695 00:41:08,680 --> 00:41:10,960 which is the operating systems class at the college 696 00:41:10,960 --> 00:41:15,160 which is known for being amazing but a crazy amount of work, 697 00:41:15,160 --> 00:41:19,810 and it focuses really on some of the low-level problems that arise in operating systems 698 00:41:19,810 --> 00:41:22,700 and also even in the world of databases. 699 00:41:22,700 --> 00:41:27,040 >> The story that was told by my professor, Margo Seltzer, that year was as follows. 700 00:41:27,040 --> 00:41:30,990 Suppose that you have a little dorm fridge for you and your roommate 701 00:41:30,990 --> 00:41:34,030 and both of you really like milk. 702 00:41:34,030 --> 00:41:36,360 So you come home from class one day, your roommate is not yet there, 703 00:41:36,360 --> 00:41:39,650 you open the fridge, and you realize, "Oh damn, we're out of milk." 704 00:41:39,650 --> 00:41:42,070 So you close the fridge, you walk across the street to CVS 705 00:41:42,070 --> 00:41:45,830 and get in the increasingly long lines to buy some milk at CVS. 706 00:41:45,830 --> 00:41:48,470 Meanwhile, your roommate comes home from his or her class, 707 00:41:48,470 --> 00:41:51,690 comes into the room, opens the fridge really wanting some milk, 708 00:41:51,690 --> 00:41:54,130 opens the fridge and, "Damn, no milk." 709 00:41:54,130 --> 00:41:57,890 So he or she closes the fridge, walks out the door, and goes to ABP 710 00:41:57,890 --> 00:42:00,910 or somewhere other than CVS where you're not going to bump into each other 711 00:42:00,910 --> 00:42:02,790 to go get some milk. 712 00:42:02,790 --> 00:42:04,820 Of course a few minutes later, both of you get back home 713 00:42:04,820 --> 00:42:07,740 and now you have twice as much milk as you actually wanted. 714 00:42:07,740 --> 00:42:10,670 And being milk, now it's going to go bad because you like milk 715 00:42:10,670 --> 00:42:14,200 but you don't really like milk, so now you have too much milk, so it's going to sour. 716 00:42:14,200 --> 00:42:16,830 This is an awful, awful situation. 717 00:42:16,830 --> 00:42:22,920 What could have solved this predicament if you were the first roommate home? Yes. 718 00:42:22,920 --> 00:42:25,970 [student] You should have left a note. [laughter] 719 00:42:25,970 --> 00:42:28,090 Good. You should have left a note. 720 00:42:28,090 --> 00:42:32,320 You should have put a Post-it note or the like saying, "Gone for milk," 721 00:42:32,320 --> 00:42:36,830 and then your roommate conceptually would have been locked out of actually doing that. 722 00:42:36,830 --> 00:42:38,010 Or you could go 1 step further. 723 00:42:38,010 --> 00:42:41,060 You could literally lock the refrigerator with some kind of padlock, 724 00:42:41,060 --> 00:42:44,870 and now your roommate will literally be locked out of the fridge. 725 00:42:44,870 --> 00:42:48,520 If we generalize back to programming, 726 00:42:48,520 --> 00:42:51,610 you can almost think of the fridge as some kind of variable or a struct, 727 00:42:51,610 --> 00:42:53,500 some kind of container for information. 728 00:42:53,500 --> 00:42:58,290 The problem fundamentally here is that both of you were allowed to inspect 729 00:42:58,290 --> 00:43:02,370 or read the state of this data structure, 730 00:43:02,370 --> 00:43:08,050 but you viewed it at different times and yet both of you made a decision 731 00:43:08,050 --> 00:43:11,920 based on the state of the world at those different moments in time. 732 00:43:11,920 --> 00:43:15,570 So had you locked the refrigerator, you would have at least avoided your roommate 733 00:43:15,570 --> 00:43:19,070 from having been able to inspect the state of the world, 734 00:43:19,070 --> 00:43:22,530 so he or she could not have made that same decision. 735 00:43:22,530 --> 00:43:25,780 So databases, as it turns out, have this problem constantly. 736 00:43:25,780 --> 00:43:31,050 >> Let's see if we can construct a scenario. 737 00:43:31,050 --> 00:43:34,310 Suppose that you're sort of a bad guy and you go to Bank of America 738 00:43:34,310 --> 00:43:37,950 or one of the other places in the square that have a couple ATMs side by side, 739 00:43:37,950 --> 00:43:41,200 and somehow you figured out how to duplicate an ATM card--not all that hard. 740 00:43:41,200 --> 00:43:42,730 It's just a magnetic strip. 741 00:43:42,730 --> 00:43:45,180 And so what you want to try to do is play this game 742 00:43:45,180 --> 00:43:49,060 whereby you put 1 card into 1 machine, another card into the other machine, 743 00:43:49,060 --> 00:43:51,980 and you essentially want to try to withdraw money simultaneously, 744 00:43:51,980 --> 00:43:54,930 because imagine that story goes as follows. 745 00:43:54,930 --> 00:43:57,350 The machine on the left takes your card and your PIN, 746 00:43:57,350 --> 00:44:00,240 and then you say, "Give me $100." 747 00:44:00,240 --> 00:44:04,790 The ATM is programmed to first do a select on its database or the equivalent-- 748 00:44:04,790 --> 00:44:10,780 whatever database it's using--to see does this user have at least $100 in his or her account? 749 00:44:10,780 --> 00:44:16,180 If so, then spit out the $100 and subtract $100 from their balance. 750 00:44:16,180 --> 00:44:20,470 But of course if there's multiple machines here or multiple ways of inspecting 751 00:44:20,470 --> 00:44:23,560 the state of that world, the bank vault, to see how much money you have, 752 00:44:23,560 --> 00:44:26,780 suppose that just by chance the machine on the left and the right 753 00:44:26,780 --> 00:44:30,140 both ask that question at roughly the same moment in time. 754 00:44:30,140 --> 00:44:34,160 >> And this can certainly happen. ATMs are computers these days. 755 00:44:34,160 --> 00:44:37,670 So if the machine on the left says, "Yes, you have at least $100," 756 00:44:37,670 --> 00:44:42,150 meanwhile the machine on the right says, "Yes, you have at least $100," 757 00:44:42,150 --> 00:44:47,420 then both of them proceed to finish their programs and actually spit out the $100 758 00:44:47,420 --> 00:44:50,820 and say, "Previously you had $200." 759 00:44:50,820 --> 00:44:54,890 "Let me update the variable to now be $100 left in the account." 760 00:44:54,890 --> 00:44:58,780 But if both of them have checked your account balance and found that it's $200 761 00:44:58,780 --> 00:45:02,000 and both of them then do the math and say 200 - 100, 762 00:45:02,000 --> 00:45:06,990 the machines have potentially spit out two $100 bills in each machine, 763 00:45:06,990 --> 00:45:11,360 but they've only updated your sum account balance to be $100. 764 00:45:11,360 --> 00:45:15,130 In other words, you've taken out $200, but because they inspected the state of the world 765 00:45:15,130 --> 00:45:18,840 simultaneously and then made a decision based on that value, 766 00:45:18,840 --> 00:45:21,930 they might not do the math ultimately correctly. 767 00:45:21,930 --> 00:45:25,520 So in a bank situation too you really want to have some kind of lockout 768 00:45:25,520 --> 00:45:28,450 so that as soon as you've checked the state of some variable 769 00:45:28,450 --> 00:45:31,220 that's really important, like your account balance, 770 00:45:31,220 --> 00:45:36,070 don't let anyone else make decisions based on that until you are done doing your thing, 771 00:45:36,070 --> 00:45:38,920 where in this case you are the ATM on the left. 772 00:45:38,920 --> 00:45:41,160 Lock everyone else out. 773 00:45:41,160 --> 00:45:44,650 You can actually achieve this effect in a couple of different ways. 774 00:45:44,650 --> 00:45:48,660 >> The simplest way in MySQL is a line of SQL that we gave you 775 00:45:48,660 --> 00:45:52,030 in the problem set specification that looks exactly like this. 776 00:45:52,030 --> 00:45:57,420 Insert into the table--whatever it's called--an id, a symbol, and a share, a number of shares, 777 00:45:57,420 --> 00:45:59,660 the following values, for instance. 778 00:45:59,660 --> 00:46:03,370 If you haven't read the spec yet, this is an example involving how do you go about 779 00:46:03,370 --> 00:46:07,340 buying 10 shares of this penny stock for President Skroob, 780 00:46:07,340 --> 00:46:10,340 whose user ID happens to be the number 7? 781 00:46:10,340 --> 00:46:14,070 This says INSERT INTO table the following id, symbol, and number of shares 782 00:46:14,070 --> 00:46:18,200 of 7, 'DVN.V', and 10. 783 00:46:18,200 --> 00:46:21,510 But--but, but, but--the second line is the important one. 784 00:46:21,510 --> 00:46:26,310 ON DUPLICATE KEY UPDATE shares = shares + VALUES(shares). 785 00:46:26,310 --> 00:46:28,350 So totally cryptic-looking at first glance. 786 00:46:28,350 --> 00:46:31,990 But the fact that this SQL query, even though it wraps onto 2 lines, 787 00:46:31,990 --> 00:46:35,920 is 1 long query, it means it's atomic 788 00:46:35,920 --> 00:46:41,000 in the sense that this query will either be executed all together or not at all. 789 00:46:41,000 --> 00:46:45,100 And by definition of MySQL, that's how they implemented this query. 790 00:46:45,100 --> 00:46:51,010 It is by definition in the manual guaranteed to execute all at once or not at all. 791 00:46:51,010 --> 00:46:54,020 The motivation for this is as follows. 792 00:46:54,020 --> 00:46:58,540 If in this case you are trying to buy 10 shares of stock, 793 00:46:58,540 --> 00:47:02,260 it's kind of the same story as the milk, it's kind of the same story as the ATM. 794 00:47:02,260 --> 00:47:04,970 >> If you make the mistake of not using this syntax 795 00:47:04,970 --> 00:47:09,610 but instead selecting from the database to see how many shares of this penny stock 796 00:47:09,610 --> 00:47:13,750 does President Skroob have, and suppose he has 10 shares, 797 00:47:13,750 --> 00:47:19,330 and then some split second later you then do an UPDATE statement, 798 00:47:19,330 --> 00:47:24,810 which is another statement in SQL that says go ahead and add 10 more shares 799 00:47:24,810 --> 00:47:28,700 to his current 10 so that ideally the total is 20, 800 00:47:28,700 --> 00:47:33,490 the problem is because in today's database systems and because in today's computers 801 00:47:33,490 --> 00:47:35,990 you have multiple processors, multiple cores-- 802 00:47:35,990 --> 00:47:38,920 in other words, computers can literally be doing multiple things at once-- 803 00:47:38,920 --> 00:47:44,270 there's no guarantee that your SELECT and your UPDATE in this case 804 00:47:44,270 --> 00:47:46,150 are going to happen back to back. 805 00:47:46,150 --> 00:47:49,140 So a bad scenario would be you do the SELECT 806 00:47:49,140 --> 00:47:51,670 to see how many shares of this penny stock does Skroob have, 807 00:47:51,670 --> 00:47:54,710 and then just by chance another database query is executed-- 808 00:47:54,710 --> 00:47:57,740 maybe its Skroob in another browser window trying to buy 10 shares 809 00:47:57,740 --> 00:48:00,700 in another window altogether, much like the ATM-- 810 00:48:00,700 --> 00:48:05,410 and suppose that another query gets in between SELECT and the UPDATE. 811 00:48:05,410 --> 00:48:10,210 It could be the case that Skroob now loses some number of shares 812 00:48:10,210 --> 00:48:14,340 because another process is inspecting the state of his world, 813 00:48:14,340 --> 00:48:17,800 or he gets more shares than he should have. 814 00:48:17,800 --> 00:48:23,250 We won't go into the particulars of exactly what those particular story lines would be, 815 00:48:23,250 --> 00:48:28,380 but the point is if you have to check a variables value and then make a decision, 816 00:48:28,380 --> 00:48:32,500 if there's a risk of someone else doing something in between those 2 statements, 817 00:48:32,500 --> 00:48:36,220 as can happen in multiprocessor systems, in multicore systems, 818 00:48:36,220 --> 00:48:41,220 computers with the ability to do multiple things at once, bad things can happen 819 00:48:41,220 --> 00:48:44,530 like bank accounts being debited incorrectly, buying twice as much milk, 820 00:48:44,530 --> 00:48:46,730 or in this case the wrong number of shares. 821 00:48:46,730 --> 00:48:48,370 But there's an easier way to think about this. 822 00:48:48,370 --> 00:48:53,290 >> It turns out that SQL also supports, if you configure your table correctly, 823 00:48:53,290 --> 00:48:56,920 something called transactions, which I would argue is actually even easier to understand 824 00:48:56,920 --> 00:49:00,650 than this, but it's not a 1-liner, so it's actually a bit more involved. 825 00:49:00,650 --> 00:49:04,960 There is literally a statement in SQL called START TRANSACTION. 826 00:49:04,960 --> 00:49:08,300 Just like there's SELECT, UPDATE, INSERT, DELETE, and JOIN and a bunch of others, 827 00:49:08,300 --> 00:49:10,970 there are keywords like START TRANSACTION. 828 00:49:10,970 --> 00:49:13,560 And what you then do in the context of pset 7-- 829 00:49:13,560 --> 00:49:17,270 you don't have to do this for pset 7; it's explicitly disclaimed as not necessary, 830 00:49:17,270 --> 00:49:18,830 but for final projects it can be useful-- 831 00:49:18,830 --> 00:49:22,820 if you call a query of START TRANSACTION and then another query 832 00:49:22,820 --> 00:49:25,620 and then another query and then another, another, and another, 833 00:49:25,620 --> 00:49:31,860 those queries will not actually be executed until you call the SQL statement COMMIT, 834 00:49:31,860 --> 00:49:37,220 at which point, whether it's 2 statements or 20 statements, they will all be executed at once, 835 00:49:37,220 --> 00:49:42,770 which means no one else can accidentally buy too much milk or debit too much money 836 00:49:42,770 --> 00:49:46,340 or buy too many shares because all of your queries will execute 837 00:49:46,340 --> 00:49:48,410 back to back to back to back. 838 00:49:48,410 --> 00:49:51,580 And this is super important, especially when you're doing something like this. 839 00:49:51,580 --> 00:49:54,900 This is an arbitrary example that says let's update the bank account 840 00:49:54,900 --> 00:50:00,200 by setting a balance equal to balance - $1000 where the account number is 2. 841 00:50:00,200 --> 00:50:04,260 And then the second statement is now let's deposit that $1000 842 00:50:04,260 --> 00:50:07,310 into someone else's bank account whose account number is 1. 843 00:50:07,310 --> 00:50:10,400 >> In other words, this is a perfect example of where you want to make sure 844 00:50:10,400 --> 00:50:13,590 that both of these statements happen or not at all 845 00:50:13,590 --> 00:50:15,450 because otherwise the customer is going to get screwed 846 00:50:15,450 --> 00:50:17,670 and you're going to take their money and not deposit it elsewhere, 847 00:50:17,670 --> 00:50:20,470 or the bank is going to get screwed where you're going to deposit the money 848 00:50:20,470 --> 00:50:23,140 but not actually subtract it from the user's account. 849 00:50:23,140 --> 00:50:25,810 So you want both of them to execute together. 850 00:50:25,810 --> 00:50:29,140 Thus enters into the world transactions. 851 00:50:29,140 --> 00:50:31,360 So that's something to keep in the back of your mind, 852 00:50:31,360 --> 00:50:34,710 not so much for the purposes of just a final project, 853 00:50:34,710 --> 00:50:36,700 but if you want to take your final project somewhere, 854 00:50:36,700 --> 00:50:39,040 if you want to start up some company around it, 855 00:50:39,040 --> 00:50:41,270 if you want to solve some student group's problem on campus 856 00:50:41,270 --> 00:50:45,210 and actually have a live, active website, these are the sort of subtle bugs that can arise 857 00:50:45,210 --> 00:50:49,480 if you don't quite think through what can happen if 2 people 858 00:50:49,480 --> 00:50:54,190 are trying to access your website at literally the same moment in time, 859 00:50:54,190 --> 00:50:56,890 whereby their queries might otherwise get interwoven. 860 00:50:58,840 --> 00:51:01,420 >> Ready for some JavaScript, a teaser thereof? 861 00:51:01,420 --> 00:51:04,320 This is our last language for the semester. All right. Thankfully, JavaScript looks very, very, very similar to the 2 languages, C and PHP, we've done thus far. There's no JavaScript in pset 7, but it's an incredibly useful tool when it comes to doing web-based final projects or really just web programming more generally. So a quick overview of something called DOM. Here is a super simple web page that really just says hello, world both in the title and in the body. As the indentation has been suggesting for some time, there is indeed a hierarchy to web pages. I could draw this same snippet of HTML as a tree, thinking back to our discussions of data structures in C, as follows. I have some special root node called the document node, and we'll see the analog of this in JavaScript in just a moment. The first child and only child of that in this case is the HTML tag. There's no direct mapping of the doctype. That's a special thing, so we should just ignore it when it comes to this DOM, this Document Object Model tree. Notice that the HTML tag, which I've depicted arbitrarily as a rectangle, has 2 children: head and body. Those are similarly drawn as rectangles. It is meaningful pictorially that head is to the left of body. The implication is that head comes first in the tree. So there's actually an ordering to a tree when you draw it like this, even though the shapes and whatnot are arbitrary. Head meanwhile has a single child called title, and title actually has its own child, which is "hello, world", which I deliberately drew as an oval here to make it slightly different from the rectangle. These rectangles are elements, whereas hello, world is really a text node. So it's a node in the tree, but it's a different type of node so I drew it arbitrarily differently. Similarly does body have a child called hello, world as well, so different node even though they're coincidentally the same text, but I've drawn it using the same shape. So who cares? 895 00:52:52,100 --> 00:52:56,820 Well, what's nice about HTML is that it does have this hierarchical nature. 896 00:52:56,820 --> 00:53:01,010 And what's nice about JavaScript and particularly libraries that are freely available 897 00:53:01,010 --> 00:53:07,120 and popular like jQuery, you can navigate the tree structure so amazingly easy. 898 00:53:07,120 --> 00:53:11,790 Any of the stuff we did in C with pointers and traversing trees and recursing on nodes 899 00:53:11,790 --> 00:53:15,300 left child to right child, all of a sudden we can sort of take for granted 900 00:53:15,300 --> 00:53:19,450 as being amazingly enlightening if not a bit frustrating 901 00:53:19,450 --> 00:53:22,470 but not nearly an efficient way to go about programming. 902 00:53:22,470 --> 00:53:24,470 And so with these higher level languages like JavaScript 903 00:53:24,470 --> 00:53:28,340 we'll be able to navigate this tree much more intuitively. 904 00:53:28,340 --> 00:53:30,430 >> And indeed the syntax is going to be quite familiar. 905 00:53:30,430 --> 00:53:32,950 If you've never seen JavaScript before, this is a really nice reference 906 00:53:32,950 --> 00:53:35,910 from the Mozilla folks, the people who make Firefox, 907 00:53:35,910 --> 00:53:38,370 so do feel free to browse that at your convenience. 908 00:53:38,370 --> 00:53:41,590 What you'll find--and these slides are identical to what we used the other day-- 909 00:53:41,590 --> 00:53:44,030 similarly, main is gone. 910 00:53:44,030 --> 00:53:47,010 So when you write a program in JavaScript, there is no main function. 911 00:53:47,010 --> 00:53:48,690 You just start writing code. 912 00:53:48,690 --> 00:53:51,660 But a key distinction between JavaScript and C and PHP 913 00:53:51,660 --> 00:53:55,890 is that whereas C and PHP thus far have been executed server side 914 00:53:55,890 --> 00:53:59,180 by the appliance in this case or more generally by a server, 915 00:53:59,180 --> 00:54:04,270 JavaScript by design is usually executed by a browser. 916 00:54:04,270 --> 00:54:08,440 In other words, you might write JavaScript code, as we're about to, 917 00:54:08,440 --> 00:54:13,080 on a server in the appliance, but you include it among your HTML, among your CSS, 918 00:54:13,080 --> 00:54:16,100 among your GIFs and your PNGs and your JPEGs 919 00:54:16,100 --> 00:54:19,170 so that when the user visits your web page, if you're using JavaScript, 920 00:54:19,170 --> 00:54:21,770 that JavaScript code comes from server to browser, 921 00:54:21,770 --> 00:54:24,540 and it's the browser that actually executes it. 922 00:54:24,540 --> 00:54:27,960 So this has meaningful implications for even intellectual property. 923 00:54:27,960 --> 00:54:32,600 It's kind of silly to even think about protecting your IP when it comes to JavaScript code 924 00:54:32,600 --> 00:54:37,560 because by nature of the language it gets executed usually browser side. 925 00:54:37,560 --> 00:54:40,360 >> You can obfuscate it, which means you can make it look crazy and ugly 926 00:54:40,360 --> 00:54:45,400 with no whitespace, horrible variable names, to make it harder for people to steal your IP, 927 00:54:45,400 --> 00:54:48,120 but the key is that it is executed browser side. 928 00:54:48,120 --> 00:54:51,790 Even though as an aside JavaScript can be used server side, 929 00:54:51,790 --> 00:54:54,480 the most common use case right now is still on the browser. 930 00:54:54,480 --> 00:54:59,800 And here's what it looks like. Here is an if-else if-else construct just like C, just like PHP. 931 00:54:59,800 --> 00:55:02,420 Here is a Boolean expression when you "or" 2 things together. 932 00:55:02,420 --> 00:55:04,330 Here is when you "and" 2 things together. 933 00:55:04,330 --> 00:55:08,300 Here is a switch statement, which is similar to PHP 934 00:55:08,300 --> 00:55:10,810 in that you can switch on different types of values. 935 00:55:10,810 --> 00:55:15,180 Loops similarly have for loops here, which are structured identically to what we've seen before. 936 00:55:15,180 --> 00:55:18,110 While loops; we've got do while loops. 937 00:55:18,110 --> 00:55:20,290 Variables, ever so slightly different. 938 00:55:20,290 --> 00:55:24,560 You do declare variables like you do in PHP and C, 939 00:55:24,560 --> 00:55:27,860 but similarly is JavaScript weakly typed. 940 00:55:27,860 --> 00:55:32,730 You don't specify int or float or string or anything like that usually. 941 00:55:32,730 --> 00:55:34,240 You can specify var. 942 00:55:34,240 --> 00:55:38,040 You don't have to specify var, but it has implications if you don't. 943 00:55:38,040 --> 00:55:42,000 Usually if you omit var, you accidentally create a global variable instead of local. 944 00:55:42,000 --> 00:55:46,420 So let me propose that you almost always just say var and then the name of the variable. 945 00:55:46,420 --> 00:55:48,740 It's not a type, it's just var for variable. 946 00:55:48,740 --> 00:55:52,930 This would be an example, whether it's 123 or "hello, world". 947 00:55:52,930 --> 00:55:58,910 Arrays are present and syntactically similar to PHP. 948 00:55:58,910 --> 00:56:03,690 I'll say var numbers and then I use square brackets again to declare a variable 949 00:56:03,690 --> 00:56:08,870 whose type is array that has these particular numbers in it separated by commas. 950 00:56:08,870 --> 00:56:11,740 And then lastly, this is the only one that really looks different. 951 00:56:11,740 --> 00:56:16,700 Recall that in PHP we would have implemented an associative array for a student 952 00:56:16,700 --> 00:56:20,220 like Zamyla that might look like this, where the variable is called student. 953 00:56:20,220 --> 00:56:23,370 The square brackets mean here comes an array. 954 00:56:23,370 --> 00:56:28,500 >> The fact that I'm not using numeric indices but strings--id, house, and name-- 955 00:56:28,500 --> 00:56:30,990 means that this is an associative array, 956 00:56:30,990 --> 00:56:34,490 and these arrows with the equals sign and the angled bracket 957 00:56:34,490 --> 00:56:37,310 means that the key is "id", the value is 1; 958 00:56:37,310 --> 00:56:39,310 the key is "house", the value is Winthrop House; 959 00:56:39,310 --> 00:56:41,800 the key is "name", the value is Zamyla Chan. 960 00:56:41,800 --> 00:56:47,110 So there's 3 keys inside of this associative array, each of which has its own value. 961 00:56:47,110 --> 00:56:52,880 We've seen that in pset 7, or you soon will, in JavaScript same idea, 962 00:56:52,880 --> 00:56:55,220 but it's going to look like this. 963 00:56:55,220 --> 00:57:00,070 So var student--no dollar sign and no mention of type still but var-- 964 00:57:00,070 --> 00:57:05,860 equals and then open curly braces because in JavaScript when you have key value pairs, 965 00:57:05,860 --> 00:57:08,900 you actually use something called an object. 966 00:57:08,900 --> 00:57:13,490 And those of you who did take APCS or the like might recall objects from Java 967 00:57:13,490 --> 00:57:15,140 or similar languages. 968 00:57:15,140 --> 00:57:17,880 JavaScript is not Java, first of all. 969 00:57:17,880 --> 00:57:21,600 It was a deliberate design decision years ago to knock off something else that was popular, 970 00:57:21,600 --> 00:57:25,640 its name, even though it has no fundamental relation to Java itself. 971 00:57:25,640 --> 00:57:31,490 JavaScript has objects, and you create them by way of the curly brace notation. 972 00:57:31,490 --> 00:57:36,710 Objects in JavaScript are pretty much equivalent to associative arrays in PHP 973 00:57:36,710 --> 00:57:40,030 when it comes to storing data inside of them. 974 00:57:40,030 --> 00:57:44,100 >> But even more powerfully in JavaScript can you associate very easily functions 975 00:57:44,100 --> 00:57:48,040 inside of an object, and though you can do this in other languages, 976 00:57:48,040 --> 00:57:50,040 it's quite a common paradigm, as we'll see. 977 00:57:50,040 --> 00:57:54,380 In short, this object represents a student, who is particularly Zamyla, 978 00:57:54,380 --> 00:58:00,380 and it's similar conceptually, just syntactically different from this. 979 00:58:00,380 --> 00:58:03,840 Let's actually use JavaScript in a file. 980 00:58:03,840 --> 00:58:05,570 It turns out there's a script tag. 981 00:58:05,570 --> 00:58:08,180 We've seen a style tag and we've seen other HTML tags. 982 00:58:08,180 --> 00:58:11,510 The script tag actually will contain some JavaScript code. 983 00:58:11,510 --> 00:58:15,500 Let me go into the appliance where we have some source code pre-made. 984 00:58:15,500 --> 00:58:18,700 I haven't posted it yet on the website, but I'll do that after class. 985 00:58:18,700 --> 00:58:21,770 Let's open up this one, blink.html. 986 00:58:21,770 --> 00:58:27,560 Back in the 1990s, there was literally an HTML tag called the blink tag, 987 00:58:27,560 --> 00:58:30,340 and this was one of the most wonderfully overused tags on the Internet 988 00:58:30,340 --> 00:58:36,140 whereby you'd visit some 1990s style web page and start seeing text flashing you like this, 989 00:58:36,140 --> 00:58:39,810 the results of the marquis tag, which had text going like this. 990 00:58:39,810 --> 00:58:45,070 One of the few times where the world has actually agreed on a web standard, 991 00:58:45,070 --> 00:58:48,250 everyone across the board killed the blink tag some years ago. 992 00:58:48,250 --> 00:58:52,860 But we can resurrect it with JavaScript as a demonstration of the power you have 993 00:58:52,860 --> 00:58:56,660 when you can write a program inside of a web page. 994 00:58:56,660 --> 00:59:00,240 First let's skip over the new stuff and focus only on the old. 995 00:59:00,240 --> 00:59:01,780 >> Here is the old stuff in this example. 996 00:59:01,780 --> 00:59:06,350 I have an HTML tag, a head tag, and a title tag. 997 00:59:06,350 --> 00:59:11,210 Then I have a body tag here with a div, which recall is just a rectangular division of the page 998 00:59:11,210 --> 00:59:14,720 that I've given a unique ID arbitrarily of "greeting" to, 999 00:59:14,720 --> 00:59:18,320 just so I have a way of uniquely referring to it, that has some very simple text: 1000 00:59:18,320 --> 00:59:20,220 hello, world. 1001 00:59:20,220 --> 00:59:23,940 Now let me scroll up to the top of this file and see what's new. 1002 00:59:23,940 --> 00:59:27,710 The first thing that's new up top is the script tag, 1003 00:59:27,710 --> 00:59:31,280 and inside of the script tag notice I've declared a function. 1004 00:59:31,280 --> 00:59:34,610 To declare a function in JavaScript, pretty similar to PHP, 1005 00:59:34,610 --> 00:59:37,930 you literally write function then the name of the function, parentheses, 1006 00:59:37,930 --> 00:59:40,400 and maybe some arguments if it takes any. 1007 00:59:40,400 --> 00:59:43,510 Then I've got my curly brace as usual, and now we have some slightly new code, 1008 00:59:43,510 --> 00:59:45,230 but let's see what this means. 1009 00:59:45,230 --> 00:59:48,670 So var div, this just means give me a variable called div. 1010 00:59:48,670 --> 00:59:50,530 I could have called it foo, but I wanted it to be called div 1011 00:59:50,530 --> 00:59:52,620 for reasons that will be clear in a second. 1012 00:59:52,620 --> 00:59:57,480 Then it turns out in JavaScript--and this is JavaScript code embedded in my web page-- 1013 00:59:57,480 --> 01:00:01,760 there is a special global variable of sorts called document. 1014 01:00:01,760 --> 01:00:04,780 JavaScript is in fact an object-oriented language. 1015 01:00:04,780 --> 01:00:07,230 We won't go into detail in 50 as to what that means, 1016 01:00:07,230 --> 01:00:11,180 but for now know that an object is pretty much like a struct. 1017 01:00:11,180 --> 01:00:14,740 Like we saw way back when in one of the earliest problem sets 1018 01:00:14,740 --> 01:00:17,150 where we put a lot of information in a struct, 1019 01:00:17,150 --> 01:00:21,330 similarly is document a special struct that comes with the browser, 1020 01:00:21,330 --> 01:00:24,810 comes with any web page. It's not something I created. 1021 01:00:24,810 --> 01:00:28,210 Inside of this document structure, though, you have not only data 1022 01:00:28,210 --> 01:00:30,010 but you also have functions. 1023 01:00:30,010 --> 01:00:34,090 >> And any time you have a function inside of a structure, inside of an object, 1024 01:00:34,090 --> 01:00:36,490 it's called a method. But it's the same thing. 1025 01:00:36,490 --> 01:00:40,110 A method is a function that just so happens to be inside of something else. 1026 01:00:40,110 --> 01:00:42,990 So this means that this special global variable called document 1027 01:00:42,990 --> 01:00:47,690 has a function called getElementById that literally does that. 1028 01:00:47,690 --> 01:00:52,460 It will get you an element from the DOM, Document Object Model tree, 1029 01:00:52,460 --> 01:00:55,520 whose ID is in this case greeting. 1030 01:00:55,520 --> 01:00:59,200 In other words, all that time we spent on data structures comes into play here. 1031 01:00:59,200 --> 01:01:01,400 This picture of a DOM that we had a moment ago, 1032 01:01:01,400 --> 01:01:06,100 even though the page is a little different, if I had a div in this picture, 1033 01:01:06,100 --> 01:01:11,180 what document.getElementById would return to me would effectively be a pointer 1034 01:01:11,180 --> 01:01:15,440 to the rectangle in the tree, a reference to the rectangle in the tree. 1035 01:01:15,440 --> 01:01:18,410 So that's what it means to actually call one of those functions. 1036 01:01:18,410 --> 01:01:21,960 In this case again it's a div. It's not a body or a title. 1037 01:01:21,960 --> 01:01:26,480 So let's see what I then do with this div now that I have it inside of this variable called div. 1038 01:01:26,480 --> 01:01:32,580 It turns out with JavaScript you have the ability to tweak the CSS of your page dynamically. 1039 01:01:32,580 --> 01:01:39,060 Up until now, all of the CSS we've done, albeit limited, is in style attributes, 1040 01:01:39,060 --> 01:01:41,730 or where else have we put CSS? 1041 01:01:42,730 --> 01:01:45,810 I kind of spoiled that one. In the style tag at the top of the file. 1042 01:01:45,810 --> 01:01:49,180 Or third place has been in? 1043 01:01:50,710 --> 01:01:54,590 >> An external file, something .css. 1044 01:01:54,590 --> 01:01:56,730 So those are the 3 places we've done CSS thus far, 1045 01:01:56,730 --> 01:01:59,310 but the catch is we've hard coded it all. 1046 01:01:59,310 --> 01:02:04,060 You decided as you dove into pset 7, we decided before lecture what our CSS would be. 1047 01:02:04,060 --> 01:02:07,380 But if you want to change your CSS, you can actually do that 1048 01:02:07,380 --> 01:02:09,370 once you have an actual programming language. 1049 01:02:09,370 --> 01:02:13,910 CSS, HTML--not programming languages. JavaScript is. 1050 01:02:13,910 --> 01:02:18,200 So it turns out that as soon as you have one of those rectangles from the tree 1051 01:02:18,200 --> 01:02:23,050 called the DOM, it has itself some data inside of it. 1052 01:02:23,050 --> 01:02:27,820 So the div that I just grabbed from the tree has what we'll call a property inside of it 1053 01:02:27,820 --> 01:02:34,390 called style, and the style property has itself a property called visibility. 1054 01:02:34,390 --> 01:02:37,330 I would know this only by looking up a CSS user's manual. 1055 01:02:37,330 --> 01:02:41,160 It turns out there's a visibility CSS property that does what it says. 1056 01:02:41,160 --> 01:02:44,530 It makes something visible or not, visible or not. 1057 01:02:44,530 --> 01:02:46,810 And how you do that is this. 1058 01:02:46,810 --> 01:02:50,510 I'm asking programmatically if the visibility of this div is hidden, 1059 01:02:50,510 --> 01:02:53,390 what do I change it to? Visible. 1060 01:02:53,390 --> 01:02:58,840 Else if the visibility of this page is not hidden, logically I do make it hidden. 1061 01:02:58,840 --> 01:03:04,070 I have no idea why it's visible and hidden and not visible and invisible. 1062 01:03:04,070 --> 01:03:06,000 This was a poor design decision along the way. 1063 01:03:06,000 --> 01:03:09,530 But those are indeed opposites in CSS: visible and hidden. 1064 01:03:09,530 --> 01:03:15,520 All this does is it means change the CSS of my file on and off, on and off 1065 01:03:15,520 --> 01:03:16,870 for that particular div. 1066 01:03:16,870 --> 01:03:20,630 But again, this is a function called blink. When is the blink function called? 1067 01:03:20,630 --> 01:03:24,080 It turns out that there's another special global variable called window, 1068 01:03:24,080 --> 01:03:28,220 similar in spirit to document, but whereas the document refers to your web page, 1069 01:03:28,220 --> 01:03:31,700 like the DOM tree, the HTML you sent from the server, 1070 01:03:31,700 --> 01:03:35,250 window refers to the chrome around it, the address bar, the title bar, 1071 01:03:35,250 --> 01:03:37,880 and all of that stuff around your web page. 1072 01:03:37,880 --> 01:03:42,800 >> And it turns out that the window object has a special function inside of it called setInterval 1073 01:03:42,800 --> 01:03:44,360 that does what it says. 1074 01:03:44,360 --> 01:03:48,600 It will set an interval--in this case every 500 milliseconds-- 1075 01:03:48,600 --> 01:03:52,270 and, take a guess, what's it going to do every 500 milliseconds? 1076 01:03:52,270 --> 01:03:55,240 It's going to execute that function blink. 1077 01:03:55,240 --> 01:03:58,560 And what's nice here is that we could have done this in C even though we never did. 1078 01:03:58,560 --> 01:04:01,580 C does have something called function pointers where you can pass functions around 1079 01:04:01,580 --> 01:04:03,140 as arguments. 1080 01:04:03,140 --> 01:04:07,620 Similarly in JavaScript can you pass the name of a function into another function. 1081 01:04:07,620 --> 01:04:10,630 And notice what I'm doing. I'm not doing this. 1082 01:04:10,630 --> 01:04:14,380 If I put parentheses after the blink, that would mean call the blink function. 1083 01:04:14,380 --> 01:04:17,430 If I omit them, that means here is the blink function 1084 01:04:17,430 --> 01:04:21,330 so that setInterval can call it every 500 milliseconds. 1085 01:04:21,330 --> 01:04:28,200 So the end result, atrocious though it is, is that if I go into localhost and go to blink.html, 1086 01:04:28,200 --> 01:04:32,120 I now have this happening again and again. 1087 01:04:32,120 --> 01:04:34,950 And if I actually Inspect Element, let's see if we can see this. 1088 01:04:34,950 --> 01:04:38,550 Let me Inspect Element, let me scroll down just a little bit, 1089 01:04:38,550 --> 01:04:44,320 let me choose Elements over here, and notice the DOM inside of Chrome's inspector. 1090 01:04:44,320 --> 01:04:48,840 It's literally changing back and forth every 500 milliseconds. 1091 01:04:48,840 --> 01:04:55,660 If we go to our friend Nate, 1092 01:04:55,660 --> 01:05:00,020 if you ever wondered how this is working, similar idea with an interval, 1093 01:05:00,020 --> 01:05:04,810 but Nate is actually making very effective use of color in this particular case here. 1094 01:05:04,810 --> 01:05:07,350 So what more can we actually do with this? 1095 01:05:07,350 --> 01:05:09,990 Let's open up another example and try something 1096 01:05:09,990 --> 01:05:12,940 that's programmatically even more useful than making things blink. 1097 01:05:12,940 --> 01:05:17,990 Let me go into our forms directory today and go into form0. 1098 01:05:17,990 --> 01:05:20,820 This was the ugliest possible form that I could come up with, 1099 01:05:20,820 --> 01:05:23,290 and let me just show you what it looks like in a browser. 1100 01:05:23,290 --> 01:05:28,960 >> Let me go into localhost/forms, and this is form0. 1101 01:05:28,960 --> 01:05:33,400 This is a super ugly HTML form that has a few fields for email, for password, 1102 01:05:33,400 --> 01:05:37,190 password, and then a little checkbox to agree to some terms and conditions. 1103 01:05:37,190 --> 01:05:41,350 The catch is if I visit this form and I don't want to give you my email address, 1104 01:05:41,350 --> 01:05:44,730 I don't want to agree to the terms and conditions maybe, I can click Register 1105 01:05:44,730 --> 01:05:46,920 and it lets me through anyway. 1106 01:05:46,920 --> 01:05:50,800 This happens to submit to a stupid PHP file called dump.php. 1107 01:05:50,800 --> 01:05:58,420 All it does is print out the contents of $_GET just for diagnostic purposes. 1108 01:05:58,420 --> 01:06:01,580 That was what was submitted by the user just now. 1109 01:06:01,580 --> 01:06:05,010 But suppose we actually want to validate the user's form submission. 1110 01:06:05,010 --> 01:06:06,530 Let me go into version 1. 1111 01:06:06,530 --> 01:06:11,420 This is form1.html. It looks aesthetically just as bad, but notice how fancy it is. 1112 01:06:11,420 --> 01:06:15,450 If I click Register without cooperating, I get yelled at. 1113 01:06:15,450 --> 01:06:17,320 "You must provide your email address." 1114 01:06:17,320 --> 01:06:21,670 All right. So let me try that. So malan@harvard.edu. I don't need a password. 1115 01:06:21,670 --> 01:06:25,100 Register. "You must provide a password." All right. 1116 01:06:25,100 --> 01:06:28,470 So I will provide a password of crimson. Register. 1117 01:06:28,470 --> 01:06:32,300 "Passwords do not match." I have to now type in crimson here. 1118 01:06:32,300 --> 01:06:35,710 I accidentally checked that. Register. 1119 01:06:35,710 --> 01:06:39,860 "You must agree to the terms and conditions." All right. Agree there. Register. 1120 01:06:39,860 --> 01:06:43,700 And now it shows me the diagnostic output over there. 1121 01:06:43,700 --> 01:06:45,630 >> So what just happened? 1122 01:06:45,630 --> 01:06:48,330 We've had this ability to validate form submissions. 1123 01:06:48,330 --> 01:06:51,420 In fact, if you did dive into pset 7, there's an apologize function 1124 01:06:51,420 --> 01:06:54,620 that makes it pretty easy to yell at the user with a message on the screen. 1125 01:06:54,620 --> 01:06:57,580 I'm using a slightly different mechanism, the alert function, 1126 01:06:57,580 --> 01:07:03,690 which is not a function that's smiled upon since it makes very ugly user messages. 1127 01:07:03,690 --> 01:07:05,710 But let's see what I'm doing here. 1128 01:07:05,710 --> 01:07:09,620 This is form1.html, and notice that I have some pretty familiar syntax: 1129 01:07:09,620 --> 01:07:12,920 body tag, form tag, action attribute, method attribute. 1130 01:07:12,920 --> 01:07:17,050 But notice I've given my form a unique ID for convenience. 1131 01:07:17,050 --> 01:07:19,190 Then I've got an email field whose type is text, 1132 01:07:19,190 --> 01:07:23,780 a password field whose type is password, confirmation field whose type is password, 1133 01:07:23,780 --> 01:07:28,070 and then a checkbox whose name is agreement over here, type is checkbox. 1134 01:07:28,070 --> 01:07:30,380 And then I've got a submit button. 1135 01:07:30,380 --> 01:07:33,050 But notice at the top what more I have. 1136 01:07:33,050 --> 01:07:35,810 First of all, there's another use of the script tag. 1137 01:07:35,810 --> 01:07:40,520 If you have some JavaScript code in another file, just like with CSS you can include it. 1138 01:07:40,520 --> 01:07:44,530 And you do that with script source, and then notice I'm connecting apparently 1139 01:07:44,530 --> 01:07:50,349 to googleapis.com to a very long path but whose file name ends in jquery.min 1140 01:07:50,349 --> 01:07:52,420 for minimum .js. 1141 01:07:52,420 --> 01:07:55,969 jQuery is a super popular library for JavaScript that just makes JavaScript 1142 01:07:55,969 --> 01:07:58,230 all the more user-friendly to use. 1143 01:07:58,230 --> 01:08:00,610 It's effectively become a de facto standard. 1144 01:08:00,610 --> 01:08:04,090 So even though what you're about to see is not pure JavaScript per se, 1145 01:08:04,090 --> 01:08:09,340 it is a library on top of JavaScript much like the CS50 library is a layer 1146 01:08:09,340 --> 01:08:13,670 on top of low-level C code; the reality is almost everyone on the Internet uses it. 1147 01:08:13,670 --> 01:08:18,030 So these are not training wheels. This is just best practice these days. 1148 01:08:18,030 --> 01:08:22,830 Now notice below that is my own script tag, and notice what I've done here. 1149 01:08:22,830 --> 01:08:27,450 It turns out that jQuery does something a little fancy. 1150 01:08:27,450 --> 01:08:29,660 JavaScript has dollar signs, but they are meaningless. 1151 01:08:29,660 --> 01:08:32,870 >> They are like the letter A or B or C. 1152 01:08:32,870 --> 01:08:36,670 jQuery has simply adopted the convention or sort of laid claim to the fact 1153 01:08:36,670 --> 01:08:40,280 that $ will be their special symbol. 1154 01:08:40,280 --> 01:08:44,950 So as soon as you load this global JavaScript file up here with the script tag, 1155 01:08:44,950 --> 01:08:49,080 you have access to a special global variable that's called $. 1156 01:08:49,080 --> 01:08:53,009 It's more properly called jQuery, but that doesn't look nearly as sexy as $. 1157 01:08:53,009 --> 01:08:56,250 But $ has no special meaning. In PHP it had special meaning. 1158 01:08:56,250 --> 01:08:58,440 You had to have it in front of a variable. 1159 01:08:58,440 --> 01:09:01,670 This is just a sexy thing that they took on. 1160 01:09:01,670 --> 01:09:03,389 What is going on here? 1161 01:09:03,389 --> 01:09:08,830 Notice I'm passing to the jQuery function my global variable document 1162 01:09:08,830 --> 01:09:10,860 and then I'm calling .ready. 1163 01:09:10,860 --> 01:09:15,480 What jQuery essentially does is it allows you to take some vanilla JavaScript things 1164 01:09:15,480 --> 01:09:17,889 like the document object, the window object, 1165 01:09:17,889 --> 01:09:20,790 and if you pass it in to the jQuery function-- 1166 01:09:20,790 --> 01:09:24,429 and again, to be clear, this is a function called jQuery-- 1167 01:09:24,429 --> 01:09:28,240 what it does is it returns to you a special version of document 1168 01:09:28,240 --> 01:09:30,700 that has more functionality associated with it. 1169 01:09:30,700 --> 01:09:34,760 So in raw JavaScript there is no ready function, 1170 01:09:34,760 --> 01:09:37,810 but if you pass document to the jQuery function first, 1171 01:09:37,810 --> 01:09:40,960 it returns to you a special version of the document object 1172 01:09:40,960 --> 01:09:43,030 that has more fancy features. 1173 01:09:43,030 --> 01:09:48,230 And that's why people like it. It just makes things easier to do, as we're about to see. 1174 01:09:48,230 --> 01:09:49,820 So what does this line of code mean? 1175 01:09:49,820 --> 01:09:52,690 This line of code here means when the document is ready-- 1176 01:09:52,690 --> 01:09:56,830 in other words, once the browser is done reading this file top to bottom-- 1177 01:09:56,830 --> 01:09:59,200 go ahead and execute the following function. 1178 01:09:59,200 --> 01:10:03,540 What's really interesting in JavaScript--and PHP has this as well-- 1179 01:10:03,540 --> 01:10:05,450 is anonymous functions. 1180 01:10:05,450 --> 01:10:10,560 In JavaScript you can declare functions that have no name but they do have a body. 1181 01:10:10,560 --> 01:10:12,570 Notice what's happening here. 1182 01:10:12,570 --> 01:10:16,220 >> This is a function called ready, and it just means do the following 1183 01:10:16,220 --> 01:10:20,220 when the whole web page is ready, when it's all been read in from the server. 1184 01:10:20,220 --> 01:10:23,090 What do you want to do? I want to execute a chunk of code. 1185 01:10:23,090 --> 01:10:27,120 Notice that we don't want to execute this code right away. 1186 01:10:27,120 --> 01:10:34,350 If I omitted this, this would mean immediately start executing these lines of code. 1187 01:10:34,350 --> 01:10:39,040 But the fact that I'm saying no, no, no, wrap this in an anonymous function like this 1188 01:10:39,040 --> 01:10:43,000 means don't execute it yet; call it eventually. 1189 01:10:43,000 --> 01:10:45,430 We saw this a moment ago in our previous form example. 1190 01:10:45,430 --> 01:10:49,990 What function did we call eventually, 500 milliseconds later? Blink. 1191 01:10:49,990 --> 01:10:51,480 So the same idea. 1192 01:10:51,480 --> 01:10:53,950 Again, even if this looks a little weird, just take for now on faith 1193 01:10:53,950 --> 01:10:57,060 that to declare an anonymous function that's called eventually, 1194 01:10:57,060 --> 01:11:01,720 you simply write function() { 1195 01:11:01,720 --> 01:11:05,380 So what code are we going to execute eventually? The following. 1196 01:11:05,380 --> 01:11:10,460 This too looks a little new, but this means here's the jQuery function, 1197 01:11:10,460 --> 01:11:13,430 and this now is a shortcut. 1198 01:11:13,430 --> 01:11:18,830 This snippet of HTML at the bottom of the screen of course has some tree representation. 1199 01:11:18,830 --> 01:11:21,730 It's not this. This page is more interesting than this hello, world example. 1200 01:11:21,730 --> 01:11:25,210 But there's some tree that corresponds to this HTML. 1201 01:11:25,210 --> 01:11:28,910 It would be a pain in the neck to have to implement some kind of recursive function 1202 01:11:28,910 --> 01:11:34,380 to start at the root node and then find the node whose ID is registration. 1203 01:11:34,380 --> 01:11:38,340 So what jQuery makes super easy for us is literally this. 1204 01:11:38,340 --> 01:11:43,000 Go ahead and get me whatever div or whatever form, whatever HTML element 1205 01:11:43,000 --> 01:11:45,820 has an ID of registration. 1206 01:11:45,820 --> 01:11:52,440 This is equivalent to document.getElementById('registration'). 1207 01:11:52,440 --> 01:11:54,170 >> Why do people like jQuery? 1208 01:11:54,170 --> 01:12:00,110 Because it's shorter to type. But that's all it is. It's the same idea. 1209 01:12:00,110 --> 01:12:02,630 Get me the tag whose ID is registration. 1210 01:12:02,630 --> 01:12:06,300 And when that tag, which happens to be a form, is submitted, 1211 01:12:06,300 --> 01:12:08,300 go ahead and execute this code. 1212 01:12:08,300 --> 01:12:11,320 So let's take one look now at how we're doing form validation. 1213 01:12:11,320 --> 01:12:15,950 The syntax is admittedly cryptic at first, but what's going on? 1214 01:12:15,950 --> 01:12:21,050 If this line of code is true, I'm going to yell at the user to provide his or her email address. 1215 01:12:21,050 --> 01:12:22,970 So what is this line of code? 1216 01:12:22,970 --> 01:12:25,560 $ means jQuery. Now notice this. 1217 01:12:25,560 --> 01:12:27,920 This is kind of like CSS. 1218 01:12:27,920 --> 01:12:33,370 If you've dived into CSS yet, you'll know that this means the element whose ID is registration. 1219 01:12:33,370 --> 01:12:39,840 The space means find a child or a descendant of registration whose name is input. 1220 01:12:39,840 --> 01:12:42,970 And then this thing in square brackets is a little filter. 1221 01:12:42,970 --> 01:12:47,010 And even if this looks cryptic, this just means go to the form whose ID is registration, 1222 01:12:47,010 --> 01:12:51,230 go to the input element inside of that whose name is email, 1223 01:12:51,230 --> 01:12:55,440 and then get its value, whatever its value happens to be-- 1224 01:12:55,440 --> 01:12:59,670 asdf if that's all I typed or malan@harvard.edu if that's what I typed. 1225 01:12:59,670 --> 01:13:05,250 So if the value of the form's email field == nothing, yell at the user. 1226 01:13:05,250 --> 01:13:09,700 Else if the value of the password field == nothing, yell at the user. 1227 01:13:09,700 --> 01:13:19,520 >> Else if the value of the password field does not equal the value of the confirmation field, 1228 01:13:19,520 --> 01:13:22,850 which was the other form element, yell at the user. 1229 01:13:22,850 --> 01:13:25,680 And then lastly--and this one too has some new syntax of its own, 1230 01:13:25,680 --> 01:13:29,270 but once you've seen it, it's at least a little more reasonable-- 1231 01:13:29,270 --> 01:13:34,060 else if the form whose ID is registration has an input element whose name is agreement 1232 01:13:34,060 --> 01:13:39,720 and it is checked, go ahead and yell at the user. 1233 01:13:39,720 --> 01:13:42,520 So I totally admit this is completely overwhelming at first glance. 1234 01:13:42,520 --> 01:13:46,530 It's a lot of new syntax. But all of jQuery follows these kinds of patterns. And honestly, I didn't even know this existed until a few minutes ago. I Googled, "How do you check if a checkbox is checked in jQuery?" and this is the syntax, because there's different ways of doing it with actual raw JavaScript code. So as the very first page of Problem Set 7 emphasizes, pset 7 is very much an exercise in bootstrapping yourself where we've provided, hopefully, a conceptual framework with which to tackle the pset. But as is often the case with web design, it's up to you really to poke around, incorporate snippets of code and examples from the Web so long as you cite them per the terms on that first sheet, and realize that learning HTML, CSS, JavaScript and even SQL is really meant to be this at-home exercise as we begin to take these training wheels off. And realize too there's so many more things you can do with a browser. Inside of most of these elements there are other things called event handlers. And even though we just looked at ones called onsubmit and onready, you can do things like onkeydown, onkeyup, like when the user touches a key, you can listen for that and key up. Gmail has keyboard shortcuts. How does Google implement keyboard shortcuts like C for compose? They listen for events, as they're called, like onkeypress or onkeyup and onkeydown. If you've ever hovered your mouse over some menu option and all of a sudden, voila, a menu appears or the graphic changes color, how are they doing that? Rather than listen for onready or onsubmit, you listen for onmouseover or onmouseout. So in short, with these very simple basics that we've begun to scratch the surface of today and we'll dive in further to on Wednesday, you have, increasingly, power to implement the kinds of things that you're already familiar with. So let's end there, and we'll continue this on Wednesday.