1 00:00:00,000 --> 00:00:02,570 [Week 9] 2 00:00:02,570 --> 00:00:04,740 [David J. Malan - Harvard University] 3 00:00:04,740 --> 00:00:07,170 [This is CS50. - CS50.TV] 4 00:00:07,170 --> 00:00:12,350 All right. Welcome back. This is CS50, and this is the start of week 9. 5 00:00:12,350 --> 00:00:16,600 Today we focus in particular on design, no longer in the context of C 6 00:00:16,600 --> 00:00:20,010 but in the context of PHP and a bit of SQL and a bit of JavaScript, 7 00:00:20,010 --> 00:00:23,730 particularly toward an end of both pset 7 and also your final project. 8 00:00:23,730 --> 00:00:26,310 In fact, if you are at that point in your final project 9 00:00:26,310 --> 00:00:30,100 where presumably as of an hour or so ago you at least started to give some thought 10 00:00:30,100 --> 00:00:33,730 to your final project and you're thinking you'd like to collaborate with 1 or 2 classmates, 11 00:00:33,730 --> 00:00:36,150 if you're having trouble connecting with said classmates, 12 00:00:36,150 --> 00:00:40,570 feel free to fill out the form at cs50.net/partners/form. 13 00:00:40,570 --> 00:00:42,880 It just asks you who you are, what kind of project you're thinking about, 14 00:00:42,880 --> 00:00:44,870 where you live just for logistical reasons. 15 00:00:44,870 --> 00:00:49,510 And then if you want to keep an eye on over the next week or so the spreadsheet URL there, 16 00:00:49,510 --> 00:00:53,520 you can then see a read-only version of the Google doc 17 00:00:53,520 --> 00:00:56,010 in which we're collecting that information. 18 00:00:56,010 --> 00:00:58,930 So if you want to work with someone, by all means feel free to reach out to people 19 00:00:58,930 --> 00:01:00,480 via that mechanism. 20 00:01:00,480 --> 00:01:02,690 But the majority of folks do work solo. That's totally fine. 21 00:01:02,690 --> 00:01:06,120 So don't feel that this is in any way obligatory. 22 00:01:06,120 --> 00:01:09,680 On Friday it was just me and a few of the team in here, 23 00:01:09,680 --> 00:01:11,100 empty theater for the most part. 24 00:01:11,100 --> 00:01:14,600 There were 3 tourists sitting up there, so that was a little awkward. 25 00:01:14,600 --> 00:01:18,970 What we talked about was databases and we talked about pset 7 a little bit. 26 00:01:18,970 --> 00:01:22,200 And if you didn't happen to catch that on video just yet, that's fine. 27 00:01:22,200 --> 00:01:26,770 I'll try to define any terms that we would otherwise take for granted 28 00:01:26,770 --> 00:01:28,840 based on Friday's lecture. 29 00:01:28,840 --> 00:01:32,550 >> But today we're going to try to get you to the point 30 00:01:32,550 --> 00:01:34,990 of not just being able to do something like pset 7 31 00:01:34,990 --> 00:01:37,360 but really understanding what's going on underneath the hood, 32 00:01:37,360 --> 00:01:41,910 particularly some of the abstractions that we put in place in the functions.php file 33 00:01:41,910 --> 00:01:45,780 to make your lives a bit easier but so that you ultimately understand 34 00:01:45,780 --> 00:01:48,760 so that when the training wheels come off in a few weeks you can still survive 35 00:01:48,760 --> 00:01:53,750 in the real world and do this stuff without any CS50 framework underneath you. 36 00:01:53,750 --> 00:01:57,500 This $_SESSION, for those of you who are familiar 37 00:01:57,500 --> 00:02:01,960 or who already caught the video on Friday, what does SESSION let us do 38 00:02:01,960 --> 00:02:04,330 in a PHP-based web application? 39 00:02:04,330 --> 00:02:09,650 This is a superglobal variable, which means it's similar in spirit to GET and POST 40 00:02:09,650 --> 00:02:13,970 and a few others, but what is this thing useful for? 41 00:02:13,970 --> 00:02:18,320 >> What is SESSION used for? Yeah. [student] Logging in. 42 00:02:18,320 --> 00:02:21,040 Sorry? [student] Logging in. Logging in. Indeed. 43 00:02:21,040 --> 00:02:25,100 In pset 7 we're using this SESSION superglobal to facilitate logging in. 44 00:02:25,100 --> 00:02:28,600 And what's nice about this superglobal is that it's an associative array. 45 00:02:28,600 --> 00:02:33,190 An associative array, recall, is just an array but whose indices no longer have to be numbers 46 00:02:33,190 --> 00:02:37,670 like 012. They can be numbers or they can be even strings. 47 00:02:37,670 --> 00:02:44,890 And so if you've dived into pset 7 yet, you may recall that we are storing a key called ID 48 00:02:44,890 --> 00:02:50,330 inside of this associative array whose value is something like 123-- 49 00:02:50,330 --> 00:02:53,780 whatever the currently logged in user's ID is. 50 00:02:53,780 --> 00:02:59,470 The motivation for this is that even after the user has visited localhost 51 00:02:59,470 --> 00:03:02,720 or my website more generally and then they've logged in, 52 00:03:02,720 --> 00:03:07,320 even if they don't click a link or return to my website for 5 minutes 53 00:03:07,320 --> 00:03:10,730 or even an hour or even a day but they leave their browser window open, 54 00:03:10,730 --> 00:03:14,370 via this superglobal can I remember that they are logged in. 55 00:03:14,370 --> 00:03:21,140 >> In other words, it allows me to store slightly long term anything I want about a user. 56 00:03:21,140 --> 00:03:24,390 And you can think of it really as the incarnation of a shopping cart. 57 00:03:24,390 --> 00:03:27,740 Places like Amazon obviously let you put things into a shopping cart, 58 00:03:27,740 --> 00:03:32,230 but HTTP, the protocol that powers the Web, is stateless 59 00:03:32,230 --> 00:03:34,230 in the sense that when you visit a website, 60 00:03:34,230 --> 00:03:37,290 for the most part you don't have some constant network connection 61 00:03:37,290 --> 00:03:39,270 between your browser and the server. 62 00:03:39,270 --> 00:03:42,190 As soon as you've downloaded the HTML and the JPEGs and the GIFs and all that, 63 00:03:42,190 --> 00:03:48,200 the connection goes away and you just have a copy of the HTML and whatnot from the server. 64 00:03:48,200 --> 00:03:53,000 But if the server wants to remember something about you, 65 00:03:53,000 --> 00:03:57,580 the burden is on the server to actually record that information. 66 00:03:57,580 --> 00:04:00,130 And so you the programmer who have control over the server 67 00:04:00,130 --> 00:04:04,400 can put most anything you want inside of this superglobal associative array 68 00:04:04,400 --> 00:04:06,850 and it will be there the next time the user comes back, 69 00:04:06,850 --> 00:04:12,070 whether it's minutes or even days later, unless they close their browser window, 70 00:04:12,070 --> 00:04:14,360 at which point SESSION disappears. 71 00:04:14,360 --> 00:04:17,779 So it's ephemeral storage, it's non-persistent, and it's meant to go away 72 00:04:17,779 --> 00:04:22,360 as soon as the user closes their browser--not just that tab, often the entire browser, 73 00:04:22,360 --> 00:04:24,930 thereby effectively logging the user out. 74 00:04:24,930 --> 00:04:28,000 So how is this thing actually implemented? 75 00:04:28,000 --> 00:04:31,360 Let's take a quick look at a simple example we looked at on Friday. 76 00:04:31,360 --> 00:04:33,340 For those unfamiliar, it was as simple as this. 77 00:04:33,340 --> 00:04:35,910 This is a web page whose sole purpose in life is to tell me 78 00:04:35,910 --> 00:04:38,000 how many times I have visited this page. 79 00:04:38,000 --> 00:04:41,670 This is the first time here on Monday that I visited it, so it says 0 times. 80 00:04:41,670 --> 00:04:46,940 >> But if I start reloading this page, it says 1 time, 2, 3, 4, 5, 81 00:04:46,940 --> 00:04:49,800 and this will eventually just keep on counting up, up, up, up, up 82 00:04:49,800 --> 00:04:53,130 for each time I actually click Reload on it. 83 00:04:53,130 --> 00:04:58,830 So how is this working? Let me go inside of this file called counter.php. 84 00:04:58,830 --> 00:05:02,490 The top part of it is all blue comments, but the interesting part is here. 85 00:05:02,490 --> 00:05:06,670 On line 13 we call this function session_start, 86 00:05:06,670 --> 00:05:09,600 and that is literally all you need to do if you want to have access 87 00:05:09,600 --> 00:05:13,610 to this special superglobal called $_SESSION. 88 00:05:13,610 --> 00:05:17,430 That makes it all possible, and we'll see in a moment how that's all possible. 89 00:05:17,430 --> 00:05:20,350 In line 16 notice what I'm doing. 90 00:05:20,350 --> 00:05:25,960 If the key, called counter--in other words, the index value--"counter" 91 00:05:25,960 --> 00:05:32,310 exists inside of this array called SESSION, then what am I doing with it in the line below? 92 00:05:32,310 --> 00:05:36,650 What is line 18 doing? 93 00:05:36,650 --> 00:05:40,360 >> [inaudible student response] What's that? [student] Storing the value. Good. 94 00:05:40,360 --> 00:05:45,800 It's storing the value that's in SESSION right now in a new local temporary variable, 95 00:05:45,800 --> 00:05:48,250 $counter in all lowercase. 96 00:05:48,250 --> 00:05:50,770 Notice that PHP is already being a little lazy here. 97 00:05:50,770 --> 00:05:55,550 Notice we don't have any mention of int or float or string or anything like that 98 00:05:55,550 --> 00:06:00,480 because PHP is weakly typed, whereby you don't have to specify the type of a variable, 99 00:06:00,480 --> 00:06:03,310 and in this case here I've not even declared it yet. 100 00:06:03,310 --> 00:06:08,980 I'm declaring it inside of these curly braces and unlike C, this is actually okay. 101 00:06:08,980 --> 00:06:13,800 No matter how deeply nested a variable's declaration is in PHP-- 102 00:06:13,800 --> 00:06:16,650 inside of curly brace, inside of curly brace and the like-- 103 00:06:16,650 --> 00:06:21,230 it will at that moment in time exist for the remainder of the program, 104 00:06:21,230 --> 00:06:22,680 for better or for worse. 105 00:06:22,680 --> 00:06:26,930 So it immediately becomes global as soon as you define it as we're doing here. 106 00:06:26,930 --> 00:06:31,620 >> Otherwise, if I do not find that there's anything in the SESSION superglobal, 107 00:06:31,620 --> 00:06:34,680 I'm apparently initializing this variable counter to 0, 108 00:06:34,680 --> 00:06:37,580 thereby just assuming the user has never been here before. 109 00:06:37,580 --> 00:06:40,030 And then this of course is incrementing the counter how? 110 00:06:40,030 --> 00:06:44,480 I'm updating the value that's inside of this associative array 111 00:06:44,480 --> 00:06:49,530 by setting it equal to whatever counter currently is + 1. 112 00:06:49,530 --> 00:06:53,520 If I scroll down here to the HTML of the page, it's actually pretty simple. 113 00:06:53,520 --> 00:06:58,920 All I have in the body of this page is, "You have visited this site so-and-so times." 114 00:06:58,920 --> 00:07:00,350 And this is a PHP construct. 115 00:07:00,350 --> 00:07:06,080 If you do 00:07:12,600 It's really equivalent to something like printf, which we've seen many times in C, 117 00:07:12,600 --> 00:07:15,940 although as you may know already from the spec in pset 7, 118 00:07:15,940 --> 00:07:20,160 print is also a function that just prints something out, it doesn't actually use format codes, 119 00:07:20,160 --> 00:07:23,270 and you can actually say echo as well. 120 00:07:23,270 --> 00:07:27,460 They're all ever so slightly different even though the net effect is ultimately the same. 121 00:07:27,460 --> 00:07:31,270 So this use of the equals sign is just sort of an elegant way of doing it 122 00:07:31,270 --> 00:07:34,910 more succinctly than you might otherwise be able to. 123 00:07:34,910 --> 00:07:38,370 So that's all this site does. It prints out the value of counter. 124 00:07:38,370 --> 00:07:40,550 How is this all actually happening? 125 00:07:40,550 --> 00:07:43,250 You may recall a week or so ago we started looking underneath the hood 126 00:07:43,250 --> 00:07:47,910 at how a web page works by using this Inspector tab. 127 00:07:47,910 --> 00:07:51,900 >> Chrome has this both in the Mac version, the Windows version, and even the Linux version, 128 00:07:51,900 --> 00:07:59,510 and Firefox and IE have similar mechanisms whereby you have this built-in debugger 129 00:07:59,510 --> 00:08:01,400 inside of the browser. 130 00:08:01,400 --> 00:08:03,040 Let's take a look at the following. 131 00:08:03,040 --> 00:08:06,960 We've got a whole bunch of tabs here, and recall that the leftmost one is Elements, 132 00:08:06,960 --> 00:08:10,700 and no matter how godawful the HTML and JavaScript is in a page, 133 00:08:10,700 --> 00:08:15,710 recall that with the Elements tab you can actually navigate the HTML hierarchically 134 00:08:15,710 --> 00:08:17,050 and nice and neatly. 135 00:08:17,050 --> 00:08:19,370 So if you're trying to learn from a website like Google or Facebook 136 00:08:19,370 --> 00:08:22,370 or really any website, realize that you're probably better off 137 00:08:22,370 --> 00:08:26,360 looking at the source code this way as opposed to viewing the raw source, 138 00:08:26,360 --> 00:08:29,580 which can be a mess, as we've seen especially on Google's site. 139 00:08:29,580 --> 00:08:32,220 So if I instead click on the Network tab here, 140 00:08:32,220 --> 00:08:34,830 let's see what's going on when I visit this page. 141 00:08:34,830 --> 00:08:38,669 First let me clear my cache. 142 00:08:38,669 --> 00:08:43,570 I'm going to go into Settings in Chrome and then go to History 143 00:08:43,570 --> 00:08:46,420 and then Clear all browsing data. 144 00:08:46,420 --> 00:08:48,170 You might be used to doing this for other purposes, [laughter] 145 00:08:48,170 --> 00:08:51,990 but when it comes to developing websites, it's actually useful-- 146 00:08:51,990 --> 00:08:55,980 if you're laughing you know. [laughter] 147 00:08:55,980 --> 00:08:59,310 It's actually really useful when developing websites because the reality is 148 00:08:59,310 --> 00:09:04,100 things like cookies and things like cached HTML files, cached JavaScript files 149 00:09:04,100 --> 00:09:06,390 can actually become a big headache, because if for whatever reason 150 00:09:06,390 --> 00:09:11,500 the browser decides to cache some file and yet you've made changes to that file on the server 151 00:09:11,500 --> 00:09:14,670 but the browser hasn't really realized that the file has changed 152 00:09:14,670 --> 00:09:19,060 and therefore does not actually re-download it even when you click the Reload button, 153 00:09:19,060 --> 00:09:23,210 one of the most surefire ways to just make sure the fault is not with your code, 154 00:09:23,210 --> 00:09:26,480 it's with the behavior of the browser, is to go in here in your browser 155 00:09:26,480 --> 00:09:29,950 and just clear the entire history so that there's no confusion. 156 00:09:29,950 --> 00:09:33,210 >> And then if you really want to be paranoid, quit the browser, restart it, 157 00:09:33,210 --> 00:09:35,660 and then make sure all is working as expected. 158 00:09:35,660 --> 00:09:38,820 So in short, clearing cache is good when doing development. 159 00:09:38,820 --> 00:09:40,690 So here we have the Network tab. 160 00:09:40,690 --> 00:09:46,020 I previously had visited the site 9 times, but let me go ahead now and click Reload. 161 00:09:46,020 --> 00:09:47,500 And I'm back down to 0. 162 00:09:47,500 --> 00:09:52,100 Let's actually see how it is that this SESSION superglobal is being implemented. 163 00:09:52,100 --> 00:09:55,990 I'm going to click on the 1 HTTP request that was made, 164 00:09:55,990 --> 00:09:58,810 and this debugging window lets me look inside of that. 165 00:09:58,810 --> 00:10:01,970 Here I see just the response from the server, which isn't interesting. 166 00:10:01,970 --> 00:10:04,030 I've seen this in any number of ways. 167 00:10:04,030 --> 00:10:06,350 But what's technically interesting are the headers. 168 00:10:06,350 --> 00:10:11,770 If I scroll down here and focus on the request headers and click view source, 169 00:10:11,770 --> 00:10:14,400 what I'm going to see is literally the HTTP request 170 00:10:14,400 --> 00:10:17,250 that just went from my browser to the server, 171 00:10:17,250 --> 00:10:21,400 GET being the operative word and then /counter.php being the file name, 172 00:10:21,400 --> 00:10:25,670 HTTP/1.1 just being the version of HTTP that my browser is using. 173 00:10:25,670 --> 00:10:31,070 This line here is a little reminder from browser to server what the name of the server is 174 00:10:31,070 --> 00:10:33,020 that it wants to talk to. 175 00:10:33,020 --> 00:10:38,200 And then the rest of this is sometimes interesting but not relevant right now. 176 00:10:38,200 --> 00:10:40,090 >> This is just kind of a curiosity. 177 00:10:40,090 --> 00:10:43,530 Cryptic though this string is, any time your browser visits a website 178 00:10:43,530 --> 00:10:47,110 it is informing the server what browser you're using 179 00:10:47,110 --> 00:10:50,040 and what operating system you're using and what version thereof. 180 00:10:50,040 --> 00:10:52,650 So if you've ever wondered how websites like CNN and whatnot 181 00:10:52,650 --> 00:10:56,860 know what the percentages are of Mac users on the Web, PC users, 182 00:10:56,860 --> 00:11:00,820 IE users, Chrome users and the like, it's because all of our browsers 183 00:11:00,820 --> 00:11:04,300 are telling every single website out there what we are. 184 00:11:04,300 --> 00:11:07,410 It doesn't necessarily contain personally identifiable information, 185 00:11:07,410 --> 00:11:13,060 but it does tell the server what your IP address is and what browser and OS you are using. 186 00:11:13,060 --> 00:11:14,720 So that's where this information is. 187 00:11:14,720 --> 00:11:19,960 But what's more interesting now when it comes to these sessions is the response header. 188 00:11:19,960 --> 00:11:22,530 Let me click view source next to response. 189 00:11:22,530 --> 00:11:24,590 What's interesting here is a few things. 190 00:11:24,590 --> 00:11:27,580 1, we got back a status code of 200. 191 00:11:27,580 --> 00:11:29,840 We never see this status code because that means all is well. 192 00:11:29,840 --> 00:11:32,920 It means literally okay in contrast to something else. 193 00:11:32,920 --> 00:11:36,380 What's a number we sometimes see that's bad? [student] 404. 194 00:11:36,380 --> 00:11:39,860 404, file not found, 403 you might be stumbling upon already, 195 00:11:39,860 --> 00:11:43,660 which is forbidden, which means you forgot to chmod something, most likely. 196 00:11:43,660 --> 00:11:45,190 And there's a bunch of others. 197 00:11:45,190 --> 00:11:47,760 >> Down here, this is a little crazy. 198 00:11:47,760 --> 00:11:52,340 I really just wrote this file a few minutes ago by pasting it into gedit. 199 00:11:52,340 --> 00:11:57,100 Why did this page expire in 1981 before there really was a Web? 200 00:11:58,010 --> 00:12:00,730 What's going on there? 201 00:12:00,730 --> 00:12:04,390 >> [inaudible student response] The time stamp. But why? 202 00:12:06,110 --> 00:12:09,120 It's somewhat arbitrary, but it's actually useful. 203 00:12:09,120 --> 00:12:15,500 What this is saying to my browser is this PHP file you've just requested has already expired. 204 00:12:15,500 --> 00:12:18,580 In fact, it expired 30 years ago. 205 00:12:18,580 --> 00:12:20,260 But what does that really mean? 206 00:12:20,260 --> 00:12:22,500 It just means the next time the user visits this page, 207 00:12:22,500 --> 00:12:25,540 whether by reloading or typing the URL in the address bar, 208 00:12:25,540 --> 00:12:28,010 make sure you go and fetch a new copy of it. 209 00:12:28,010 --> 00:12:30,840 This is sort of an example of cache busting, 210 00:12:30,840 --> 00:12:33,790 a stupid word that just means trying to discourage browsers 211 00:12:33,790 --> 00:12:37,260 from actually caching HTML that's been sent from a server 212 00:12:37,260 --> 00:12:41,490 so that you don't accidentally hit reload and then see the same version of the file. 213 00:12:41,490 --> 00:12:43,730 You actually want the server to send a new copy. 214 00:12:43,730 --> 00:12:47,440 So the fact that it's 1981 just means that that's what the appliance is choosing 215 00:12:47,440 --> 00:12:50,280 as an arbitrary date in the past. 216 00:12:50,280 --> 00:12:53,380 But the real juicy line is now this one. 217 00:12:53,380 --> 00:12:57,550 Even before 50 you're probably vaguely familiar with cookies. 218 00:12:57,550 --> 00:13:01,820 As of right now, especially among those less comfortable or in between, 219 00:13:01,820 --> 00:13:04,120 what is a cookie in your understanding right now 220 00:13:04,120 --> 00:13:06,980 even though we're about to make your understanding more technical? 221 00:13:08,150 --> 00:13:10,070 What's a cookie? Yeah. 222 00:13:10,070 --> 00:13:13,890 [student] Information about the user, like if they've written their user name or something. 223 00:13:13,890 --> 00:13:17,370 >> Good. It's information about the user, whether they've typed in their user name already. 224 00:13:17,370 --> 00:13:21,190 Cookies are a way whereby servers can remember something about a user. 225 00:13:21,190 --> 00:13:25,810 And what a cookie really is is a text file or some sequence of bytes 226 00:13:25,810 --> 00:13:28,340 that's planted by the server inside of your browser, 227 00:13:28,340 --> 00:13:31,960 and inside of that file or among those bytes is some kind of identifier. 228 00:13:31,960 --> 00:13:35,640 Maybe it's literally your user name, but more often it's something more cryptic-looking 229 00:13:35,640 --> 00:13:43,700 like this thing here--bo8dal3ct and so forth--this really big alphanumeric string 230 00:13:43,700 --> 00:13:47,050 that's really just meant to be a unique identifier for you. 231 00:13:47,050 --> 00:13:49,790 Or you can think of it as sort of a virtual hand stamp. 232 00:13:49,790 --> 00:13:53,020 If you go to some club or an amusement park, to remember that you've actually paid 233 00:13:53,020 --> 00:13:55,850 and gone in, they put a little red sticker on your hand of some sort, 234 00:13:55,850 --> 00:13:59,270 and that reminds the people at the counter that you've already paid 235 00:13:59,270 --> 00:14:01,340 and you can come and go as you please. 236 00:14:01,340 --> 00:14:04,250 Cookies are a little similar in spirit to that. 237 00:14:04,250 --> 00:14:08,070 The first time I visited this website, as I just did after clearing my cache, 238 00:14:08,070 --> 00:14:11,620 the web server, the appliance in this case, put a stamp on my hand 239 00:14:11,620 --> 00:14:15,030 whose name is PHPSESSID, session ID, 240 00:14:15,030 --> 00:14:18,260 whose value is this really long alphanumeric string. 241 00:14:18,260 --> 00:14:22,470 >> So that's now sort of emblazoned on my hand so that the next time I hit reload 242 00:14:22,470 --> 00:14:25,230 or manually visit this URL in a browser, 243 00:14:25,230 --> 00:14:29,230 my browser by definition of HTTP is going to present the hand stamp 244 00:14:29,230 --> 00:14:31,940 again and again and again. 245 00:14:31,940 --> 00:14:34,550 So even though the server doesn't necessarily know who I am, 246 00:14:34,550 --> 00:14:39,610 they at least know that I'm the same user or at least, more specifically, the same browser. 247 00:14:39,610 --> 00:14:45,660 And so this is ultimately how the SESSION superglobal is implemented. 248 00:14:45,660 --> 00:14:51,200 The server has no idea who you are when you revisit a website for the second or the third time 249 00:14:51,200 --> 00:14:53,410 unless you present this hand stamp. 250 00:14:53,410 --> 00:14:55,530 And as soon as you present that hand stamp, 251 00:14:55,530 --> 00:14:59,370 the web server essentially goes into a little database of its own 252 00:14:59,370 --> 00:15:06,040 and checks, okay, I have just seen the hand stamp of user bo8dal3ct and so forth. 253 00:15:06,040 --> 00:15:09,850 Let me see what information the programmer has stored 254 00:15:09,850 --> 00:15:12,380 inside of the superglobal about this user, 255 00:15:12,380 --> 00:15:17,000 and then let me make sure that that data is again inside of the SESSION superglobal 256 00:15:17,000 --> 00:15:19,830 so that the programmer can re-access that data 257 00:15:19,830 --> 00:15:23,360 even if it was set some minutes or hours ago. 258 00:15:23,360 --> 00:15:26,150 So in other words, cookies, which got a bad rap for some time 259 00:15:26,150 --> 00:15:29,990 because of insecurities in browsers and they can really violate our privacy and all this, 260 00:15:29,990 --> 00:15:31,900 they actually have great utility because without them 261 00:15:31,900 --> 00:15:36,110 you would constantly be logging in to every Facebook page you visit 262 00:15:36,110 --> 00:15:40,680 or every Gmail email you read if the browser didn't have some way of remembering 263 00:15:40,680 --> 00:15:43,320 that you've already authenticated. 264 00:15:43,320 --> 00:15:46,640 >> So in this way cookies are sent back and forth across the wire. 265 00:15:46,640 --> 00:15:52,470 Another curiosity about cookies, especially here, is that this is completely in cleartext. 266 00:15:52,470 --> 00:15:54,930 There's no encryption going on here whatsoever, 267 00:15:54,930 --> 00:15:57,240 and indeed I'm using HTTP at the moment. 268 00:15:57,240 --> 00:16:00,890 One of our favorites moments in CS50, which is now 2 years ago, 269 00:16:00,890 --> 00:16:04,750 was around the time a tool called Firesheep came out. 270 00:16:04,750 --> 00:16:08,320 This was a free piece of software that was made by a security researcher 271 00:16:08,320 --> 00:16:13,250 as a wake-up call for the community to say just how atrociously implemented 272 00:16:13,250 --> 00:16:17,900 certain authentication mechanisms on the Web were. 273 00:16:17,900 --> 00:16:22,880 So for some time, Facebook was almost entirely over HTTP, no HTTPS. 274 00:16:22,880 --> 00:16:25,640 And even if you have no idea how the crypto works, S is secure 275 00:16:25,640 --> 00:16:27,950 so it means there's at least some encryption involved. 276 00:16:27,950 --> 00:16:30,610 Facebook did used to encrypt user names and passwords, 277 00:16:30,610 --> 00:16:33,560 but as soon as you looked at your pokes or your messages or your news feed, 278 00:16:33,560 --> 00:16:35,360 all of that was unencrypted. 279 00:16:35,360 --> 00:16:37,870 So was Gmail until just a year or 2 ago. 280 00:16:37,870 --> 00:16:41,100 Any time you logged in, yes, they used secure encryption, 281 00:16:41,100 --> 00:16:44,300 but thereafter they didn't. And why might that be? 282 00:16:44,300 --> 00:16:49,210 Why not just use cryptography all of the time in use cases like this? 283 00:16:49,210 --> 00:16:53,700 What's that? I think I heard something. [student] Speed. 284 00:16:53,700 --> 00:16:56,250 Speed, right? There are ways around this. 285 00:16:56,250 --> 00:16:59,610 But if you just kind of think about it logically, if you encrypt something, 286 00:16:59,610 --> 00:17:01,820 you have to do at least a little more work. 287 00:17:01,820 --> 00:17:05,460 In pset 2 when you implemented Caesar or Vigenere or even Crack, 288 00:17:05,460 --> 00:17:07,760 just printing a string is relatively easy. 289 00:17:07,760 --> 00:17:12,040 Encrypting and then printing a string minimally requires a bit more work. 290 00:17:12,040 --> 00:17:14,520 >> For super popular websites like Google and Facebook, 291 00:17:14,520 --> 00:17:18,839 if you have to do more work for each user for every single web page they visit, 292 00:17:18,839 --> 00:17:20,520 that just takes more CPU time. 293 00:17:20,520 --> 00:17:22,920 And if you need more CPU time, you might need more servers, 294 00:17:22,920 --> 00:17:24,270 which means you might need more money. 295 00:17:24,270 --> 00:17:27,579 And so for many years this just really wasn't best practice. 296 00:17:27,579 --> 00:17:31,440 People would use SSL encryption only when they needed to. 297 00:17:31,440 --> 00:17:34,960 But it turned out, and as this fellow with Firesheep made super clear, 298 00:17:34,960 --> 00:17:37,920 when you guys who are currently on Facebook right now-- 299 00:17:37,920 --> 00:17:39,880 Out of curiosity, let's see if you'll fess up. 300 00:17:39,880 --> 00:17:42,620 If you're on Facebook right now in some tab, even if it's not foregrounded, 301 00:17:42,620 --> 00:17:46,610 is your URL HTTP or HTTPS? 302 00:17:46,610 --> 00:17:50,560 [multiple students] S. S? [laughter] 303 00:17:50,560 --> 00:17:55,510 Okay. Any HTTP? Just 1? Okay. 304 00:17:55,510 --> 00:17:58,940 So all of us can hack that guy's Facebook account right now. 305 00:17:58,940 --> 00:18:04,100 For the most part this has become turned on by default, at least in some websites. 306 00:18:04,100 --> 00:18:08,120 And long story short, if your web traffic is not encrypted, 307 00:18:08,120 --> 00:18:12,960 not only does the HTML go back and forth across the WiFis unencrypted, 308 00:18:12,960 --> 00:18:16,760 so do things like cookies go back and forth throughout the air 309 00:18:16,760 --> 00:18:18,940 without any form of encryption. 310 00:18:18,940 --> 00:18:23,540 So if you have just a bit of programming savvy or a bit of Googling skills 311 00:18:23,540 --> 00:18:27,410 to find free software that does this, all you have to do is sit in Starbucks 312 00:18:27,410 --> 00:18:30,680 or sit in an airport where there's generally unencrypted WiFi 313 00:18:30,680 --> 00:18:36,070 and just watch for keywords like Set-Cookie: or PHPSESSID 314 00:18:36,070 --> 00:18:39,300 because if you have the technical savvy to just watch the WiFi 315 00:18:39,300 --> 00:18:43,010 for all of the bits that flow throughout the air for this pattern, 316 00:18:43,010 --> 00:18:50,840 you can then say that guy's PHPSESSID happens to be bo8dal and so forth. 317 00:18:50,840 --> 00:18:53,890 And then again if you're sufficiently technically savvy or have the right tool, 318 00:18:53,890 --> 00:18:58,890 you can then just reconfigure your own browser to start presenting that hand stamp 319 00:18:58,890 --> 00:19:05,030 to Facebook.com, and Facebook is just going to assume that you are that guy 320 00:19:05,030 --> 00:19:09,880 because all they know is not who you are but that you have this unique identifier. 321 00:19:09,880 --> 00:19:14,650 So if you steal that unique identifier and present it to the web server as your own, 322 00:19:14,650 --> 00:19:16,860 they are just going to show you that person's news feed 323 00:19:16,860 --> 00:19:18,980 or that person's messages or pokes. 324 00:19:18,980 --> 00:19:23,190 >> And I would Google now how to activate HTTPS for Facebook perhaps. 325 00:19:23,190 --> 00:19:25,150 But it really is as simple as that. 326 00:19:25,150 --> 00:19:27,660 And so Facebook and Google and the like have gotten really good at this, 327 00:19:27,660 --> 00:19:31,870 but keep an eye out all the more for any websites you visit that don't use HTTP 328 00:19:31,870 --> 00:19:35,020 and have some kind of sensitive information on them, 329 00:19:35,020 --> 00:19:37,490 whether it's financial or personal or the like. 330 00:19:37,490 --> 00:19:43,180 If they're not using this, quite possibly can cookies like this be very easily stolen 331 00:19:43,180 --> 00:19:46,270 and then forged, and that's exactly what Firesheep did. 332 00:19:46,270 --> 00:19:48,250 You didn't have to be a programmer. 333 00:19:48,250 --> 00:19:51,680 All you had to do was have an Internet connection, download this free tool, 334 00:19:51,680 --> 00:19:56,490 and what it would do is you log in and then it would show you the Facebook names 335 00:19:56,490 --> 00:20:00,170 of everyone in Sanders, in this particular demonstration, around you 336 00:20:00,170 --> 00:20:03,260 and all you had to do was click on their name and the software automated the process 337 00:20:03,260 --> 00:20:05,970 of sniffing that cookie, presenting it to Facebook as your own, 338 00:20:05,970 --> 00:20:07,990 and, voila, you're logged in. 339 00:20:07,990 --> 00:20:11,190 So this is another one of those "don't do this" officially. 340 00:20:11,190 --> 00:20:14,660 If you have your own home network and you want to tinker, by all means, 341 00:20:14,660 --> 00:20:17,530 but realize this does cross the line on a university environment. 342 00:20:17,530 --> 00:20:20,030 >> But the goal here is really to emphasize not how to do this 343 00:20:20,030 --> 00:20:22,320 but how to defend against these kinds of things. 344 00:20:22,320 --> 00:20:26,180 And the trivial solution here, even though it itself is flawed, 345 00:20:26,180 --> 00:20:31,360 is to really reduce use of any sites that aren't using HTTPS constantly. 346 00:20:31,360 --> 00:20:34,520 So sites like Facebook and Google increasingly have checkboxes 347 00:20:34,520 --> 00:20:36,200 where you can opt in to this sort of thing, 348 00:20:36,200 --> 00:20:40,000 and banks have had this for years for similar reasons. 349 00:20:40,000 --> 00:20:43,580 So just a little bit of a fear factor if we can. But that's it in a nutshell. 350 00:20:43,580 --> 00:20:46,420 That is how a server remembers who you are. 351 00:20:46,420 --> 00:20:50,760 And as soon as they can remember who you are, they can remember anything about you 352 00:20:50,760 --> 00:20:56,140 that the programmer has stored inside of this special superglobal called $_SESSION. 353 00:20:56,140 --> 00:20:59,750 And for pset 7 we're using it trivially just to remember an int, 354 00:20:59,750 --> 00:21:02,260 namely the unique ID of the user who has logged in, 355 00:21:02,260 --> 00:21:05,880 so that we know they've been there before. 356 00:21:05,880 --> 00:21:12,450 Any questions then on sessions or cookies or the like? 357 00:21:12,450 --> 00:21:15,130 Firesheep doesn't work as well anymore, 358 00:21:15,130 --> 00:21:18,310 and you have to put your computer into a special promiscuous mode 359 00:21:18,310 --> 00:21:20,700 so you're actually listening for traffic besides yourselves. 360 00:21:20,700 --> 00:21:23,940 So if you're currently downloading Firesheep, realize it's not quite as easy 361 00:21:23,940 --> 00:21:26,850 as it once was to demonstrate. 362 00:21:26,850 --> 00:21:29,070 All right. And don't do it in Sanders. Do it at home. 363 00:21:29,070 --> 00:21:30,890 Databases. 364 00:21:30,890 --> 00:21:33,580 One of the things we did in pset 7 very deliberately 365 00:21:33,580 --> 00:21:37,780 was we give you a sample database table for users that has some user IDs, 366 00:21:37,780 --> 00:21:41,020 some user names, and some encrypted passwords therein. 367 00:21:41,020 --> 00:21:44,520 And as you'll see, if you haven't already, you're going to have to change the table a little bit. 368 00:21:44,520 --> 00:21:47,710 You're going to have to add some cache to each of the users in that table, 369 00:21:47,710 --> 00:21:51,130 and you're going to have to add another history table, a portfolios table, 370 00:21:51,130 --> 00:21:53,310 or perhaps call it something else. 371 00:21:53,310 --> 00:21:56,740 But in terms of thinking about how to do this, let's open up this tool 372 00:21:56,740 --> 00:22:00,570 which we used on Friday, but if unfamiliar, the appliance comes with a tool 373 00:22:00,570 --> 00:22:04,680 called phpMyAdmin which is coincidentally written in PHP, 374 00:22:04,680 --> 00:22:07,950 but its purpose in life, after I log in here as jharvard with crimson, 375 00:22:07,950 --> 00:22:15,160 is to give me a user-friendly way of viewing and changing my database. 376 00:22:15,160 --> 00:22:18,040 >> The database that I'm running on the appliance is called MySQL. 377 00:22:18,040 --> 00:22:23,420 This is very popular, and it's a free open source database that's wonderfully easy to use, 378 00:22:23,420 --> 00:22:25,620 especially with front ends like this. 379 00:22:25,620 --> 00:22:29,350 What this tool allows me to do, for instance, is poke around tables. 380 00:22:29,350 --> 00:22:30,890 Let me go ahead and do this. 381 00:22:30,890 --> 00:22:36,580 On Friday we created a table called students that was super simple. 382 00:22:36,580 --> 00:22:41,680 It had 3 columns--id, name, and email--and I manually inserted a couple of rows 383 00:22:41,680 --> 00:22:44,420 like David and Mike in this particular example. 384 00:22:44,420 --> 00:22:47,290 Let's take this a bit further, and let's assume that we want to remember more 385 00:22:47,290 --> 00:22:49,660 than just name and email about a user. 386 00:22:49,660 --> 00:22:53,090 Let me click Structure up here at the top. 387 00:22:53,090 --> 00:22:55,440 And again, the pset walks you through the requisite steps here, 388 00:22:55,440 --> 00:22:58,150 so don't worry if some of this is a bit quick. 389 00:22:58,150 --> 00:22:59,690 Then I'm going to click on here. 390 00:22:59,690 --> 00:23:02,270 I'm going to add some number of columns after email 391 00:23:02,270 --> 00:23:04,130 because I want to add something like house. 392 00:23:04,130 --> 00:23:06,640 I forgot to record a student's house. 393 00:23:06,640 --> 00:23:11,400 Let me click Go, and now we have this form that unfortunately is a little wide from left to right, 394 00:23:11,400 --> 00:23:13,710 but I'm going to call the name of this field house, 395 00:23:13,710 --> 00:23:16,050 and then the type I now have to choose. 396 00:23:16,050 --> 00:23:18,870 So let's have a brief chat about the various types in MySQL 397 00:23:18,870 --> 00:23:24,590 because whereas PHP is weakly typed and it sort of plays fast and loose with types, 398 00:23:24,590 --> 00:23:29,430 in a database especially it's super important to actually use typing to your advantage 399 00:23:29,430 --> 00:23:33,260 because one of the things MySQL and other database engines can do for you 400 00:23:33,260 --> 00:23:37,910 is ensure that you don't put bogus data into your database. 401 00:23:37,910 --> 00:23:41,850 This is sort of free error checking available to you. 402 00:23:41,850 --> 00:23:46,250 >> For house we obviously don't want it to be an int, which is a 32-bit value in MySQL. 403 00:23:46,250 --> 00:23:49,810 We did talk briefly on Friday about varchar, which stands for variable length char. 404 00:23:49,810 --> 00:23:54,720 What is this? This allows you to specify that you want this to be a string of some sort. 405 00:23:54,720 --> 00:23:56,840 You don't really know in advance how long it is, 406 00:23:56,840 --> 00:24:00,100 so we'll arbitrarily say a house name can be 255 characters, 407 00:24:00,100 --> 00:24:04,190 but you could go with 32, 64--any number really. 408 00:24:04,190 --> 00:24:10,700 But the advantage of using a varchar over a field called char is what? 409 00:24:10,700 --> 00:24:15,110 Just intuitively if I scroll down here, notice there's char and there's varchar. 410 00:24:15,110 --> 00:24:19,520 Varchar is variable length char; char is a fixed length char. 411 00:24:19,520 --> 00:24:24,730 So based only on that definition, what's the advantage or disadvantage of each of these? 412 00:24:24,730 --> 00:24:30,490 In other words, who cares about the distinction, or why should you care? 413 00:24:31,660 --> 00:24:35,750 >> Yeah. [student] Varchar has more flexibility but takes up more memory. 414 00:24:35,750 --> 00:24:40,730 Good. Varchar takes up more-- Let's see. I'm not sure if I heard that right. 415 00:24:40,730 --> 00:24:42,360 Can you say that once more? 416 00:24:42,360 --> 00:24:45,850 [student] I said varchar probably has more flexibility but it takes up more memory. 417 00:24:45,850 --> 00:24:51,170 Interesting. Okay. Varchar probably gives you more flexibility but takes up more memory. 418 00:24:51,170 --> 00:24:53,220 The latter isn't necessarily true. 419 00:24:53,220 --> 00:24:56,290 It depends on the context, but let's come back to that. 420 00:24:56,290 --> 00:25:03,230 >> [inaudible student response] Exactly. 421 00:25:03,230 --> 00:25:06,900 It's actually the case that char will typically use more memory 422 00:25:06,900 --> 00:25:10,950 because a char, like in C, is like a string, it's an array of characters. 423 00:25:10,950 --> 00:25:13,690 So if you say a char field of length 255, 424 00:25:13,690 --> 00:25:16,910 the database is literally going to give you 255 characters. 425 00:25:16,910 --> 00:25:22,290 And if the house ends up being M-A-T-H-E-R and 6 characters total, 426 00:25:22,290 --> 00:25:25,090 you're wasting over 200 characters. 427 00:25:25,090 --> 00:25:29,640 >> So a varchar effectively only uses as many characters as is necessary 428 00:25:29,640 --> 00:25:31,590 up to a maximum amount. 429 00:25:31,590 --> 00:25:35,470 But the price you pay is actually performance, potentially. 430 00:25:35,470 --> 00:25:39,740 If you know in advance that all of your strings are going to be 8 characters-- 431 00:25:39,740 --> 00:25:43,090 for instance, suppose that you require passwords of length 8-- 432 00:25:43,090 --> 00:25:47,350 the upside of using a char field on occasion, though not often, 433 00:25:47,350 --> 00:25:51,100 is to specify a fixed length for something like a password 434 00:25:51,100 --> 00:25:53,300 because now the database can be even smarter. 435 00:25:53,300 --> 00:25:58,160 If it knows that every char field, every string in a column is the same length, 436 00:25:58,160 --> 00:26:00,780 you get back the feature of random access. 437 00:26:00,780 --> 00:26:05,110 You can jump around among the various char fields in your database table 438 00:26:05,110 --> 00:26:07,940 because think of a database as rows and columns. 439 00:26:07,940 --> 00:26:11,670 So if every one of the strings is the same length, 440 00:26:11,670 --> 00:26:17,820 you know that the first one is at byte 0, the next one is at byte 8 441 00:26:17,820 --> 00:26:20,240 and then 16 and then 24 and so forth. 442 00:26:20,240 --> 00:26:24,500 So if all the strings are of the same length, you can jump around much more efficiently. 443 00:26:24,500 --> 00:26:26,710 So that can be a benefit in terms of performance, 444 00:26:26,710 --> 00:26:29,420 but typically you don't have the luxury of knowing in advance, 445 00:26:29,420 --> 00:26:32,170 so a varchar is the way to go. 446 00:26:32,170 --> 00:26:36,030 Here's another detail that even Facebook ran into eventually. 447 00:26:36,030 --> 00:26:39,670 Ints are great, and we sort of use them by default any time we want a number, 448 00:26:39,670 --> 00:26:41,750 but it's only 32 bits. 449 00:26:41,750 --> 00:26:46,210 >> And even though Facebook doesn't quite have 4 billion users now, 450 00:26:46,210 --> 00:26:48,680 there's definitely some people out there with multiple accounts 451 00:26:48,680 --> 00:26:50,960 or accounts that have been opened and then closed, 452 00:26:50,960 --> 00:26:55,130 and so Facebook itself I believe a few years ago had to transition from int 453 00:26:55,130 --> 00:27:00,010 to, as is aptly called, bigint, which is just 64 bits instead. 454 00:27:00,010 --> 00:27:02,230 So this too is a design decision. 455 00:27:02,230 --> 00:27:06,570 You would be amazingly lucky if your final project turns startup, 456 00:27:06,570 --> 00:27:10,010 has 4 billion and 1 users, give or take, 457 00:27:10,010 --> 00:27:13,200 in which case using ints might be a little shortsighted. 458 00:27:13,200 --> 00:27:16,230 But in reality, your users table is probably fine with ints. 459 00:27:16,230 --> 00:27:19,340 But for something like pset 7, like your history table, 460 00:27:19,340 --> 00:27:23,700 you might have thousands, millions of users if you evolve into etrade.com. 461 00:27:23,700 --> 00:27:26,020 So whereas you might not have more than 4 billion users, 462 00:27:26,020 --> 00:27:30,070 those users you do have might have more than 4 billion transactions over time-- 463 00:27:30,070 --> 00:27:33,200 buys and sells and things in their history. 464 00:27:33,200 --> 00:27:38,090 So if you do anticipate--again, these are good problems to have if you have this much data-- 465 00:27:38,090 --> 00:27:40,920 if you do anticipate data exceeding the size of an int, 466 00:27:40,920 --> 00:27:47,740 going with something like bigint is a direction not frequently enough adopted by designers 467 00:27:47,740 --> 00:27:49,710 because people figure that's not going to be a problem, 468 00:27:49,710 --> 00:27:51,930 but it's this easy to choose something bigger than that. 469 00:27:51,930 --> 00:27:55,380 Decimal we're using in pset 7, which specifies fixed precision 470 00:27:55,380 --> 00:27:59,840 so you can avoid the issues involving floats and doubles and reals and the like. 471 00:27:59,840 --> 00:28:02,440 >> And then there's some other fields here. We'll wave our hands at them to some extent. 472 00:28:02,440 --> 00:28:07,270 But dates, times all have a prescribed format in MySQL, 473 00:28:07,270 --> 00:28:10,830 and the advantage of storing dates as dates and not varchars 474 00:28:10,830 --> 00:28:15,730 means that the database can actually reformat them into different formats, 475 00:28:15,730 --> 00:28:18,800 whether a US format or European format or the like--however you want it-- 476 00:28:18,800 --> 00:28:22,700 much more efficiently than if it were just some generic varchar. 477 00:28:22,700 --> 00:28:25,150 And then there's some other binary, varbinary, blobs. 478 00:28:25,150 --> 00:28:28,580 These are binary large objects, and you can also store binary data 479 00:28:28,580 --> 00:28:30,750 as well as geometric data in a database. 480 00:28:30,750 --> 00:28:34,350 But for us we'll typically care about ints and varchars and the like. 481 00:28:34,350 --> 00:28:36,230 Let's finish up this example with house. 482 00:28:36,230 --> 00:28:40,030 House I'm going to arbitrarily say will be 255 chars. 483 00:28:40,030 --> 00:28:42,850 Then default value we could do this. 484 00:28:42,850 --> 00:28:47,440 We could by default put everyone in Mather House, for instance. 485 00:28:47,440 --> 00:28:49,710 That's how we could specify that the database 486 00:28:49,710 --> 00:28:52,460 should ensure that someone always has a value. But I'll leave that be. 487 00:28:52,460 --> 00:28:55,270 In fact, for people who live off campus and not in a house, 488 00:28:55,270 --> 00:28:59,590 maybe I actually want to specify that the default value for house is NULL, 489 00:28:59,590 --> 00:29:04,890 and then I need to check this box and tell the database it's okay if the user's house is NULL. 490 00:29:04,890 --> 00:29:07,270 >> Again, this is another defense mechanism you can put in place 491 00:29:07,270 --> 00:29:10,590 so you don't even have to put it in your PHP code necessarily. 492 00:29:10,590 --> 00:29:14,630 The database will ensure that things are or are not NULL. 493 00:29:14,630 --> 00:29:17,310 And then lastly, Attributes. 494 00:29:17,310 --> 00:29:18,920 None of these are really relevant. 495 00:29:18,920 --> 00:29:22,880 Binary, unsigned--none of those are relevant to a varchar. 496 00:29:22,880 --> 00:29:24,220 Index. 497 00:29:24,220 --> 00:29:27,320 Does anyone know or remember or have a guess as to what an index is 498 00:29:27,320 --> 00:29:29,510 for something like house? 499 00:29:29,510 --> 00:29:35,240 This too is actually an important and relatively easy design decision. 500 00:29:35,240 --> 00:29:39,200 For those who haven't yet seen it, on Friday we talked briefly about primary keys. 501 00:29:39,200 --> 00:29:43,240 In a database table, a primary key is the field or column 502 00:29:43,240 --> 00:29:46,270 that uniquely identifies rows in the table. 503 00:29:46,270 --> 00:29:49,150 So in the current table we have IDs, we have names and emails. 504 00:29:49,150 --> 00:29:52,050 Which of those is the best candidate to be a primary key, 505 00:29:52,050 --> 00:29:55,810 whose role is to uniquely identify rows? 506 00:29:55,810 --> 00:29:57,530 Probably ID. 507 00:29:57,530 --> 00:29:59,930 Arguably, we could also use what though? 508 00:29:59,930 --> 00:30:02,860 Maybe you could use email because in theory it's unique 509 00:30:02,860 --> 00:30:05,380 unless people are sharing email accounts. 510 00:30:05,380 --> 00:30:09,980 But the reality is that if you're using a numeric ID like 1234, 511 00:30:09,980 --> 00:30:14,170 that's only 32 bits, whereas an email address could be this many bytes or this many bytes. 512 00:30:14,170 --> 00:30:16,610 So in terms of efficiency for unique identifiers, 513 00:30:16,610 --> 00:30:19,270 it tends to be good practice just to use an int 514 00:30:19,270 --> 00:30:23,090 even if you have some string candidate that you could arguably use. 515 00:30:23,090 --> 00:30:26,760 >> For something like house, this should not be a primary key 516 00:30:26,760 --> 00:30:30,770 because then only 1 person could live in Mather and 1 person in Currier and the like. 517 00:30:30,770 --> 00:30:32,790 Similarly, this should not be unique. 518 00:30:32,790 --> 00:30:37,830 The difference between primary and unique is that in the case of our current table, 519 00:30:37,830 --> 00:30:42,620 ID would be primary but email is not primary for the reason we just mentioned-- 520 00:30:42,620 --> 00:30:44,740 performance--but it should still be unique. 521 00:30:44,740 --> 00:30:47,200 So you can still enforce uniqueness without making the claim 522 00:30:47,200 --> 00:30:49,520 that it's a super important primary field. 523 00:30:49,520 --> 00:30:52,610 But this one is quite helpful: Index. 524 00:30:52,610 --> 00:30:56,180 If you know in advance for your final project, for pset 7, or in general, 525 00:30:56,180 --> 00:30:59,480 that this field house is going to be something you search on a lot 526 00:30:59,480 --> 00:31:01,910 using the select keyword or something else, 527 00:31:01,910 --> 00:31:05,180 then you can preemptively tell the database to work its magic 528 00:31:05,180 --> 00:31:10,510 and make sure that it creates in memory any fancy data structures necessary 529 00:31:10,510 --> 00:31:13,770 to expedite searches based on house. 530 00:31:13,770 --> 00:31:17,860 Maybe it will use a hash table, maybe it will use a linked list. 531 00:31:17,860 --> 00:31:21,260 In reality, it tends to use a tree, often a structure called a B-tree-- 532 00:31:21,260 --> 00:31:24,090 not a binary tree but a B-tree--which is a very wide tree 533 00:31:24,090 --> 00:31:27,370 that you might see in a class like CS124, the data structures class. 534 00:31:27,370 --> 00:31:31,800 But in short, you don't have to worry about that when using smart database software. 535 00:31:31,800 --> 00:31:35,890 You can just tell it, "Index this field so I can search on it more efficiently." 536 00:31:35,890 --> 00:31:40,250 >> If you leave this off and you try to search for everyone in the database who lives in Mather, 537 00:31:40,250 --> 00:31:42,710 it will devolve into linear search. 538 00:31:42,710 --> 00:31:45,360 And if you've got 6,000 undergrads all living in some house, 539 00:31:45,360 --> 00:31:47,900 you're going to search the entire table to find the Matherites, 540 00:31:47,900 --> 00:31:52,190 whereas if you say Index, hopefully it will be something close to a logarithmic search 541 00:31:52,190 --> 00:31:54,510 to find those kinds of students. 542 00:31:54,510 --> 00:31:56,750 This is just a free feature to turn on, 543 00:31:56,750 --> 00:31:59,530 even though it does come at a price of some amount of space. 544 00:31:59,530 --> 00:32:02,690 Lastly, auto-increment, this AI field, 545 00:32:02,690 --> 00:32:05,830 which just means if it's an int and you don't want to care to increment it yourself 546 00:32:05,830 --> 00:32:07,570 every time there's a new user, check that, 547 00:32:07,570 --> 00:32:11,910 and each user that gets inserted will automatically get a new ID. 548 00:32:11,910 --> 00:32:15,620 Let's click Save, and now let's find fault with this design. 549 00:32:15,620 --> 00:32:20,200 If I go into Browse, notice that both Mike and my house is NULL. 550 00:32:20,200 --> 00:32:22,420 I can use phpMyAdmin to edit this manually. 551 00:32:22,420 --> 00:32:25,110 I can go in here and type in Mather and then hit Enter, 552 00:32:25,110 --> 00:32:27,740 and now notice the table is different. 553 00:32:27,740 --> 00:32:29,270 But notice I could do something else as well. 554 00:32:29,270 --> 00:32:33,530 David's ID is 1, so phpMyAdmin again is just an administrative tool; 555 00:32:33,530 --> 00:32:35,970 this is not something your users are ever going to see. 556 00:32:35,970 --> 00:32:38,810 So if I instead click the SQL tab up top-- 557 00:32:38,810 --> 00:32:41,450 and again, pset 7 will introduce you to more of these queries-- 558 00:32:41,450 --> 00:32:45,260 I can manually execute the SQL structured query language command 559 00:32:45,260 --> 00:32:56,410 UPDATE users SET house = 'Pfoho' WHERE id = 1. 560 00:32:56,410 --> 00:33:00,830 These SQL queries are, nicely enough, pretty readable from left to right. 561 00:33:00,830 --> 00:33:04,350 Update the users table, set the field called house to Pfoho 562 00:33:04,350 --> 00:33:06,830 where the user's ID is 1. 563 00:33:06,830 --> 00:33:11,480 Or I could even do where email = 'malan@harvard.edu'. 564 00:33:11,480 --> 00:33:14,860 So long as that uniquely identifies me, that would work as well. 565 00:33:14,860 --> 00:33:18,810 But ID tends to be higher performance, so let's do that. 566 00:33:18,810 --> 00:33:22,950 Let's click Go. Okay, lecture.users doesn't exist. What's my error? 567 00:33:22,950 --> 00:33:26,220 What's the table actually called here? 568 00:33:26,220 --> 00:33:28,770 It's called students just because that's what we did up here at top left. 569 00:33:28,770 --> 00:33:31,860 It's called students, not users. So click Go now. 570 00:33:31,860 --> 00:33:34,330 1 row affected. Query took 0.01 seconds. 571 00:33:34,330 --> 00:33:38,010 If I click Browse now, now Malan lives in Pfoho. 572 00:33:38,010 --> 00:33:42,070 So that's another taste of SQL, but the pset will walk you through a bit more of that. 573 00:33:42,070 --> 00:33:44,710 >> There's a stupid decision I've already made here. 574 00:33:44,710 --> 00:33:47,820 I would argue that this database design is inefficient 575 00:33:47,820 --> 00:33:51,650 because the more people I add to the students table, 576 00:33:51,650 --> 00:33:54,730 the more of us I start adding, the more of the TFs I start adding, 577 00:33:54,730 --> 00:33:58,320 we're going to start to see what redundancies in this table? 578 00:34:00,840 --> 00:34:06,020 >> Yeah. [student] Seeing that it's in students, we're using the same [inaudible] 579 00:34:06,020 --> 00:34:07,360 The same-- Right, exactly. 580 00:34:07,360 --> 00:34:10,400 So if 400 people live in Mather, give or take, 581 00:34:10,400 --> 00:34:15,000 eventually this table is going to have 400 rows that say "Mather," "Mather," 582 00:34:15,000 --> 00:34:16,590 "Mather," "Mather," "Mather." 583 00:34:16,590 --> 00:34:19,820 We're wasting all of these bytes, and there's a couple of takeaways there. 584 00:34:19,820 --> 00:34:23,080 1, there's the crazy corner case where if someone pays a lot of money 585 00:34:23,080 --> 00:34:25,949 and renames Mather, we now have to change our whole database table. 586 00:34:25,949 --> 00:34:29,730 That's not going to happen often, though Pfoho was once called North House 15 years ago, 587 00:34:29,730 --> 00:34:32,310 so it happens. But that's not all that compelling. 588 00:34:32,310 --> 00:34:36,000 More compelling than a corner case like that of needing to update the data in bulk 589 00:34:36,000 --> 00:34:41,150 for a database is why are you storing M-A-T-H-E-R again and again and again and again? 590 00:34:41,150 --> 00:34:43,020 That's a lot of chars, 6 chars. 591 00:34:43,020 --> 00:34:45,500 Can't we do even better than that, especially for Pforzheimer? 592 00:34:45,500 --> 00:34:48,320 Surely we can do better than that many characters. 593 00:34:48,320 --> 00:34:51,790 Why not just associate a unique identifier with each house 594 00:34:51,790 --> 00:34:55,020 and store that for each user? So let's try this. 595 00:34:55,020 --> 00:35:00,610 Rather than just use the students table, let me go up to my lecture database up here at top left. 596 00:35:00,610 --> 00:35:02,600 Notice here it says Create table. 597 00:35:02,600 --> 00:35:04,550 Let me create a new table called houses. 598 00:35:04,550 --> 00:35:08,880 The number of columns is going to be 2. Enter. 599 00:35:08,880 --> 00:35:11,200 Now I have 2 fields. 600 00:35:11,200 --> 00:35:14,600 I'm going to call this the name, and it's going to be a varchar of length 255, 601 00:35:14,600 --> 00:35:18,770 >> but that's pretty arbitrary. Let me put this down here by convention. 602 00:35:18,770 --> 00:35:22,840 So put an ID up here. Let's give every house a unique identifier. 603 00:35:22,840 --> 00:35:25,360 Let's give every house a name. 604 00:35:25,360 --> 00:35:30,980 Let's specify that the identifier will be unsigned just by convention to only use positive numbers. 605 00:35:30,980 --> 00:35:35,020 Let's go ahead and give this an auto-increment field for now. 606 00:35:35,020 --> 00:35:38,160 And do we need anything else? 607 00:35:38,160 --> 00:35:41,010 Let's go ahead and click Save. 608 00:35:41,010 --> 00:35:42,480 Now I have a second table. 609 00:35:42,480 --> 00:35:45,860 Notice as an aside this is the slightly cryptic SQL command 610 00:35:45,860 --> 00:35:50,280 that you would have had to type manually if not using an administrative tool like phpMyAdmin. 611 00:35:50,280 --> 00:35:51,990 So another reason we use it. 612 00:35:51,990 --> 00:35:55,480 It's wonderfully useful sort of pedagogically because you can click around 613 00:35:55,480 --> 00:36:01,050 and figure out how things work by just copying and pasting what phpMyAdmin did. 614 00:36:01,050 --> 00:36:04,150 But the Create table command is what was just executed, and here is my table. 615 00:36:04,150 --> 00:36:11,370 Let me go ahead now and use raw SQL rather than oversimplify by clicking the Insert tab. 616 00:36:11,370 --> 00:36:15,040 Let me do INSERT INTO houses, 617 00:36:15,040 --> 00:36:22,230 and I'm going to say the name of the house is going to have a value of 'Mather'. 618 00:36:22,230 --> 00:36:24,790 That's it. This syntax is a little more cryptic. 619 00:36:24,790 --> 00:36:26,660 This is the name of the fields we want to insert. 620 00:36:26,660 --> 00:36:30,390 These are the values we want to insert into those fields. Let me click Go. 621 00:36:30,390 --> 00:36:34,410 1 row inserted took 0.02 seconds. Let me click Browse now. 622 00:36:34,410 --> 00:36:42,020 >> Notice if I click Browse, there's Mather, whose ID is by automation the number 1. 623 00:36:42,020 --> 00:36:45,000 Let me do another one. Let me go into the SQL tab. 624 00:36:45,000 --> 00:36:52,950 INSERT INTO houses. The name of the house is going to have a value of Pfoho and so forth. 625 00:36:52,950 --> 00:36:56,350 Go. And I can keep doing this again and again and again. 626 00:36:56,350 --> 00:36:59,470 Or if you get bored using phpMyAdmin, you can just use the Insert tab 627 00:36:59,470 --> 00:37:01,000 and not have to type the raw SQL. 628 00:37:01,000 --> 00:37:04,690 You can just bang it out more quickly by typing, for instance, Currier, Enter, 629 00:37:04,690 --> 00:37:07,610 and now if we click Browse, there's Currier with an ID of 3. 630 00:37:07,610 --> 00:37:09,920 So this is what we mean by auto-increment. 631 00:37:09,920 --> 00:37:12,280 But now we have to fix something in students. 632 00:37:12,280 --> 00:37:16,240 In students what should the data type of the house field now be? 633 00:37:16,240 --> 00:37:19,450 It should be an int, right? 634 00:37:19,450 --> 00:37:23,950 So the goal here is to factor out, otherwise known as normalize, the tables 635 00:37:23,950 --> 00:37:27,940 so that we don't store information redundantly in any of my tables. 636 00:37:27,940 --> 00:37:31,130 And again, the path we were on here is going to say Mather, Mather, 637 00:37:31,130 --> 00:37:34,220 Mather, Mather, Pfoho, Pfoho, Pfoho, Pfoho, which is very redundant 638 00:37:34,220 --> 00:37:36,240 in terms of the wastefulness of the chars. 639 00:37:36,240 --> 00:37:40,820 So let me go ahead and change this by clicking Structure, 640 00:37:40,820 --> 00:37:44,620 and let me go ahead and check off the house field, click Change, 641 00:37:44,620 --> 00:37:46,990 and now I'm going to change this to be an int. 642 00:37:46,990 --> 00:37:49,490 255 is no longer relevant. 643 00:37:49,490 --> 00:37:54,010 Let me go ahead and say that's fine if it's still NULL. Save. 644 00:37:54,010 --> 00:37:55,870 Now table students has been altered successfully, 645 00:37:55,870 --> 00:37:59,090 and notice again house is an int. 646 00:37:59,090 --> 00:38:02,220 As an aside, ignore the number in parentheses when it comes to ints. 647 00:38:02,220 --> 00:38:03,770 >> This is for legacy reasons. 648 00:38:03,770 --> 00:38:06,920 Back in the day when you didn't have GUIs, you instead had a command line environment, 649 00:38:06,920 --> 00:38:11,580 the 10 and 11 respectively specified how many characters you should show 650 00:38:11,580 --> 00:38:13,950 in the terminal window to actually display fields. 651 00:38:13,950 --> 00:38:19,150 It has nothing to do with the bit length of the actual field, so we'll just ignore that for now. 652 00:38:19,150 --> 00:38:20,990 Now I have to go into this table. 653 00:38:20,990 --> 00:38:24,610 And if David lives in Mather, house should not be 0, 654 00:38:24,610 --> 00:38:27,350 which is a default int value closest to NULL. 655 00:38:27,350 --> 00:38:29,810 He should live in house 1. 656 00:38:29,810 --> 00:38:36,870 Let's arbitrarily say that Mike lives in Pfoho, so house number 2. 657 00:38:36,870 --> 00:38:40,160 Now my table looks a little more cryptic. 658 00:38:40,160 --> 00:38:41,960 But consider the efficiency. 659 00:38:41,960 --> 00:38:44,860 I'm now using only 32 bits to identify the house, 660 00:38:44,860 --> 00:38:49,530 which means there's only 1 canonical definition of my house Mather and Pfoho 661 00:38:49,530 --> 00:38:52,090 and that's in the houses table. 662 00:38:52,090 --> 00:38:55,880 So if I want to now rejoin these tables, think of it this way. 663 00:38:55,880 --> 00:39:01,980 Here I have my students table, and on the right-hand side there's these numbers, 1 and 2. 664 00:39:01,980 --> 00:39:04,180 1 is Mather, 2 is Pfoho. 665 00:39:04,180 --> 00:39:08,580 We have those same numbers in this other table, which is called houses, 666 00:39:08,580 --> 00:39:11,020 1 and 2 and 3 for those 3 houses. 667 00:39:11,020 --> 00:39:14,990 What we now want to do is have the ability in code, PHP and SQL, 668 00:39:14,990 --> 00:39:18,800 to sort of rejoin these tables, where if these are the students and these are the houses, 669 00:39:18,800 --> 00:39:22,050 we want to somehow combine them so that 1 lines up with 1, 670 00:39:22,050 --> 00:39:25,670 2 lines up with 2, and so that we can figure out where David 671 00:39:25,670 --> 00:39:28,000 and where Mike and where everyone else lives. 672 00:39:28,000 --> 00:39:31,850 To do this we can execute a SQL query like the following. 673 00:39:31,850 --> 00:39:40,470 SELECT * FROM students JOIN houses ON-- 674 00:39:40,470 --> 00:39:43,000 And now what fields do we want to join on? 675 00:39:43,000 --> 00:39:49,520 So students.house = houses.id. 676 00:39:49,520 --> 00:39:54,150 >> A little cryptic, but this part means literally create a new temporary table 677 00:39:54,150 --> 00:39:56,690 that's the result of joining students and houses. 678 00:39:56,690 --> 00:40:00,340 And how do you want to combine the tips of my fingers here? 679 00:40:00,340 --> 00:40:05,280 Set the students' house field equal to the houses' ID field. 680 00:40:05,280 --> 00:40:10,220 And if I now click Go, I get back exactly what I hoped to. 681 00:40:10,220 --> 00:40:15,890 David is in Mather, Mike is in Pfoho, and I also see the unique identifiers. 682 00:40:15,890 --> 00:40:18,640 But the point is now I have a complete table. 683 00:40:18,640 --> 00:40:23,020 And so the takeaway here for pset 7 or really for the final project: 684 00:40:23,020 --> 00:40:25,830 If you find that you're storing any piece of information redundantly, 685 00:40:25,830 --> 00:40:28,850 whether it's a house, maybe it's a city, state, and ZIP 686 00:40:28,850 --> 00:40:32,050 where ZIP can usually but not always be used as a unique identifier, 687 00:40:32,050 --> 00:40:35,810 do go through the exercise mentally and then with something like phpMyAdmin 688 00:40:35,810 --> 00:40:40,660 of factoring out that common data because especially as your website gets more well used 689 00:40:40,660 --> 00:40:45,440 and more popular, this is how you make sure that everything is super fast, 690 00:40:45,440 --> 00:40:51,930 by giving the database as many hints as to uniqueness as possible. 691 00:40:51,930 --> 00:40:53,860 That was a lot. 692 00:40:53,860 --> 00:40:59,010 Any questions? All right. Let's take a 5-minute break there and regroup. 693 00:41:01,600 --> 00:41:03,540 All right. 694 00:41:03,540 --> 00:41:08,680 The following is an example that was used some years ago when I took CS161, 695 00:41:08,680 --> 00:41:10,960 which is the operating systems class at the college 696 00:41:10,960 --> 00:41:15,160 which is known for being amazing but a crazy amount of work, 697 00:41:15,160 --> 00:41:19,810 and it focuses really on some of the low-level problems that arise in operating systems 698 00:41:19,810 --> 00:41:22,700 and also even in the world of databases. 699 00:41:22,700 --> 00:41:27,040 >> The story that was told by my professor, Margo Seltzer, that year was as follows. 700 00:41:27,040 --> 00:41:30,990 Suppose that you have a little dorm fridge for you and your roommate 701 00:41:30,990 --> 00:41:34,030 and both of you really like milk. 702 00:41:34,030 --> 00:41:36,360 So you come home from class one day, your roommate is not yet there, 703 00:41:36,360 --> 00:41:39,650 you open the fridge, and you realize, "Oh damn, we're out of milk." 704 00:41:39,650 --> 00:41:42,070 So you close the fridge, you walk across the street to CVS 705 00:41:42,070 --> 00:41:45,830 and get in the increasingly long lines to buy some milk at CVS. 706 00:41:45,830 --> 00:41:48,470 Meanwhile, your roommate comes home from his or her class, 707 00:41:48,470 --> 00:41:51,690 comes into the room, opens the fridge really wanting some milk, 708 00:41:51,690 --> 00:41:54,130 opens the fridge and, "Damn, no milk." 709 00:41:54,130 --> 00:41:57,890 So he or she closes the fridge, walks out the door, and goes to ABP 710 00:41:57,890 --> 00:42:00,910 or somewhere other than CVS where you're not going to bump into each other 711 00:42:00,910 --> 00:42:02,790 to go get some milk. 712 00:42:02,790 --> 00:42:04,820 Of course a few minutes later, both of you get back home 713 00:42:04,820 --> 00:42:07,740 and now you have twice as much milk as you actually wanted. 714 00:42:07,740 --> 00:42:10,670 And being milk, now it's going to go bad because you like milk 715 00:42:10,670 --> 00:42:14,200 but you don't really like milk, so now you have too much milk, so it's going to sour. 716 00:42:14,200 --> 00:42:16,830 This is an awful, awful situation. 717 00:42:16,830 --> 00:42:22,920 What could have solved this predicament if you were the first roommate home? Yes. 718 00:42:22,920 --> 00:42:25,970 [student] You should have left a note. [laughter] 719 00:42:25,970 --> 00:42:28,090 Good. You should have left a note. 720 00:42:28,090 --> 00:42:32,320 You should have put a Post-it note or the like saying, "Gone for milk," 721 00:42:32,320 --> 00:42:36,830 and then your roommate conceptually would have been locked out of actually doing that. 722 00:42:36,830 --> 00:42:38,010 Or you could go 1 step further. 723 00:42:38,010 --> 00:42:41,060 You could literally lock the refrigerator with some kind of padlock, 724 00:42:41,060 --> 00:42:44,870 and now your roommate will literally be locked out of the fridge. 725 00:42:44,870 --> 00:42:48,520 If we generalize back to programming, 726 00:42:48,520 --> 00:42:51,610 you can almost think of the fridge as some kind of variable or a struct, 727 00:42:51,610 --> 00:42:53,500 some kind of container for information. 728 00:42:53,500 --> 00:42:58,290 The problem fundamentally here is that both of you were allowed to inspect 729 00:42:58,290 --> 00:43:02,370 or read the state of this data structure, 730 00:43:02,370 --> 00:43:08,050 but you viewed it at different times and yet both of you made a decision 731 00:43:08,050 --> 00:43:11,920 based on the state of the world at those different moments in time. 732 00:43:11,920 --> 00:43:15,570 So had you locked the refrigerator, you would have at least avoided your roommate 733 00:43:15,570 --> 00:43:19,070 from having been able to inspect the state of the world, 734 00:43:19,070 --> 00:43:22,530 so he or she could not have made that same decision. 735 00:43:22,530 --> 00:43:25,780 So databases, as it turns out, have this problem constantly. 736 00:43:25,780 --> 00:43:31,050 >> Let's see if we can construct a scenario. 737 00:43:31,050 --> 00:43:34,310 Suppose that you're sort of a bad guy and you go to Bank of America 738 00:43:34,310 --> 00:43:37,950 or one of the other places in the square that have a couple ATMs side by side, 739 00:43:37,950 --> 00:43:41,200 and somehow you figured out how to duplicate an ATM card--not all that hard. 740 00:43:41,200 --> 00:43:42,730 It's just a magnetic strip. 741 00:43:42,730 --> 00:43:45,180 And so what you want to try to do is play this game 742 00:43:45,180 --> 00:43:49,060 whereby you put 1 card into 1 machine, another card into the other machine, 743 00:43:49,060 --> 00:43:51,980 and you essentially want to try to withdraw money simultaneously, 744 00:43:51,980 --> 00:43:54,930 because imagine that story goes as follows. 745 00:43:54,930 --> 00:43:57,350 The machine on the left takes your card and your PIN, 746 00:43:57,350 --> 00:44:00,240 and then you say, "Give me $100." 747 00:44:00,240 --> 00:44:04,790 The ATM is programmed to first do a select on its database or the equivalent-- 748 00:44:04,790 --> 00:44:10,780 whatever database it's using--to see does this user have at least $100 in his or her account? 749 00:44:10,780 --> 00:44:16,180 If so, then spit out the $100 and subtract $100 from their balance. 750 00:44:16,180 --> 00:44:20,470 But of course if there's multiple machines here or multiple ways of inspecting 751 00:44:20,470 --> 00:44:23,560 the state of that world, the bank vault, to see how much money you have, 752 00:44:23,560 --> 00:44:26,780 suppose that just by chance the machine on the left and the right 753 00:44:26,780 --> 00:44:30,140 both ask that question at roughly the same moment in time. 754 00:44:30,140 --> 00:44:34,160 >> And this can certainly happen. ATMs are computers these days. 755 00:44:34,160 --> 00:44:37,670 So if the machine on the left says, "Yes, you have at least $100," 756 00:44:37,670 --> 00:44:42,150 meanwhile the machine on the right says, "Yes, you have at least $100," 757 00:44:42,150 --> 00:44:47,420 then both of them proceed to finish their programs and actually spit out the $100 758 00:44:47,420 --> 00:44:50,820 and say, "Previously you had $200." 759 00:44:50,820 --> 00:44:54,890 "Let me update the variable to now be $100 left in the account." 760 00:44:54,890 --> 00:44:58,780 But if both of them have checked your account balance and found that it's $200 761 00:44:58,780 --> 00:45:02,000 and both of them then do the math and say 200 - 100, 762 00:45:02,000 --> 00:45:06,990 the machines have potentially spit out two $100 bills in each machine, 763 00:45:06,990 --> 00:45:11,360 but they've only updated your sum account balance to be $100. 764 00:45:11,360 --> 00:45:15,130 In other words, you've taken out $200, but because they inspected the state of the world 765 00:45:15,130 --> 00:45:18,840 simultaneously and then made a decision based on that value, 766 00:45:18,840 --> 00:45:21,930 they might not do the math ultimately correctly. 767 00:45:21,930 --> 00:45:25,520 So in a bank situation too you really want to have some kind of lockout 768 00:45:25,520 --> 00:45:28,450 so that as soon as you've checked the state of some variable 769 00:45:28,450 --> 00:45:31,220 that's really important, like your account balance, 770 00:45:31,220 --> 00:45:36,070 don't let anyone else make decisions based on that until you are done doing your thing, 771 00:45:36,070 --> 00:45:38,920 where in this case you are the ATM on the left. 772 00:45:38,920 --> 00:45:41,160 Lock everyone else out. 773 00:45:41,160 --> 00:45:44,650 You can actually achieve this effect in a couple of different ways. 774 00:45:44,650 --> 00:45:48,660 >> The simplest way in MySQL is a line of SQL that we gave you 775 00:45:48,660 --> 00:45:52,030 in the problem set specification that looks exactly like this. 776 00:45:52,030 --> 00:45:57,420 Insert into the table--whatever it's called--an id, a symbol, and a share, a number of shares, 777 00:45:57,420 --> 00:45:59,660 the following values, for instance. 778 00:45:59,660 --> 00:46:03,370 If you haven't read the spec yet, this is an example involving how do you go about 779 00:46:03,370 --> 00:46:07,340 buying 10 shares of this penny stock for President Skroob, 780 00:46:07,340 --> 00:46:10,340 whose user ID happens to be the number 7? 781 00:46:10,340 --> 00:46:14,070 This says INSERT INTO table the following id, symbol, and number of shares 782 00:46:14,070 --> 00:46:18,200 of 7, 'DVN.V', and 10. 783 00:46:18,200 --> 00:46:21,510 But--but, but, but--the second line is the important one. 784 00:46:21,510 --> 00:46:26,310 ON DUPLICATE KEY UPDATE shares = shares + VALUES(shares). 785 00:46:26,310 --> 00:46:28,350 So totally cryptic-looking at first glance. 786 00:46:28,350 --> 00:46:31,990 But the fact that this SQL query, even though it wraps onto 2 lines, 787 00:46:31,990 --> 00:46:35,920 is 1 long query, it means it's atomic 788 00:46:35,920 --> 00:46:41,000 in the sense that this query will either be executed all together or not at all. 789 00:46:41,000 --> 00:46:45,100 And by definition of MySQL, that's how they implemented this query. 790 00:46:45,100 --> 00:46:51,010 It is by definition in the manual guaranteed to execute all at once or not at all. 791 00:46:51,010 --> 00:46:54,020 The motivation for this is as follows. 792 00:46:54,020 --> 00:46:58,540 If in this case you are trying to buy 10 shares of stock, 793 00:46:58,540 --> 00:47:02,260 it's kind of the same story as the milk, it's kind of the same story as the ATM. 794 00:47:02,260 --> 00:47:04,970 >> If you make the mistake of not using this syntax 795 00:47:04,970 --> 00:47:09,610 but instead selecting from the database to see how many shares of this penny stock 796 00:47:09,610 --> 00:47:13,750 does President Skroob have, and suppose he has 10 shares, 797 00:47:13,750 --> 00:47:19,330 and then some split second later you then do an UPDATE statement, 798 00:47:19,330 --> 00:47:24,810 which is another statement in SQL that says go ahead and add 10 more shares 799 00:47:24,810 --> 00:47:28,700 to his current 10 so that ideally the total is 20, 800 00:47:28,700 --> 00:47:33,490 the problem is because in today's database systems and because in today's computers 801 00:47:33,490 --> 00:47:35,990 you have multiple processors, multiple cores-- 802 00:47:35,990 --> 00:47:38,920 in other words, computers can literally be doing multiple things at once-- 803 00:47:38,920 --> 00:47:44,270 there's no guarantee that your SELECT and your UPDATE in this case 804 00:47:44,270 --> 00:47:46,150 are going to happen back to back. 805 00:47:46,150 --> 00:47:49,140 So a bad scenario would be you do the SELECT 806 00:47:49,140 --> 00:47:51,670 to see how many shares of this penny stock does Skroob have, 807 00:47:51,670 --> 00:47:54,710 and then just by chance another database query is executed-- 808 00:47:54,710 --> 00:47:57,740 maybe its Skroob in another browser window trying to buy 10 shares 809 00:47:57,740 --> 00:48:00,700 in another window altogether, much like the ATM-- 810 00:48:00,700 --> 00:48:05,410 and suppose that another query gets in between SELECT and the UPDATE. 811 00:48:05,410 --> 00:48:10,210 It could be the case that Skroob now loses some number of shares 812 00:48:10,210 --> 00:48:14,340 because another process is inspecting the state of his world, 813 00:48:14,340 --> 00:48:17,800 or he gets more shares than he should have. 814 00:48:17,800 --> 00:48:23,250 We won't go into the particulars of exactly what those particular story lines would be, 815 00:48:23,250 --> 00:48:28,380 but the point is if you have to check a variables value and then make a decision, 816 00:48:28,380 --> 00:48:32,500 if there's a risk of someone else doing something in between those 2 statements, 817 00:48:32,500 --> 00:48:36,220 as can happen in multiprocessor systems, in multicore systems, 818 00:48:36,220 --> 00:48:41,220 computers with the ability to do multiple things at once, bad things can happen 819 00:48:41,220 --> 00:48:44,530 like bank accounts being debited incorrectly, buying twice as much milk, 820 00:48:44,530 --> 00:48:46,730 or in this case the wrong number of shares. 821 00:48:46,730 --> 00:48:48,370 But there's an easier way to think about this. 822 00:48:48,370 --> 00:48:53,290 >> It turns out that SQL also supports, if you configure your table correctly, 823 00:48:53,290 --> 00:48:56,920 something called transactions, which I would argue is actually even easier to understand 824 00:48:56,920 --> 00:49:00,650 than this, but it's not a 1-liner, so it's actually a bit more involved. 825 00:49:00,650 --> 00:49:04,960 There is literally a statement in SQL called START TRANSACTION. 826 00:49:04,960 --> 00:49:08,300 Just like there's SELECT, UPDATE, INSERT, DELETE, and JOIN and a bunch of others, 827 00:49:08,300 --> 00:49:10,970 there are keywords like START TRANSACTION. 828 00:49:10,970 --> 00:49:13,560 And what you then do in the context of pset 7-- 829 00:49:13,560 --> 00:49:17,270 you don't have to do this for pset 7; it's explicitly disclaimed as not necessary, 830 00:49:17,270 --> 00:49:18,830 but for final projects it can be useful-- 831 00:49:18,830 --> 00:49:22,820 if you call a query of START TRANSACTION and then another query 832 00:49:22,820 --> 00:49:25,620 and then another query and then another, another, and another, 833 00:49:25,620 --> 00:49:31,860 those queries will not actually be executed until you call the SQL statement COMMIT, 834 00:49:31,860 --> 00:49:37,220 at which point, whether it's 2 statements or 20 statements, they will all be executed at once, 835 00:49:37,220 --> 00:49:42,770 which means no one else can accidentally buy too much milk or debit too much money 836 00:49:42,770 --> 00:49:46,340 or buy too many shares because all of your queries will execute 837 00:49:46,340 --> 00:49:48,410 back to back to back to back. 838 00:49:48,410 --> 00:49:51,580 And this is super important, especially when you're doing something like this. 839 00:49:51,580 --> 00:49:54,900 This is an arbitrary example that says let's update the bank account 840 00:49:54,900 --> 00:50:00,200 by setting a balance equal to balance - $1000 where the account number is 2. 841 00:50:00,200 --> 00:50:04,260 And then the second statement is now let's deposit that $1000 842 00:50:04,260 --> 00:50:07,310 into someone else's bank account whose account number is 1. 843 00:50:07,310 --> 00:50:10,400 >> In other words, this is a perfect example of where you want to make sure 844 00:50:10,400 --> 00:50:13,590 that both of these statements happen or not at all 845 00:50:13,590 --> 00:50:15,450 because otherwise the customer is going to get screwed 846 00:50:15,450 --> 00:50:17,670 and you're going to take their money and not deposit it elsewhere, 847 00:50:17,670 --> 00:50:20,470 or the bank is going to get screwed where you're going to deposit the money 848 00:50:20,470 --> 00:50:23,140 but not actually subtract it from the user's account. 849 00:50:23,140 --> 00:50:25,810 So you want both of them to execute together. 850 00:50:25,810 --> 00:50:29,140 Thus enters into the world transactions. 851 00:50:29,140 --> 00:50:31,360 So that's something to keep in the back of your mind, 852 00:50:31,360 --> 00:50:34,710 not so much for the purposes of just a final project, 853 00:50:34,710 --> 00:50:36,700 but if you want to take your final project somewhere, 854 00:50:36,700 --> 00:50:39,040 if you want to start up some company around it, 855 00:50:39,040 --> 00:50:41,270 if you want to solve some student group's problem on campus 856 00:50:41,270 --> 00:50:45,210 and actually have a live, active website, these are the sort of subtle bugs that can arise 857 00:50:45,210 --> 00:50:49,480 if you don't quite think through what can happen if 2 people 858 00:50:49,480 --> 00:50:54,190 are trying to access your website at literally the same moment in time, 859 00:50:54,190 --> 00:50:56,890 whereby their queries might otherwise get interwoven. 860 00:50:58,840 --> 00:51:01,420 >> Ready for some JavaScript, a teaser thereof? 861 00:51:01,420 --> 00:51:04,320 This is our last language for the semester. All right. 862 00:51:04,320 --> 00:51:09,940 Thankfully, JavaScript looks very, very, very similar to the 2 languages, C and PHP, 863 00:51:09,940 --> 00:51:11,140 we've done thus far. 864 00:51:11,140 --> 00:51:14,340 There's no JavaScript in pset 7, but it's an incredibly useful tool 865 00:51:14,340 --> 00:51:18,840 when it comes to doing web-based final projects or really just web programming more generally. 866 00:51:18,840 --> 00:51:20,950 So a quick overview of something called DOM. 867 00:51:20,950 --> 00:51:23,600 Here is a super simple web page that really just says hello, world 868 00:51:23,600 --> 00:51:25,970 both in the title and in the body. 869 00:51:25,970 --> 00:51:29,270 As the indentation has been suggesting for some time, 870 00:51:29,270 --> 00:51:31,380 there is indeed a hierarchy to web pages. 871 00:51:31,380 --> 00:51:34,220 I could draw this same snippet of HTML as a tree, 872 00:51:34,220 --> 00:51:37,470 thinking back to our discussions of data structures in C, as follows. 873 00:51:37,470 --> 00:51:40,710 I have some special root node called the document node, 874 00:51:40,710 --> 00:51:43,650 and we'll see the analog of this in JavaScript in just a moment. 875 00:51:43,650 --> 00:51:48,330 The first child and only child of that in this case is the HTML tag. 876 00:51:48,330 --> 00:51:49,880 There's no direct mapping of the doctype. 877 00:51:49,880 --> 00:51:53,170 That's a special thing, so we should just ignore it when it comes to this DOM, 878 00:51:53,170 --> 00:51:55,810 this Document Object Model tree. 879 00:51:55,810 --> 00:51:59,530 Notice that the HTML tag, which I've depicted arbitrarily as a rectangle, 880 00:51:59,530 --> 00:52:02,890 has 2 children: head and body. 881 00:52:02,890 --> 00:52:04,840 >> Those are similarly drawn as rectangles. 882 00:52:04,840 --> 00:52:08,970 It is meaningful pictorially that head is to the left of body. 883 00:52:08,970 --> 00:52:11,960 The implication is that head comes first in the tree. 884 00:52:11,960 --> 00:52:14,910 So there's actually an ordering to a tree when you draw it like this, 885 00:52:14,910 --> 00:52:17,460 even though the shapes and whatnot are arbitrary. 886 00:52:17,460 --> 00:52:20,360 Head meanwhile has a single child called title, 887 00:52:20,360 --> 00:52:25,170 and title actually has its own child, which is "hello, world", 888 00:52:25,170 --> 00:52:32,210 which I deliberately drew as an oval here to make it slightly different from the rectangle. 889 00:52:32,210 --> 00:52:37,420 These rectangles are elements, whereas hello, world is really a text node. 890 00:52:37,420 --> 00:52:39,850 So it's a node in the tree, but it's a different type of node 891 00:52:39,850 --> 00:52:41,730 so I drew it arbitrarily differently. 892 00:52:41,730 --> 00:52:45,000 Similarly does body have a child called hello, world as well, 893 00:52:45,000 --> 00:52:47,910 so different node even though they're coincidentally the same text, 894 00:52:47,910 --> 00:52:52,100 but I've drawn it using the same shape. So who cares? 895 00:52:52,100 --> 00:52:56,820 Well, what's nice about HTML is that it does have this hierarchical nature. 896 00:52:56,820 --> 00:53:01,010 And what's nice about JavaScript and particularly libraries that are freely available 897 00:53:01,010 --> 00:53:07,120 and popular like jQuery, you can navigate the tree structure so amazingly easy. 898 00:53:07,120 --> 00:53:11,790 Any of the stuff we did in C with pointers and traversing trees and recursing on nodes 899 00:53:11,790 --> 00:53:15,300 left child to right child, all of a sudden we can sort of take for granted 900 00:53:15,300 --> 00:53:19,450 as being amazingly enlightening if not a bit frustrating 901 00:53:19,450 --> 00:53:22,470 but not nearly an efficient way to go about programming. 902 00:53:22,470 --> 00:53:24,470 And so with these higher level languages like JavaScript 903 00:53:24,470 --> 00:53:28,340 we'll be able to navigate this tree much more intuitively. 904 00:53:28,340 --> 00:53:30,430 >> And indeed the syntax is going to be quite familiar. 905 00:53:30,430 --> 00:53:32,950 If you've never seen JavaScript before, this is a really nice reference 906 00:53:32,950 --> 00:53:35,910 from the Mozilla folks, the people who make Firefox, 907 00:53:35,910 --> 00:53:38,370 so do feel free to browse that at your convenience. 908 00:53:38,370 --> 00:53:41,590 What you'll find--and these slides are identical to what we used the other day-- 909 00:53:41,590 --> 00:53:44,030 similarly, main is gone. 910 00:53:44,030 --> 00:53:47,010 So when you write a program in JavaScript, there is no main function. 911 00:53:47,010 --> 00:53:48,690 You just start writing code. 912 00:53:48,690 --> 00:53:51,660 But a key distinction between JavaScript and C and PHP 913 00:53:51,660 --> 00:53:55,890 is that whereas C and PHP thus far have been executed server side 914 00:53:55,890 --> 00:53:59,180 by the appliance in this case or more generally by a server, 915 00:53:59,180 --> 00:54:04,270 JavaScript by design is usually executed by a browser. 916 00:54:04,270 --> 00:54:08,440 In other words, you might write JavaScript code, as we're about to, 917 00:54:08,440 --> 00:54:13,080 on a server in the appliance, but you include it among your HTML, among your CSS, 918 00:54:13,080 --> 00:54:16,100 among your GIFs and your PNGs and your JPEGs 919 00:54:16,100 --> 00:54:19,170 so that when the user visits your web page, if you're using JavaScript, 920 00:54:19,170 --> 00:54:21,770 that JavaScript code comes from server to browser, 921 00:54:21,770 --> 00:54:24,540 and it's the browser that actually executes it. 922 00:54:24,540 --> 00:54:27,960 So this has meaningful implications for even intellectual property. 923 00:54:27,960 --> 00:54:32,600 It's kind of silly to even think about protecting your IP when it comes to JavaScript code 924 00:54:32,600 --> 00:54:37,560 because by nature of the language it gets executed usually browser side. 925 00:54:37,560 --> 00:54:40,360 >> You can obfuscate it, which means you can make it look crazy and ugly 926 00:54:40,360 --> 00:54:45,400 with no whitespace, horrible variable names, to make it harder for people to steal your IP, 927 00:54:45,400 --> 00:54:48,120 but the key is that it is executed browser side. 928 00:54:48,120 --> 00:54:51,790 Even though as an aside JavaScript can be used server side, 929 00:54:51,790 --> 00:54:54,480 the most common use case right now is still on the browser. 930 00:54:54,480 --> 00:54:59,800 And here's what it looks like. Here is an if-else if-else construct just like C, just like PHP. 931 00:54:59,800 --> 00:55:02,420 Here is a Boolean expression when you "or" 2 things together. 932 00:55:02,420 --> 00:55:04,330 Here is when you "and" 2 things together. 933 00:55:04,330 --> 00:55:08,300 Here is a switch statement, which is similar to PHP 934 00:55:08,300 --> 00:55:10,810 in that you can switch on different types of values. 935 00:55:10,810 --> 00:55:15,180 Loops similarly have for loops here, which are structured identically to what we've seen before. 936 00:55:15,180 --> 00:55:18,110 While loops; we've got do while loops. 937 00:55:18,110 --> 00:55:20,290 Variables, ever so slightly different. 938 00:55:20,290 --> 00:55:24,560 You do declare variables like you do in PHP and C, 939 00:55:24,560 --> 00:55:27,860 but similarly is JavaScript weakly typed. 940 00:55:27,860 --> 00:55:32,730 You don't specify int or float or string or anything like that usually. 941 00:55:32,730 --> 00:55:34,240 You can specify var. 942 00:55:34,240 --> 00:55:38,040 You don't have to specify var, but it has implications if you don't. 943 00:55:38,040 --> 00:55:42,000 Usually if you omit var, you accidentally create a global variable instead of local. 944 00:55:42,000 --> 00:55:46,420 So let me propose that you almost always just say var and then the name of the variable. 945 00:55:46,420 --> 00:55:48,740 It's not a type, it's just var for variable. 946 00:55:48,740 --> 00:55:52,930 This would be an example, whether it's 123 or "hello, world". 947 00:55:52,930 --> 00:55:58,910 Arrays are present and syntactically similar to PHP. 948 00:55:58,910 --> 00:56:03,690 I'll say var numbers and then I use square brackets again to declare a variable 949 00:56:03,690 --> 00:56:08,870 whose type is array that has these particular numbers in it separated by commas. 950 00:56:08,870 --> 00:56:11,740 And then lastly, this is the only one that really looks different. 951 00:56:11,740 --> 00:56:16,700 Recall that in PHP we would have implemented an associative array for a student 952 00:56:16,700 --> 00:56:20,220 like Zamyla that might look like this, where the variable is called student. 953 00:56:20,220 --> 00:56:23,370 The square brackets mean here comes an array. 954 00:56:23,370 --> 00:56:28,500 >> The fact that I'm not using numeric indices but strings--id, house, and name-- 955 00:56:28,500 --> 00:56:30,990 means that this is an associative array, 956 00:56:30,990 --> 00:56:34,490 and these arrows with the equals sign and the angled bracket 957 00:56:34,490 --> 00:56:37,310 means that the key is "id", the value is 1; 958 00:56:37,310 --> 00:56:39,310 the key is "house", the value is Winthrop House; 959 00:56:39,310 --> 00:56:41,800 the key is "name", the value is Zamyla Chan. 960 00:56:41,800 --> 00:56:47,110 So there's 3 keys inside of this associative array, each of which has its own value. 961 00:56:47,110 --> 00:56:52,880 We've seen that in pset 7, or you soon will, in JavaScript same idea, 962 00:56:52,880 --> 00:56:55,220 but it's going to look like this. 963 00:56:55,220 --> 00:57:00,070 So var student--no dollar sign and no mention of type still but var-- 964 00:57:00,070 --> 00:57:05,860 equals and then open curly braces because in JavaScript when you have key value pairs, 965 00:57:05,860 --> 00:57:08,900 you actually use something called an object. 966 00:57:08,900 --> 00:57:13,490 And those of you who did take APCS or the like might recall objects from Java 967 00:57:13,490 --> 00:57:15,140 or similar languages. 968 00:57:15,140 --> 00:57:17,880 JavaScript is not Java, first of all. 969 00:57:17,880 --> 00:57:21,600 It was a deliberate design decision years ago to knock off something else that was popular, 970 00:57:21,600 --> 00:57:25,640 its name, even though it has no fundamental relation to Java itself. 971 00:57:25,640 --> 00:57:31,490 JavaScript has objects, and you create them by way of the curly brace notation. 972 00:57:31,490 --> 00:57:36,710 Objects in JavaScript are pretty much equivalent to associative arrays in PHP 973 00:57:36,710 --> 00:57:40,030 when it comes to storing data inside of them. 974 00:57:40,030 --> 00:57:44,100 >> But even more powerfully in JavaScript can you associate very easily functions 975 00:57:44,100 --> 00:57:48,040 inside of an object, and though you can do this in other languages, 976 00:57:48,040 --> 00:57:50,040 it's quite a common paradigm, as we'll see. 977 00:57:50,040 --> 00:57:54,380 In short, this object represents a student, who is particularly Zamyla, 978 00:57:54,380 --> 00:58:00,380 and it's similar conceptually, just syntactically different from this. 979 00:58:00,380 --> 00:58:03,840 Let's actually use JavaScript in a file. 980 00:58:03,840 --> 00:58:05,570 It turns out there's a script tag. 981 00:58:05,570 --> 00:58:08,180 We've seen a style tag and we've seen other HTML tags. 982 00:58:08,180 --> 00:58:11,510 The script tag actually will contain some JavaScript code. 983 00:58:11,510 --> 00:58:15,500 Let me go into the appliance where we have some source code pre-made. 984 00:58:15,500 --> 00:58:18,700 I haven't posted it yet on the website, but I'll do that after class. 985 00:58:18,700 --> 00:58:21,770 Let's open up this one, blink.html. 986 00:58:21,770 --> 00:58:27,560 Back in the 1990s, there was literally an HTML tag called the blink tag, 987 00:58:27,560 --> 00:58:30,340 and this was one of the most wonderfully overused tags on the Internet 988 00:58:30,340 --> 00:58:36,140 whereby you'd visit some 1990s style web page and start seeing text flashing you like this, 989 00:58:36,140 --> 00:58:39,810 the results of the marquis tag, which had text going like this. 990 00:58:39,810 --> 00:58:45,070 One of the few times where the world has actually agreed on a web standard, 991 00:58:45,070 --> 00:58:48,250 everyone across the board killed the blink tag some years ago. 992 00:58:48,250 --> 00:58:52,860 But we can resurrect it with JavaScript as a demonstration of the power you have 993 00:58:52,860 --> 00:58:56,660 when you can write a program inside of a web page. 994 00:58:56,660 --> 00:59:00,240 First let's skip over the new stuff and focus only on the old. 995 00:59:00,240 --> 00:59:01,780 >> Here is the old stuff in this example. 996 00:59:01,780 --> 00:59:06,350 I have an HTML tag, a head tag, and a title tag. 997 00:59:06,350 --> 00:59:11,210 Then I have a body tag here with a div, which recall is just a rectangular division of the page 998 00:59:11,210 --> 00:59:14,720 that I've given a unique ID arbitrarily of "greeting" to, 999 00:59:14,720 --> 00:59:18,320 just so I have a way of uniquely referring to it, that has some very simple text: 1000 00:59:18,320 --> 00:59:20,220 hello, world. 1001 00:59:20,220 --> 00:59:23,940 Now let me scroll up to the top of this file and see what's new. 1002 00:59:23,940 --> 00:59:27,710 The first thing that's new up top is the script tag, 1003 00:59:27,710 --> 00:59:31,280 and inside of the script tag notice I've declared a function. 1004 00:59:31,280 --> 00:59:34,610 To declare a function in JavaScript, pretty similar to PHP, 1005 00:59:34,610 --> 00:59:37,930 you literally write function then the name of the function, parentheses, 1006 00:59:37,930 --> 00:59:40,400 and maybe some arguments if it takes any. 1007 00:59:40,400 --> 00:59:43,510 Then I've got my curly brace as usual, and now we have some slightly new code, 1008 00:59:43,510 --> 00:59:45,230 but let's see what this means. 1009 00:59:45,230 --> 00:59:48,670 So var div, this just means give me a variable called div. 1010 00:59:48,670 --> 00:59:50,530 I could have called it foo, but I wanted it to be called div 1011 00:59:50,530 --> 00:59:52,620 for reasons that will be clear in a second. 1012 00:59:52,620 --> 00:59:57,480 Then it turns out in JavaScript--and this is JavaScript code embedded in my web page-- 1013 00:59:57,480 --> 01:00:01,760 there is a special global variable of sorts called document. 1014 01:00:01,760 --> 01:00:04,780 JavaScript is in fact an object-oriented language. 1015 01:00:04,780 --> 01:00:07,230 We won't go into detail in 50 as to what that means, 1016 01:00:07,230 --> 01:00:11,180 but for now know that an object is pretty much like a struct. 1017 01:00:11,180 --> 01:00:14,740 Like we saw way back when in one of the earliest problem sets 1018 01:00:14,740 --> 01:00:17,150 where we put a lot of information in a struct, 1019 01:00:17,150 --> 01:00:21,330 similarly is document a special struct that comes with the browser, 1020 01:00:21,330 --> 01:00:24,810 comes with any web page. It's not something I created. 1021 01:00:24,810 --> 01:00:28,210 Inside of this document structure, though, you have not only data 1022 01:00:28,210 --> 01:00:30,010 but you also have functions. 1023 01:00:30,010 --> 01:00:34,090 >> And any time you have a function inside of a structure, inside of an object, 1024 01:00:34,090 --> 01:00:36,490 it's called a method. But it's the same thing. 1025 01:00:36,490 --> 01:00:40,110 A method is a function that just so happens to be inside of something else. 1026 01:00:40,110 --> 01:00:42,990 So this means that this special global variable called document 1027 01:00:42,990 --> 01:00:47,690 has a function called getElementById that literally does that. 1028 01:00:47,690 --> 01:00:52,460 It will get you an element from the DOM, Document Object Model tree, 1029 01:00:52,460 --> 01:00:55,520 whose ID is in this case greeting. 1030 01:00:55,520 --> 01:00:59,200 In other words, all that time we spent on data structures comes into play here. 1031 01:00:59,200 --> 01:01:01,400 This picture of a DOM that we had a moment ago, 1032 01:01:01,400 --> 01:01:06,100 even though the page is a little different, if I had a div in this picture, 1033 01:01:06,100 --> 01:01:11,180 what document.getElementById would return to me would effectively be a pointer 1034 01:01:11,180 --> 01:01:15,440 to the rectangle in the tree, a reference to the rectangle in the tree. 1035 01:01:15,440 --> 01:01:18,410 So that's what it means to actually call one of those functions. 1036 01:01:18,410 --> 01:01:21,960 In this case again it's a div. It's not a body or a title. 1037 01:01:21,960 --> 01:01:26,480 So let's see what I then do with this div now that I have it inside of this variable called div. 1038 01:01:26,480 --> 01:01:32,580 It turns out with JavaScript you have the ability to tweak the CSS of your page dynamically. 1039 01:01:32,580 --> 01:01:39,060 Up until now, all of the CSS we've done, albeit limited, is in style attributes, 1040 01:01:39,060 --> 01:01:41,730 or where else have we put CSS? 1041 01:01:42,730 --> 01:01:45,810 I kind of spoiled that one. In the style tag at the top of the file. 1042 01:01:45,810 --> 01:01:49,180 Or third place has been in? 1043 01:01:50,710 --> 01:01:54,590 >> An external file, something .css. 1044 01:01:54,590 --> 01:01:56,730 So those are the 3 places we've done CSS thus far, 1045 01:01:56,730 --> 01:01:59,310 but the catch is we've hard coded it all. 1046 01:01:59,310 --> 01:02:04,060 You decided as you dove into pset 7, we decided before lecture what our CSS would be. 1047 01:02:04,060 --> 01:02:07,380 But if you want to change your CSS, you can actually do that 1048 01:02:07,380 --> 01:02:09,370 once you have an actual programming language. 1049 01:02:09,370 --> 01:02:13,910 CSS, HTML--not programming languages. JavaScript is. 1050 01:02:13,910 --> 01:02:18,200 So it turns out that as soon as you have one of those rectangles from the tree 1051 01:02:18,200 --> 01:02:23,050 called the DOM, it has itself some data inside of it. 1052 01:02:23,050 --> 01:02:27,820 So the div that I just grabbed from the tree has what we'll call a property inside of it 1053 01:02:27,820 --> 01:02:34,390 called style, and the style property has itself a property called visibility. 1054 01:02:34,390 --> 01:02:37,330 I would know this only by looking up a CSS user's manual. 1055 01:02:37,330 --> 01:02:41,160 It turns out there's a visibility CSS property that does what it says. 1056 01:02:41,160 --> 01:02:44,530 It makes something visible or not, visible or not. 1057 01:02:44,530 --> 01:02:46,810 And how you do that is this. 1058 01:02:46,810 --> 01:02:50,510 I'm asking programmatically if the visibility of this div is hidden, 1059 01:02:50,510 --> 01:02:53,390 what do I change it to? Visible. 1060 01:02:53,390 --> 01:02:58,840 Else if the visibility of this page is not hidden, logically I do make it hidden. 1061 01:02:58,840 --> 01:03:04,070 I have no idea why it's visible and hidden and not visible and invisible. 1062 01:03:04,070 --> 01:03:06,000 This was a poor design decision along the way. 1063 01:03:06,000 --> 01:03:09,530 But those are indeed opposites in CSS: visible and hidden. 1064 01:03:09,530 --> 01:03:15,520 All this does is it means change the CSS of my file on and off, on and off 1065 01:03:15,520 --> 01:03:16,870 for that particular div. 1066 01:03:16,870 --> 01:03:20,630 But again, this is a function called blink. When is the blink function called? 1067 01:03:20,630 --> 01:03:24,080 It turns out that there's another special global variable called window, 1068 01:03:24,080 --> 01:03:28,220 similar in spirit to document, but whereas the document refers to your web page, 1069 01:03:28,220 --> 01:03:31,700 like the DOM tree, the HTML you sent from the server, 1070 01:03:31,700 --> 01:03:35,250 window refers to the chrome around it, the address bar, the title bar, 1071 01:03:35,250 --> 01:03:37,880 and all of that stuff around your web page. 1072 01:03:37,880 --> 01:03:42,800 >> And it turns out that the window object has a special function inside of it called setInterval 1073 01:03:42,800 --> 01:03:44,360 that does what it says. 1074 01:03:44,360 --> 01:03:48,600 It will set an interval--in this case every 500 milliseconds-- 1075 01:03:48,600 --> 01:03:52,270 and, take a guess, what's it going to do every 500 milliseconds? 1076 01:03:52,270 --> 01:03:55,240 It's going to execute that function blink. 1077 01:03:55,240 --> 01:03:58,560 And what's nice here is that we could have done this in C even though we never did. 1078 01:03:58,560 --> 01:04:01,580 C does have something called function pointers where you can pass functions around 1079 01:04:01,580 --> 01:04:03,140 as arguments. 1080 01:04:03,140 --> 01:04:07,620 Similarly in JavaScript can you pass the name of a function into another function. 1081 01:04:07,620 --> 01:04:10,630 And notice what I'm doing. I'm not doing this. 1082 01:04:10,630 --> 01:04:14,380 If I put parentheses after the blink, that would mean call the blink function. 1083 01:04:14,380 --> 01:04:17,430 If I omit them, that means here is the blink function 1084 01:04:17,430 --> 01:04:21,330 so that setInterval can call it every 500 milliseconds. 1085 01:04:21,330 --> 01:04:28,200 So the end result, atrocious though it is, is that if I go into localhost and go to blink.html, 1086 01:04:28,200 --> 01:04:32,120 I now have this happening again and again. 1087 01:04:32,120 --> 01:04:34,950 And if I actually Inspect Element, let's see if we can see this. 1088 01:04:34,950 --> 01:04:38,550 Let me Inspect Element, let me scroll down just a little bit, 1089 01:04:38,550 --> 01:04:44,320 let me choose Elements over here, and notice the DOM inside of Chrome's inspector. 1090 01:04:44,320 --> 01:04:48,840 It's literally changing back and forth every 500 milliseconds. 1091 01:04:48,840 --> 01:04:55,660 If we go to our friend Nate, 1092 01:04:55,660 --> 01:05:00,020 if you ever wondered how this is working, similar idea with an interval, 1093 01:05:00,020 --> 01:05:04,810 but Nate is actually making very effective use of color in this particular case here. 1094 01:05:04,810 --> 01:05:07,350 So what more can we actually do with this? 1095 01:05:07,350 --> 01:05:09,990 Let's open up another example and try something 1096 01:05:09,990 --> 01:05:12,940 that's programmatically even more useful than making things blink. 1097 01:05:12,940 --> 01:05:17,990 Let me go into our forms directory today and go into form0. 1098 01:05:17,990 --> 01:05:20,820 This was the ugliest possible form that I could come up with, 1099 01:05:20,820 --> 01:05:23,290 and let me just show you what it looks like in a browser. 1100 01:05:23,290 --> 01:05:28,960 >> Let me go into localhost/forms, and this is form0. 1101 01:05:28,960 --> 01:05:33,400 This is a super ugly HTML form that has a few fields for email, for password, 1102 01:05:33,400 --> 01:05:37,190 password, and then a little checkbox to agree to some terms and conditions. 1103 01:05:37,190 --> 01:05:41,350 The catch is if I visit this form and I don't want to give you my email address, 1104 01:05:41,350 --> 01:05:44,730 I don't want to agree to the terms and conditions maybe, I can click Register 1105 01:05:44,730 --> 01:05:46,920 and it lets me through anyway. 1106 01:05:46,920 --> 01:05:50,800 This happens to submit to a stupid PHP file called dump.php. 1107 01:05:50,800 --> 01:05:58,420 All it does is print out the contents of $_GET just for diagnostic purposes. 1108 01:05:58,420 --> 01:06:01,580 That was what was submitted by the user just now. 1109 01:06:01,580 --> 01:06:05,010 But suppose we actually want to validate the user's form submission. 1110 01:06:05,010 --> 01:06:06,530 Let me go into version 1. 1111 01:06:06,530 --> 01:06:11,420 This is form1.html. It looks aesthetically just as bad, but notice how fancy it is. 1112 01:06:11,420 --> 01:06:15,450 If I click Register without cooperating, I get yelled at. 1113 01:06:15,450 --> 01:06:17,320 "You must provide your email address." 1114 01:06:17,320 --> 01:06:21,670 All right. So let me try that. So malan@harvard.edu. I don't need a password. 1115 01:06:21,670 --> 01:06:25,100 Register. "You must provide a password." All right. 1116 01:06:25,100 --> 01:06:28,470 So I will provide a password of crimson. Register. 1117 01:06:28,470 --> 01:06:32,300 "Passwords do not match." I have to now type in crimson here. 1118 01:06:32,300 --> 01:06:35,710 I accidentally checked that. Register. 1119 01:06:35,710 --> 01:06:39,860 "You must agree to the terms and conditions." All right. Agree there. Register. 1120 01:06:39,860 --> 01:06:43,700 And now it shows me the diagnostic output over there. 1121 01:06:43,700 --> 01:06:45,630 >> So what just happened? 1122 01:06:45,630 --> 01:06:48,330 We've had this ability to validate form submissions. 1123 01:06:48,330 --> 01:06:51,420 In fact, if you did dive into pset 7, there's an apologize function 1124 01:06:51,420 --> 01:06:54,620 that makes it pretty easy to yell at the user with a message on the screen. 1125 01:06:54,620 --> 01:06:57,580 I'm using a slightly different mechanism, the alert function, 1126 01:06:57,580 --> 01:07:03,690 which is not a function that's smiled upon since it makes very ugly user messages. 1127 01:07:03,690 --> 01:07:05,710 But let's see what I'm doing here. 1128 01:07:05,710 --> 01:07:09,620 This is form1.html, and notice that I have some pretty familiar syntax: 1129 01:07:09,620 --> 01:07:12,920 body tag, form tag, action attribute, method attribute. 1130 01:07:12,920 --> 01:07:17,050 But notice I've given my form a unique ID for convenience. 1131 01:07:17,050 --> 01:07:19,190 Then I've got an email field whose type is text, 1132 01:07:19,190 --> 01:07:23,780 a password field whose type is password, confirmation field whose type is password, 1133 01:07:23,780 --> 01:07:28,070 and then a checkbox whose name is agreement over here, type is checkbox. 1134 01:07:28,070 --> 01:07:30,380 And then I've got a submit button. 1135 01:07:30,380 --> 01:07:33,050 But notice at the top what more I have. 1136 01:07:33,050 --> 01:07:35,810 First of all, there's another use of the script tag. 1137 01:07:35,810 --> 01:07:40,520 If you have some JavaScript code in another file, just like with CSS you can include it. 1138 01:07:40,520 --> 01:07:44,530 And you do that with script source, and then notice I'm connecting apparently 1139 01:07:44,530 --> 01:07:50,349 to googleapis.com to a very long path but whose file name ends in jquery.min 1140 01:07:50,349 --> 01:07:52,420 for minimum .js. 1141 01:07:52,420 --> 01:07:55,969 jQuery is a super popular library for JavaScript that just makes JavaScript 1142 01:07:55,969 --> 01:07:58,230 all the more user-friendly to use. 1143 01:07:58,230 --> 01:08:00,610 It's effectively become a de facto standard. 1144 01:08:00,610 --> 01:08:04,090 So even though what you're about to see is not pure JavaScript per se, 1145 01:08:04,090 --> 01:08:09,340 it is a library on top of JavaScript much like the CS50 library is a layer 1146 01:08:09,340 --> 01:08:13,670 on top of low-level C code; the reality is almost everyone on the Internet uses it. 1147 01:08:13,670 --> 01:08:18,030 So these are not training wheels. This is just best practice these days. 1148 01:08:18,030 --> 01:08:22,830 Now notice below that is my own script tag, and notice what I've done here. 1149 01:08:22,830 --> 01:08:27,450 It turns out that jQuery does something a little fancy. 1150 01:08:27,450 --> 01:08:29,660 JavaScript has dollar signs, but they are meaningless. 1151 01:08:29,660 --> 01:08:32,870 >> They are like the letter A or B or C. 1152 01:08:32,870 --> 01:08:36,670 jQuery has simply adopted the convention or sort of laid claim to the fact 1153 01:08:36,670 --> 01:08:40,280 that $ will be their special symbol. 1154 01:08:40,280 --> 01:08:44,950 So as soon as you load this global JavaScript file up here with the script tag, 1155 01:08:44,950 --> 01:08:49,080 you have access to a special global variable that's called $. 1156 01:08:49,080 --> 01:08:53,009 It's more properly called jQuery, but that doesn't look nearly as sexy as $. 1157 01:08:53,009 --> 01:08:56,250 But $ has no special meaning. In PHP it had special meaning. 1158 01:08:56,250 --> 01:08:58,440 You had to have it in front of a variable. 1159 01:08:58,440 --> 01:09:01,670 This is just a sexy thing that they took on. 1160 01:09:01,670 --> 01:09:03,389 What is going on here? 1161 01:09:03,389 --> 01:09:08,830 Notice I'm passing to the jQuery function my global variable document 1162 01:09:08,830 --> 01:09:10,860 and then I'm calling .ready. 1163 01:09:10,860 --> 01:09:15,480 What jQuery essentially does is it allows you to take some vanilla JavaScript things 1164 01:09:15,480 --> 01:09:17,889 like the document object, the window object, 1165 01:09:17,889 --> 01:09:20,790 and if you pass it in to the jQuery function-- 1166 01:09:20,790 --> 01:09:24,429 and again, to be clear, this is a function called jQuery-- 1167 01:09:24,429 --> 01:09:28,240 what it does is it returns to you a special version of document 1168 01:09:28,240 --> 01:09:30,700 that has more functionality associated with it. 1169 01:09:30,700 --> 01:09:34,760 So in raw JavaScript there is no ready function, 1170 01:09:34,760 --> 01:09:37,810 but if you pass document to the jQuery function first, 1171 01:09:37,810 --> 01:09:40,960 it returns to you a special version of the document object 1172 01:09:40,960 --> 01:09:43,030 that has more fancy features. 1173 01:09:43,030 --> 01:09:48,230 And that's why people like it. It just makes things easier to do, as we're about to see. 1174 01:09:48,230 --> 01:09:49,820 So what does this line of code mean? 1175 01:09:49,820 --> 01:09:52,690 This line of code here means when the document is ready-- 1176 01:09:52,690 --> 01:09:56,830 in other words, once the browser is done reading this file top to bottom-- 1177 01:09:56,830 --> 01:09:59,200 go ahead and execute the following function. 1178 01:09:59,200 --> 01:10:03,540 What's really interesting in JavaScript--and PHP has this as well-- 1179 01:10:03,540 --> 01:10:05,450 is anonymous functions. 1180 01:10:05,450 --> 01:10:10,560 In JavaScript you can declare functions that have no name but they do have a body. 1181 01:10:10,560 --> 01:10:12,570 Notice what's happening here. 1182 01:10:12,570 --> 01:10:16,220 >> This is a function called ready, and it just means do the following 1183 01:10:16,220 --> 01:10:20,220 when the whole web page is ready, when it's all been read in from the server. 1184 01:10:20,220 --> 01:10:23,090 What do you want to do? I want to execute a chunk of code. 1185 01:10:23,090 --> 01:10:27,120 Notice that we don't want to execute this code right away. 1186 01:10:27,120 --> 01:10:34,350 If I omitted this, this would mean immediately start executing these lines of code. 1187 01:10:34,350 --> 01:10:39,040 But the fact that I'm saying no, no, no, wrap this in an anonymous function like this 1188 01:10:39,040 --> 01:10:43,000 means don't execute it yet; call it eventually. 1189 01:10:43,000 --> 01:10:45,430 We saw this a moment ago in our previous form example. 1190 01:10:45,430 --> 01:10:49,990 What function did we call eventually, 500 milliseconds later? Blink. 1191 01:10:49,990 --> 01:10:51,480 So the same idea. 1192 01:10:51,480 --> 01:10:53,950 Again, even if this looks a little weird, just take for now on faith 1193 01:10:53,950 --> 01:10:57,060 that to declare an anonymous function that's called eventually, 1194 01:10:57,060 --> 01:11:01,720 you simply write function() { 1195 01:11:01,720 --> 01:11:05,380 So what code are we going to execute eventually? The following. 1196 01:11:05,380 --> 01:11:10,460 This too looks a little new, but this means here's the jQuery function, 1197 01:11:10,460 --> 01:11:13,430 and this now is a shortcut. 1198 01:11:13,430 --> 01:11:18,830 This snippet of HTML at the bottom of the screen of course has some tree representation. 1199 01:11:18,830 --> 01:11:21,730 It's not this. This page is more interesting than this hello, world example. 1200 01:11:21,730 --> 01:11:25,210 But there's some tree that corresponds to this HTML. 1201 01:11:25,210 --> 01:11:28,910 It would be a pain in the neck to have to implement some kind of recursive function 1202 01:11:28,910 --> 01:11:34,380 to start at the root node and then find the node whose ID is registration. 1203 01:11:34,380 --> 01:11:38,340 So what jQuery makes super easy for us is literally this. 1204 01:11:38,340 --> 01:11:43,000 Go ahead and get me whatever div or whatever form, whatever HTML element 1205 01:11:43,000 --> 01:11:45,820 has an ID of registration. 1206 01:11:45,820 --> 01:11:52,440 This is equivalent to document.getElementById('registration'). 1207 01:11:52,440 --> 01:11:54,170 >> Why do people like jQuery? 1208 01:11:54,170 --> 01:12:00,110 Because it's shorter to type. But that's all it is. It's the same idea. 1209 01:12:00,110 --> 01:12:02,630 Get me the tag whose ID is registration. 1210 01:12:02,630 --> 01:12:06,300 And when that tag, which happens to be a form, is submitted, 1211 01:12:06,300 --> 01:12:08,300 go ahead and execute this code. 1212 01:12:08,300 --> 01:12:11,320 So let's take one look now at how we're doing form validation. 1213 01:12:11,320 --> 01:12:15,950 The syntax is admittedly cryptic at first, but what's going on? 1214 01:12:15,950 --> 01:12:21,050 If this line of code is true, I'm going to yell at the user to provide his or her email address. 1215 01:12:21,050 --> 01:12:22,970 So what is this line of code? 1216 01:12:22,970 --> 01:12:25,560 $ means jQuery. Now notice this. 1217 01:12:25,560 --> 01:12:27,920 This is kind of like CSS. 1218 01:12:27,920 --> 01:12:33,370 If you've dived into CSS yet, you'll know that this means the element whose ID is registration. 1219 01:12:33,370 --> 01:12:39,840 The space means find a child or a descendant of registration whose name is input. 1220 01:12:39,840 --> 01:12:42,970 And then this thing in square brackets is a little filter. 1221 01:12:42,970 --> 01:12:47,010 And even if this looks cryptic, this just means go to the form whose ID is registration, 1222 01:12:47,010 --> 01:12:51,230 go to the input element inside of that whose name is email, 1223 01:12:51,230 --> 01:12:55,440 and then get its value, whatever its value happens to be-- 1224 01:12:55,440 --> 01:12:59,670 asdf if that's all I typed or malan@harvard.edu if that's what I typed. 1225 01:12:59,670 --> 01:13:05,250 So if the value of the form's email field == nothing, yell at the user. 1226 01:13:05,250 --> 01:13:09,700 Else if the value of the password field == nothing, yell at the user. 1227 01:13:09,700 --> 01:13:19,520 >> Else if the value of the password field does not equal the value of the confirmation field, 1228 01:13:19,520 --> 01:13:22,850 which was the other form element, yell at the user. 1229 01:13:22,850 --> 01:13:25,680 And then lastly--and this one too has some new syntax of its own, 1230 01:13:25,680 --> 01:13:29,270 but once you've seen it, it's at least a little more reasonable-- 1231 01:13:29,270 --> 01:13:34,060 else if the form whose ID is registration has an input element whose name is agreement 1232 01:13:34,060 --> 01:13:39,720 and it is checked, go ahead and yell at the user. 1233 01:13:39,720 --> 01:13:42,520 So I totally admit this is completely overwhelming at first glance. 1234 01:13:42,520 --> 01:13:46,530 It's a lot of new syntax. But all of jQuery follows these kinds of patterns. 1235 01:13:46,530 --> 01:13:49,880 And honestly, I didn't even know this existed until a few minutes ago. 1236 01:13:49,880 --> 01:13:53,640 I Googled, "How do you check if a checkbox is checked in jQuery?" 1237 01:13:53,640 --> 01:13:55,680 and this is the syntax, because there's different ways of doing it 1238 01:13:55,680 --> 01:13:58,010 with actual raw JavaScript code. 1239 01:13:58,010 --> 01:14:01,030 So as the very first page of Problem Set 7 emphasizes, 1240 01:14:01,030 --> 01:14:04,500 pset 7 is very much an exercise in bootstrapping yourself 1241 01:14:04,500 --> 01:14:08,650 where we've provided, hopefully, a conceptual framework with which to tackle the pset. 1242 01:14:08,650 --> 01:14:12,280 >> But as is often the case with web design, it's up to you really to poke around, 1243 01:14:12,280 --> 01:14:16,680 incorporate snippets of code and examples from the Web so long as you cite them 1244 01:14:16,680 --> 01:14:17,960 per the terms on that first sheet, 1245 01:14:17,960 --> 01:14:21,460 and realize that learning HTML, CSS, JavaScript and even SQL 1246 01:14:21,460 --> 01:14:26,020 is really meant to be this at-home exercise as we begin to take these training wheels off. 1247 01:14:26,020 --> 01:14:29,150 And realize too there's so many more things you can do with a browser. 1248 01:14:29,150 --> 01:14:33,790 Inside of most of these elements there are other things called event handlers. 1249 01:14:33,790 --> 01:14:37,140 And even though we just looked at ones called onsubmit and onready, 1250 01:14:37,140 --> 01:14:40,310 you can do things like onkeydown, onkeyup, 1251 01:14:40,310 --> 01:14:43,410 like when the user touches a key, you can listen for that and key up. 1252 01:14:43,410 --> 01:14:45,940 Gmail has keyboard shortcuts. 1253 01:14:45,940 --> 01:14:49,490 How does Google implement keyboard shortcuts like C for compose? 1254 01:14:49,490 --> 01:14:54,120 They listen for events, as they're called, like onkeypress or onkeyup and onkeydown. 1255 01:14:54,120 --> 01:14:56,360 If you've ever hovered your mouse over some menu option 1256 01:14:56,360 --> 01:15:00,180 and all of a sudden, voila, a menu appears or the graphic changes color, 1257 01:15:00,180 --> 01:15:01,920 how are they doing that? 1258 01:15:01,920 --> 01:15:06,940 Rather than listen for onready or onsubmit, you listen for onmouseover or onmouseout. 1259 01:15:06,940 --> 01:15:10,920 >> So in short, with these very simple basics that we've begun to scratch the surface of today 1260 01:15:10,920 --> 01:15:13,940 and we'll dive in further to on Wednesday, you have, increasingly, 1261 01:15:13,940 --> 01:15:17,530 power to implement the kinds of things that you're already familiar with. 1262 01:15:17,530 --> 01:15:21,620 So let's end there, and we'll continue this on Wednesday. 1263 01:15:22,690 --> 01:15:24,320 >> [CS50.TV]