00:00:00,080 --> 00:00:02,080 PROFESSOR DAVID J MALAN: All right, we are back. And our final session is ostensibly on web programming and some of the ingredients that are commonly used therein. Thought we'd make this a mix of conceptual and design, concepts and design, as well some hands on, which will end the day, using a little bit of JavaScript in a couple of different contexts. But first, let's make sure we knock off the last of these two as well as a more general answer to where do we actually put all of the data when making web-based applications. We've talked about databases in the abstract. We've talked about them in the infrastructure sense. But we never actually opened up the database and talked about how you actually put data in there. And what are some of the kinds of questions to consider. What makes a database design good. What makes an engineer designing a database good or not so good. And so let's begin with that. So in terms of database technologies, there are generally two classes of databases, one of which is called relational databases and one of which might be called object-oriented databases. So relational and OO, or object oriented, or document stores as they're typically called. And we'll do these in reverse order, these object-oriented databases, specifically these documents stores, things like MongoDB which is a very common one. So let me start another list of ingredients, gives you the ability to store data in what effectively resembles something called JavaScript Object Notation, JSON. And we'll actually wrap with a look at JavaScript itself. And it looks a little something like this. So if I were to store my data in JSON format and my data, for instance, was a customer, I might represent a customer with the following kind of syntax. I might have the customer's name being David. I might have the customer's address as being, say, 1 Brattle Square, Cambridge, Mass. 02138 USA. I might have the email address be malan@harvard.edu. And now this user might also have an ID of, say, 123, which is just a unique numeric identifiers. And there might be other fields still. So JSON generally refers to this key value pair approach to data, where you have keys like name, address, email, ID, and their respective values. And there are some other features. You can have arrays built into this. So for instance, let's see, what might I have a whole bunch of? So if this isn't so much course-- actually, not so much customers but courses I've taken, maybe we could do something like this, an array of courses. Like I've taken courses with ID number 5 and 7 and 18. So we could represent things like arrays using square bracket notation, as is the convention, to represent a list of things associated with me. So this is the general idea. And if I had a second such customer, I would start another one of these objects, if you will, using the curly brace notation. So this is totally language specific. But in this context, curly braces represent an object and an option. An object is just a data structure containing keys and values for our purposes. And an array is exactly what we discussed earlier but in JavaScript as opposed to c. So one of the upsides of storing your data in this object-oriented way where you think about a customer or a student as having data in this way is that there is some hierarchy to it. You might have a list of courses, which itself has a child, which is this array of course IDs. You could actually explode this so that, yes, it's an array of courses. But you know what, we don't have to think of those courses in terms of their IDs, we can think of them in terms of their names like, say, computer science for business leaders whose ID is, for instance, the number 5, and whose start date is something. And so forth. So you can have this nested structure where you just continually and progressively associate more and more data with whatever object is of interest to you. This is nice and it lends itself, let me say, in programming to accessing the data easily. Just in terms of code, you can write relatively little code to get at data like this. Unfortunately, it doesn't necessarily give you as much expressiveness, depending on the service you're using, as something more traditional called a relational database does. And indeed, this is one of these religious things, or at least trends right now, whereby people are absolutely still using SQL databases as there typically called. SQL databases. By contrast, there are NoSQL databases, which generally mean you're not using SQL and it happens to be something a little more like this. And there's a larger array of space and options here but let's focus on one of the more traditional ones, if only because it lends itself to, at least I think, some more intuitive design decisions that certainly relate to object-oriented databases as well. And one of the design decisions you initially have to make, typically, is what type of data do you want to store and how do you want to store it? And by contrast to this hierarchical approach, a relational database typically has you store your data in a very flat way, very Excel-like or Google spreadsheet. So for instance, if we want to create a database of customers in a relational database context, we might do the following. Well, what makes a customer? I have a few fields here, so name, address, email, maybe unique ID, maybe-- course is irrelevant because I'm changing the story back to customers. What else might you associate with a customer? Phone number is a good one. What else? 00:05:51,300 --> 00:05:54,460 What they bought, so like an order history. Anything else? AUDIENCE: Age. PROFESSOR DAVID J MALAN: OK, age, good one. AUDIENCE: Contacts within the company. PROFESSOR DAVID J MALAN: Contacts within the company, so like, let's say, like customer service history kind of thing. I'm sure there's a better word for that. AUDIENCE: Mailing list, yes or no. PROFESSOR DAVID J MALAN: OK, so yeah, let's do an opt in kind of field. OK, so that's a good list. And I'm sure there's innumerable more we could come up with. So now let's dive in a little deeper. What does this data look like and why do we care? So in a relational database, we would typically specify a data type for these kinds of fields. And we would ultimately store this kind of information in the equivalent of a Microsoft Excel worksheet or Google Spreadsheets sheet that allows us-- once Excel opens. 00:06:52,538 --> 00:06:53,500 Come on. Open a new file. OK, so over here we might put, what do we have, name, address, email, phone, order history, age, customer service history, opt in. And I deliberately left room for ID just because I'm going to put the ID to the left here. OK, so here's how we might lay out this data. And I might be customer number 123. David Malan 1 Brattle Square, Cambridge, Mass, 02138, malan@harvard.edu, 617-495-9000. Order history, we'll come back to that because that sounds hard. Age, we'll just leave that blank and customer service we'll just leave that blank. And opt in will be a 1 for yes, I've opted in to emails. So this is all fine and good in Excel. And Excel has a little bit of expressiveness for how you can display your data. But you don't really specify what type of data it must be. You can always override Excel's default settings. And indeed, you can go to Format and specify this is a number. This is how many digits to show. But it really is just an aesthetic detail for the most part, that you can actually impact for better or for worse your data by specifying those things. So instead here, let's consider the question of how we actually represent this information. So name, feels just like a sequence of characters. Age feels like a number. Phone number is a little weird but it's more like words. And it's like alphanumeric with some punctuation. So it's not just strictly a number. Customer service history and order history kind of are scary right now, so we'll come back to that. Opt in feels like a Boolean, like a Boolean value meaning 1 or 0, true or false, anything like that, yes or no. Email has a pretty standard format, something at something dot something, maybe something more and so forth. And then address, which is just like a phrase or sentence or something like that. But we can dive in a little deeper, in particular, we have a whole bunch of data types in SQL, Structured Query Language, which itself is a programming language with which you can query for data on a database. And indeed, SQL and relational databases more generally are example of CRUD systems, whereby you can create data, read data, update data, and delete data, silly acronym. And specifically, they have instructions that are called insert and select and update and delete. So in other words, even though, theoretically, these operations are generally referred to with these words, in actuality when you're programming in SQL, you use these four keywords instead. And they kind of do what they mean where select is the only non-obvious one, where select means search the database and give me back some rows. So for instance, with Microsoft Excel here, you could do this with formulas or macros or whatnot, but generally you don't-- many people, myself included, tend to use Excel more for storing data and not necessarily for writing software against it, in particular because it's going to be slow overall, especially with thousands and thousands of rows. You would generally use a database, something like a SQL database or the like. So what it means to select data is to take a database like this that presumably has more customers than just me and select subsets of them. Select all the customers that we have that are in the age range of like 18 to 49. Or give me all customers who live in Massachusetts. Give me all customers in specifically 02138, that zip code. Or give me all customers who have spent more than $100 this month with us. Any number of queries can be solved using SQL as a language. But before you even get to the point of using your database, you have to design it. And among the data types we have at our disposal are data types like char for character, one or more, varchar for variable number of characters, where you don't necessarily know how long the thing is going to be. We have things like int. Have data types like big int. We have data types like decimal, float, year, date, date time, which is both, time, and there is many, many more than this. But this is a decent list with which to start, which is to say, if we want to store this data in our database, we first have to ask ourselves how should that data be stored, for a couple of reasons. One, among the features of a database is to ideally give you data quickly and to make updates or insertions or deletions quick. And to help the database do that, it helps to tell it what type of data it is. Because it turns out, storing things as numbers is often faster than storing them is alphabetical characters. For instance, the number, let's say, 1 million. 1-0-0-0-0-0-0. That's 8 characters to type at a keyboard. So if I type that using tech or store that using text, a.k.a. Ascii from yesterday, I need 7 bytes. However, the number 1 million is far less than our special value 4 billion. And you know how many bits, perhaps now, we need to store 4 billion. Which is how many bits? 32. Which if you divide by 8 is how many bytes? 4. So if we instead store the number 1 million as an integer, so to speak, as an int, and not as a string of characters, we can go from 7 down to 4 bytes. So this is an example of why you actually want to care about your underlying representation because you can speed things up, you can save on space and generally help the database to do its job better, which doesn't matter for small websites but for medium-large scale websites, absolutely, all of these kinds of things can start to matter. But the second concern, too, is you can leverage your database to protect yourself from yourself. You can have the database make sure that the only type of value that can go here is an integer. The only type of value that can go here is a year, which means it must be a four digit number only. You must be able-- you can specify that this has to be a date. So it has to be year, year, year, year, dash, month, month, dash, day, day. And even if you or your programmers accidentally screw up, the database will prevent you from inserting a bogus value. And this is an added layer of defense and a good thing in general. So you can also specify what types of numbers you're storing, how long those numbers might be. And there, too, we have an opportunity to discuss design decisions where the length of these, in particular, matters. So for someone's name, the first question when designing a relational database might be, how many characters shall a user's name be? 00:13:53,790 --> 00:13:57,302 So what's the typical length of a human's first full name? 00:14:00,610 --> 00:14:02,584 10? Feels a little short, maybe for a single name. D-A-V-I-D space M-A-L-A, dammit, I'm one short already. So 11 minimally seems to be the current lower bound. If we polled everyone on their full names, probably 20, 30. What's that? AUDIENCE: 25, 30. PROFESSOR DAVID J MALAN: 25, 30. And I bet, just to play devil's advocate, longest name in the world. 00:14:30,510 --> 00:14:33,480 How many characters is this? That's crazy. I don't have to count-- his full name. 00:14:41,230 --> 00:14:45,400 All right, let me take out a program that will do the counting for me. OK, 226 characters, I think, or 225 if I'm interpreting this correctly. 226 characters. So incorrect. So we need at least 226 characters. But this is, actually, it's kind of a can of worms. Like I don't know what the upper bound. And apparently it seems to be this, pragmatically speaking. So it turns out there are certain conventions. Like in a SQL database, it was very common for years for a name-- or rather, for a character-based field to be 255. Why? That was the maximum length for some time. And thankfully it just about fits this fellow's name. So there's a difference, though, between a data type, as these are. We talked about int earlier in the context of c and SQL has these data types, too. And we're going to need to assign one of these to each of these. A character field is defined as a fixed number of characters. So if you specify 255 and the data type is char, or character, that means you will always use 255 characters to store the data. And if it's only D-A-V-I-D M-A-L-A-N with a space, all of the other 200 plus characters are just blanks, essentially, but they're allocated. A varchar is when I don't really know what the biggest name is going to be so you say, 255. But the database only uses as many characters or bytes as it needs. So for David Malan it might store 11 total bytes, give or take. And it won't waste the other 200 plus of them. Well, this seems silly. This sounds better. Like I give it an upper bound but it's less wasteful. Why might char still even exist? Why might you want to commit a priori to a specific number of bytes? Even wastefully. 00:16:34,851 --> 00:16:35,350 Yeah? Anessa? AUDIENCE: [INAUDIBLE]. 00:16:48,920 --> 00:16:50,700 PROFESSOR DAVID J MALAN: So that's true. But this decision would be independent of the front end. So presumably, you want to support users whose names are of any length because some of us may certainly have run into websites, or even if you've been filling out comment forms, like big, obnoxious companies, tend not to let you leave very long comments. And they will literally countdown the characters, a la Twitter. And why is that? Well, one, they probably, from a business perspective, don't want to read too much text. But two, they've probably specified that we're going to store your message in a field of a specific length. So let's separate the front end and those kinds of decisions from the underlying distinction between what's stored in the database. So a char field will store a fixed number of characters, even if a lot of them are blank. But varchar will only store up to a certain amount. So why would you want one or the other? AUDIENCE: Scalable. PROFESSOR DAVID J MALAN: More scalable, let's come back to that. 00:17:41,070 --> 00:17:46,470 AUDIENCE: So maybe a phone number or all of the area codes or something, be able to know at a certain point if the character is going to be something. 00:17:50,010 --> 00:17:51,176 PROFESSOR DAVID J MALAN: OK. 00:17:53,830 --> 00:17:54,870 Yeah, so that helps. So if you have a fixed format you could certainly go with char because you know in advance how long it's going to be. Phone number, could break down if you want to support international folks who have longer, different length phone numbers. Maybe zip code, if we're just US customers and we throw away the extra four digits, we just have five digit zip codes or maybe nine character zip codes, that could work as well, or 10 with the dash. But yeah, if you know in advance how long that it is going to be, you might as well tell the database it's a fixed length. But it's actually for a scalability or really a performance reason. It turns out that if you think of your data as being stored in a column in Microsoft Excel, if you specify that your field, every value in this row, is going to be 5 characters, 5 characters, 5 characters, for a zip code, each one is going to be exactly this length. And much like our discussion earlier, you can address these things. So this is address 0, 5, 10, 15, 20, 25, and so forth. And that, specifically, is the number of bytes away from which each of these things is. In other words, there's a gap of five bytes because I'm assuming 5 characters. And what feature do we gain when we know our data is back to back to back to back at predictable gaps? AUDIENCE: Binary search. PROFESSOR DAVID J MALAN: Potentially binary search, if it's sorted. And it also allows us, more generally, random access I can jump to the middle of my rows because I just do some simple arithmetic, x minus y, like the total minus wherever I start at. And that's a feature. If by contrast we don't know what the length of the strings are going to be, deterministically, and when we say it could be as many as 255. The visual effect might be this first string might be pretty long. This next one might be half the length. This next one might be like 7 3/4. This one might be really short. This one might be blank. And now you have these ragged edges, which means the numbers no longer apply. This row might start at location 0. This might still start at a location 5. This might start now at location 7. This might start at 5, 11, let's say, 12. This is also going to be 13. And then this one might be 14 or something like that, depending on the lengths. In other words, the numbers are now useless because there is not a predictable offset, which means you can't just skip around randomly. So this is the kind of thing where the database can leverage the data's structure if you help inform it. And what a DBA, database administrator, or just generally a developer who's doing database design, you can provide these kinds of hints to the database so as to perform better to help things like Twitter analyze or search through their data or store their data more quickly. So we have to specify a length. So for a name, how long of an upper bound do we want to give a name? Probably don't want to use char because most people don't have 226 characters in their name. So it feels wasteful to have all those blanks. So let me propose varchar. But what's a reasonable upper bound, then? 00:20:59,570 --> 00:21:00,389 What's that? AUDIENCE: 30. PROFESSOR DAVID J MALAN: 30? Well, the only catch with going small again is we're kind of screwing over the gentleman who was written up in the Guinness Book and probably other people. There's probably thousands of people who have pretty long names. So what would be common convention would be, you know what, I'm going to make this a varchar with a max length of 255, partly just by convention. 255 is a little arbitrary but it happens to be the former boundary on a string. We know, empirically, there is no one with a longer name right now. But if someone does create a name on their birth certificate that's 256 characters, their name will get truncated. They're going to have to sacrifice one of their names when they register. But that's one of the tradeoffs here. By contrast, address is fundamentally harder to think about. How long might an address be? 00:21:52,270 --> 00:21:53,180 I don't know. Let's see. We have our little Excel file here. One, so I proposed this address here where we currently are. So 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, looks like 50 characters or so. Is that long enough? No, maybe 100. I don't know, 255. Here, too, there might be some common defaults. AUDIENCE: [INAUDIBLE]. 00:22:18,835 --> 00:22:21,210 PROFESSOR DAVID J MALAN: Not necessarily for that reason. Because at the end of the day, even if you split it up, the total effect is probably about the same but there's a more compelling reason to split up the address. What I have done is very lazy and very bad design at the moment. Adam? AUDIENCE: I was just going to say so that they're searchable. PROFESSOR DAVID J MALAN: Yeah, right now there is no easy way to search this because Cambridge is kind of sandwiched in the middle of the comma and MA. The number is all the way at the end but it's not alone. It would be nice to kind clean this up. And indeed, let me go ahead and do that. Let me go ahead and insert a few new columns. And instead of calling this address, how about we'll call this street, city, state, zip, and for now, we could do country, but let's bias ourselves to the US for now just so we can discuss zip codes specifically. So in this case, I might now rewrite this as 1 Brattle Square, Cambridge-- no. Wait, I got those backwards. Cambridge, Mass, 02138. So it's a little cleaner. And to your intuition, we can now search those fields individually. So each of these fields I really don't know, but it should no longer be called just address. This should be called Street, City, State and Zip. Meanwhile, each of city and street should be, I don't know, maybe varchar 255, if only because it's kind of an arbitrary but conventional default and doesn't paint us too much into a corner. As an aside, you can have longer strings of text in your database than 255. Indeed, varchar can be bigger. I think it can be 65,535 nowadays. But there comes a point where if you have even bigger blobs of text, because maybe you're letting people upload their resume or maybe you're letting people upload a college essay or really large documents or something, there are other data types that are on this list called, quite literally, text and large text, I believe, which are even bigger. But they're stored in the database in a different way, in a way that's slightly slower to access. So that would be one of the motivations for using varchar. And again, you'd have to read the fine print of your database, although they do tend to follow certain conventions. But that's the kind of intuition behind that. But state, let's just assume for simplicity the US, how long should that be? AUDIENCE: Two. PROFESSOR DAVID J MALAN: Two. There is an advantage to use not varchar but char two, because if we use the two letter codes we can save some space there, which I like. Zip code, again, we can be a little presumptuous here. We could do char 5 or char 9 or char 10 if we include the dash, depending on whether we want to store that. But we'll keep it simple, just do five. Post office will figure it out. Email, what data type should it be? 00:25:08,483 --> 00:25:09,450 AUDIENCE: Varchar. PROFESSOR DAVID J MALAN: Yeah, probably. And here, I'm getting a little lazy by sort of encouraging us to use 255 for everything. But it is just common. But you know, so long as you're comfortable with the value that's what matters in the end. Unfortunately, in a database, typically, you can't impose a formatting constraint. You can't say, has to have an at sign, has to have a .com or .net. That has to be in your code. But at least you can specify its maximum length here. Now things get a little more interesting, an ID. Typically an ID would have been the first thing we discussed. But now that we've kind of had a logical progression, now it's time to go back and improve this and give everyone a unique identifier. But wait a minute, wouldn't their email be a unique identifier? AUDIENCE: What if they don't have an email? 00:25:52,784 --> 00:25:55,200 PROFESSOR DAVID J MALAN: What if they don't have an email? So reasonable problem. Let's make the business decision that, to hell with these people, they need to have an email address to use our website, for whatever reason. So not concerned about that. AUDIENCE: Two people share an email. PROFESSOR DAVID J MALAN: If two people share an email could be a corner case. Grace? AUDIENCE: That was mine. PROFESSOR DAVID J MALAN: Could be people sharing email for family or significant others, or you just happened to be logged in, it's easier to use the same email account. So that could certainly happen. And there's another more technical reason. What's that? AUDIENCE: Change in email. PROFESSOR DAVID J MALAN: If they want to change their email, that's actually a good one, too. It turns out, even though we're talking right now about one worksheet, one database table, so to speak, it turns out that the unique identifier is probably going to end up in other worksheets or other database tables like customer service history, order history. In other words, whatever we're using to uniquely identify the user, we probably are not going to put their order history in the same worksheet, if only because, like, where do I put it? Well, I could put a column here for their first order. And then a column here for their second. And then their third order. But this very quickly becomes messy because where does it end? Some users are going to have one order. Some users might have 100 orders. Doesn't feel like a very clean way of organizing you data. Your rows should really be what you keep adding to database, not columns. So as such, much like you might in your own spreadsheets, you'll probably put our orders or customer service history in their own worksheets were each email or each order is its own row. But to do that, if we have another worksheet. there needs to be some common link among them, and maybe that's their email address. But that could be problematic, then, if they change their email address, oh my god, now I have to change it in so many different places. There's yet another reason to use something other than email to uniquely identify your users. Yeah? AUDIENCE: Does it have the @ symbol as an integer? 00:27:45,484 --> 00:27:48,150 PROFESSOR DAVID J MALAN: Yes, so it will-- an integer is better. So let's clarify the question further. I'll claim, it is better to identify your users via a unique integer than by an email address. Why might that claim further be true? AUDIENCE: An int is going to take up space. PROFESSOR DAVID J MALAN: Yeah, that's the biggie. An int is going to take up four bytes. And email address might take up five bytes, 10 bytes, I mean, 20 bytes, depends on how long your email address is. And that just seems unnecessarily inefficient. So indeed, it's the case in databases the ID will almost always be an arbitrary but a consistent unique number per user. And it's usually just auto incremented. So you I might be user one. Nicholas might be user two. Avi might be user three, and so forth. And it just keeps getting incremented automatically in the database each time someone registers for the site. Phone number, integer? 00:28:40,993 --> 00:28:44,349 AUDIENCE: You put dashes between [INAUDIBLE]. PROFESSOR DAVID J MALAN: Hm, could do that. We could just store it as an integer and just, because we know we're dealing with only Americans in the US right now, we can just forcibly insert, visually, in the presentation of our data the parentheses or the dashes, or whatever. Possible, and this won't really bite us because, again, not to belabor math too much, this is how many digits, just to be safe? So we are-- oh, actually. Three, no, no good. Why? Did I count correctly? Four of these, three of these, yep. Can't represent the zip code for 430-- the area code 430 or 431 or 432. All right, so big into to the rescue. So it turns out there is big int on the list. It's 64 bits instead of 32. But this, too, is kind of foolish but for a different reason. That is plenty big to represent a phone number. AUDIENCE: Another int. PROFESSOR DAVID J MALAN: What's that? AUDIENCE: [INAUDIBLE]. PROFESSOR DAVID J MALAN: OK, so we can kind of cheat and just use another int, which is reasonable, especially back in the day, an int for the area code and then an int for the number. We could even do it for the exchange and then the last four digits. But not necessary. In fact, there is sort of a semantic thing that should start to rub you the wrong way here. Like a phone number, we call it a number. It's a collection of numbers, but it's not an arbitrary number from 0 to 4 billion or so, or 0 to whatever this number is. It is a pattern of 10 digits, in the US case, only. And so arguably, you know, I probably store this as a character field of length 10, or maybe a character field of length 10 plus 2, 12, to have hyphens. Or maybe a couple more characters if you want parentheses. But frankly, there's no reason to store any of the punctuation in the database. I would probably just store a 10 character field because now I know that the length is bounded and I'm going to have to relegate to my code the check of whether it's all numeric. So that feels better. An integer really should be unbounded except by the size of the data type itself. A phone number feels to me, but you could argue it both ways, that it should be something else. But an int, too. An integer should be something you do math on. Shouldn't be doing math on your phone number. Feels wrong and feels irrelevant, too. You'd never have a use case for that. So let's jump to another number, age. Here is a good candidate for an integer, right? 00:31:14,100 --> 00:31:15,107 Who hates this idea? 00:31:21,050 --> 00:31:23,390 OK, someone should hate this idea, leading question. But why? It's fine to represent age with an int. Dan? AUDIENCE: Because it would change. PROFESSOR DAVID J MALAN: Yeah, I don't want to really be changing my database 365 times a year by incrementing 1/365 of my customer base by one just because their birthday is any given day. Better than representing their age would be what? Their birth date using this data type, which happens to be in the format yyyy, month month, day day, typically, which sorts nicely. In fact, this is an interesting aside. Computer scientists tend to think in this way. There is a huge benefit, well, huge is subjective, I suppose, to storing dates whether it's in your file names or whether it's in your database in this format as opposed to the silly American convention of month month, day day, year year year year, or even the EU approach of like this, and ignore the errors. I'm using a calculator to type out words. Why is the first way that I claimed is what database uses, better? Dan? AUDIENCE: It makes sense to sort by year, rather than which day in the year or which month in the year it is. 00:32:37,840 --> 00:32:40,022 PROFESSOR DAVID J MALAN: Exactly. AUDIENCE: If you're going to have a date on an item, it would make sense to do it by year first, so you could see. 00:32:45,565 --> 00:32:46,940 PROFESSOR DAVID J MALAN: Exactly. Everything sorts chronologically as a result because if you have something like 2016, 07, 20-whatever today is, 6 or 7 or so. So here's one filename or here's one row in my database. And now let's pick a day in August 2016 08 29. If you compare these alphabetically or lexical graphically as they would appear in a dictionary, this later date actually will come alphabetically later than everything else and so it sorts properly. So you can tell who a computer scientist is if they cringe when people store their dates in the wrong format. Anyhow, slight tangent. But age, bad, date of birth, better, would be a better design decision here. So date of birth. Opt in can really just be a Boolean field. It turns out most databases can't just give you a bit, they can give you a byte. So you have to waste a few of those bits to effectively store true or false, 1 or 0, or the like. So let's talk about one last detail here that also rears its head in programming languages as well. It turns out that there's different types of numbers in the world you might recall from grade school, some of them have decimal points and some of them don't. Integers do not have decimal points. Its numbers like negative 1, 0, 1, dot, dot, dot, to infinity in both directions. Then there are real numbers which are a superset of those numbers, which tend to be represented with decimal points. And even though there is an infinite number of integers, there is even more real numbers in some sense because of the decimal point. And there's sort of an interesting theoretical argument there. But for our purposes, know that computers, of course, only use finite amount of memory. This is why the biggest int a computer can typically represent is 4 billion, if using 32 bits. And even that's an overstatement. If you want to support negative numbers, you have to steal one of those bits, essentially for the equivalence of the negative sign or positive sign. So that gives you only 2 billion numbers, from negative 2 billion to positive 2 billion, give or take. So float, as a real number is called, a float in a programming language or a database is a number that has a decimal point. This is even more problematic because if you have a finite number of bits, 32 or 64, you can only represent a finite number of digits in a number. Unfortunately, there's a lot of numbers in the world that have an infinite number of digits in them. And they're not dot 0, 0, 0, 0. It's things like pi, 3.14159. And I don't know the rest of pi, but it's a lot. And it goes on forever. And so at some point, the computer essentially has to truncate the number or round the number. Which is to say, if you choose a float for a data type in a computer program or in a database program, you will be, occasionally, making mathematical errors. And unfortunately I can't cite these examples in the undergrad class anymore because none of them have actually watched Superman 3 or even Office Space. I mentioned that one in a high school class recently and I felt old. But you might recall if you did see either or both of those movies, that Richard Pryor and Ron Livingston and his character made an awful lot of money, sort of accidentally, by skimming fractions of pennies off of their companies. Because they realized that in financial transactions, they were only looking at the number.cents. And if it were half a cent or a quarter of a cent, that would normally get rounded away, truncated away. And so they figured out in both scenarios, and office space stole the idea from Superman 3 in the narrative of the story, they just put all of those fractions of cents in their bank account. And as I recall, spoiler alert, but I think the movie's been out for 10 or 20 years, 30 or 40 years, they ended up with a whole lot of money in their bank account because of this. And that was because the computers were effectively using floats, and therefore imprecise data types. Thankfully, databases like MySQL, PostgreSQL, Oracle and Microsoft SQL Server support decimal types instead, which are numbers also that have decimal points but you have the luxury of specifying how many digits to the left and how many digits to the right of the decimal point. And so for a database storing financial information, you would absolutely want to use this over the more familiar, because of programming languages, float, because you get exact precision. And the database figures out how best to do that. So this is the common sort of subtle thing. Maybe it doesn't matter for most companies, certainly banking companies should be in the know as to details like this because it can actually add or subtract money from the total account balances as a result. All right, so at the end of the day, what do some of the queries look like? I'll just give you a couple of samples but we don't actually play with an actual database here. If you want to select data from a table called customers, you would typically see programmers type something like this, or even analysts or less technical people often pick up a bit of SQL so that they can do their own data analytics or answer their own questions based on the data set. You don't necessarily have to feel like you are or actually be a professional programmer. Select star from customers, semicolon. This is a representative SQL query that would select all of the rows from the table called customers and let me iterate over them, a la scratch one at a time, like the repeat block or the forever block. If I want to insert into my customer's database, I might want to insert a new name, email-- just name and email, let's say. Specifically these values, David and then malan@harvard.edu. That's how I might insert a customer from the database. Delete from customers where email equals malan@harvard.edu. That would delete me as a customer. And I deliberately chose to delete me based on my email, why? 00:38:41,624 --> 00:38:42,540 AUDIENCE: [INAUDIBLE]. PROFESSOR DAVID J MALAN: Yeah, I don't want to accidentally unregister all of the Davids in the database. And frankly, even email is a little sloppy for the reasons we discussed earlier. Two people might have the same email address, if you allow that. So better still would be where ID equals 123, where 123 happens to be my unique identifier. But notice, I didn't insert a unique identifier. One of the features you get from databases, typically, is that they will generate the ID for you and let you know what it is, which in this case I'm assuming was 123. We can update values as well. But we can also scope our queries to be more limited. Customers where, let's say, zip code equals 02138. This would give me the ability to select only customers in this particular zip code. Notice I quoted the string. I did not do this, because in many programming languages, SQL among them, quoting a value and not actually has semantic meaning. When you quote a value, it's a string, a sequence of alphabetical characters or alphanumeric characters. When you don't quote a sequence of characters, it's interpreted, generally, as being a number. Unfortunately, a zip code is not a number, semantically. It's a sequence of digits, so to speak. But of course, in the decimal notation, that would be equal to this, which also suggests if we now rewind, what data type should we use to be clear for our zip code? Characters is probably better than numbers. And in fact, I learned this the hard way when years ago, I was using Microsoft Outlook for years for email. I eventually decided to switch to Gmail and I exported all of my contacts using Outlook as like a big CSV file, comma separated values, which is like an Excel spreadsheet. And then I must have done a spot check and I double clicked and opened it in Excel, looked at it, must have instinctively or reflexively hit Save and then quit it without really making any changes. But dammit if Excel didn't presumptuously decide that any column that has numbers must surely be numbers, not zip codes. So to this day I occasionally look up a friend's address for like mailing them something and I find that they live in Cambridge, Massachusetts 2138 USA because Excel treated the data as a number and not as a string. And so to this day I always sort of cringe, like years and years later, I'm still finding friends who live in 2138. But it sort of speaks to this kind of corner case or issues. So this should have been considered a string or sequence of characters in both cases. All right, so we've only just scratched the surface. But this should give you, hopefully, a sense of the sort of litany of design decisions that have to be made. And this is the kind of thing that actually does determine whether someone is good or not so good at this. And it determines how well your website performs under load. Because even beyond this, just to give you three final ingredients, or one final ingredient, there are things in databases called indexes and primary keys, which we've only just alluded to. And let's see, full text is a feature of MySQL and other databases, and unique. And these are just keywords where I can specify in advance that a field in my database should be optimized for searchability. In other words, if I know in advance that I'm really going to search on zip codes a lot, I should tell my database to index that field. And I do this with a certain command or by clicking something in a web page. And I do that once when I first set up my database and I'm designing my website. And then thereafter, the database's purpose in life, and why I am paying Oracle or why I am using a popular open source free tool, is because they claim to be more high performing than others. And that's because they are good at building fancy tree-like data structures underneath the hood to get me my data quickly. But I have to give them these hints. And I need to tell them, hey, this is unique. Don't let me-- don't let two users with the same email address register. Hey, let me search free form text. So if the user just types in some random words, I want to be able to search over their whole profile using something like this. And then primary is a way of saying, this ID number in this field shall uniquely identify my users. I am sort of contractually agreeing to that as the programmer so that the database can actually leverage that detail. So it's all about sort of educating machines in this way. And while this is not machine learning, it's a decent opportunity to mention a couple of these topics which generally fall into the category of ingredients that we can bring to solve problems in a software sense. Machine learning is one incarnation of AI, or artificial intelligence, whereby you write software that somehow learns. And you typically provide your program with training data, sort of representative financial data or maybe sales data, or any type of data, that is sort of retrospective. And you want to sort of train the software you've written to leverage that data and predict future results. So there's this kind of feedback loop whereby you train your data set and then you try to apply it to new problems. Or what's the stock price tomorrow going to be like? What are our projections for sales going to be like in the future? And so this is very much a trendy and fundamentally compelling subfield of computer science, whereby you can leverage this to try to answer questions more effectively, things like Siri and Cortana are really about machine learning. Apple and Microsoft and others trying to train a software to interpret my own voice better. And in fact, machine learning can sometimes take individual ingredients. They don't do this so much anymore, but what's it called, Dragon speak, I think, the software where you could actually talk to your computer for recitation software, would often train the software based on your own voice, having you read certain things. And that, too, would be an example of machine learning as well. Hadoop, meanwhile, is a piece of software that's commonly used in distributed applications. It's software that you can run. And this would have tied in pretty well to yesterday's chat about cloud computing where you have access to lots and lots of machines. Hadoop allows you to take some job, for instance, even something like the New York Times, for example, generating a whole lot of PDFs of millions and millions of articles, but distributing that load over a whole bunch of worker nodes, whereby there is one master node that somehow orchestrates all of this in a cluster but then it just kind of farms out all of the actually hard or interesting work to these worker nodes, who eventually report back. And that data all gets aggregated somehow. And so Hadoop is very popular for that. And it's very popular in the cloud context because people often want to spin up or turn on a whole bunch of machines at once, run some distributed job and then that's it. It doesn't necessarily have to be run ongoingly but it certainly could on premise as well. Damn it, I've got to keep thinking of an answer to this one, now. All right, any questions, then, on database design or those kinds of topics? Yeah? AUDIENCE: How does MongoDB fit into all of this? Just like an online database, a program online? 00:45:27,035 --> 00:45:29,660 PROFESSOR DAVID J MALAN: No, online wouldn't have meaning here. It's software that you can download. You can run it here on my laptop. You could run it in the cloud on Heroku or Amazon Web Services. It is an answer to the first type of database that we talked to, an object-oriented database, where you can store things that look like those JSON objects that I first wrote with the more textual syntax. They're especially trendy now because they're easier to use in some sense. You can think they're designed to allow you to think a lot less about your data but you do pay a price sometimes in terms of redundancy. You might sometimes have the same data stored in multiple locations, though there is the notion of unique identifiers that allows you to factor that out. MongoDB and things like that are a little more conducive to languages that are in vogue these days, JavaScript specifically. So it, too, is just a trend and representative of a class of type of databases. Yeah? AUDIENCE: Is JSON like XML? PROFESSOR DAVID J MALAN: Yes, it's sort of a lighter weight version of XML. XML is just very verbose. It's kind of dying off as a popular format because it was such a pain to use. Good intentions, just very heavyweight. Yeah? Anessa? AUDIENCE: [INAUDIBLE]. 00:46:54,974 --> 00:46:56,390 PROFESSOR DAVID J MALAN: Possibly. I would need to know more and would need to read up on some specific technology to speak to that better. But the general principle is absolutely. Irrespective of how you store your data, you might need to massage it into some other format, as someone would say, whereby you ready it for some other analytical process. So hard to answer in the abstract but absolutely, that would be a commonly done thing, especially for analytics if you're trying to aggregate the data somehow. Yeah, Avi and Marco? AUDIENCE: [INAUDIBLE]. 00:47:33,879 --> 00:47:35,670 PROFESSOR DAVID J MALAN: Short answer, yes. For instance, when I mentioned that really big textual strings are stored elsewhere, I meant that literally. So if you think of a table as really just being an array of memory, when you have really big chunks of text that are bigger than a varchar supports, they wouldn't be stored in this rectangular region of memory, so to speak. It might be stored over here where there is more space, albeit at the cost of slower to access. And there would be the equivalent of a pointer where that cell would be in the database pointing over here. So your schema decisions, your design decisions do affect the lower level details for sure. AUDIENCE: Otherwise, the data is actually stored in the physical table? 00:48:14,990 --> 00:48:17,320 PROFESSOR DAVID J MALAN: Physical disk and on top of that is layered the idea of a table. So at the end of the day, everything is stored permanently on disks these days, so like mechanical disks, maybe SSDs. But you get more space from mechanical disks, still. And it might live temporarily in memory. So to yesterday's comment about in memory as being a feature, all the data is hopefully being still stored on disk. But the system probably comes with a lot of RAM or memory to hold it temporarily. Marco? AUDIENCE: I don't know if it's true or not, but some months ago there was a story about a woman with the last name Null, N-U-L-L. PROFESSOR DAVID J MALAN: OK. AUDIENCE: Everytime she tried to register or to buy airplane ticket, for instance, she had problems, because the website crashed because of her last name. PROFESSOR DAVID J MALAN: Really? AUDIENCE: I don't know if it's true. PROFESSOR DAVID J MALAN: It could be. I mean, it doesn't fundamentally need to be the case. There are bugs in the software, then, that are not handling her name properly. I can imagine what was happening, whereby they were just plugging her name into it a-- context. 00:49:18,880 --> 00:49:20,170 Would that do it? No. It's possible. I can't think of a specific language where that would happen. So it could be kind of a myth or a joke but maybe. Let me think about what language could trick-- you could trick null to thinking it's zero. AUDIENCE: [INAUDIBLE]. PROFESSOR DAVID J MALAN: No, not a problem here. Yeah? AUDIENCE: If where does-- [INAUDIBLE]. 00:49:45,428 --> 00:49:47,686 Like if you delete a field from a table and you might delete all that data with it too. What does it-- PROFESSOR DAVID J MALAN: Good question. That one I think will depend much more on the database. That's a level of detail that the database user wouldn't necessarily know. In reality, yesterday, I was definitely oversimplifying a bit because there are so many layers in between us and our files these days. There is the physical hard drive. There is the software or the firmware, so to speak, that's running on the hard drive. There's the device driver built into the operating system that talks to the hard drive. There is the operating system that talks to the device driver and each of those can do whatever it wants with the layer below it. So it's hard to say. Odds are, space is re-used where possible, except for performance reasons, sometimes it might be packed, especially tight together. For instance, there's an archive data format for certain databases whereby the moment you write or insert a row into the database, it gets compressed. And something tells me that is really compacted in memory back to back to back because you're making the contract with the database that you're not going to change that data. You want it to be archived. But hard to say without looking at the actual source code or documentation. Other questions? All right, so let's take a final look at web programming through the lens of an actual language, JavaScript, playing in turn with some sample code and a sample API. So we'll get to an API. Let's start with a bit of JavaScript. And let's do it as follows first. If you go to, let's say, this screen here. Let me give a couple definitions. Here's a very simple web page, again. And I've highlighted in yellow two new tags, the script tag. And we saw these briefly when we looked at Google source code, but in no detail yesterday. But we also saw other tags in the head of a web page when we looked at CSS, for Cascading Style Sheets. So we're introducing script because in this scenario you can actually put programming code between that open script tag and closed script tag. Specifically, the language is JavaScript. Back in the day, you could use something like VB script, Visual Basic script and Microsoft IE. No one really did that and it's not across-platform. So JavaScript is really the only thing you can put there these days. Let me stipulate that putting JavaScript code in the head of your web page, not good, for all of the reasons we discussed yesterday because you're co-mingling your data with the presentation with now some business logic that you would express in code. And so while possible, this is generally not the right approach. A more correct approach tends to be this, where you write all of your JavaScript code, a bit of which we'll write in just a moment. But you put in a separate file, maybe it's called scripts.js or whatever. But you reference that file in this way. You then get the benefits of caching. You then get the benefits of separating your logic from your markup language and all of the same answers we gave yesterday for Cascading Style Sheets. So let's play with this in the following way. So this is some examples from a colleague at Stanford. So if you could, from today's slides, go to this URL, this URL here. 00:52:58,840 --> 00:53:02,164 And let me introduce you to the simplest of APIs as follows. 00:53:04,950 --> 00:53:08,200 Let me grab one thing. I'm looking at the source code of the page for just a moment so I can remember something. Where is that? 00:53:27,097 --> 00:53:27,597 OK. 00:53:30,520 --> 00:53:37,440 OK, I'm about to define the following API for us. And that ties together nicely enough a whole bunch of topics. So an API, or application programming interface, is a fancy way of describing a way of using a library, if you will. A library is a bunch of code that someone else wrote that does something that you can use. An API it's kind of a higher level concept. It is the documentation for how you use that code. If you're using an API, you are using a library in a prescribed way, if you will. And this can be more concretely defined in the following way. We're about to introduce you to JavaScript. But using your keyboard only, no mouse, no clicking and dragging, because JavaScript is a textual language. So what you're about to see are the textual equivalent of scratches, puzzle pieces, or the programming blocks we used a moment ago. You're about to have the ability to call, so to speak, a few different puzzle pieces. A puzzle piece, or a function, or method as we would call it, called get read, that takes two values, x and y, where x and y are the Cartesian coordinates of a pixel in an image. Henceforth, we're going to assume that an image is really just a rectangle on the screen. And it's a GIF or PNG or JPEG, things that we see every day on Facebook and Gmail and the like. And generally speaking, this is 0, 0 over here. This would be like something comma 0. This would be 0 comma something. And this would be something comma something. So you count this way and that way, generally. So when I say x and y, this means get me the x-th y-th pixel at x comma y location. So there are two other functions, get red, get green, x comma y. And get, as you might guess, blue, x comma y. So those are three API calls, so to speak, three functions that you can call in this way. And then there's three others, set red. And actually, capitalization is important so I should be a little less sloppy. Set red, xy, and I'm going to call it n, where n is the number, in this case, from 0 to 255, I believe, for Nick's code. Set green xy n. Set blue xy n. And I've deliberately written my method names in what's called camel case, where camels have humps and so similarly does the text kind of have humps to it where the convention is, you start lowercase. And then you capitalize each subsequent word in the method or in the function's name. So this is a convention. And Nick, the professor at Stanford who wrote this code just, adhered to convention. But this not a technical thing, it's more of a human convention. And it varies by language what people tend to do. So here's the challenge at hand. Number one, an iron image puzzle. So this iron puzzle.png image is a puzzle. It contains an image of something famous. However, the image has been distorted. The famous object is in the red values. However, the red values have all been divided by 10. So they're too small by a factor of 10. So all of the redness in these pixels has been dulled down so much that you can't really tell what the image is anymore. The blue and green values are just all meaningless random values, a.k.a. Noise, added to obscure the real image. You must first undo these distortions to reveal the real image. And how to do this. First, set all of the blue and green values to zero to get them out of the way. Look at the result. If you look very carefully, you may see the real image, though it is very dark, way down towards zero. Then multiply each red value by 10, scaling it back up to approximately its proper value. What's the famous object? So this ties together our discussion yesterday of RGB, whereby each of these thousands of dots on the screen has three numbers associated with it, how much red, how much green, how much blue. What Nick is saying is that he's just added a whole bunch of green and blue values. So for every pixel that has three values, two of them are just random numbers that Nick has thrown at the puzzle creating this noisy static-y image. The red, meanwhile, he's turned the dial all the way down. So there's still a little red there. If he turned it all the way to zero, there'd be no information. But there's enough information, it's just a tenth as much information as you want. So we're going to have to zero out the red and green values and ratchet up, magnified by a value of 10, the red values. So let me get you started and you're welcome to work with the person next to you. And the goal here, really, is just to give you a taste of programming in JavaScript with a very nice visual impact. And here, in this text box below the image, is some sample code. Let me walk you through it and give you a bit of syntax and then send you on your way to see if you can recover this image. Here's how it works. This top line on the left declares a variable called IM for image. It's arbitrary, Nick just was succinct, so IM is what he chose. Right hand side says new simple image and then iron puzzle png. This is just code that's using a library called the simple image library. And Nick knows that to open a file using this library, you literally type new simple image quote unquote "filename" with some parentheses. The effect of that is to store in the variable called IM, not a number, not a word like we've discussed in the past as in Scratch, but to store in a variable a whole image, a whole grid of pixels, if you will. The next line of code is similar to Scratch's repeat block. It's a for loop, so to speak. And the syntax here is saying, initialize a variable called x to zero. Then increment x on each iteration of this loop by one. So x plus plus just means add 1, add 1, add 1, add 1, starting from zero. And then this condition, notice the less than sign, says, keep doing this so long as x is less than the width of that image. So this syntax here is image is the variable name. Dot means go inside of that variable and call, that is, use the puzzle piece called get width, whose purpose in life is to just give you the width of that image. So excuse me, in layman's terms, this just means do the following thing x times where x is the width of the image. So it's like iterating over every column of pixels, if you will. And then you can perhaps guess what does the inner loop do, the for loop that involves y? If the outer loop is iterating over the columns, probably y is representing the rows, down and down and down. So this here is just a comment. So I'm going to delete this. And let me give you this tidbit. If you want to set the green value to something, you would do image.setgreen???. 01:00:32,090 --> 01:00:35,530 If you want to do image.setblue, you would do something, something, something. And if you want to get the value of red, you might say red gets image.getred of something something. And that's it. Red, here is a variable. And I'm omitting one final line, which will allow you to set the amount of red. But let me turn on some music for a couple minutes, even if you've never programmed before, you're welcome to work with the person or persons to the left, to the right, in front and behind, whoever helps you get this done. And re-read the problem statement if you need to. But I claim that my little hints here are probably enough puzzle pieces for you to figure out how to implement this in JavaScript. So let me start to fill in some blanks. Would someone like to offer up, what is the line of code with which I can set all of the green values to zero? 01:01:36,452 --> 01:01:37,660 AUDIENCE: Im.setgreen(x,y,0). 01:01:44,954 --> 01:01:46,370 PROFESSOR DAVID J MALAN: OK, good. So let me run this per my suggestion of baby steps. Click Run, Save. And notice, it suddenly gets much more blue. Why is that? Well we've essentially turned off the green. Just for demonstration's sake, let me do the opposite. Let me ratchet it up to 255 instead of 0. And now the image is really green. So really, we're just kind of turning a knob there. But let's leave it at zero. And someone else, how do I set the blue to zero as well? 01:02:14,780 --> 01:02:18,800 Set blue xyz 0. So now let me hit Run Save. Now unfortunately, it looks really, really black and really washed out on this screen, certainly. And you can probably tilt your laptop and turn up your brightness and kind of see something, and that's just because the red value is so close to zero, that there's information there. But as they would say in the cheesy TV shows, we need to enhance the image so as to increase the fidelity. So we need one other line of code. And I gave you this one. I said red gets image.getred at xy. And I gave that hint to you so that you would have a way of referencing the amount of red currently in the image. And how did people go about magnifying it by a factor of 10? And I did not give you this ingredient, so it's perhaps non-obvious. 01:03:07,640 --> 01:03:10,820 How can I multiply this value? So it turns, out if you want to take the red value and set it equal to its current value times 10, you might think it's x. But of course x, we've already seen, is a variable in this case. So it turns out that many programming languages use an asterisk as multiplication. You wouldn't know that so it's fine if you struggled with the final step. But let me multiply it by 10. But it's not enough to just change the variable. What do I now need to do with the variable called red? AUDIENCE: Set the red to it. PROFESSOR DAVID J MALAN: I need to set the red to it. So you can think of red, this variable, as a puzzle piece that I now need to drag and drop into one of those question mark placeholders and say image.setred at x comma y to not 0 not 255, but to whatever this red value is. So if I now click Run Save, if you've not solved it on your screen, the answer is the Eiffel Tower. And it's just there by nature of having ratcheted up the red value so that there's still black in the image, the Eiffel Tower itself is mostly black. But against this Red Sky, it rather pops out as the result. So very nice. And this is an example of a general technique known as steganography, or the art of hiding information in other information. And the world starts to get kind of spooky when you think about this. Because we've clearly hidden in what was previously a whole bunch of seemingly noise, an actual image. Now that image could have been text. This could have been my secret message to [INAUDIBLE] earlier. It could have been in the form of an image, not even in the form of a note or an email. And you can imagine artists leveraging this to watermark their images. We typically see pretty blatant ugly watermarks on images, but there's no reason you couldn't embed much more subtlety in the pixels of an image your name, your initials, even more information. So that if someone is ripping off your images you can claim, especially if you're in the media, that you are the original source of these images. Or you can actually transmit messages. I mean, what more clever a way for two bad guys to communicate on the internet than to both have seemingly very innocuous websites, a blog if you will, photos of what they've done during the day. But if you actually run code on the images, embedded in every one of those publicly accessible images on Tumblr or Facebook or whatever, might very well be secret messages using a technique not unlike this one. Let's do one more. This next one is a reddish image, also showing something famous. And the definition here is that the true image this time is in the blue and green values. However, all of the blue and green values have been divided by 20. So the values are very small. Excuse me. The red values are just random numbers, noise, that's been added on top. So you need to undo those distortions to reveal the image. Let me take the first line of code that would allow us to set the red values to zero. What do I have to do this time? image.setred at xy 2 0. All right, next. Let me go ahead and run this. Pretty black. So let's see what comes next. Multiply the blue and green values by 20 to get them back approximately. So how did someone do the green values first? Any suggestions? 01:06:26,880 --> 01:06:30,070 Green, I hear Alycia mouthing green equals-- 01:06:32,720 --> 01:06:34,120 AUDIENCE: [INAUDIBLE]. PROFESSOR DAVID J MALAN: Getgreen at xy. And I'm going to be a little presumptuous, blue gets image.getblue at xy. And now what do I want to do with these values? AUDIENCE: [INAUDIBLE]. PROFESSOR DAVID J MALAN: OK, so green gets green times 20. Blue gets blue times 20. And then lastly, image.setgreen to xy green. Image.setblue to xy blue. Holding our breath, Ah. A little more color this time because we're using both the green and the blue channels and just not the red this time. So if in an at home exercise you'd like to tackle the west image puzzle, there is one more in here that you might enjoy as well. So let's take a gamble here. And in our remaining time ratchet things up so that you either feel-- so hopefully, this one of those demos that backfires because there's a few moving parts. The goal is to get everyone up and running with your own instantiation of a tiny, tiny web app that implements the Google Maps API. So how are we going to do this? First, if you would, go to cs50.io like yesterday, and go ahead and log in. And I'm going to go ahead and do the same here. And I'm going to go ahead and sign in to this here. 01:08:10,280 --> 01:08:20,450 And take a moment to just get back to where you were yesterday, which should load after a moment or so. 01:08:29,960 --> 01:08:32,260 So eventually you should be at a screen like this. And in the mean time, if you could open up today's slides and also open in a separate tab, this URL here, which is the entry point for an API from Google that folks like Uber, I believe, and lots and lots and lots of people on the internet use to embed maps into their own applications so that you can start to do things that use maps, but simply exhume it as an ingredient to your own, more interesting, application. So that URL there. 01:09:03,990 --> 01:09:08,660 And at this point in the story, hopefully everyone has cloud 9 open to roughly this state? It's OK if you have other tabs open from yesterday. And let's go ahead and do the following. Go ahead and go to File, New File. 01:09:21,569 --> 01:09:23,880 And that will give you a new tab. And then just type in the word, Tuesday, or something, just so we have a quick and dirty test of whether or not this is working. Go to File, Save. And call it map.html. And odds are this will co-exist alongside yesterday's file, which was hello.html. So when you hit Enter, odds are your interface looks roughly like mine, with map.html to open in the editor, also in the file browser at left. And you probably have your little blue terminal window open at the bottom. So now, if you would, typically, since we're using the free accounts, the web server typically turns itself off and your account hibernates after some amount of time. So just for good measure, go ahead in your terminal window and run Apache 50 start period with spaces in between. And hit Enter. 01:10:18,190 --> 01:10:20,100 And if it's still running, that's fine. It might say stopping and then starting. And you should see the same URL that you were encouraged to visit yesterday. And if you could, the third and final window to open in a tab here is click on that URL, open your website, and visit map.html. And you should see one of two things, ultimately. Either forbidden, like mine, or you see Tuesday, or whatever you typed. If you see forbidden, what was the solution in your terminal window? Yeah. Chmod a+r for read, on map.html. Let all of the world read it. 01:11:02,850 --> 01:11:05,500 And again, that's just giving global permissions. Nothing should seem to happen when you hit Enter. But if you go back now to the forbidden window-- and notice I didn't mention this yesterday-- if you look at the tab, it says 403 forbidden. There's that http status code. Not 404 but 403. If you reload, hopefully you see Tuesday or whatever it is you typed into your tab. Just catch my eye if you want me to run over or look on with the person next to you. [INAUDIBLE], question? 01:11:31,340 --> 01:11:32,320 Yeah? AUDIENCE: Oh, I was-- I have-- PROFESSOR DAVID J MALAN: OK. 01:11:40,980 --> 01:11:41,900 Oh, OK. Down here you want a terminal window. Somehow you closed it. AUDIENCE: OK. PROFESSOR DAVID J MALAN: So use the blue window there. Sure. 01:12:00,480 --> 01:12:02,540 Oh, you capitalized map, which is fine. But just when you type the name, you're going to have to chmod a capital letter. 01:12:11,570 --> 01:12:13,380 All right, any questions? Use the buddy system or use me to run over to unstick. 01:12:19,750 --> 01:12:20,290 All right. 01:12:27,630 --> 01:12:28,510 All right. Meanwhile, in that other tab, I invited you to open earlier you probably see this Google screen. And we will just barely scratch the surface. The goal here is not to build an application, per se, but really just to get you up and running with their very simple sample map just so that you understand the workflow and feel like if you do want to tinker afterward, you have a little something to build on if you would like. Notice that Google offers maps for different platforms, Android, iOS, web and web services. Web is what we want. So if you're on this screen, go ahead and click web. That will lead you to a page that looks like this. And notice, maps user love-- there's different ways to embed maps. And frankly, it all can be a little overwhelming at first. So sometimes Google is your-- ironically, Google is your friend as to figure out what you actually want by just googling around for recommendations. But I figured it out for us. So go to the Google Maps JavaScript API, the very first link. 01:13:24,972 --> 01:13:27,180 And now here, too, they have not made it very obvious because there's a lot of fluffy like images and text here. But click guides at the top here. So not overview but guides. And that should finally lead you to something more technical. So what you are looking at is essentially, API documentation. This is not a standard format. Every company will do this a little bit differently, but generally, good API documentation will have formal definitions of what their API does, the functionality they're giving you, and how you can actually use it with sample code. So we will literally do the Hello World sample here. And it's going to be relatively straightforward. But I'll run around and unstick any issues people are having. Notice that down below underneath Hello World there's a whole bunch of html. And for better or for worse there is some JavaScript commingled in the page. So not best practices but it makes the simple example Google's giving us all self-contained. The objective at hand is to quite simply copy and paste that sample code into map.html, save it but with one change. Notice down here, and they've highlighted it, they are using a script tag in the sample program that is src=https://maps.googleapis.com/maps-- but notice it says key equals your API key. So the way they keep track of who's using their API and they impose limits on people how often they query their API is everyone gets assigned a big pseudo-random number that they save in their database. Rather than have everyone here sign up for this, hopefully mine has not been overused, you can go to the next slide in today's handouts and definitely go to the slides, don't try to transcribe this. Here is an API key that I created for us that you can copy paste. So again, if you need today's slides, you'll never be able to transcribe that URL. Recall that today's slides exist here, just like yesterday. And definitely copy and paste from the slides. Don't manually transcribe. And again, the goal is copy the Hello World example into your own map.html file. Save it, reload and change the API key. And hopefully, you will have your very own map.html with an embedded Google map. The goal really was to give you a sense of JavaScript as a language, two, using an API. And frankly, just as exciting the world of programming can be, when you have these APIs and libraries and third party support on top of which you can build your own product. And indeed, what's especially exciting about software development these days, is it's so much more increasingly about weaving together various ingredients and standing on the shoulders of others equivalently in order to make some really cool applications. And case in point is something like Uber where they are not in the mapping business, per se. But having access to the ability to embed interactive maps into their application was the enabling technology, dare say, on top of which they could then build a car sharing service as well. So it's really quite cool what you can do. Thank you so much to the whole team who's been behind the scenes both in the room and outside the room for the videos today. We'll edit these and make them available online and follow up via email at some point. The slides are already available, so all those references are there. Do feel free to keep in touch if you have any questions. But otherwise, let me officially step out so you feel comfortable filling out the evaluations. And I'll linger in the lobby if anyone has questions. But thanks so much for coming to town this week. See you soon. Thanks. [APPLAUSE]