## Week 9 Monday Andrew Sellergren ### Announcements and Demos (0:00-2:00) + If you're looking for a collaborator for the Final Project, fill out the form at [cs50.net/partners/form](http://cs50.net/partners/form) and check out the spreadsheet at [cs50.net/partners/spreadsheet](http://cs50.net/partners/spreadsheet). + David lectured to an empty Sanders on Friday! So sad. He'll do his best to re-define any terms from that lecture in case you haven't watched it online yet. ### From Last Time (2:00-21:00) + `$_SESSION` is a superglobal variable and an associative array, like `$_GET` and `$_POST`. We use `$_SESSION` to store information about a user that we want to persist as long as he or she is signed in. In Problem Set 7, we store in `$_SESSION` a key "id" with a unique numeric identifier for the logged-in user. + You can think of `$_SESSION` as an online shopping cart. Information will usually remain in `$_SESSION` as long as the user keeps his browser open. `$_SESSION` provides an answer to the problem of HTTP being a *stateless* protocol, meaning that the connection between the browser and server is not persistent. + The example we looked at to demonstrate the functionality of `$_SESSION` was a simple visit counter webpage. When you first visit it, it says "You have visited this site 0 time(s)." However, as you refresh it, that number increments. + Within `counter.php`, we called `session_start()` to enable the `$_SESSION` variable. Then we simply stored within it at index "counter" a numeric value that we increment whenever the user visits. We made a copy of this value in a temporary variable named `$counter` before incrementing it. Finally, we print out `$counter` using the special `` PHP syntax, equivalent to `printf`, `print`, or `echo`. + Chrome's Developer Tools (as well as similar tools in IE and Firefox) allow you to dig into a webpage in a structured way. In the Elements tab, you can examine the HTML as a hierarchical tree which is much preferable to simply viewing the messy source on most webpages. In the Network tab, we can see the files that are being passed from server to browser. Within the Network tab, we can click on the Headers tab and see the raw HTTP headers that are passed between server and browser. One such header is the User-Agent, which is a string indicating the user's browser. Another is the Expires header, which occasionally is set to a time far in the past as part of cache-busting, ensuring that the browser will fetch a new copy of the page when it visits again. As an aside, clearing your cookies and cache is an essential part of web development as caching sometimes prevents your recent changes from rendering. + More interestingly, though, in the response headers, there's a Set-Cookie line. Cookies are a mechanism for servers to remember users. They are actually files that contain text or sequences of bytes which act as unique identifiers for users, sort of like handstamps at amusement parks. In this case, we see that it's an alphanumeric string following the `PHPSESSID` keyword. When the user revisits this page, his or her browser will present this alphanumeric string to the server, which will then look up the user in its database and fill in the `$_SESSION` variable with his or her information. + When using HTTP (not HTTPS), cookies are transmitted in plaintext from browser to server. What that means is that they can easily be intercepted and used by malicious users. A few years ago, a tool named [Firesheep](http://codebutler.com/firesheep/) was released which made this interception all the easier. Services like Gmail and Facebook only recently transitioned to mostly using HTTPS (which encrypts cookies), probably because it's more expensive to encrypt data before transmitting it. ### Database Design (21:00-41:00) + For Problem Set 7, we'll be giving you a database that's already been designed. But what if you had to design one yourself, what things would you consider? + phpMyAdmin is a user interface designed to help you administer your MySQL databases. Let's say that we want to add a `house` column to our `students` table from Friday. We click the Structure tab and then click on the Go button next to "Add 1 column(s)..." Don't worry, these steps are outlined in the Problem Set 7 specification. + First, we need to decide what type our `house` column will be. Obviously, it won't be an `INT`. It could be a `VARCHAR`, which is a string of variable length. In contrast to `CHAR`, which has a fixed length, `VARCHAR` only uses the minimum number of bytes necessary to store a string. So if you insert the value `Mather` into a `VARCHAR` column of length 255, it uses 6 bytes whereas if you insert the value `Mather` into a `CHAR` column of length 255, it uses 255 bytes. The flipside of this, however, is performance. If your column values have a fixed length, as with the `CHAR` type, the database engine can optimize for their lookups because it knows exactly where to jump to access them. + The `INT` type stores 32-bit integers. Having used this for its user id field, Facebook recently faced the problem of running out of integers and had to switch to a `BIGINT` type, which store 64-bit integers. `DECIMAL` is a type that specified fixed precision, as described in Problem Set 7. + So we settle on `VARCHAR` of length 255 for `house`. If we choose to, we can also specify a default value of `NULL` so long as we check the Null checkbox to indicate that these values are allowed. + We spoke on Friday about keys and indices for SQL tables. The primary key is a field which uniquely identifies a row in a table. For our `students` table, the best candidate for this is the `id` field. `email` is also a possibility, but is less ideal because it's a string instead of an integer and thus proably requires more bytes on average. We can also specify some columns as being `UNIQUE`, which will guarantee that duplicate values cannot be inserted. Even if a column is not `UNIQUE`, however, we can specify it as an index if we know that we're going to search it frequently. In the case of `house`, for example, the values won't be unique, but we'll want MySQL to build up a data structure (usually a B-tree) behind the scenes so that it's easily searchable. This is what specifying a column as an index does. + Once we click Save and add `house` to our `students` table, the existing rows will have a value of `NULL` for the new column. If we want to change David's `house` value to "Pfoho," we can do it by clicking Edit or we can click the SQL tab and manually execute the following query: UPDATE students SET house = 'Pfoho' WHERE id = 1; + This query is actually pretty readable. It just says to update the `students` table and set the `house` value to "Pfoho" wherever the `id` value is 1. Since the `id` value is 1 only for David, we're only updating a single row. + This database design is pretty inefficient. If some 400 people live in Mather, then this `students` table will have 400 rows that have the value "Mather" as a value for `house`. We're wasting bytes by storing this string over and over again. Instead of storing the whole "Mather" string for each of these rows, let's instead store an identifier that maps to "Mather" in another table. + We start by creating a new table named `houses` which has just 2 columns, `id` with type `INT` and attributes `UNSIGNED` and `AUTO_INCREMENT` as well as `name` with type `VARCHAR` of length 255. Then we insert the values of "Mather" and "Pfoho" into this table by executing the following SQL queries: INSERT INTO houses (name) VALUES('Mather'); INSERT INTO houses (name) VALUES('Pfoho'); + Now that we have the names of these houses mapped to integer values, we need to change how we store them in the `students` table. The type of the `house` field should be changed from `VARCHAR` to `INT`. David's `house` value then becomes 1 and Mike's becomes 2. Henceforth, we'll only be using 32 bits to store the `house` value as opposed to potentially many more for long house names. This process of factoring out redundant information from our database is called *normalization*. + Incidentally, the number that sometimes follows `INT` in parentheses, e.g. `INT(11)`, has nothing to do with the size of the column. It refers to the display width of the column. + But how do we search for all students who live in Mather? We have to join our `students` table with our `houses` table using a SQL query like the following: SELECT * FROM students JOIN houses ON students.house = houses.id; + The `*` means select all fields from both tables. The `ON` syntax specifies which columns map to each other in the two tables. The takeaway here is that if you find yourself storing information redundantly, you should consider factoring it out into separate tables. ### Atomicity and Transactions (41:00-51:00) + Suppose you have a dorm fridge that you share with your roommates. One day, you notice that you're out of milk, so you go to CVS to buy some more. While you're there, your roommate also notices that you're out of milk and goes somewhere other than CVS to buy some more. A few minutes later, you both return with more milk, some of which is bound to go bad because you won't drink it in time. A tragedy indeed! + What could solve this predicament? You could have left a note saying "Gone for milk." You could also have locked the refrigerator door to prevent your roommate from even opening it. If you think about this in computer science terms, the refrigerator might represent some object that both you and your roommate have access to read. The problem arose because both of you were allowed to read the object. This could have been prevented by locking your roommate out from reading the object while you were reading it. + Now imagine that the object in question is a bank account instead of a refrigerator. If there are two ATM cards that can withdraw from this bank account, it's conceivable that they could access the account nearly simultaneously. Maybe the account has $200 in it and both cards ask to withdraw $100. If they've both queried for the balance nearly simultaneously, they both might see the balance as $200. Once the withdrawals have gone through, the two machines do the math and subtract $100 from the original $200. Finally, the balance is set to $100 by both machines, but $200 has in fact been withdrawn from an account that only had $200 in it to begin with. + In Problem Set 7, we provided you with queries like the following: INSERT INTO table (id, symbol, shares) VALUES(7, 'DVN.V', 10) ON DUPLICATE KEY UPDATE shares = shares + VALUES(shares); + Although it's a bit cryptic, you can tell that this query is doing some inserting as well as some updating. The fact that it's written as a single query guarantees that it's *atomic*, meaning it's executed as a single database operation or not at all. If we took the approach of first querying to see how many shares of this stock the user owns and then calculating the new value to update based on this, we could end up with a situation like we did with the refrigerator or the ATM in which two back-to-back queries corrupt the true value of a column. + SQL also offers *transactions*. By enclosing several SQL operations with a `START TRANSACTION` and a `COMMIT` statement, you guarantee that all of them get executed simultaneously. Although this isn't necessary for Problem Set 7, it may prove useful for your Final Project. ### DOM and JavaScript (51:00-76:00) + Consider for the sake of this discussion the following very simple webpage: hello, world hello, world + As the indentation in our code has suggested, there is a hierarchy to HTML. We can represent this simple webpage with the following tree: ![DOM](dom.png "The DOM.") + This tree representing a webpage we'll refer to as the Document Object Model, or DOM. The root node is the document itself. The first and only child of this root document node is the html element. The html element has two children, the head and the body. The head is to the left pictorially because it comes first in our HTML code. The head element has a child named title which itself has a child containing the text "hello, world." This text is represented with a node to distinguish it from HTML elements like body, head, and title. + Given that webpages have this well-defined tree structure, traversing them with JavaScript is quite straightforward. + For a nice starter guide to JavaScript, check out [this one](https://developer.mozilla.org/en-US/docs/JavaScript/Guide) provided by Mozilla, the developers of Firefox. + Like PHP, JavaScript has no `main` function. Unlike PHP, however, JavaScript is executed clientside, not serverside. The browser itself downloads and executes the JavaScript code on a webpage. This has meaningful implications for intellectual property since it's hard to protect your JavaScript source code from being seen by others. + Conditions, Boolean expressions, switch statements, and loops are the same in JavaScript as they are in C and PHP. Variables have a slightly different syntax: var s = "hello, world"; + JavaScript is weakly typed, so you don't need to specify the type of a variable. However, you do generally need to include the keyword `var` lest you accidentally declare your variable to be global. + Arrays in JavaScript are syntactically similar to arrays in PHP: var numbers = [4, 8, 15, 16, 23, 42]; + Associative arrays don't really exist in JavaScript. Instead, JavaScript has objects: var student = { id: 1, house: "Winthrop House", name: "Zamyla Chan" }; + As an aside, note that JavaScript has no relation whatsoever to Java. A somewhat confusing naming choice. + One special type of loop in JavaScript mimics the `foreach` syntax of PHP: for (var i in array) { // do this with array[i] } + With this syntax, JavaScript will do the work of figuring out how long the array is and iterate from 0 to that length, storing the iterator in `i`. + To include JavaScript in a webpage, we use the ` blink
hello, world
+ In `blink.html`, we resurrect the `` tag which was deprecated from HTML. Ignoring the ` form1
Email:
Password:
Password (again):
I agree to the terms and conditions:

+ The first `