## Week 8 Friday Andrew Sellergren ### From Last Time (0:00-5:00) + Our first foray into web programming was actually also David's first foray into web programming: a website to register freshman for intramural sports. To create this website, we wrote some HTML and CSS for the look and feel as well as some PHP for the logic. + The website itself was in two parts, one which collected user input via a form and one which processed this input. In the last version of this website, processing the user input mostly entailed constructing an object which would send an e-mail to `jharvard@cs50.net`. + HTTP is *stateless* protocol. This means that it doesn't account for a persistent connection between client and server. How then do we remember a user after his or her client has disconnected from the server? In PHP, we can make use of certain superglobal variables which retain information about a user. So far, we've used the superglobal variables `$_GET` and `$_POST` which store input from forms. In addition to these superglobal variables, there are also the following: + `$_SERVER` + `$_COOKIE` + `$_SESSION` + `$_SERVER` stores some interesting information about the server and user (e.g. IP address, request method). `$_COOKIE` keeps track of cookies, unique identifiers that are assigned to users. Most useful to us, however, is `$_SESSION`. This variable can store anything we want to keep track of, e.g. strings, arrays, objects, for the duration of a user's visit to our website. Think of it like a shopping cart on a merchant's website. Whatever's in the cart stays there until the user closes his browser or until the user or web server decides to delete it. + To see how we might use the `$_SESSION` variable, take a look at `counter.php`: Counter You have visited this site time(s). + At the beginning of our PHP, we start a session by calling the appropriately named `session_start` function. Then we check if our `$_SESSION` variable already has an index in it named "counter." If it does, we initialize a temporary variable to its value. If it doesn't, we initialize the temporary variable to 0. In both cases, we increment the value stored at the "counter" key in `$_SESSION`. Finally we spit out some HTML that prints out the value of the temporary variable. To dynamically insert the value of a PHP variable into some HTML, we simply enclose it with the `` tags. This is just shorthand for a number of other functions we could have called, including `print`, `printf`, and `echo`. + When we visit this webpage for the first time, the counter will read 0. If we keep hitting refresh, this value will increment. + Under the hood, the web server and PHP are using cookies to remember this counter value between refreshes. This is what enables our webpage to give the illusion of maintaining state even though HTTP is a stateless protocol. + This `$_SESSION` variable will be invaluable to you as you implement Problem Set 7, a fake stock-trading website. We'll provide you with the ability to query Yahoo! Finance for a stock price using a single function. It will be up to you to empower users to register for your website as well as to buy and sell stocks. ### Model-View-Controller (13:00-27:00) + As you begin to design web applications, you'll want to think about how to organize your code. One paradigm for organizing code is called Model-View-Controller (MVC). The View encapsulates the aesthetics of the website. The Model handles interactions with the database. The Controller handles user requests, passing data to and from the Model and View as needed. + Let's try to create a course website for CS50 using the MVC framework. In version 0, we have `index.php` which links to `lectures.php`, which links to `week1.php` and `week2.php`, each of which has PDF files for download. This is all well organized, but if we want to change the aesthetics of the website, we have a lot of files to modify. + Version 1 improves upon version 0 by factoring out the header and footer for all pages. Everything that was common to all of our pages was extracted into `header.php` and `footer.php`. Now each of our pages need only call the `require` function to include these common files. + We can take this abstraction one step further in version 2 by writing some helper functions to render the header and footer: + Both functions take a single argument named `$data`. By writing `$data = []`, we specify that the default value for `$data` should be an empty array. `extract` is a function which takes the keys of an associative array and turns them into local variables. So, if you have an array like `[ "foo" => "bar" ]`, then `extract` will create a local variable named `$foo` whose value is "bar." Having done this, `header.php` and `footer.php` will then have access to these local variables. Indeed, in `header.php`, we now don't have to hardcode "CS50" as the title, but rather can use a variable `$title`: <?php echo htmlspecialchars($title) ?>

+ The `htmlspecialchars` function ensures that any special HTML characters that are passed to it are first escaped so that they aren't treated as actual HTML. This helps prevent what's called a cross-site scripting (XSS) attack in which someone else can inject HTML code into your website. + Let's go [deeper](http://i1.kym-cdn.com/photos/images/original/000/062/798/heard-you-like-dreams-e1280242797220.png) in version 3. Our `renderFooter` and `renderHeader` function look pretty similar, so why not write one `render` function that does the work of both: + Here we take an extra argument (e.g. "footer" or "header"), concatenate the file extension ".php" to it, then include it via `require` as before. Our `index.php` now looks like this: 'CS50']); ?> + In version 4, we finally introduce some hierarchy: there are now subdirectories named `includes/` and `templates/`. Now that we've moved our `helpers.php` file into the `includes/` directory, we need to be sure to pass `includes/helpers.php` to `require`. As you will also notice in `index.php`, we're now using the `require_once` function in place of `require`. As you might guess, `require_once` ensures that a file is only included once. + Version 5 introduces a third subdirectory named `html/` which has `index.php`, `lectures.php`, `week1.php`, and `week2.php` within it. These files are the *only* ones that need to be publicly accessible. The others, which may be more sensitive, are kept separate in the `templates/` and `includes/` directories. + To accommodate this new directory structure, we need to pass `../includes/helpers.php` to `require_once`. This tells the function to go up one directory before looking for the `includes/helpers.php` path. + In these examples, we've focused on the V in MVC. You'll get much more familiar with the C in Problem Set 7. As for the M, we'll at least begin now to look databases in general as we dive into MySQL. ### Databases and SQL (27:00-46:00) + Thus far, we've stored data in variables during our programs' execution and even in text or CSV files before and after our programs' execution. Text and CSV files are methods of persistent storage, but they're not very conducive to operations like search or insert. Thankfully, SQL databases *are* conducive to search and insert. + SQL stands for structured query language. It's a language that allows you to insert and extract data from databases. You can think of a database as an Excel file. Within an Excel file, there might be multiple spreadsheets, each of which is analogous to a SQL table. Spreadsheets, like SQL tables, have rows and columns. SQL databases are said to be *relational*, where *relation* is synonymous with table. + SQL databases are accessible via the command line as well as more user-friendly interfaces. One of the tools we'll use to interact with our SQL databases is phpMyAdmin, a free, web-based utility. Here's a snapshot of what a users table might look like in phpMyAdmin: ![A users table in phpMyAdmin](users.png "A users table in phpMyAdmin.") + This users table has three columns: id, username, and hash. The hash is an encrypted version of a password. Storing passwords in plaintext in databases is a really bad idea. All too frequently are databases hacked and their data leaked. If passwords are stored in plaintext in databases, it's just one less hoop that bad guys have to jump through. + The fact that all the values for the hash column start with `$1$` is an artifact of MD5, the method by which they were encrypted. The `50$` is a *salt* that is combined with the password to enhance security. You don't need to worry about understanding MD5, though, because this users table will be provided for you in Problem Set 7. + SQL relies on a small number of operations, a few of which are listed below: + `DELETE` + `INSERT` + `UPDATE` + `SELECT` + To get to the phpMyAdmin interface within the Appliance, open up the web browser and navigate to http://localhost/phpMyAdmin. To get us started, we can click on the Databases tab and create a new database named `lecture`. On the lefthand menu, we now see the word "lecture" which, when clicked, takes us to a prompt to create tables. Let's imagine that we want to store a id, name, and e-mail address for some number of students, so we can create a table named `students` that has 3 columns. This will present us with a form that has 3 rows, one for each column we just asked for. + For each column in the table we're creating, we need to provide a number of options. The names of our columns should be pretty straightforward. The type of the `id` column should probably be `INT` and we'll take it one step further by specifying `UNSIGNED` from the Attributes dropdown menu so that none of our IDs are negative. The `name` and `email` columns should probably be strings. In SQL, a string which has a variable length is designated a `VARCHAR`. For this type, we have to provide the Length/Values option, which means the maximum length that this column's values can be. By convention, we'll use 255, a power of 2 (minus 1). Note that a `VARCHAR` is smart enough to store only the number of bytes that a string actually requires, not the maximum for every string. + Since we don't actually care what the value of `id` is (just so long as it's unique), we can have MySQL generate it for us by checking the A_I checkbox. A_I stands for `AUTO_INCREMENT`. + In relational databases, columns can be designated as *keys* so that the database engine optimizes them for lookup. For example, if we choose `PRIMARY` from the Index dropdown for our `id` column, then we've told MySQL that the value of `id` will always be unique. Knowing this, MySQL can build up a special data structure called a B-tree to make lookups against this column as fast as possible. In theory, `email` could also have been the *primary key* for this table, since e-mail addresses might be unique across users. However, in general it's better to use an integer as an index rather than a string since an integer only requires 4 bytes of memory whereas a string might require many more bytes. Still, if we want `email` to be unique, we can choose `UNIQUE` from the Index dropdown menu. This tells MySQL that the same e-mail address should not be inserted more than once. + One interesting feature of phpMyAdmin is that when we click Save, we'll actually be presented with the SQL that is used to create our table. + To add some data to our newly created table, we click on the Insert tab. For our first entry, we leave the Value for `id` column blank and we set `name` to be "David Malan" and `email` to be "malan@harvard.edu." For our second entry, we again leave `id` blank and we set `name` to be "Mike Smith" and `email` to be "smith@example.com." We click Go and suddenly we have two rows in our table! We can see these rows by clicking on the Browse tab for this table. + Soon you'll be writing your own SQL without the help of phpMyAdmin. Let's try getting some data out of this table we just created: SELECT * FROM students WHERE email="malan@harvard.edu" + Imagine that in your web form, a user has provided his e-mail address and you need to look up his other details. We might write a query like the above which, in this case, gives us a result set with a single row. + In Problem Set 7, your users will visit a Controller via their browsers. That Controller might contain some SQL that will interact with a database. We'll also provide you with some templates that will act as the View portion of the MVC framework.