Week 9, continued

From Last Time

We talked about how web servers send web pages to your browser when they request it. Those web pages are written in a markup language called HTML. CSS is a language that controls aesthetics like font size and color. Ideally, for the sake of reusability and organization, CSS lives in a separate file that gets linked into the HTML. One downside of this approach is that it requires a separate HTTP request to be fetched and thus it might increase latency. Smart browsers will often save a copy of CSS files and other static content locally so that they don’t have to be refetched. This local repository is called a cache and the act of saving to it is called caching.
Unlike HTML, PHP is a true programming language in that it can express logic. When we need our web pages to change their content dynamically, we use PHP to output HTML that the browser then renders. Note that style-wise, we don’t really care about indentation in the HTML, only in the PHP.

Reimplementing `speller`

Unless you really need to your program to be highly performant, perhaps because your data sets are huge, you’ll probably reach for a language like PHP rather than a language like C. Your C implementation of speller is no doubt very fast, but think how long it took you to program. In contrast, let’s see how easy it is to re-implement the logic in PHP.

<?php
    $size = 0;

    $table = [];

    function load($dictionary)
    {
        global $size, $table;
        foreach (file($dictionary) as $word)
        {
            $table[chop($word)] = true;
            $size++;
        }
        return true;
    }

?>

A hash table in PHP is as easy to implement as declaring an associative array. We can use the words themselves as indices in the array. file is a function that reads in a file and hands it back to you as an array of lines.
A few other details include declaring our variables as globals, stripping the whitespace from the word, and incrementing the $size variable.

Let’s take a stab at check:

<?php
    $size = 0;

    $table = [];

    function check($word)
    {
        if (isset($table[strtolower($word)]))
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    function load($dictionary)
    {
        global $size, $table;
        foreach (file($dictionary) as $word)
        {
            $table[chop($word)] = true;
            $size++;
        }
        return true;
    }

?>

All we need to check is if the index for a particular word is set to know whether it’s in the dictionary. size and unload are similarly trivial to implement:

<?php
    $size = 0;

    $table = [];

    function check($word)
    {
        global $table;
        if (isset($table[strtolower($word)]))
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    function load($dictionary)
    {
        global $size, $table;
        foreach (file($dictionary) as $word)
        {
            $table[chop($word)] = true;
            $size++;
        }
        return true;
    }

    function size()
    {
        global $size;
        return $size;
    }

    function unload()
    {
        return true;
    }

?>

If it’s this easy to implement this in PHP, why bother implementing it in C? When you run speller in C over the King James Bible, you get a runtime of about 0.5 seconds. When you run speller in PHP over the King James Bible, you get a runtime of about 3 seconds. That’s an order of magnitude difference! One reason is that PHP is an interpreted language, so it has to be read and executed line by line. C on the other hand is already compiled into 0s and 1s by the time it is executed. There are ways to cache the results of the PHP interpreter so that content can be served up faster on the web, but it will likely never compete with C in terms of performance.

Sessions and Cookies

There are a number of variables called superglobals that are available everywhere in PHP programs:
- $_COOKIE
- $_GET
- $_POST
- $_SERVER
- $_SESSION
A cookie is a long unique identifier that is planted on your computer when you visit a web page, typically via the "Set-cookie" HTTP header. This identifier gets passed back to the web server via the "Cookie" HTTP header when you revisit that web page so that it knows who you are and can serve you content that is specific to you. Think of it like a hand stamp at an amusement park.

`counter.php`

counter.php demonstrates the use of sessions to implement a simple counter for a user’s visits:

<?php
    // enable sessions
    session_start();

    // check counter
    if (isset($_SESSION["counter"]))
    {
        $counter = $_SESSION["counter"];
    }
    else
    {
        $counter = 0;
    }

    // increment counter
    $_SESSION["counter"] = $counter + 1;

?>

<!DOCTYPE html>
<html>
    <head>
        <title>counter</title>
    </head>
    <body>
        You have visited this site <?= $counter ?> time(s).
    </body>
</html>

Note that $counter exists in scope even outside the curly braces of the if-else blocks.
session_start() needs to be called in order to use HTTP sessions, which allow a web server to keep track of a user across visits. This function call tells the server to send the "Set-cookie" header with a long alphanumeric string after the keyword PHPSESSID.
Question: are PHP variables always global? Yes, unless they are declared within a function.
The <?= $counter ?> syntax is shorthand for switching into PHP mode and printing out a variable.
If cookies are used to identify users, then impersonating a user is as easy as stealing a cookie. The defense against this is to encrypt HTTP headers using SSL. This might be familiar to you as HTTPS. Even SSL encryption can be broken though!

SQL

We introduced SQL last time as a language to interact with databases. You can think of a database as an Excel spreadsheet: it stores data in tables and rows.
There are four basic SQL commands:
- SELECT
- INSERT
- UPDATE
- DELETE
For Problem Set 7, we’ve set you up with a database and an application named phpMyAdmin (not affiliated with PHP) to interact with it. In that database, there is a users table with id, username, and hash columns:
Presumably, username is unique, so why bother with an id column? An id is only 32 bits because it’s an integer, whereas username is a variable-length string. Comparing strings is not as fast as comparing integers, so lookups by id will be faster than lookups by username.
SQL supports the following types:
- CHAR
- VARCHAR
- INT
- BIGINT
- DECIMAL
- DATETIME
A CHAR is a fixed-length string whereas a VARCHAR is a variable-length string. It’s slightly faster to search on a CHAR than a VARCHAR.
SQL tables also have indexes:
- PRIMARY
- INDEX
- UNIQUE
- FULLTEXT
Adding indexes makes searching a table faster. Because we know that we’ll be using id to look up rows and we know that id will be unique, we can specify it as a primary key by choosing PRIMARY from the Index dropdown menu. Although we decided that id should be the lookup field rather than email, we still want email to be unique, so we choose UNIQUE from the Index dropdown menu. This tells MySQL that the same e-mail address should not be inserted more than once. Choosing INDEX tells MySQL to build a data structure to make searching this column more efficient, even though it’s not unique. Similarly, FULLTEXT allows for wildcard searching on a column.
Databases can be powered by one of several different engines:
- InnoDB
- MyISAM
- Archive
- Memory
This is one more design decision that we have to make, but we won’t trouble ourselves with the particulars for right now.

Race Conditions

Let’s motivate our discussion of race conditions with a short story. Imagine that you come home from class and open the refrigerator to find that there’s no milk. You close the door, walk to the store, buy some milk, and come home. In the meantime, your roommate has also come home and noticed that there’s no milk. He went out to buy some more, so when he comes back, you now have two cartons of milk, one of which will certainly spoil before you can drink it.
How could you have avoided this? You could have left a note on the refrigerator or you could have even padlocked the door. In the real world, we might run into this same situation when withdrawing from an account from two different ATMs. If both ATMs query for the balance of the account at the same time, they will return the same number. Then when you withdraw $100 from each ATM, 100 will be subtracted from that number on each ATM instead of the cumulative 200 that should be subtracted. Free money!
For Problem Set 7, you’ll need to keep track of users' cash balances and shares of stock. If you issue a SELECT statement to get these numbers and then issue an UPDATE statement to update them, you may run into the same problem as the milk or ATM situations we just discussed. Better to run one atomic operation like so:
```
INSERT INTO table (id, symbol, shares) VALUES(7, 'DVN.V', 10)
    ON DUPLICATE KEY UPDATE shares = shares + VALUES(shares);
```
Atomicity means that multiple operations happen at the same time or don’t happen at all. The above statement is both an INSERT and an UPDATE statement, where the UPDATE statement is executed only if the INSERT fails because there’s already an entry for that stock.

Some database engines also support transactions:

START TRANSACTION;
UPDATE account SET balance = balance - 1000 WHERE number = 2;
UPDATE account SET balance = balance + 1000 WHERE number = 1;
COMMIT;

Because these two UPDATE statements are part of the same transaction, they will only succeed if both succeed.

JavaScript

JavaScript is an interpreted programming language that executes clientside. Whereas PHP is executed on the server, JavaScript is downloaded by the browser and executed there.
If you’ve ever been on page that updates its content without reloading, you’ve seen JavaScript in action. Behind the scenes, it has actually made another HTTP request.
JavaScript is also used to manipulate the DOM, the document object model, the tree of HTML that we saw earlier.
The syntax for conditions, Boolean expressions, loops, switch statements is much the same in JavaScript as it is in PHP and C. One new type of loop exists, however:
```
for (var i in array)
{
    // do this with array[i]
}
```
Functionally, this loop is equivalent to the foreach loop in PHP.
The syntax for arrays is slightly different:
```
var numbers = [4, 8, 15, 16, 23, 42];
```
Note that we don’t have the $ prefix for variables anymore. We still don’t specify a type for the variable, though.
Another built-in data structure in JavaScript are objects:
```
var quote = {symbol: "FB", price: 49.26}
```
Objects are similar in functionality to associative arrays in PHP or structs in C.
JavaScript Object Notation, or JSON, is a very popular format these days. If you work with APIs like Facebook’s for your Final Project, you’ll be passed data in JSON. JSON is quite useful because it is self-describing: each of the fields in an object is named.

`dom-0.html`

Let’s start with a "hello, world" for JavaScript:

<!DOCTYPE html>

<html>
    <head>
        <script>

            function greet()
            {
                alert('hello, ' + document.getElementById('name').value + '!');
            }

        </script>
        <title>dom-0</title>
    </head>
    <body>
        <form id="demo" onsubmit="greet(); return false;">
            <input id="name" placeholder="Name" type="text"/>
            <input type="submit"/>
        </form>
    </body>
</html>

What’s interesting here is that we can embed JavaScript directly in HTML using the <script> tag. The alert function is a quick and dirty way of displaying output via a pop-up window. Convention in JavaScript is to use single quotes for strings, by the way.
In JavaScript, there exists a special global variable named document that contains the entire tree structure of the HTML. The document object also has functions associated with it known as methods. One such method is getElementById which retrieves an HTML element with the specified id attribute.
greet is called when the form is submitted because it is passed to the onsubmit attribute of the form. After that, we have to return false because otherwise the form will actually submit and redirect to whatever its action attribute is.

`dom-2.html`

jQuery is a library that adds a great deal of convenience to writing JavaScript. Take a look at it in dom-2.html:

<!DOCTYPE html>

<html>
    <head>
        <script src="http://code.jquery.com/jquery-latest.min.js"></script>
        <script>

            $(document).ready(function() {
                $('#demo').submit(function(event) {
                    alert('hello, ' + $('#name').val() + '!');
                    event.preventDefault();
                });
            });

        </script>
        <title>dom-2</title>
    </head>
    <body>
        <form id="demo">
            <input id="name" placeholder="Name" type="text"/>
            <input type="submit"/>
        </form>
  </body>
</html>

For now we’ll wave our hands at the first line of JavaScript above. Basically, it just waits till the document has loaded before executing anything. $('#demo') is equivalent to document.getElementById('demo').
One feature of JavaScript that we’re leveraging here is the ability to pass functions as objects to other functions. The only argument to the submit method is an anonymous function that takes in its own argument event.

Ajax

`ajax-2.html`

Ajax is the technology that allows us to request external data without a page refresh. Take a look at ajax-2.html:

<!--

ajax-2.html

Gets stock quote from quote.php via Ajax with jQuery, embedding result in page itself.

David J. Malan
malan@harvard.edu

-->

<!DOCTYPE html>

<html>
    <head>
        <script src="http://code.jquery.com/jquery-latest.min.js"></script>
        <script>

            /**
             * Gets a quote via JSON.
             */
            function quote()
            {
                var url = 'quote.php?symbol=' + $('#symbol').val();
                $.getJSON(url, function(data) {
                    $('#price').html(data.price);
                });
            }

        </script>
        <title>ajax-2</title>
    </head>
    <body>
        <form onsubmit="quote(); return false;">
            Symbol: <input autocomplete="off" id="symbol" type="text"/>
            <br/>
            Price: <span id="price">to be determined</span>
            <br/><br/>
            <input type="submit" value="Get Quote"/>
        </form>
    </body>
</html>

The quote function appears to be constructing a URL with a stock symbol as a parameter. If we visit quote.php?symbol=GOOG directly, we’ll get some JSON spit out to the screen that includes a stock price. In the JavaScript above, we’re asking for that JSON programmatically, then passing it to a function that inserts it into the DOM.

Teaser

Check out geolocation-0.html which seems to know where you are!