Announcements and Demos

  • This week we start our foray into web programming! The fundamentals and concepts from the first 9 weeks of the course will still play a role in how you program, but you’ll find that tasks are an order of magnitude easier to accomplish now.

  • After taking a course like CS50, you may start noticing mistakes in a show like Numb3rs. First, IP addresses (at least v4 addresses), are of the form w.x.y.z, where w, x, y, and z are all numbers between 0 and 255. Second, the language of the code you see in Charlie’s browser window is Objective C and appears to be manipulating a variable named crayon, which has nothing to do with what Amita was ostensibly programming.

From Last Time

  • Every computer on the internet has a (more or less) unique IP address. An IP address can be used for good in order to send and receive information. An IP address can also be used for evil if an adversary wants to track a particular computer.

  • When you log onto a website, your browser (the client) is communicating with another computer on the internet (the server) via HTTP. HTTP is just one of many services like FTP and SMTP which sit on top of the internet. A single computer can support multiple different services on different ports.

  • HTTP responses are tagged with a status code like one of the following:

    • 200 - OK

    • 301 - Moved Permanently

    • 302 - Found

    • 401 - Unauthorized

    • 403 - Forbidden

    • 404 - Not Found

    • 500 - Internal Server Error

  • You’re probably familiar with 404, which you’ll see if a file has been deleted from a web server or if you’ve mistyped a URL. You can think of 500 as the analog of a segmentation fault: it means something really bad happened!

Web Debugging Tools

  • Browsers these days have very powerful tools for debugging. In Chrome, there’s Developer Tools, in IE, there’s Developer Toolbar, and in Firefox, there’s Firebug. Chrome is installed by default on the Appliance, so we’ll take a peek at Developer Tools.

  • If you right click on any part of a web page and select Inspect Element, the Developer Tools pane will be opened at the bottom. The Elements tab provides a much more readable and organized version of the HTML of the web page. The Network tab allows you to poke around the HTTP requests that your browser executes. If we navigate to facebook.com, we’ll see that the first such HTTP request is met with a 301 response code. We can also see the GET / HTTP/1.1 request that we sent if we click on view source next to Request Headers.

  • Why is Facebook responding with a "Moved Permanently" status? We didn’t type www before facebook.com, so Facebook is adding it for us. They might be doing this for technical or even branding reasons.

  • But the second HTTP request is also met with a 301 response code! This time we’re being redirected to the HTTPS version of the site.

  • When our request is finally answered with some content, it’s a lot more than just one page of HTML. There are a number of scripts, stylesheets, and images that are received as well. This makes sense since web pages consist of more than just HTML.

Intro to HTML and PHP

HTML

  • HTML stands for hypertext markup language. The word "markup" implies that it’s not a programming language, per se. For the most part you can’t (and shouldn’t) express logic in HTML. Rather, HTML allows you to tell the browser how to render a web page using start and end tags.

  • A very simple web page we looked at last time was as follows:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>hello, world</title>
        </head>
        <body>
            hello, world
        </body>
    </html>
    
  • The first line is the doctype declaration which tells the browser "here comes some HTML." Everything after that is enclosed in the <html> tag. This tag has two children, <head> and <body>. Each of these children, which begin with an open tag and end with a close tag, we’ll call HTML elements. Given that certain elements are children of each other, we can use a tree structure to represent the HTML of a web page:

    A tree structure representing HTML.

PHP

  • As with other languages, let’s start by writing a "hello, world" program in PHP:

    printf("hello, world\n");
    
  • PHP is an interpreted language, so there’s no compiler. To run this program, we pass it to the PHP interpreter, which is conveniently named php. When we do so, we just get our line of code printed out to the screen. What went wrong? PHP has tags (<?php and ?>) which tell the interpreter where code begins and ends. Without them, our lines of code are interpreted as raw HTML that should be passed over. Let’s add them in:

    <?php
    
        printf("hello, world\n");
    
    ?>
    
  • If you run the command ls in your home directory on the Appliance, you’ll see a directory called vhosts. You can actually run multiple websites on a single web server using this concept of virtual hosts. When a browser sends a request to that web server, it will include in that request the domain name that it wants a response from. Within the vhosts folder, there’s a single folder named localhost. On another web server, there might be multiple folders here, one for each website that the server hosts. Within the localhost folder is a public folder where we can store our HTML source code. The Appliance has been configured such that http://localhost maps to this directory.

  • If we navigate to http://localhost/index.html on Chrome in our Appliance, we get a 403 Forbidden error. Sad trombone. We can diagnose the problem by typing ls -l in our public directory:

    -rw------- 1 jharvard students 135 Nov  4 13:29 index.html
  • The rw on the far left stands for "read or write." The fact that there is only one such rw means that only the owner of this file can read it. We want everyone to be able to read it, so we run the command chmod a+r index.html. Now when we reload the page in our browser, we see "hello, world."

More HTML

  • What’s our favorite thing to do on a web page? Click on links of course! To add a link, we use the <a> tag:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>hello, world</title>
        </head>
        <body>
            hello, <a href="https://www.cs50.net/">CS50</a>.
        </body>
    </html>
    
  • href is an attribute of the <a> tag. In this case, it contains the URL that we want to link to. What goes inside the element is the text we want the link to show: "CS50."

  • Consider the security implications of this. If we change the href attribute to be https://www.fakecs50.net/, the text will still show CS50. This is misleading and potentially harmful to the user if the site he or she ends up on is malicious. We call this a phishing attack. Don’t fall for it!

  • Another useful tag is the <div> tag, which doesn’t have any aesthetics, but allows us to organize the structure of our HTML. If we want to add aesthetics as well, we can use the style attribute like so:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>hello, world</title>
        </head>
        <body>
            <div style="background-color: red;">
                Top of Page
            </div>
            <div>
                Bottom of Page
            </div>
        </body>
    </html>
    
  • The value of this style attribute is a language known as CSS, or cascading stylesheets. It specifies properties and their values to control what the page looks like. Instead of the English word "red," we can even specify a hexadecimal RGB value like ff0000 for background-color and achieve the same effect.

  • As you can see, it’s very easy to make very hideous web pages. To see all the attributes of a given tag and all the different CSS properties, consult an online reference like W3Schools.

  • Better than embedding CSS within our tags is factoring it out into a separate <style> tag at the top of our page:

    <!DOCTYPE html>
    
    <html>
        <head>
            <style>
                #top
                {
                    background-color: #ff0000;
                }
                #bottom
                {
                    background-color: #abcdef;
                    font-size: 24pt;
                }
            </style>
            <title>hello, world</title>
        </head>
        <body>
            <div id="top">
                Top of Page
            </div>
            <div id="bottom">
                Bottom of Page
            </div>
        </body>
    </html>
    
  • Note that we can specify CSS properties that apply only to a single <div> by assigning it an id attribute that we reference within the style tag.

  • Separating out CSS into a single style element is good practice because it is easier to reuse. In fact, we can (and should) even separate out CSS into a separate file altogether:

    <!DOCTYPE html>
    
    <html>
        <head>
            <link href="styles.css" rel="stylesheet"/>
            <title>hello, world</title>
        </head>
        <body>
            <div id="top">
                Top of Page
            </div>
            <div id="bottom">
                Bottom of Page
            </div>
        </body>
    </html>
    
  • The <link> tag is an example of an empty tag in that there is no need for a separate close tag. styles.css must be in the same folder as index.html, unless we specify the full or relative path, and it must be world-readable.

Implementing Google

  • When you search for something on Google, the URL changes from google.com to something with a lot of information after the /. This information is actually a series of parameters that Google uses to create its search results for you. One such parameter is your search query. If you navigate to http://www.google.com/search?q=cats, you’ll notice that the search term "cats" is already filled in for you. q is a key meaning "query" and its value, "cats," is specified after the =.

  • Let’s implement Google! First, we need an input box for a user’s query:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>CS50 Search</title>
        </head>
        <body>
            <h1>CS50 Search</h1>
            <form>
                <input type="text"/>
                <input type="submit"/>
            </form>
        </body>
    </html>
    
  • With this <form> tag and a few <input> tags, we have a very simple search engine…that doesn’t do anything. We need to specify an action for the <form> tag:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>CS50 Search</title>
        </head>
        <body>
            <h1>CS50 Search</h1>
            <form action="https://www.google.com/search" method="get">
                <input name="q" type="text"/>
                <br/>
                <input type="submit" value="CS50 Search"/>
            </form>
        </body>
    </html>
    
  • Now we’re telling the form to submit its information directly to Google using the GET method. There are two methods for submitting form information, GET and POST. For now, just know that GET means the information is appended to the URL.

  • We’ve also added a name attribute to the text input to match the URL parameter we saw that Google was using. We changed the text that the submit button displays to "CS50 Search" using its value attribute. Finally, we added a line break using the <br/> tag between the two inputs.

  • When we type "cats" and click "CS50 Search," we end up on http://www.google.com/search?q=cats! We’ve implemented Google!

Frosh IMs

froshims0.php

  • Back at the turn of the 19th century when David was a freshman at Harvard, the process of registering for intramural sports was painfully manual. You had to fill out a paper form and actually drop it off at the dorm room of the proctor in charge. David decided to change all that by implementing an online registration form. Although his original implementation was in Perl, we can recreate it in HTML and PHP:

    <?php
    
        /**
         * froshims-0.php
         *
         * David J. Malan
         * malan@harvard.edu
         *
         * Implements a registration form for Frosh IMs.
         * Submits to register-0.php.
         */
    
    ?>
    
    <!DOCTYPE html>
    
    <html>
        <head>
            <title>Frosh IMs</title>
        </head>
        <body style="text-align: center;">
            <h1>Register for Frosh IMs</h1>
            <form action="register-0.php" method="post">
                Name: <input name="name" type="text"/>
                <br/>
                <input name="captain" type="checkbox"/> Captain?
                <br/>
                <input name="gender" type="radio" value="F"/> Female
                <input name="gender" type="radio" value="M"/> Male
                <br/>
                Dorm:
                <select name="dorm">
                    <option value=""></option>
                    <option value="Apley Court">Apley Court</option>
                    <option value="Canaday">Canaday</option>
                    <option value="Grays">Grays</option>
                    <option value="Greenough">Greenough</option>
                    <option value="Hollis">Hollis</option>
                    <option value="Holworthy">Holworthy</option>
                    <option value="Hurlbut">Hurlbut</option>
                    <option value="Lionel">Lionel</option>
                    <option value="Matthews">Matthews</option>
                    <option value="Mower">Mower</option>
                    <option value="Pennypacker">Pennypacker</option>
                    <option value="Stoughton">Stoughton</option>
                    <option value="Straus">Straus</option>
                    <option value="Thayer">Thayer</option>
                    <option value="Weld">Weld</option>
                    <option value="Wigglesworth">Wigglesworth</option>
                </select>
                <br/>
                <input type="submit" value="Register"/>
            </form>
        </body>
    </html>
    
  • As before, we have <head> and <body> tags. Within the <body>, there’s a <form> whose action attribute is register0.php. We see <input> tags with type set to "text," "checkbox," and "radio." Text and checkbox should be self-explanatory, but radio refers to the bulleted buttons for which the user can only choose 1 option. To create a dropdown menu, we use the <select> tag with <option> tags within it. Finally we have our submit button which displays "Register" as its value attribute.

  • When we enter in values into this form and click "Register," we’re taken to the register0.php URL that was specified in the action attribute of the form. Unlike with our CS50 Search example, this URL doesn’t have any of our inputs embedded in it. That’s because we used the POST method of sending data rather than the GET method.

  • register0.php does nothing more than print out our inputs as an associative array. Whereas we only worked with numerically indexed arrays in C, PHP supports arrays that can use strings and other objects as keys. An associative array is really just a hash table! Because POST sends data via the headers rather than the URL, it is useful for submitting passwords, credit card numbers, and anything that’s sensitive. It’s also useful for sending data that’s too large to embed in the URL, for example an uploaded photo.

conditions-1.php

  • To get a feel for this new language, let’s take a look at how we would implement conditions-1.c in PHP:

    <?php
    
        /**
         * conditions-1.php
         *
         * David J. Malan
         * malan@harvard.edu
         *
         * Tells user if his or her input is positive, zero, or negative.
         *
         * Demonstrates use of if-else construct.
         */
    
        // ask user for an integer
        $n = readline("I'd like an integer please: ");
    
        // analyze user's input
        if ($n > 0)
        {
            printf("You picked a positive number!\n");
        }
        else if ($n == 0)
        {
            printf("You picked zero!\n");
        }
        else
        {
            printf("You picked a negative number!\n");
        }
    
    ?>
    
  • The syntax for PHP is actually quite similar to that of C. Variable names in PHP are prefixed with a $. Variables also do not need to be declared with explicit types because PHP is a loosely typed language. In different contexts, PHP will implicitly cast variables from one type to another. readline is a new function, but the if-else construct is identical to C.

register-0.php

  • register0.php is a quick example of commingling PHP and HTML:

    <!DOCTYPE html>
    
    <html>
        <head>
            <title>Frosh IMs</title>
        </head>
        <body>
            <pre>
                <?php print_r($_POST); ?>
            </pre>
        </body>
    </html>
    
  • Within the <pre> HTML tags, we enter PHP mode by inserting the <?php and ?>. Once we’re in PHP mode, we access a variable named $_POST. This is an associative array which PHP constructs for you whenever you pass in data via the POST method. If we had used the GET method, the data would be available in the $_GET variable. print_r is a function which prints recursively, meaning it prints everything that’s nested within a variable. When we pass the $_POST variable to print_r, we see the four inputs that the user provided, each with a key that corresponds to the name attribute of the input. $_POST and $_GET are known as superglobal variables because they’re available everywhere.

register-3.php

  • In register3.php, we take the extra step of actually e-mailing the user’s information:

    <?php
    
        /**
         * register-3.php
         *
         * Computer Science 50
         * David J. Malan
         *
         * Implements a registration form for Frosh IMs.  Reports registration
         * via email.  Redirects user to froshims-3.php upon error.
         */
    
        // require PHPMailer
        require("PHPMailer/class.phpmailer.php");
    
        // validate submission
        if (!empty($_POST["name"]) && !empty($_POST["gender"]) && !empty($_POST["dorm"]))
        {
            // instantiate mailer
            $mail = new PHPMailer();
    
            // use SMTP
            $mail->IsSMTP();
            $mail->Host = "smtp.fas.harvard.edu";
    
            // set From:
            $mail->SetFrom("jharvard@cs50.net");
    
            // set To:
            $mail->AddAddress("jharvard@cs50.net");
    
            // set Subject:
            $mail->Subject = "registration";
    
            // set body
            $mail->Body =
                "This person just registered:\n\n" .
                "Name: " . $_POST["name"] . "\n" .
                "Captain: " . $_POST["captain"] . "\n" .
                "Gender: " . $_POST["gender"] . "\n" .
                "Dorm: " . $_POST["dorm"];
    
            // send mail
            if ($mail->Send() == false)
            {
                die($mail->ErrInfo);
            }
        }
        else
        {
            header("Location: http://localhost/src9m/froshims/froshims-3.php");
            exit;
        }
    ?>
    
    <!DOCTYPE html>
    
    <html>
        <head>
            <title>Frosh IMs</title>
        </head>
        <body>
            You are registered!  (Really.)
        </body>
    </html>
    
  • The require function in PHP is similar to the #include directive in C. First, we check that the user’s inputs are not empty using the appropriately named function empty. If they aren’t, we begin using a library called PHPMailer to create and send an e-mail. To use this library, we create a new object of type PHPMailer named $mail and we call the IsSMTP method of that object by writing $mail->IsSMTP(). We set the mail server to be smtp.fas.harvard.edu and call a few more methods to set the to and from addresses as well as the subject and body. We know the names of these methods simply by reading the documentation for PHPMailer. To construct a body for our message, we use the dot operator (.) to concatenate the user’s inputs into one long string. Finally, we call the Send method and voila, we have just registered a user for freshman intramurals!

  • One interesting implication of this is that it’s pretty easy to send e-mails from any e-mail address to any e-mail address. Be wary!

Crash Course in PHP

  • PHP is a programming language which thankfully is quite accessible since its syntax is very similar to that of C.

  • There’s no main function in PHP.

  • Conditions, Boolean expressions, switches, and loops all have the same syntax in PHP as they do in C. Switches in PHP have the additional capability to use strings as the variables that define cases.

  • In PHP, variable names begin with $. To declare an array of numbers, you can write the following:

    $numbers = [4, 8, 15, 16, 23, 42];
    
  • Interestingly, there’s no explicit size or type associated with the variable $numbers. C is a so-called strongly typed language in that it requires you to declare an explicit type for every variable. PHP is loosely typed. You don’t need to inform PHP in advance as to what data type you’re going to fill a variable with. Consequently, functions can also return multiple types.

  • One extra loop type available in PHP is the foreach loop:

    foreach ($numbers as $number)
    {
        // do this with $number
    }
    
  • In PHP, there’s another type of array called an associative array whose indices can be strings, objects, etc.:

    $quote = ["symbol" => "FB", "price" => "49.26"];
    
  • Under the hood, associative arrays are implemented as hash tables. Performance-wise, they’re slower to access and require more memory than the numerically indexed arrays we worked with in C, but they’re very convenient to use!

  • So far, we’ve used the superglobal variables $_GET and $_POST which store input from forms. In addition to these superglobal variables, there are also the following:

    • $_SERVER

    • $_COOKIE

    • $_SESSION

Model-view-controller

  • As you begin to design web applications, you’ll want to think about how to organize your code. One paradigm for organizing code is called Model-view-controller (MVC). The View encapsulates the aesthetics of the website. The Model handles interactions with the database. The Controller handles user requests, passing data to and from the Model and View as needed.

  • Let’s try to create a course website for CS50 using the MVC framework. In version 0, the pages are well organized into separate directories, but there are a lot of files with very similar code.

  • To see how we might abstract away some of the logic, let’s jump ahead to version 5:

    <?php require("../includes/helpers.php"); ?>
    
    <?php render("header", ["title" => "CS50"]); ?>
    
    <ul>
        <li><a href="lectures.php">Lectures</a></li>
        <li><a href="http://cdn.cs50.net/2013/fall/lectures/0/w/syllabus/syllabus.html">Syllabus</a></li>
    </ul>
    
    <?php render("footer"); ?>
    
  • Now the header and footer are being automatically generated by the function render. This is better design because we can change the header and footer of all the pages within our site just by changing a few lines of code.

Teaser

  • Soon we’ll dive into the world of databases and even implement our own e-trading website!