Day 4

Questions from Yesterday00:00

Suppose that a company does not want its employees visiting Facebook while at work, and so it blocks outbound Internet access to the site. Technically speaking, how might the company be implementing that restriction? How might a clever troublesome employee potentially circumvent the block?
- The company might block any DNS requests for www.facebook.com, but an employee can just visit the IP address of Facebook directly if they know it, or change their DNS server in Network Preferences to something custom, like Google’s easy-to-remember 8.8.8.8. But now Google knows every site you visit, since you’re asking it for DNS information …
- The default ones that were there already are configured by DHCP, when you first connect to a network.
- The company also just block packets to any IP address that belonged to Facebook, or even look inside a packet, since HTTP requests include a Host: field, and block packets they don’t like.
- A proxy is just some server that we might send all our traffic to, and then it forwards it along for us. Routers are proxies too. Another type of proxy is a VPN (virtual private network), which we can use to go around website blocking, since it has a completely encrypted connection. So our data, including all the headers and packets, are secure, up until it reaches our VPN, after which it is decrypted and forwarded along to its actual destination. This keeps anything in between us and our VPN from reading our traffic.
  - We can use this to avoid people in public locations from seeing our data on a public network, or firewalls in other countries that might block certain websites.
- The cost of this, though, is latency, or time between requests and responses, since we need to send our traffic to another server first.
- Tor is a similar software where all the computers running it form a network, and routes data randomly among them, so it would be extremely difficult to trace the origin, or even destination, of data on the network.
  - Not too long ago there was a bomb threat made on campus by email, but the student, though he was using Tor, was among only a few people on campus connected to the network, which made him easier to find.
When should GET be used in an HTTP request?
- Sometimes, when we submit a query to a webpage, we do want the input to be in the URL and saved, like for images of cats, so we can bookmark it or link to it more easily, instead of having to type it in every time.
When should POST be used in an HTTP request?
- POST is for data to remain private, like passwords or credit card numbers, or also when sending large files that don’t fit in the URL of a GET request.
What’s the difference between a char and a short (both have 16 bits) and similarly between int/float and long/double?
- A char has no negative sign, and is meant to store characters, while a short has a sign bit in its leftmost bit, and used to store integers that can fit in the 15 bits left.
- An int is an integer, with no decimal place, while a float is a real number which does.
- A long is an integer with more (twice as many, to be precise) bits, and a double is a float that also has twice as many bits.
- We have these data types because we want the flexibility of being able to choose between saving space and not having as many empty bits, or having larger numbers or more precision. With thousands or millions of numbers, that difference is much greater.
Is the 2011 Citibank hack related to the dangers of using GET requests when they should be structured as POST?
- David had trouble finding sources that detailed the hack, but the gist is as follows:
- The bank’s website, at the time, used to take you to a URL that looked something like this, after you logged in:
  http://www.citibank.com/balance?account=123
- But now all we would need to do is to change the URL to any account number we want, since the website didn’t check whether you were actually logged into that account, as it should have. Sometimes really simple programming errors can lead to really large security holes.
Why do companies have multiple IP addresses? For example, why did Google show ten or so IP addresses opposed to just one?
- Google has multiple servers, for redundancy and latency, and having all those IP addresses just allows for any of them to be contacted. Facebook only has one IP address, but all of their traffic might just be automatically balanced by that one server to others.
Can we showcase some of the Scratch projects? I’m curious what my classmates did! Thanks!
- We have a gallery here, if you’re interested!
I understand why the Patriot’s failure happened. However, how did they fix it? Did they change the time unit from 1/10 of second to something with a terminary binary expansion, e.g. 1/4 of second?
- David looked around and there’s not much information, but it seems that though there was a function that convert clock time to floats relatively precisely, it was only being used some of the time, so operations on imprecise and precise values created more errors than there otherwise would be.
IP addresses only correspond to domain names, or something like "www.citibank.com", or "cs50.harvard.edu".

HTML533:00

HTML5 is the fifth version of a language used to markup webpages, and we’ll manually connect to Google from the command line like last time:

% telnet www.google.com 80
Trying 173.194.219.99...
Connected to www.google.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.google.com

We get a lot of code as text, but if we visit the page in the browser, we’ll see the usual sparse text, buttons, input boxes, images, and links:
We can right-click a blank space, and click Inspect Element, to get this feature:
Today we’ll look at the Elements tab, and all the code on the left is just a pretty-printed version of the code that we saw when we ran our telnet program.
We notice that the code is now colored, by some pattern, and not just black and white and indented nicely with triangles for blocks we can expand and collapse.
The version we got earlier was all black and white with no spacing, to save on bandwidth (and therefore costs) and time, but the computer can read it just the same. The colors and indentation are just for us human.
If we start looking at the actual code, now, we see that the first word in color is something familiar, <html:
- We’ll ignore that first gray line for now, and look at the second line. The < means that we’re starting a new tag, which is just something that tells the browser that there’s something special ahead. In this case, <html> (the > is further to the right, and we can ignore everything else within for now) tells the computer that we’re starting a new HTML document.
- The next purple tags we see are <head> and </head>, which is telling the browser that whatever is in between them should go in the head of the webpage. You can think of the head as the stuff at the top, like the title, that blue "g" icon, and other stuff that’s not displayed. The next chunk, <body>, is everything that you see in the actual page.
So HTML isn’t really a programming language in that it has anything fancy like variables or loops, but rather a markup language that describes pieces of a webpage.
Let’s look at a really simple webpage:
```
 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title>hello, world</title>
 6    </head>
 7    <body>
 8        hello, world
 9    </body>
10</html>
```
- All we needed to write this was Notepad or TextEdit.
- The first line is required by the HTML standard, and just says the document type is HTML.
- After that, we notice the <html> as usual, but now at the bottom we see </html>, which is telling the browser that the page has ended.
- We’ll start calling everything within an open (start) and close (end) tag an element.
- Notice how everything inside an element is indented, and that every tag that’s opened is closed.
- <title> and </title> are on the same line because hello, world was just one line, but it makes no difference to the computer.
- Also notice how tags are closed symmetrically, as in <title> is closed before <head>, since it was opened inside <head>.
We can draw this page as a tree:
- The shapes don’t really matter, and the root element is called the document, which we can think of as everything in the file.
- Then within the document, we have the html element, and now we see how the children of that are labeled head and body, matching what’s in the source code.
- The head element has the title element, with the text hello, world inside, and since that’s actual text, we’ve made it an oval rather than a rectangle.
We’ll save this file as hello.html to the Desktop, and open it:
- Notice the title is in the top of the tab, and our text is in the body.

We can add tabs to make our text bold:

 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title>hello, world</title>
 6    </head>
 7    <body>
 8        <b>hello, world</b>
 9    </body>
10</html>

(The text is bigger because David zoomed in more in the browser, but it’s still the same actual size.)

If we remove the title, we see this:

 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title></title>
 6    </head>
 7    <body>
 8        <b>hello, world</b>
 9    </body>
10</html>

So Chrome chooses to display the name of the file, instead of nothing, but that was a decision made by the browser and not us, since we didn’t specify anything.

We can make a link, too:
```
 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title></title>
 6    </head>
 7    <body>
 8        hello, <a href="http://www.disney.com/">Disney World</a>
 9    </body>
10</html>
```
- Every time we’ve made a change, we had to save the file from TextEdit, and refresh in Chrome.
- A link has historically been called an anchor, hence the <a> tag, but we also pass in, as an attribute, the href, or hyper-reference (also an ancient word for the actual link) to the place we want the link to actually go.
- Notice the fundamental syntax of the href portion, where the attribute name is to the left of an equals sign, with the value to the right.
- If we left out the http:// part of the link, or even the ending tag of </a>, it would still work, but only because browsers automatically guess what it thinks is missing. Only if we use complete syntax can we be sure that the page will be displayed as we intended. (Well, browsers actually all display things slightly differently, and making web pages that look the same on all browsers (cross-browser compatibility) has actually been a pain for the longest time.)

What if we wanted to say something else a few lines down? Let’s try this:

 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title></title>
 6    </head>
 7    <body>
 8        hello, <a href="http://www.disney.com/">Disney World</a>
 9
10        come visit soon!!!!
11    </body>
12</html>

Hmm, not what we intended, but this makes sense because we have all those spaces and lines, and browsers will only place a maximum of one space between text.

Thankfully, we can manually add a line break:

 1<!DOCTYPE html>
 2
 3<html>
 4    <head>
 5        <title></title>
 6    </head>
 7    <body>
 8        hello, <a href="http://www.disney.com/">Disney World</a>
 9        <br/>
10        come visit soon!!!!
11    </body>
12</html>

And notice how we closed our <br>, line break, tag. This is because a line break is an element that can’t have anything inside, and is either there or not, so we use a slightly different syntax where the open and close tags are combined as you see there.

But this web page only lives on our laptop, and not online. Today, there are web-based web development environments, and that’s just a fancy way of saying there are web apps that you can code in, with all the setup and configuration done for you.
We’ll use one such service, Cloud9, where we’ve already signed up and created a workspace (the instructions for you for which are in Project 1):
We’ll press the Start editing button, and paste our code in:
- What we’ve done here is connect to one of Cloud9’s servers, somewhere in the cloud, and save our hello.html there, with the built-in text editor that we can access in our browser.
When we sign up for an account, we also get assigned a nice URL for our server, that looks something like this:
And if we go to it, we get a page that’s just a listing of our files on our server:
If we clicked hello.html, we see what we expected, and now it’s actually live on the internet:
We’ll go through a few of the examples that we’ve put up here.

First is paragraphs.html:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5paragraphs.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Displays some text.
11
12Demonstrates paragraphs.
13
14-->
15
16<html>
17    <head>
18        <title>paragraphs</title>
19    </head>
20    <body>
21        <p>
22            Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam in tincidunt augue. Duis imperdiet, justo ac iaculis rhoncus, erat elit dignissim mi, eu interdum velit sapien nec risus. Praesent ullamcorper nibh at volutpat aliquam. Nam sed aliquam risus. Nulla rutrum nunc augue, in varius lacus commodo in. Ut tincidunt nisi a convallis consequat. Fusce sed pulvinar nulla.
23        </p>
24        <p>
25            Ut tempus rutrum arcu eget condimentum. Morbi elit ipsum, gravida faucibus sodales quis, varius at mi. Suspendisse id viverra lectus. Etiam dignissim interdum felis quis faucibus. Integer et vestibulum eros, non malesuada felis. Pellentesque porttitor eleifend laoreet. Duis sit amet pellentesque nisi. Aenean ligula mauris, volutpat sed luctus in, consectetur id turpis. Phasellus mattis dui ac metus blandit volutpat. Donec lorem arcu, sollicitudin in risus a, imperdiet condimentum augue. Ut at facilisis mauris. Curabitur sagittis augue in dictum gravida. Integer sed sem sed justo tempus ultrices eu non magna. Phasellus semper eros erat, a posuere nisi auctor et. Praesent dignissim orci aliquam laoreet scelerisque.
26        </p>
27        <p>
28            Mauris eget erat arcu. Maecenas ac ante vel ipsum bibendum varius. Nunc tristique nulla eget tincidunt molestie. Morbi sed mauris eu lectus vehicula iaculis ac id lacus. Etiam sit amet magna massa. In pulvinar sapien ac mi ultrices, quis consequat nisl hendrerit. Aliquam pharetra nec sem non vehicula. In et risus leo. Ut tristique ornare nisl et lacinia.
29        </p>
30    </body>
31</html>

Turns out, on line 3, the <!-- marks a comment, which is something about the source code that isn’t displayed.
The new tag around all that text, <p>, marks paragraphs.

Turns out, to make a website mobile-friendly, we only need another line of code.
The original paragraphs.html looked like this:
The text is a bit small, so let’s open responsive.html:

"Responsive" is just a buzzword for websites that "respond" to different device sizes, and if we looked at the source code:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5responsive.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Displays some text.
11
12Demonstrates responsive layout.
13
14-->
15
16<html>
17    <head>
18        <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
19        <title>responsive</title>
20    </head>
21    <body>
22        <p>
23            Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam in tincidunt augue. Duis imperdiet, justo ac iaculis rhoncus, erat elit dignissim mi, eu interdum velit sapien nec risus. Praesent ullamcorper nibh at volutpat aliquam. Nam sed aliquam risus. Nulla rutrum nunc augue, in varius lacus commodo in. Ut tincidunt nisi a convallis consequat. Fusce sed pulvinar nulla.
24        </p>
25...

The new tag on line 18, <meta>, has an attribute called name which is viewport, kind of cryptic, but the next attribute, content, has something inside called width=device-width, initial-scale=1.0, which is just scaling the page to 100%, of the width of the device.
And all of these standard HTML values and descriptors can be found by looking in a book or on Google for what you want to do. There’s no way for anyone to know all of these tags, and most everyone looks up what they need as they need it. The important part is the syntax, with the tag name, an attribute, an =, and the values in the correct place and order.

Let’s look at open list.html:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5list.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Displays houses in the Quad.
11
12Demonstrates (unordered) lists.
13
14-->
15
16<html>
17    <head>
18        <title>list</title>
19    </head>
20    <body>
21        <ul>
22            <li>Cabot</li>
23            <li>Currier</li>
24            <li>Pforzheimer</li>
25        </ul>
26    </body>
27</html>

<ul> stands for unordered list, which means there will be bullet points for each <li>, or list item, and if we wanted a numbered list, we could use an <ol>, an ordered list.

Let’s look at image.html:

All we need to do is to upload or download a picture to our server, and refer to it in HTML like this:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5image.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10This is Grumpy Cat.
11
12Demonstrates images.
13
14-->
15
16<html>
17    <head>
18        <title>image</title>
19    </head>
20    <body>
21        <!-- http://knowyourmeme.com/memes/grumpy-cat -->
22        <img alt="Grumpy Cat" src="cat.jpg"/>
23    </body>
24</html>

Notice line 21 is a comment that reminds us where the image is from, and line 22 is a new tag, <img>, that has alternative text that reads Grumpy Cat in case the image doesn’t load, and refers to a src, source, file of cat.jpg.

We can actually replace that with any image URL on the internet, and borrow one from HBS’s home page:
The cat.jpg image was in the same directory where we had image.html, src4.
We can also make tables in HTML, that we can later style to make prettier:

CSS301:10:34

So let’s look at css-0.html:
CSS stands for Cascading Style Sheets, which is used to style webpages. It’s another language that you can use inside HTML, or in a separate file that your HTML page links to.
Let’s look at the source code:
```
 1<!DOCTYPE html>
 2
 3
15
16<html>
17    <head>
18        <title>css-0</title>
19    </head>
20    <body>
21        <div style="text-align: center;">
22            <div style="font-size: 36px; font-weight: bold;">
23                John Harvard
24            </div>
25            <div style="font-size: 24px;">
26                Welcome to my home page!
27            </div>
28            <div style="font-size: 12px;">
29                Copyright &#169; John Harvard
30            </div>
31        </div>
32    </body>
33</html>
```
- There are lots of divs here, like invisible rectangles on the screen that divides a page, and each of them have a style attribute.
- The values of each of the style attributes seem to have their own structure too. It’s not just a single word, but key-value pairs, where "key" is some property, and "value" is just the value of that property.
- For example, on line 21, the property called text-align is being set to a value of center. So it seems like the text inside this particular div is going to be centered. And we only know the particular names of these properties and values by looking it up on online references.
- And inside that division, we have another one, where we’ve set the font size to 36 pixels and the font weight. The <b> tag in HTML is a little outdated, and it’s better to factor out design into CSS anyways.
- On line 29, we have something else that’s new, ©, which is an HTML entity, or just a tiny bit of code that represents the copyright symbol.

In css-1.html, we’ve separated the CSS from the HTML somewhat:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5css-1.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Implements a (simple) home page for John Harvard.
11
12Demonstrates CSS selectors.
13
14-->
15
16<html>
17    <head>
18        <style>
19
20            body
21            {
22                text-align: center;
23            }
24
25            #top
26            {
27                font-size: 36px;
28                font-weight: bold;
29            }
30
31            #middle
32            {
33                font-size: 24px;
34            }
35
36            #bottom
37            {
38                font-size: 12px;
39            }
40
41        </style>
42        <title>css-1</title>
43    </head>
44    <body>
45        <div id="top">
46            John Harvard
47        </div>
48        <div id="middle">
49            Welcome to my home page!
50        </div>
51        <div id="bottom">
52            Copyright &#169; John Harvard
53        </div>
54    </body>
55</html>

We’ve replaced the style attribute with ones called id, and those IDs are referred to again in the <head> of the page.
This allows us to change the styling more easily, rather than going through and finding the correct spots within a webpage.
The <style> tag in the <head> is where we’ll have all the styles now, and the curly braces and semicolons are again standard syntax we can look up in online references.
#top, for example, means that everything inside should be applied to elements with an id of top.

But we can do even better, with css-2.html:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5css-2.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Implements a (simple) home page for John Harvard.
11
12Demonstrates external stylesheets.
13
14-->
15
16<html>
17    <head>
18        <link href="css-2.css" rel="stylesheet"/>
19        <title>css-2</title>
20    </head>
21    <body>
22        <div id="top">
23            John Harvard
24        </div>
25        <div id="middle">
26            Welcome to my home page!
27        </div>
28        <div id="bottom">
29            Copyright &#169; John Harvard
30        </div>
31    </body>
32</html>

Now all the stuff that was once in the <style> tag has been moved to a file that looks like css-2.css. So multiple webpages on your website can share the same style just by linking to that file, without you having to copy all the styles manually.

Finally, let’s look at search-0.html:

 1<!DOCTYPE html>
 2
 3<!--
 4
 5search-0.html
 6
 7David J. Malan
 8malan@harvard.edu
 9
10Demonstrates form submission.
11
12-->
13
14<html>
15    <head>
16        <title>CS50 Search</title>
17    </head>
18    <body>
19        <h1>CS50 Search</h1>
20        <form action="https://www.google.com/search" method="get">
21            <input name="q" type="text"/>
22            <br/>
23            <input type="submit" value="CS50 Search"/>
24        </form>
25    </body>
26</html>

We can type something like cats, and then we’ll get to the Google results page for cats. (This is sort of an example of front-end versus back-end, where we’ve implemented the form, the front-end, but all the processing is still done by Google, the back-end.)
So if we try to think about how this might be implemented, the page has an input box and a submit button, and when we click it, the browser is probably making a GET request.

The URL we were taken to was this:
So it seem that we used HTML to build a link that looked like that, and if we look back at the source code, we first see an <h1> tag, or heading 1 tag, in the body, for the big bold title of CS50 Search. Then we have a <form> tag that has an action value of the first, hardcoded part of the URL, and then we have an input box for text with the name q. Then the submit button that we click will tell our browser to build the URL that we ended up at, based on what was inside the input box.
We’ll quickly upload a new file, search.gif that says "HBS Search", and get something like this with search-4.html: