00:00:00,040 --> 00:00:01,240 DAVID MALAN: We're back and today we'll end with a little bit of web development. So based on everyone's experience, sounds like some folks have a little bit of experience with this. So we'll try to fill it in some gaps and go in any number of directions that you might like. But ultimately give you a sense of exactly the next layering on top of platform-as-a-service, infrastructure-as-a-service, platform-as-a-service, and now software-as-a-service. So a very common thing for engineers to use is an IDE, integrated development environment. This is a piece of software with which they write code. Now, technically speaking, as we'll see tomorrow, you don't need anything more than Notepad or TextEdit to actually write code. Because most every coding language these days is text-based. So all you need generally is a program with which to write it and then a program with which to run it. Cloud9 and in turn CS50 IDE, which is the web-based tool we use in the class I teach during the year, is a web-based programming environment that gives us all of the requisite tools that we might need for any number of languages. It also gives us a built-in web server, it will give us a built-in database server, although we won't use that today, and it gives us the ability to actually write code, all within the confines of a web browser. The alternative to this would be for me to write all of a whole bunch of instructions for how everyone here can install their own web server, whether it's Apache or IIS, on their own Mac or PC. And then we're going to run into different issues of Windows and Mac OS and all of the litany of headaches that might happen. With the web, now, we're just already up and running and ready to go. So what you'll see before you is a screen that looks like this. I've maximized mine, so you might just see a few more icons and buttons around the way. If you don't see this, call me over in just a moment. And let me orient you to what's going on here. So in the top left-hand corner is your so-called workspace or file browser. So as we start to create files, just like in Windows and Mac OS, they'll start to appear there on the left. In the top right, you'll see a window where we can actually write code ultimately. And I noticed some of you might have clicked and dragged and pulled something like this up, notice that the workspace windows can be moved like common software. But go ahead, and in the window above, see this little plus up here? Go ahead and click that. And you'll see at least two options, New File and New Terminal. Click New File and you'll get a little tab, like a typical editing program. And that's where we're going to write ultimately some code, but more on that in just a moment. In the bottom right-hand corner, where you ran update 50 a moment ago, update 50 is a command that we, for the course I teach, wrote that sort of automatically updates students workspaces to have the very latest versions of software. But even though this thing is positioned as CS50 IDE, at the end of the day, this is designed to be and is actually representative of a real world development environment complete with the ability to browse your file system or files and directories, complete with the ability to write code in multiple tabs, and complete with the ability to run your own server. Indeed, this blue window here is what's called a terminal window. And it's giving you command-line access-- text-based access to the underlying operating system, which in this case is Linux called Ubuntu 14.04, which is the version number, it's a very popular distribution of a free operating system called Ubuntu. Moreover, in this environment, you have what are called super user privileges. So you can write commands like sudo, which is substitute user do, and actually run commands as though you're administrator. We won't do that and try not to do that, lest we break things right now. But what's powerful about this is that it gives you and in turn students more generally the ability to do anything they want in a nice sort of sandbox environment that's nonetheless representative. Now you wouldn't typically use Cloud9 in this context with our free accounts to run your web server, because the way they achieve free accounts is that the service, when you're not using it, automatically turns off your container. What you have access to, here, is a container. So you couldn't run a website 24/7 on it, but you could if you actually paid for the sort of monthly plans as well. Thank you. So what are we now going to do with this? So we're not going to type this up here. Let me go ahead and propose the following. Go ahead and first do this. Back on your own computer's desktop, whether it's Mac OS or Windows, go ahead and open up TextEdit if you're on a Mac or open up Notepad.exe if you're on Windows. If you're not sure how to find those, just call me over or ask someone near you. But just the simplest stupidest text editing programs on both operating systems. And they're simple insofar as they really don't do all that much. Now just as an aside, odds are if you're using TextEdit on a Mac, it's actually not as simple as would be ideal. Odds are by default, you are seeing a window that looks like this with a whole font thing up top. This is bad, this is going to create not a simple text file for us. This is going to create RTF or rich rich text format, which is actually formatted text. So Mac users only, if you wouldn't mind going to TextEdit Preferences and then change the default format from Rich Text to Plain Text. Otherwise you'll be saving the file in the wrong file format. And we won't use this for very long it's meant to just be demonstrative of the relative simplicity with which we can start writing web pages. So what is a web page? A web page is written in a language called HTML, HyperText Markup Language. This is not a programming language. As you'll see tomorrow, you can't use HTML to make the computer do something. You can only use HTML to make the computer show something, really. And we'll see what distinguishes the two by tomorrow. And HTML is this markup language that works essentially as follows. This is perhaps the simplest web page that you could write. And by that I mean it has only the minimally required elements for a page. So this is a page that apparently is just going to tell the user hello, world. But how does it do that? Well, up top, there is just this declaration called the document type declaration that frankly you just kind of have to copy and paste. It's anomalous, it doesn't look like anything else exactly. So you just copy and paste this to the start of your document. And this signifies to the browser that's going to read this file top to bottom, left to right, hey browser, this is an HTML page written in version 5 of the language. That's just the symbol for 5, it's not as intuitive as would be ideal. But they're after-- what's nice about HTML, is everything follows a pattern. So you'll notice, and I'll point to this one as it's closer, notice the parallelism between open bracket, this is the angle bracket HTML close bracket, and then notice the opposite of it, so to speak here at the bottom. And by opposite I mean if this is an open tag or a start tag as we'll call it, this would be the close tag or the end tag here. Different only insofar as there are this forward slash in front of the same word HTML. Now, up here is a division of the page into two sections, the head of the page and the body of the page. These are the only two sections inside of a web page. The head is really just the top menu bar space and the body is 99% of the page, the so-called viewport. Big rectangular region where actual information is presented. The head of the page, open bracket head close bracket head, here, for now contains only one child element, a title element. And you might have guessed, this is how I specify what the title of my page should be in the menu bar or in the tab that you see on the page. Meanwhile, the body of the page is, again, the big rectangular region with which we're all familiar. The only words there are going to be apparently hello, world. So notice that this is kind of like a tree structure if you will. It's kind of like a family tree, whereby if the roots or the patriarch or matriarch of the family is this HTML element, everything else kind of descends from that. And we can see that. Can we see that? No, can't see that here. We can see that if I draw it. We can see that we have an HTML elements that I'll draw as the root of this family tree. It has two children so to speak. So it has the head on the left, because it came first. And the body on the right, because it came second. Then the head, meanwhile, has a title child. And then the title child has literally the words hello comma world. The body, now, has how many children? Seems like just one. So I'll draw it in quotes because it's different. Hello comma world. So I offer this just as kind of a different way of thinking about things. There's this whole hierarchy or tree structure to a web page. And that's why we have the nice pretty indentation. And how every tag that's opened is simultaneously closed. But notice the order, the first tag to be opened was HTML, ergo the last tag to be closed should be a HTML. There should be the symmetry built in. So what does this do for us? I'm going to go ahead and cut a corner and just copy this for a moment. And I'm going to go into TextEdit on my Mac. And I'll slow down in just a moment. But let me just show you the revelation here. Let me go ahead and save this. And on my Mac, I'm going to arbitrarily call it hello.HTML, where dot HTML is just a convention that humans use for web pages typically. And I'm going to click Save. Mac OS is going to be minorly annoying and yell at me because it wants me to save it as a text file because it's text. But no, I know better, it's HTML. And now let me go to my actual desktop, where I see this file here. Today's slides, a screenshot, and now hello.HTML. And if I double-click on this, I'm on the internet it would seem. But not quite, well let's see. Let me zoom in and what did I mean by the head of the page? Well, notice where the title is, it's up there in the tab. The body of the page happens to be identically worded, but that's the body of the page. But I'm not really on the internet yet. I have not made a website. I've made a web page, but it's not on the web. Where is it? It's on my desktop. So if I told you all-- hey everyone, go to File colon slash slash slash users jharvard desktop, none of you are going to be able to see that because it's only running on my computer. Now I could run a web server on my Mac or on your PC, then we could via it's private IP address, share the files here internally. But then, frankly, firewalls on my computer or yours might get in the way. So it's just generally annoying. So ironically, going way outside the bounds of Harvard and using the cloud, Cloud9 in this case, or CS50 IDE, which is just our customisation thereof, allows us to actually do everything publicly instead. So let me go ahead and pause here. And just as a proof of concept and to get everyone on the same page, go ahead and in Notepad or in TextEdit on a PC or Mac, respectively, go ahead and just copy and paste that example or whip it up yourself or some variant thereof. Save it and double-click it and make sure it's working for you. If you haven't already, just do File, Save. 00:10:45,940 --> 00:10:52,569 And save it there, too, as hello.HTML or some such file name. So even if you've never used an IDE before, you should find it fairly similar to most any editing program. And then there is one step that we'll need to do together to get this to work. Stop me if going too fast, but at this point in the story, if you're following along, you should have hello.HTML in a tab, saved as such. And on the top left, now, you should see hello.HTML having suddenly appeared in your file browser. So even though we are sort of paradoxically on the web right now and using a web-based app, the web server is not running-- our web server. Cloud9's web server is running, but not our personal web server. So to make this work, in the little terminal window, the blue window down below, go ahead and run a command, it's a slightly custom command called apache50. Recall that Apache is the very popular open-source software, for Linux especially, that runs many websites out there. In fact, it's still, I think, the most popular web server out there. Run apache50, which is our own customisation of it so that it's easier to start. And do apache50 space start space period. And what this command does is it says, hey Apache, hey web server, start yourself using the current directory. Dot means the current directory as the root of the web server. And that's important because the hello.HTML file we just created is in this folder. And insofar as that's the file we want to serve up to the internet, we need to go ahead and start the web server in this folder. So if you hit Enter, you should hopefully see a successful message followed by a URL, which you can hover over. And then click and Open. And you should see something that looks like my screen here. 00:12:42,780 --> 00:12:44,960 So these are the contents of the directory in which we started our web server. And so now, if you click on hello.HTML, you should see an error message. So what's-- yes? No? You see your file? OK. Let me-- does anyone else see forbidden? You see forbidden. All right. So let me do this. Let me go back here. 00:13:14,050 --> 00:13:15,350 Oh, I know what that-- OK. That's curious that you didn't see this, but let's go ahead and do this. Go ahead and do-- in your terminal window, the blue window, chmod for change mode plus or sorry-- a plus r for all plus read. So let everyone read this file, which is by design since we want it to be on the internet. And then hello.HTML. So again, we're mixing a graphical interface, which is the tabs and the menu options and the things with which we're all familiar and comfortable with a command-line interface, which is purely text-based, which is older school but more powerful and versatile and still very much in vogue for software development. Nothing should happen when you run that command unless you see an error message, in which case it's probably a typo. But then if you go back to that other window and click Reload now, you should see it, albeit somewhat small. OK. And what we've done, to be clear, is we've given read permission to all, to everyone in the world. And that now allows our file to be publicly accessible on the internet. So incredibly underwhelming. So let's now actually do something that's a bit more interesting in the following ways. But before I forge ahead, I know we have a few folks who just need to catch up. But just call me over during the next lull if you'd like. Any questions? So that we don't lose folks. Yeah? You are. OK. Where did this get you? STUDENT: [INAUDIBLE] DAVID MALAN: OK. So click Private. And then CS50 Harvard University. And then Create Workspace, the green button down below. And let me come back in a moment again, this will take a moment to create. All right. So this is all very underwhelming, but let's do a couple of different things here. Let me go ahead and go into my web page-- the editor, here. And suppose I want to do the following, let me grab some sample Latin text for just a moment. I'll get myself five paragraphs-- just three paragraphs is fine. So this is sort of nonsensical text and I'm going to ahead and paste it into my page like this. And you can see it's not wrapping because it's quite long. Can I wrap that? I won't worry about that for now. OK. So I have three long paragraphs. And I'm going to go ahead and reload my page. And it looks like one bigger paragraph. What is going on? Well, it turns out HTML is a markup language. And it's only going to do what you tell it to do. And what I have not told it to do is to give me line breaks or paragraph breaks. So if I actually want these three things to be paragraphs, it turns out in HTML there's a Paragraph tag or p tag to be succinct. And if I open the Paragraph tag like this, here. And I then close it, here. And then I open it again, here. Notice that the editor is actually trying to be a little too helpful. The moment it notices that I have opened a tag so to speak, it tries to be helpful by closing it for me. But of course, I need to now move that to be actually below the text so that I keep everything nicely hierarchical as intended. So now it's gotten more verbose, but HTML kind of like, if you remember many, many years ago, Word Perfect, before there was a way to make things bold and italics and WYSIWYG editors, What You See Is What You Get, you would actually have to be emphatic and say make this bold, make this underline, make this italics. So now that I've done that, if I save and reload, still pretty ugly but at least now I have the semblance of three actual paragraphs. So let me go ahead and rewind for just a moment, that's all fine and good. What about something like a list of my three favorite things? So I'm going to give myself an ordered list with ol. So you can tell the authors of HTML tried to be very succinct if cryptic sometimes. Then I want to list item and I will say my favorite things are-- I don't know, like movies, and then li TV, and then how about books, number three. So if I save that, how do you think this is going to render, based on other things you've seen on the internet already? STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, numerical list of things, let me reload. And indeed I get this automatically. Still pretty simple, but it's sort of adding the logical structure of numbering for me. If I didn't like that and I actually love reading equally with everything else, and reload, now I get the familiar bulleted list. And there's other ways to stylize this that we'll come back to in just a moment. This, of course, is not really what the web is great for-- static text, but the whole point of the web in HyperText Markup Language was to have hypertext, text that links elsewhere. So why don't I greet the user with my favorite website is say a href-- and what am I doing here? So notice the following. Actually, let me do this. I got ahead of myself-- www.harvard.edu. So all of us have been very acclimated by websites-- to any time you type a URL in certain websites, they automatically become clickable. And this is a feature of modern browsers or really of modern websites using language called JavaScript, more on that tomorrow. But this, of course, is not a link. Like nothing is actually happening here. I need to, again, tell the browser to be emphatic. So I want to say a for anchor, which means link. href, which means Hyper reference or what is the URL I want to link to. And now I want to go to harvard.edu. Close quote, close bracket, and now Harvard's website. So notice here, there's some fundamentally new syntax. The open bracket a says give me a link, href modifies the behavior of this tag. So it turns out that whereas a is a tag, href is an attribute. And an attribute just modifies the behavior of that tag. And in this case, and you would only know this from having been told by someone or by reading the documentation. href controls the destination of that hyperlink. And then notice, here, that I'm still closing the tag. So what is this going to look like visually on a web page once I reload? What am I going to see, literally? OK. Yeah, Harvard's website. My favorite website is Harvard's website. I'm not going to see the URL. But it's going to work-- reload. And now we get the old school but familiar blue underlined hyperlink. And if I scroll down-- if I hover over it, it's really small, especially on the screen. What do you see in the bottom left-hand corner of either screen? The actual destination. So this is kind of a juicy moment to mention a potential security concern. How many of you have ever been phished before? P-H-I-S-H-E-D, which means you've received an email purporting to be from usually like Paypal or eBay or your bank or something like that, that's actually just trying to phish for your username and password. Most everyone-- I mean it probably ends up in your spam folder these days, because the mail providers have gotten pretty good about this. But what feature of HTML do these phishers take advantage of? Well, they might actually do something like this. They could have this be-- let's call it HTTP bad guy website phish-- I'm trying to pick a domain that doesn't actually exist and creep everyone out. So bad guy phishing website-- OK. That probably doesn't exist, but don't visit it just in case. And if I reload, the text, of course, still seems the same. And if you hover over it, of course, you see the malicious destination. But more maliciously, as is the case in a lot of these phishing attacks, what if I do this? And deliberately type it to be dichotomous with the actual URL. And if I reload now, most of us are probably, given a link like this, going to click it. I mean most of us are not so uptight as to first hover over every link we're about to visit look at what it is in the bottom left-hand corner, and then proceed to click. So how do these phishing attacks work? What are they trying to get you to do? This is the how? What's the why? Go to their website. But why is that useful? So you log in. And in fact, you know what, let me try something. Let me go to-- let's say a bank that's in Harvard Square, BankofAmerica.com. Here's their website. Recall from earlier that you can go in Chrome to View, Developer, View Source, and I.E. and Firefox all have the same. Here is their HTML, it's a little cryptic, it's longer certainly than the pages we ran. That's crazy long. I'm just going to go ahead and copy it and paste it into Cloud9, it's all 1,390 lines of it. Save that in hello.HTML. And now reload the page. And notice my URL, I'm on Cloud c9users.io. So this is not certainly Bank of America's site. Let me go ahead and reload. Woo. I have re implemented Bank of America's website. But fortunately, you know, actually I'm stealing material from tomorrow now. You know it's secure, because look at that padlock icon there. But what does that mean? 00:22:35,530 --> 00:22:37,240 Notice my URL. Actually, ironically, it is secure. But my-- connection is secure, it's a c9users.io, not Bank of America. So what does this mean? 00:22:47,479 --> 00:22:49,770 No, it just means someone who works for Bank of America knows how to make icons that look like padlocks. I mean, it literally is simple as that. So there's this whole can of worms, I'll try to defer this till tomorrow because it's kind of a rabbit hole of interesting frightening topics. But it means nothing. And a phishing attack, really is someone, who spent what? All of 30 seconds copying someone else's website, trying to trick you into going to it. But the takeaway here isn't so much that specific example, but just how HTML works. And how you can so quickly, after like 5 minutes of the language, start to abuse it in a way that leverages what we call social engineering. These aren't technological attacks really as they are human social attacks. All right, so let's do something else. The internet, of course, is filled with cats. So if we Google for a cat, let's just grab-- that's cute, a little image like this. And now, notice, this file, by nature of being on the internet, has a URL. And so let me go ahead and if this were my own cats URL on my own web server and I want to embed it, let me go ahead and do this. Instead of my favorite website, my favorite cat is image source equals quote unquote that URL. And then, here, I'm going to do alt picture of a cat. An underappreciated feature of HTML that more website's should be sensitive to is alternative text for folks who are without sight or rely on screen readers for recitation of what a site is. Of course, if someone's blind and they can't see this image, it would be nice, certainly, if the computer could tell them what it is. And so simply by providing alternative text, a picture of a cat, you can go part way toward actually helping the user follow along with what's on your page. And if I now hit reload, here, my favorite cat is this thing here. And it turns out there are going to be mechanisms that allow us to scale this appropriately, although we could just open it in Photoshop or some other program and actually integrate it better into the website. But the point is, it's these very simple building blocks with which the entirety of the web is made. We can do other things like if for whatever reason I want to emphasize cat, we can style that and other text differently. And actually is there-- let's see. Let's start to do exactly that. Let me go ahead and grab some Latin text again, just so we have some actual text to play with. And grab two paragraphs of this again. This is just a popular way of generating sample text with which to play. Let me go ahead and paste it into the site, give myself a couple of paragraphs. Let me go ahead and fix this real fast. Close the paragraph. Whoops. Close the paragraph. Open the Paragraph. Delete this. 00:25:31,200 --> 00:25:32,242 OK. So here's where we're at now. We're back to just a couple of simple paragraphs. And let's suppose I want to change the font of this. As best I can tell, this is like Times New Roman in 14 points or something like that, whatever the website's default is. Well, turns out that back in the day, you would actually do something like color equals red or something like that. But the world eventually realized that mixing your data with metadata, specifically mixing your data with the presentation thereof, is generally bad practice. Certainly these days, because it makes it much harder to maintain your website and in turn your data long-term. It makes it much harder to change the aesthetics of your website over time, if you want to do a refresh, change the colors, change the iconography. And so the world has gotten a lot better at factoring out anything related to aesthetics to a separate language called not HTML, but CSS, cascading style sheets. And unfortunately, the syntax for this is a little different. But it still follows some simple patterns. If I go into the head of my web page and introduce a style tag, notice that I can do the following. I can specify that you know what, I want every paragraph in my page to have the following properties. And notice the new syntax, here, where we have curly braces like this. And I want to go ahead and say, you know what, make the font size-- oh, it's too big. Let's make it 11 point. And the color is just a little annoying. Let's go ahead and make it a nice shade of blue. So CSS, if you've kind of seen the entirety of its grammar so to speak, although there are some few other features, you have key-value pairs. The key is a word like color or font size, the value of those respectively is blue or 11 point. You would only know what the valid keys and values are by reading the documentation or taking a class or reading a book or whatnot. But if I now reload this, the effect is going to be make all paragraph tags content match these properties. So if I go to Hello World and click Reload, it's not very pretty but the text is indeed a little smaller and it's much bluer. And so I've achieved that effect. If instead I want to do something different, suppose that font family, I really don't like the old school serif approach, here, I want it to look a little more modern, a little fresher, I can change the family to be sans serif. And if you notice, this is the before, after, and now we've changed the font entirely. And we've just scratched the surface, here, of what we can do. But ultimately, the paradigm, now, is that we have the ability to separate the aesthetics of our page, the stylistic decisions from the content, but there's still a problem. So notice that we're still inside the same file, hello.HTML. And it turns out that even though yes, there is this style tag right here-- it turns out that's not best practice. Best practice would have us put it not up here, but instead use the tag as follows. A link tag, which annoyingly named is not a hyperlink, it just links to another resource. That resource, in this case, is a file called styles.CSS, which I'll stipulate is just a file containing a whole bunch of key-value pairs, a bunch of properties as we just saw. And then the relationship, here, is that it's a style sheet, which you just have to copy and paste. So I've removed from the file, apparently, per this yellow highlight, all of the properties and moved them into a separate file. Why might it be good to do that? STUDENT: [INAUDIBLE] DAVID MALAN: Or was that just a stretch? STUDENT: [INAUDIBLE] 00:29:07,220 --> 00:29:07,960 DAVID MALAN: OK. Exactly, you can factor out the different job's. So one of you can focus on the actual content that you want to display, the images, the text, the sort of business or the products you're trying to sell. And then someone else, who's perhaps more artistic, and better than you at that can actually do the refinements. What does the text look like? Where is it placed? Where to get all the aesthetics? So that makes sense. What else might be compelling about factoring out CSS from the style tag in the page to a separate file like this? Yeah, Vanessa. STUDENT: [INAUDIBLE] 00:29:45,347 --> 00:29:47,180 DAVID MALAN: Exactly, if you're-- right now, we're assuming naively we have just one page, hello.HTML. If you have 2 pages, 10 pages, 1,000 pages-- how else are you going to make all of the text blue or all of the text sans serif? You don't want to have to copy and paste that same block in every one of those files, if only because, God forbid, you want to change the aesthetics of the site tomorrow or in a year, now you have to go through and change 2 or 10 or 1,000 pages separately. Much better to factor that out, put it in one central place. What about more technically? If you were a browser, why might you prefer, too-- or even a user, why might you, too, prefer that the CSS be factored out into the separate file styles.CSS? 00:30:27,930 --> 00:30:28,430 Vanessa? 00:30:31,680 --> 00:30:32,370 Easier to read. A little bit. STUDENT: [INAUDIBLE] 00:30:42,339 --> 00:30:43,380 DAVID MALAN: That's fair. So the browser definitely won't care, because it's just going to read it as text, top to bottom, left to right, no matter how messy it is. And a user, yeah, that's nice. But I don't really care as the business owner about making my source code so to speak easier for humans to read. After all, I don't want them copying and pasting it even easier like Bank of America. Oh, OK. Sure, then that's fair. Factoring out into some central, more readable place. Why might a browser or an end user actually benefit from factoring out your CSS into a file like this? STUDENT: [INAUDIBLE] DAVID MALAN: Runs it one time or more specifically downloads it one time. If you have a user visiting this page and that page and that page, and the content is, of course, changing, after all that's why they're visiting different web pages, to see different things. But the stylization is global in the sense that all of those pages are including this same file. The upside of that is that especially if it's a really big file, the website browser only has to download it once and do what with it? It's a book. Copy. Or just to borrow the more technical term from earlier, to actually cache it for some amount of time. Now, caching, we know can work against you because if then the browser remembers it longer than you intend it might actually backfire. But at least if you're not changing the CSS that often or if the browser only caches it for a few minutes or hours, it can certainly help, especially on devices like this. What's frightening these days-- let's go ahead and do this real fast. If I go to Chrome and go to View Developer, but not Source, but Developer Tools. Most browsers these days have fairly arcane features like this built in, whereby if I click on the Network tab, I can actually see all of the HTTP requests that Chrome is about to make. So let me go somewhere-- whoops. Let's see, let's go somewhere like CNN by moving this up. Come on, come on. No, come on. OK. Let me move this all the way up. Let me go to CNN.com, Enter. And notice that just visiting CNN's web page-- dear God. OK, it's even worse than the last time I did this. How many HTTP requests did my browser just make in order to visit CNN.com? 00:33:01,640 --> 00:33:06,490 300-- atrociously 24 requests. Each of which-- oh, now it's up to 325. Each of which represents apparently a JPEG or PNG, which are image file formats, which is not unreasonable for like a new site. Some of these are script files, JavaScript, which we discussed. GIF, which is an image format. GIF, GIF, GIF, GIF. Script, Script, Script, Script. I mean-- my God, this is actually just remarkable. Wow! A lot of this frankly is advertising, too. Wow! All right. So, OK. Why is this bad? Never mind the content from CNN, but why is this technologically bad? Yeah. STUDENT: [INAUDIBLE] DAVID MALAN: Consumer bandwidth, and it's not just bandwidth because at the end of the day-- well, it's pretty big, it's 3.8 megabytes. But it turns out downloading one 3.8 megabyte image would probably reach me faster than 332, now, files that individually represent 3.8 megabytes. So it's not so much the bandwidth that's concerning, but it's another measure of sort of speed that users experience. And the words come up a couple times already. 00:34:19,966 --> 00:34:20,466 Late-- Latency. OK, good. OK, latency, which is different. Bandwidth-- often latency, especially if you've ever used like YouTube or Netflix or Hulu or the like, latency is that delay from when you visit a video and it takes like a second or five seconds to start playing. But then it looks beautiful because you have a good bandwidth but bad latency. By contrast, if you had good latency, the video might start streaming instantly but very suddenly get very pixelated or hang or buffer and that's because you have bad bandwidth. So latency describes the amount of time it takes. And for the browser to be doing this, what's happening? Well, recall from my simple example earlier of the cat, that an HTML file can, inside of it, reference other files or other URLs. A browser is defined upon reading a web page, to look inside of it, looking for all of the images, all of the movie files, audio files, anything that's mentioned inside of it, it's designed to go fetch all of those URLs as well one at a time or a few at a time. So the result is that CNN's web page, index.HTML as it might be called, itself mentions all of these other darn files. So we are inducing, by visiting CNN.com, 335 separate HTTP requests, some of which might be parallelized to be fair, but that's 300 requests. Each of which might have 200 milliseconds of latency and actually you can see how long each one of these takes. It's all between 0 milliseconds and 150 or 200. And imagine doing the same on your phone. So phones have even less bandwidth and it's often higher latency, so this is not necessarily a great user experience on the phone. So how might websites mitigate this concern? Like I feel like having my phone downloaded 335 files is not very good for business for making me want to come back. They have a mobile version. So you can detect with high probability if a user is coming from a mobile device. How is that? How would you know? 00:36:24,870 --> 00:36:25,640 Oh. STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, the browser should tell the website and indeed all browsers do. In fact, if I scroll back up in time and go to the very first request for www.CNN.com and I click on headers, this is fairly arcane information that would now be found inside of those virtual envelopes we were discussing this morning. And if I zoom in on this, notice here, these are exactly the headers so to speak, the text that my browser put inside of that virtual envelope. Odds are the first two lines look familiar based on my quick example earlier, when I manually typed it in. But what is minorly revealing about yourself? 00:37:13,600 --> 00:37:16,830 What's the takeaway here, in terms of privacy perhaps? Or curiosity? 00:37:20,770 --> 00:37:24,630 What else is my browser presumptuously telling CNN? Yeah, it's telling CNN that I own a Mac running version Mac OS 10.11.2 no less, which is oddly precise. What browser-- actually this is historically confusing. But what browser am I apparently using? Chrome. So that's mildly interesting information, especially since it turns out web development is still kind of a headache even all these years later after it's inception since so many of the manufacturers Google and Microsoft and Mozilla all can't re agree on all of the implementation details. So one of the frustrations in fact of web development is you might design something on Chrome, looks amazing on your Mac, looks awful or broken or somehow other on a PC or on Firefox or on safari or on IE or Edge or any-- I mean it's the biggest nightmare that libraries, more on those tomorrow, are helping with. Because you have other people figure out all these headaches and you build on top of their software. So that's mildly disconcerting. I also told CNN, already, my IP address, because that had to be on the envelope for the response to come back. So there's a decent amount of information being leaked here that CNN is inferring. But they can at least use that as a feature to realize oh, by way of a different user agent, that's what this header is called, they can infer if I'm on a mobile device or an Android device or an iPhone device, which can either be used for statistical purposes or to actually decide what kind of data to send or to request once you visit that first web page. All right. So with that said, what does this allow us to do ultimately when we have tags like this? So ultimately, we're going to be able to do things a little more efficiently by factoring out. This would be an example of best practices so to speak. But we've really only just scratched the surface of HTML. But I think it's the kind of language, frankly, where if that's the extent of our formal instruction, here is our-- here's a tag, here's what an attribute is, and here's how to find more information. What I thought I'd propose, so that we can get everyone back on the same page, give you an opportunity to get your hands a little dirty, is propose a few different problems. Almost all of which can be solved with Google or Bing or your favorite search engine. Or by me whispering in your ear or offering a little bit of tips. So I wanted to propose, I'll turn on some music, I'll wander about. And let me propose that you tackle one or two of these problems, trying to bring to bear the very basic conceptual ingredients we provided but also Google. So literally, it is totally acceptable to type in web page how make text bigger or CSS change color of text or the like totally fine. And indeed, you'll find that this is how many developers early on are self-taught. Once they understand the conceptual framework, it's so much easier to bootstrap yourself to understanding and applying yet new techniques. So I'll turn on some music, tackle one or two of these, I'll wander around fielding some questions. Yeah, Griff. STUDENT: When you're creating a separate style sheet, are you just creating a new tab at the top of the file? DAVID MALAN: Ah, yes. Correct. I did omit that part. But yes, you would simply in the IDE, create a new tab and new file. And essentially repeat the steps from before but call this something.CSS, styles.CSS. And do remember that chmod command again. This time for this filename, where again, you're changing the mode of the file to give all read permission of styles.CSS for instance. You only have to run that once. Because by default for security sake, when you create files, they are typically viewable only by you. All right, feel free to continue tinkering. But I thought I'd try to tie everything together for our final segment here. This is, of course, Google. And let me go ahead and see. Google is constantly changing some of this stuff-- even more from Google. Is this going to work? OK. Let me go ahead and do the following in Google. I'm going to go ahead and search for cats again. Come on, search for cat. Oh. Oh, here we go. Google.com and cats. Enter. And now notice what happens when I search for cats. So the URL changes from just www.google.com to slash search questionmark num equals 20 site equals something source equals something-- oh, q equals cats. And in fact, let me go ahead and just delete all of the visual clutter here. And whittle this URL down to just this canonical form, if you will. And hit Enter, and there's apparently no difference. Which is to say, it seems that we can distill Google's functionality down to its essence in terms of its URL and kind of tie this morning's conversation together with this afternoon's and now our HTML focus to figure out how this exactly works. So let's go ahead and use this tool that developers would super often use these days. Going to Developer Tools, down here going for instance, to the Network tab as I did before. I'm going to click Preserve Log. And I'm going to go ahead and hit Reload because I want to see exactly what happens when I visit or search for cats on Google. So a little more modestly that results in 57 HTTP requests, which at least is decently fewer than earlier. But let me look at the headers here. Rather, let me look at this top part. So this is the URL that I just requested and the request method is something we didn't really talk about explicitly before, but I used this keyword get. So get being the operative verb when I made that verbal request earlier for something of Google. Status code is 200 and this is a code you don't often see. But it turns out, when you visit websites there is a number that you perhaps annoyingly occasionally see. Yeah, 404, perhaps the biggest. 404 is what? Error more specifically, file not found. And indeed, there's this whole laundry list, HTTP status codes that the official list is-- not on Wikipedia-- is here. And let me scroll down, it turns out that you rarely hear about 200 because it means everything is OK. But, indeed, when you visit a web page by default, if all is well, you're getting back in the virtual envelope from the server, as, say, Sean did for me with the cat, a 200 message saying OK, here is the satisfaction of your request. Less common would be to 201, 202. But if I scroll down here to the 400s, 400s are bad. Literally, bad request, if the browser or client has malformed its request. Unauthorized would mean there's some kind of authentication required. 402, you don't really see payment required. Apparently future use has been ongoing for some time. Forbidden means, like some of you saw, like me, you had to chmod your files to make them world viewable. And then here's the famous 404 not found. So just years ago, people decided somewhat arbitrarily that 404 shall henceforth mean the file is not found on the server. And thus was born millions of mistakes that we all see later. Worse is-- well this one's kind of funny, gone. Like the page you're looking for is gone, although you won't typically see that. 500, internal server error, you might see if you're developing a web-based business and you or your engineering team is prone to mistakes. Sometimes this means there's a particularly bad problem with the code on the backend, with load balancers these days. Not uncommon is to occasionally see 503 service unavailable, which typically means the load balancer is responding to you but no back servers are actually available, they're overloaded or offline or broken for some reason. And then another juicy one, that's worth noting now because it ties together our chat earlier, is moved permanently. This web page has moved permanently. Or found, the implication of which is that it actually wants you to go elsewhere. So let me do this. Let me go to this little text-based program, Telnet, and let me try going to Google.com port 80. Let me get their home page using version 1.1 of HTTP, which I'm pretending to speak. And the host shall be Google.com. Enter. Notice what actually comes back if I visit not www.Google.com, which I did do earlier but just Google.com. A very, very small web page that literally says the document has moved. And most humans don't even see that because their browsers understand HTTP, specifically status codes. And what's the status code apparently that came back here? It's 301, moved permanently. What does that mean? Well, if you look a little lower, it includes a location header, which we've not seen before even though we've been experiencing it with some of today's demos. Google wants me to go to which URL instead? Yeah, so they apparently want me to go to www instead. So let me try that. Let me instead go to-- sorry, let me do this again. To www Enter, get slash HTTP 1.1 host www.Google.com Enter. And now it actually comes back. So Google wants me to go to www.Google.com. But both work apparently, it's not like the site is unresponsive like Harvard's was years ago as I mentioned. So why does Google want me to standardize on www.Google.com? STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, force me to go to one central place so that they don't have to maintain two websites. Sure, why else? 00:47:04,490 --> 00:47:11,350 Why did I tell you to go to CS50.io as opposed to www.CS50.io? Or why do you go to bit.ly instead of www.bit.ly for URL shortening? It's a popular URL shortening service. Easier and more to the point there. Shorter, not to have to type it. And so indeed, it's just this human convention that we have long had these subdomain's so to speak of, www--- or not even subdomains, host names, www.something.com, it was all the more of a visual cue that the user should be going there. It's a web page, but it's slightly more verbose. And also for technical reasons, turns out there's these things called cookies-- more on those tomorrow. And cookies are actually scoped to the domain name in question. So it turns out, especially for bigger companies, if you want to have multiple web applications, all of them in something.com, you can give them each their own cookies based on what that initial host name is. So you want to force your user to go to some host name, not just your root domain name so that you actually have a bit more control over what they're actually receiving. But we started this conversation by looking at Google, here, whose queries ended up looking like this. So it turns out, you know what, I bet we could re implement Google. And we don't even have to do it with Cloud9, but I'll do it over here. You could do this on your desktop as well. Let me go ahead and whittle this down to be called my version of Google. And then down here, let me get rid of the fake Latin text. And then here, let me introduce a new technique, a form tag, the action of which-- we'll come back to that actually. Form tag that has an input whose type shall equal text. And whose name is going to be oh, say, q. And then another input, whose type is submit, whose value is use my version of Google. Close quote, close tag, Save. Excuse me, it's not complete yet. But let me reload this page. Interesting. It's not all that sexy, either. So let me add a logo here. My version of Google, which notice is using an H1 tag, heading one, which just means big and bold. Reload. OK. It's getting a little better. And frankly, Google 1999, it's not all that far off now from what Google was if you want to relive the 1990s there. All right, we need a little bit of color, but it's not all that dissimilar to what it is now ironically or remarkably. So let's just keep it here, we won't worry about aesthetics. I want to go ahead and search for cats. Whoops! Not cats, but cats. Huh. It doesn't work, but notice here, it did automatically append a question mark and a q equals cats as soon as I search. But I don't want the user to go to my page because I have no database, I have no search results. What could I do? It turns out that I could go in here and add an action whose value is https www.Google.com/search and just to be clear, I'm going to use a method of Get as opposed to something else, which we'll come back to called Post. Get, now, if I save this, reload my page. And type cats, notice the URL from which I go and to which I go. I seem to have re-implemented Google. I'm sort of-- I've implemented the front-end so to speak, Google has implemented a much harder backend. But notice, it even works for dogs. And notice it's pre-populating this field up here. So what's actually going on? Well, if we actually look at this HTTP request, let me go to search for cats again. But let me go ahead and open up that same developer toolbar that I showed earlier. And preserve the logs so we can see everything that happens. And now I'm going to click Use my version of Google. Let's see what actually goes through. Here, in Google's case, notice that I'm requesting this URL, q equals cats. Method is get 200, but where did that request come from? Whereas before it came from Google's own server. Well, their server was just a web page that I happened to visit at www.Google.com, so I filled it out and submitted it. But there's nothing stopping me from filling out or creating my own web form whose action, so to speak, goes elsewhere for fulfillment. That's not typically the common case but it does rather tie all of these various ingredients together. So what does this mean now for the alternatives? There's Get and then lastly, there's this thing called Post. When might you not want to use Get, and why, based only on the example we've seen here. 00:51:58,176 --> 00:51:59,754 STUDENT: [INAUDIBLE] DAVID MALAN: OK. So if you just want one result, you might not use Get. You could still use Get for that. And in fact, actually we can-- let's see, if we can distill this. If we go to google.com, you've reminded me of the I'm feeling lucky button. If you do search, click on I'm feeling lucky. Let me inspect this so we can sniff our own network traffic so to speak. Click I'm feeling lucky. Let's see what's slightly different. We went to the top hit, which apparently is Wikipedia. And notice the one thing that's different here besides all of this nonsense is site source-- is it this? No. What is the I'm feeling lucky icon? Ah. Let's see if we can figure this out real fast. Button L. OK, I think it might be this. Let me secretly make a one quick change-- or not so secretly, while everyone watches. Input, type equals hidden, value equals 1. And then the name of that field shall equal this. So I'm guessing this could backfire. I'm guessing that the means by which Google has implemented their I'm feeling lucky button is so long as the web form submits a name of button i or button 1 with a value of 1. My hunch is that's going to take us immediately to the first search result. So let's try that. Let me go back to my version of this. Click Reload, to refresh the form and search this time for giraffes. Whoops, that's not giraffes. Giraffes, just to prove that this is different. Enter. Yeah, this is the top hit for giraffes on google.com. So what am I playing with here? I'm playing with HTTP parameters. And indeed, this is what ultimately drives the web. There's all the silly aesthetics of HTML, of hyperlinks and images and bold facing and blue text and sans serif and the like. But none of that is functional per se, it's all aesthetic, a markup. With HTTP though, the protocol that web browsers and server's speak, we have the ability to pass input from browser, and in turn human, to server. And the simplest of schemes is used to do that. Literally, what a browser does to send input is to send a value like-- a name like q with a value like cat. And if there is a second argument or input to provide, there is literally an ampersand and then you say and-- what was it? btnI equals 1. And then another ampersand if you have a third key-value pair or HTTP parameter, and that is how the entire web works with web forms. If you are using Facebook Messenger and you fill out the little box to send a message and hit Enter, what are you doing? You are sending the equivalent of a message like this, that probably has not cats for q, but whatever message you typed in. And it's probably not called q for query, it's probably called m for message or whatever Facebook has decided to call it. But these very simple basic paradigms are the entirety of what drive the web and these basic interactions. But sometimes you might not want to see this in the URL. So to come back to the earlier question, when might you not want to use Get, whereby Get means that this kind of stuff ends up in the URL? Indeed, I keep showing in the browser, precisely the results of clicking Submit. When would you not want your query or your human input to appear in the URL, would you say? Yeah. Typing in a password. Don't want it to be there, why, to be clear? STUDENT: [INAUDIBLE] 00:55:39,394 --> 00:55:42,310 DAVID MALAN: But I mean suppose it was just you in from your computer. Let me push back harder, still bad, why? 00:55:47,604 --> 00:55:49,000 STUDENT: [INAUDIBLE] DAVID MALAN: Yeah, very reasonable, right? Especially since these days there's always this notion of auto complete or a history in your browser. And when it gets stored in the URL it's being saved for your later convenience presumably or for your sibling or significant other or parents or kids or whoever, prying eyes as well. So you might not want to do that. What else besides passwords might be sensitive? 00:56:13,800 --> 00:56:14,750 Usernames. Usernames. Yeah, a little less sensitive but maybe not something you want to reveal for your accounts. Credit card information, I mean anything that mildly personal, you probably don't want it appearing in the URL insofar as it ends up in the browser's cache and history and autocomplete and so forth. In fact, why do some people use-- well, let's be more technical. Technically, why do people use incognito mode or private mode browsing? Well, what does it not do? No caches. Yeah, no caching, no cookies and more on cookies tomorrow. So by using incognito mode or private mode in your browser, among the things it does is it doesn't save the autocomplete and the history. So if you did have the misfortune of going to a website that foolishly is putting your credit card and email address and user name and such in the URL such that it gets saved, at least in cognito mode, when you close it, throws that information away. Now in practice, most people use it not for that defense because this is not a common problem, but really because they don't want websites they're visiting and other such things to end up at all in their history as well. But the same would go true for credit card numbers or the like. So I can think of at least a few other things like YouTube, like you upload videos to YouTube. How in the world do you put a video in the URL? Or how do you put a picture in a URL? All right, so there is kind of some non-obvious problems to with non-obvious solutions that arise if we're using Get and in turn the URL alone. And so that too might kind of bite us. And indeed, there's no official limit on the length of URLs. But there is sort of realistic limit of like 1,000 characters, 2,000 characters, unfortunately it totally differs by browser. So thankfully, there exists an alternative to Get called not Get but Post that functionally does the same kind of thing, it still has key-values and pairs and equal signs and ampersands. But it doesn't put it in the URL, it instead so to speak puts it deeper into the virtual envelope. So it's still there, but it's not exposed to the browser in this way. And it lets us upload gigabyte video files, megabyte image files, not to mention our email addresses and passwords and credit card numbers and the like. And to enable that you simply do this, change it to Post. Unfortunately the server has to support it. So if I now search for cats, again, using Post, unfortunately Google just doesn't cooperate because for whatever technical or policy reason they don't want to support Post. And it's probably just because it's an unnecessary feature on their servers. And in the case of search results, they probably want people linking to them for advertising reasons, for deep linking reasons. So in fact, one of the most compelling reasons to use Get is when you do want stuff to end up in the URL. If I highlight this URL and paste it into an email, I want you to see the exact same search results. And there's nothing more annoying-- and this is common to big business, who have pretty shoddy websites, where there's no state maintained in the URL. So if you're shopping on some website trying to buy something and the state, the unique product identifier is not in the URL simply because the website has been using Post or some other mechanism, there's no way to deep link to that page that you're seeing. And so the effect of this is you copy something, you paste it into an email or instant message, someone clicks on it, I just see the home page or I'm not seeing what you're seeing. And this is all too common with big e-commerce sites, not Amazon, but sort of less trendy ones that haven't really given this much thought or care. Woo! That was a lot. Any questions? 00:59:47,931 --> 00:59:48,430 No? All right. So where are we going with all of this? So today was meant to start-- well, we started pretty low so to speak with binary and we sort of built our way back up. And then we reset ourselves looking at how the web works and incarnation of all of these ideas that have been-- on top of which we've built to get to this point. Where do we go tomorrow? So in the morning, we'll take a look at the general notion of privacy, security, and as it relates to society. Drawing upon a couple of current events from a few months ago like the FBI issues with Apple and how that played out and what the underlying questions were, the general idea of encryption and how that works and what it's good for and what risks it still presents to you. A case study like something like Dropbox or SkyDrive or one drive or any number of incarnations of web-based storage for consumers. Looking then a little bit about programming. Talking about some of the basic building blocks, some of the most common data structures and algorithms that people use that you might typically learn in an instructor or computer science course. And the kinds of ingredients you bring in a design meeting when you're trying to design some piece of software thinking through, say on a white board, how is it we're going to build this or how are we going to build this efficiently, how are we going to do our analytics in a more efficient way than just throwing it all in a database and just searching in an obvious way, actually engineering non-obvious solutions. Technology stacks, which is just kind of a generic way of describing types of software and work flows and design patterns so to speak, with which you can build something sort of a general methodologies. And then finally, web programming, looking not at HTML and CSS and the aesthetics of today's focus, but rather on some of the technologies that underlie the most dynamic of websites. For instance, when you click and drag on Google Maps and you suddenly see an infinite number of squares from around the world, tiles that represent that map-- how is something like that working? When you're using Facebook Messenger and all of a sudden a new message pops up without the whole web page reloading or without you needing to reload the page, how does all of that work? It's ultimately driven by this language called JavaScript, which has no relationship to Java, another programming language altogether. But as part of our technology stacks discussion tomorrow, we'll start sort of a laundry list of yet more words of jargon and product names and the like just so that you're not necessarily comfy with all of those topics, but at least have seen them and can kind of roughly mentally categorize them all. So as to look more effectively up that additional information. Any questions? All right. Well, why don't we officially wrap here. I'll turn on some music, I'll spend some one-on-one time. If you have any questions you want me to rewind with any of today's topics or hands on material, I believe there are some snacks and drinks outside. There's a reception officially til 6, I'll stick around for any questions. But otherwise we'll see you in the morning and I'll send around an e-mail tonight with maybe some URLs from today that you might want to look at to play. So enjoy.