[MUSIC PLAYING] DAVID MALAN: All right. This is CS50. And this is already Week 8. But this is a CS50 bingo board from one of your classmates at Yale, Shoshannah, kindly sent this to us. 

And she's apparently been taking close notice of certain expressions that I apparently tend to say quite a bit. Some of which I'm aware of, but not all of them. And the idea here as she described it is that if and when I say any of these expressions on the screen, you can draw a line through that box. And if you get five in a row, you win a fabulous prize. 

It seems only fair, then, if we maybe give away some cookies today if and when I actually do say five such things in a row. Perhaps it'll be all the more motivation to keep a rapt ear against everything we're talking about today. So if and when that happens, feel free to just yell out bingo. And then please see Carter during the break or after class for adjudication. 

All right. So today, ultimately though, Week 8 is about the internet and, in turn, how it works and, in fact, how we can start building software on top of it. So up until now, of course, we've experimented with Scratch, spent quite a bit of time with C, only really spent a week plus so far on Python, and about the same on SQL. But ultimately, we're going to come full circle next week and tie all of those languages together. 

But we're going to do it in the context of the web. And in fact to do that, we're going to introduce three different languages today, but only one of which is a proper programming language. The other two are more about presentation, markup languages, so to speak. 

And those languages are HTML and CSS, commonly used in conjunction. Some of you might have done this middle school, high school even, if you ever made a personal website of sorts. And JavaScript, a programming language that is very commonly used in the context of browsers to make interfaces that are all the more interactive. But it can also be used server side. 

And what you'll find that is our goal this week, like last week, like two weeks ago is really to teach you ultimately how to program, how to program procedurally and also with elements of what we'll call functional programming, object-oriented programming, concepts that you'll explore more if you pursue more programming or higher level classes. But at the end of the day, you will exit this class having learned how to program, particularly in a context that's very much in vogue nowadays, be it for the web or be it for mobile devices. And all of the ideas thus far will be applicable as we now begin to build on top of the internet. 

So what is it? So back in the late '60s and 1970s, it wasn't much of anything. This is an early diagram depicting a few access points on the West Coast of the United States, which represents what was originally called ARPANET. And this was a project from the US Department of Defense to begin to internetwork computers by enabling them to exchange data using what's now known as packets, packets of information back and forth. 

It wasn't too long before East Coast was eventually connected through MIT, Harvard, and others. And nowadays, fast forward to present day, just a few decades later, everything, it would seem, is somehow interconnected, either with wires or wirelessly. But how do you actually get data from any of these points to any of these other points or all of the points that now exist? Well, let me stipulate, for today's purposes, that the world nowadays is filled with routers, simply computers, servers whose purpose in life is to route information from left to right, top to bottom, geographically, so to speak, to just get data from point A to point B. 

But typically, you're not going to have a direct connection between points A and B. You might have C, D, E. In other words, you might have many different servers between you and someone else. So if you have a friend at Stanford University and you simply send them an email, well, odds are that email is going to be put inside what we're soon going to call a packet. And that packet might actually pass through the hands, so to speak, of any number of routers, typically more than one or two, but typically fewer than 30 such routers. 

And it's up to the IT administrators of the world to figure out how to route data between these servers. And we have software nowadays that dynamically figures out the best path. It's not necessarily a straight line, as it might be in the world of mathematics. But hopefully, it's the fastest way to get data from point A to point B. 

So the teaching fellows, thanks to Zoom, kindly put together in years past a demonstration of this whereby each of the teaching fellows or TAs that you see on the screen here consider representing a router, that is a device on the internet that its purpose in life is to get data, North, South. East, or West, between two points ultimately. And if we assume that Phyllis, for instance, wants to send a packet of information to Brian up here at top left, from bottom right, it turns out that by design the internet can send that data over any number of routes. It can go up and to the left. It can go left and then up. It can double back a little bit. 

Again, it's not necessarily a straight line. And this is a feature, not a bug. The intent of the internet early on was to be able to route around downed servers. So if one router is overwhelmed, or if one router is offline, the internet can still adapt dynamically and just route it some other direction. 

So here, for instance, is one representative route that our packets might take. Thanks to the team. 

[CLASSICAL MUSIC PLAYING] 

So my thanks to the team. And if you've ever used Zoom before, you know that you don't often see exactly the same layout that someone else sees. So it took us forever to actually get that right. Because no one actually knew to whom they were necessarily passing it. 

But if all of those TFs and TAs represent routers, well, what is it they were handing? What is it that Phyllis wanted to send to Brian? Well, I've called it generically a packet. And a packet is a generic term for some amount of information. 

But it's kind of analogous to an envelope in the real world. If you're still in the habit of sending letters or postal mail, you typically put your information inside of an envelope such as this. And then you hand it off to the mail carrier, or you drop it into the mail box. And then humans, in the case of the Postal Service, actually get it from point A to point B. 

But odds are it goes through different cities, different countries, even. So you can think of that as roughly analogous to these things called routers. But the technical term for what it is the TFs were just doing is they were implementing a protocol that we know as TCP/IP. 

And this is actually probably a pair of acronyms that you've probably seen, maybe on your Mac, PC, or phone, even if you haven't really thought much about it. But this is actually a pair of protocols, two protocols that the internet generally uses nowadays and has for some time to get data from point A to point B. And let's consider each of these halves so you have a sense of what it is the internet is doing when you do send an email or do anything else. 

Well, first, IP stands for internet protocol. And you've probably even heard this in popular media, since a lot of humans are indeed familiar with this notion of IP. And they associate it typically with IP addresses, as you might. 

So I'll stipulate for today that every computer, every internet work device in the world has an IP address, an internet protocol address similar in spirit to buildings in the physical world. Here we are at 45 Quincy Street, Cambridge, Massachusetts, 02138, USA. That is a unique string, theoretically, that uniquely identifies this building. 

Similarly, in the world of computers, we use a simpler mechanism, just numbers of this format that uniquely represent computers. Now that's a bit of a white lie. Because there's actually a way to share IP addresses. And within your home, often within your dorm, or your apartment, you'll actually have what appears to be the same IP address as your roommates or family members. 

But for now, let's keep things simple and assume that every Mac, PC, and phone in the world has a unique IP address that's formatted like this, number dot number dot number dot number. Each of these number signs represents a value between 0 and 255. And even though we haven't played around with this kind of arithmetic in some time, if each of these placeholders is 0 through 255, how many bits are being used to represent each number? 

Think back to Week 0, Week 1. Yeah, so 8, in fact. 8 bits in total, or 1 byte. 

So IP addresses are generally 4 bytes or 32 bits. And the other math we kept doing early on is if you've got 4 bytes or 32 bits, that's a maximum of 2 to the 32nd power total number of values. How many IP addresses can you have, it would seem, maximally in the world? 

Enough. Actually, not enough would be a better answer nowadays. But roughly, 4 billion was the rough math that we typically did anytime 2 to the 32 was involved. 

But it turns out with all of the humans, and all of the devices, servers, clients, PCs, Macs, phones, and everything else, internet of things devices nowadays, even 4 billion is not quite enough. So the world is gradually in the process of transitioning from this format, which is technically IPv4, version 4, to IPv6. And in the world of IPv6, we've actually bumped things up from 32 bits to 128 bits, which is a crazy number of possible permutations, 2 to the 128. 

So you'll gradually see that over time. But those are a lot messier of a format because there's so much larger. So we'll use the more commonplace ones IPv4. 

Now just to get into the weeds briefly, this is some ASCII art. That is, someone wrote this up decades ago in a text file to represent the layout of one of these packets. So think of this as, like, the digital representation of this here envelope. 

And even though we won't get into the weeds of what this represents, up here you just have some values saying that this is byte 0. This is byte 10. This is byte 20. And this is byte 32, but 0 indexed. 

So that is to say that this is just kind of an artist's rendition of a grid of bits, top to bottom, left to right. And what's going to be interesting for us today is not most of these fields. There's a whole bunch of information that's encapsulated inside of any one of these packets. 

But we'll focus initially on these two, source address and destination address. Maybe the most important thing IP does is it standardizes what you put, so to speak, on the outside of these envelopes. It says that every computer is going to have a unique address of that form, something dot something dot something dot something. 

And so just like in the real world, if I want to send this packet from Phyllis to Brian, and suppose that Brian's IP address is a number, like, very simply 1.2.3.4, what Phyllis would do is put that IP address in the middle of this envelope, just like you would address a letter in the real world. But so that Brian could reply to her, if only to confirm receipt, she's also going to put in the top left of this envelope, virtually, her own IP address, which for the sake of discussion is maybe 5.6.7.8. In practice, they won't be as pretty as that. But it's the general idea. 

So you have a source address from which it's coming and a destination | to which it's going. 

And that's what IP does. It sort of standardizes, in addition to a bunch of other numbers and values that need to be in this envelope, too. It really just mandates that computers on the internet minimally provide a source address and a destination address so that the envelope can get from point A to point B. 

But that's not quite enough. Because it turns out, and if you saw the bloopers from the TFs' Zoom session there, you would see that it's very common not only for humans to physically drop an envelope like that, and frankly, even in the real world, for mail carriers to lose mail occasionally, undelivered to recipients. And so it turns out that IP alone is not enough to guarantee delivery because sometimes the packet just might not get to its destination. 

More technically, that might happen because the router is overwhelmed. It only has so much memory. It only has so fast a CPU. And if it's receiving way too many packets because so many people are on the internet at some moment in time, well, it might just kind of get overwhelmed and metaphorically drop certain packets in the sense that there's just not enough room in its memory to keep up with the traffic. So the effect for the sender is that the packet just doesn't get through. 

And so there's this other protocol, TCP, that humans typically use in conjunction with IP via their Macs, PCs, and phones that does a couple of other things for us. One, it guarantees delivery, or really "guarantees" delivery. And it does that actually by doing this. 

It does that by having Phyllis write on the outside of the envelope not just the source address and destination address, but also what we'll call a sequence number. So, for instance, this would be packet one of two that she might be sending to Brian. So maybe in, like, the memo field, she could write one of two. 

And then, if she happens to send a second packet to Brian, she might write similarly a source address and destination address. But she might write two out of two. Because now, logically, if Brian only gets one of these, that sequence number is enough information for him to know wait a minute, I need to ask Phyllis to resend number one, or maybe resend number two. If both of them don't get through, I mean, honestly, that's probably when Phyllis hits reload or resends the email. But in general, these sequence numbers help with guaranteeing delivery. 

But if Phyllis and Brian are each representing computers in this story, they can be doing different things. They can be doing email, chat, video conferencing, direct messaging, or any number of services on the internet nowadays. So TCP gives us one other feature, namely port numbers. 

Because when Brian receives that envelope, assuming he's indeed a computer, how does he know that what's inside of that envelope is indeed an email, versus a direct message, versus a little bit of video, versus sound, versus any other type of media. Ideally, the outside of the envelope would have a bit of a clue for him that indicates this is the type of data herein. Or more specifically, this is the program, really, that should open this envelope, the email program, the video conferencing program, or whatever else. 

So what Phyllis would typically do on the outside of this envelope lastly, in addition to the source address, destination address, and the memo field, the sequence number, she would also write a port number. And it turns out two of the most common port numbers in the world of TCP are these two, 80, which represents the web. That is to say, something called HTTP, more on that today, or HTTPS, which most everyone nowadays probably knows means secure, so it's some kind of secure version of HTTP. And that number happens to be 443. 

There's no mathematical significance of these. They're just kind of arbitrary. But humans decades ago decided to standardize on these numbers. 

So what it means for Phyllis is that on the outside of her envelope, she should generally put a colon after the destination address and then the number of the port that she wants to receive this packet. So if she's actually not sending an email, but maybe making a web request, and Brian is a web server and Phyllis is a web browser, she would write colon 80. Or if she's using HTTPS securely, we would change that 80 to a 443. 

There's other stuff on the outside of that envelope. In fact, just like with IP, there might be fields that look like this. But just to give you a sense of this, which is a TCP packet, you'll see that indeed sequence numbers are actually really big. They use all 32 bits of this part of the picture, which is to say that generally computers are sending way more than one packet or two. They might be sending dozens, hundreds, thousands even, depending on the size of the data in question. 

And there's some other features therein, including source port and destination port. Destination port is the 80 or the 443 that I mentioned earlier. But long story short, Phyllis also gets to pick a source port to uniquely identify this particular request. 

But more on that another time. For now, just know that TCP is the pair of protocols that the internet uses to get data from point A to point B. IP standardizes how the addresses work. And TCP guarantees delivery with those sequence numbers and also helps the servers do more than one thing, helps them multiplex, so to speak, among email, web, video conferencing by using those port numbers. So at the end of the day, everything, even now, weeks into the class, it all boils down somehow to zeros and ones, or in turn, numbers, as we might think of them in this case. 

Questions on any of these building blocks thus far? Questions on any of these? No? All right. 

Well, on the outside of this envelope are just some arbitrary numbers, 1, 2, 3, 4, 5, 6, 7, 8. That's obviously not what you and I are in the habit of typing. When we actually visit websites, for instance, you and I are generally in the habit of typing harvard.edu, or yale.edu, or google.com, or the like, otherwise known as domain names. 

But your Mac, your PC has to, at the end of the day, address those virtual envelopes, AKA packets, with actual IP addresses. There is no room for words, letters of the English alphabet in those pictures that we showed on the screen. It's just 32 bits, here 32 bits here. 

So it turns out, on the internet, there's another type of server. That, unlike routers, which route information from point A to point B, there's another type of server that are all over the place, frankly, in your home, on campus, in a company on the internet more broadly, known as DNS servers, domain name system servers. So what do these things do? 

This is just a type of server on the internet whose purpose in life is to answer questions of the form, what is the IP address for this domain name. So for instance, if you do pull up your browser on your Mac or PC or your phone, you type in harvard.edu and hit Enter, what your device is designed to do is to ask some local DNS server on campus, on your mobile carrier's network, on your apartment or dorm's network, what is the IP address of harvard.edu, or yale.edu? Whatever you actually typed in, hopefully there is a nearby DNS server that will respond with a numeric address of the form something dot something dot something dot something. And that's the number that your computer, your device will actually use on the outside of that virtual envelope. 

So you can think of DNS servers, honestly, as fitting the model that we keep coming back to, this notion of a dictionary, or a hash table, more specifically, whereby inside of a DNS server is essentially a dictionary, a two column spreadsheet or database table, if you will. And in one column are domain names, harvard.edu, yale.edu, google.com. On the right-hand side, right-hand column are just the corresponding IP addresses. 

And that's it. To be technical, if they're not generally called just domain names, technically, it's a fully qualified domain name. More on that another time. But domain names as we know them, generally have different parts. And we'll soon see how to tease them apart beyond the usual. 

Questions though, on what DNS server's purpose in life is or how this might work? No. All right. 

So how does your Mac, how does your PC, how does your phone know what these IP addresses are? Well, they don't come from the manufacturer this way. And there's this whole hierarchy in the world of DNS servers such that your phone, your Mac, your PC, will generally ask the nearest DNS server, which is usually owned by your internet service provider at home, in your apartment, or by your university or by your company. 

But it's a hierarchical system. And it's kind of a recursive design. In that if that local DNS server does not have the answer, it's going to ask someone bigger, more important than it. If that one doesn't know, it might ask someone, again, recursively for it. 

And throughout the world, there's a finite number of what are called root servers that essentially know about all the dot coms in the world, all of the dot edus in the world, all of the dot whatever is in the world. And so someone, at the end of the day, knows about those systems. 

And in fact, if you've ever bought, or in the future might buy a domain name, part of that process is paying someone to associate an IP address for you with the actual server that you're going to actually be using. So your final projects, for instance, in CS50, it's sometimes common for folks to actually buy for personal use their own domain name for a few dollars a year, typically. So you're sort of renting it more than you're buying it. But among the steps you'll go through if you ever do that is to essentially inform the world what will be the IP address or IP addresses of your particular domain name that you've bought for, say, that calendar year. All right. 

So how does all this get started? Well, back in the day, when you arrived on campus here at Yale, anyone, or in the world, you would actually configure your Mac or PC to know the IP addresses of your nearest router, of your nearest DNS server. So literally, someone would come to your home back in the day when signing up for internet service and configure your Mac or PC for you. 

Of course, nowadays I don't remember anyone really touching my computer recently to configure it for me. It all seems to happen automatically. And indeed, there's this other type of server now in the world, another solution to a human made problem known as DHCP. And I think this is among the remaining acronyms for today, dynamic host configuration protocol. And it's not that intellectually interesting to memorize that. 

But what DHCP servers do is answer questions of the form "what should be my DNS server and router," quote, unquote. So nowadays, when you turn on your phone in the morning, if you actually powered it off, if you open your laptop lid for the first day of classes or the like, your Mac, your PC, your phone is essentially broadcasting a Hello, World message, unbeknownst to you, that's just asking the local network, hey, what IP address should I use for my DNS server and for my router. And hopefully, Harvard or Yale or your apartment or your home more generally has a DHCP server nearby whose purpose in life is just to hand out answers to that question. And what these DHCP servers also do is they tell your Mac, your PC, your phone, what IP address your device should use because that too is no longer manually configured. 

So this all just nowadays happens automatically. And in the case of a campus like this or at Yale, it's because, at the very beginning of your visit to campus, you did register somehow. You probably logged in. You authenticated against your Harvard account or your Yale account. And that is what enabled the DHCP servers henceforth and forever to recognize your particular computer and answer those questions for you. 

All right. So that's it for how the internet works, at least so far as we are concerned today. We're going to now start building on top of it. And undoubtedly, the most popular form of the internet nowadays is something called HTTP. That is the World Wide Web, though most people don't really say it in long form anymore. 

But HTTP is just another protocol that governs how web browsers and how web servers speak, just like IP is a protocol that governs how computers address each other on the internet, and how TCP governs how computers keep track of sequences of packets from point A to point B and also multiplex among different services using those port numbers. And to be clear, what's a protocol-- well, in the human world, it's a very common protocol. And I can't reach any of you. 

But if I were to reach over and say hi, nice to meet you. You presumably, if we weren't five feet apart, would extend your hand. We would sort of acknowledge, in this strange cultural convention. 

But that's a protocol. I know how to do it. You know how to do it. I'm initiating. You're responding. 

And that's exactly what's happening all the time on the internet. You have a client, like me in this case, that's initiating a request. You have a server, like you in this case, that's responding to that request. 

Or analogously, if you're in a restaurant, you might be the client sitting down at the table. You want to order food. And there's a server that serves you that food after you have requested it. 

So computers, really, on the internet are implementing that same paradigm. So when it comes very specifically to the web, which is different, of course, from email and video conferencing and all of these other services on the internet, the world wide web uses this protocol, HTTP, which standardizes what goes inside of those envelopes in order to allow a web browser to request and receive information from a web server. So we've talked about really the lower level details up until now, the outside of the envelope. Let's now look inside of the envelope when it comes to actual web pages that you might visit or soon today, you yourselves might design. 

So HTTP stands for Hypertext Transfer Protocol, which is another mouthful, but, again, just standardizes how we're going to get web traffic from point A to point B, from browser to server and back. HTTPS is literally the secure version of that. And what that means for today's purposes is that the connection is somehow encrypted, scrambled using very fancy mathematics so that it is very, very, very unlikely that anyone who intercepts your traffic, your packets between point A and point B will have any idea what is inside of those envelopes. 

They might intercept the packet itself digitally. They might try to open it up. But it's going to look metaphorically like random zeros and ones on the inside when using HTTPS because of what's called encryption. 

But let's look at some canonical URLs. All of us are in the habit of seeing these and typing these all the time. Well, let's actually tease apart some of the jargon here. 

So here is an example URL with all of the usual components. So here, for instance, with the yellow slash, this generally means, even though you rarely type it and you rarely see it nowadays, this means the default page for the website. Give me the root of the website, so to speak. 

So this is to say this represents a folder, like, the default folder inside of which is presumably the default web page. And we'll see what that means more concretely in just a bit. If, though, you're visiting a more specific URL, we're going to henceforth call this a path. So slash something is representative of a path, maybe a file, maybe a folder, just like in the world of Macs, PCs, and cloud services. 

Specifically, you might sometimes be in the habit of visiting an actual file, something like /file.html. Nowadays, this is kind of very '90s, early 2000s. Nowadays most web servers hide the file extension, the dot HTML, even if it's there on the server. 

It just looks a little messy nowadays. It sort of reveals information that's not necessary. So very often you won't see dot HTML, even if there is actually a file ending in that suffix. You might instead see /folder, with a slash. Maybe not a slash, maybe a slash, but that generally represents a folder on the server. And sometimes there are, of course, files in folders. 

So all of this stuff you're probably familiar with on Macs and PCs and even Google Drive and the like. Those same semantics exist in the context of URLs. So there's a mapping between this URL and something on a hard drive somewhere on some server. 

All right. What about the other parts? So this is the fully qualified domain name, so the full domain name. Even though you and I, when we say domain name, we typically just mean this example.com, for instance. 

So technically, the W-W-W is what we would typically call a host name. A host name is like the name of a specific server that lives somewhere in that domain. And this is just a human convention. 

Even though most URLs still probably start with W-W-W dot something, that's not strictly required. That's just a configuration detail. And historically, this was just to kind of signal to less technical people in particular, when you would see a URL in print, that oh, this is a web address. This is an address on that new world wide web. W-W-W just kind of connotes that. 

But decreasingly, do you see websites using this? I mean, some of CS50's own tools, it's just cs50.dev. It's just CS50.ai. Because most of us are now conditioned to know that, oh, OK, that's probably a URL, even though there's no explicit W-W-W. And in fact, even if you type the W-W-W using tricks that we'll soon see, you can redirect the user from one to another. Essentially, remove the W-W-W or add it to the server, to the address bar in their browser. 

This thing here is called the top level domain. And many of the domain names that you and I are in the habit, certainly in the US nowadays, end in .com, which stands for commercial, .edu stands for educational, .gov stands for US government. But of course, there's hundreds of country codes, too, that by convention are two letters. So .uk for the United Kingdom, .jp for Japan, and two characters for every other country in the world. 

But even those have kind of been used in clever ways. So .tv, for instance, is actually a country code that's been used by a lot of the English-speaking world to represent television, for TV shows and the like. .ai, similarly, does not actually mean artificial intelligence. It's a two character country code that has been used by the world nowadays to represent AI. .ly for bitly and CS50.ly, too, that's a country code that allows people like us to essentially buy domain names in that subdomain. 

But long story short, back in the day there only used to be a few of these top level domains. Now there are hundreds of them. So I do think, over time, it's going to become a lot less regimented as it seems to be now as to what URLs actually look like. 

Lastly, beyond the :// here is the scheme, or the protocol. And this just means that this URL is going to be securely accessing the server thanks to the HTTPS instead of HTTP. Mouthful. But just to get some vocabulary out there. Questions on these here URLs that we've probably been taking for granted for years? 

AUDIENCE: Who approves .edu? 

DAVID MALAN: Really good question, who approves .edu. So you have to be in an accredited educational institution to use .edu. I don't recall the name of the organization that does this. But it can't be anyone on the internet. You actually have to apply and be a seemingly legitimate educational institution. 

That is not true of a lot of domain names. Anyone can buy a .com. Anyone can buy .org, a .net, not a .gov, for instance. And then different countries might have their own policies over who can be in what domain or subdomain as well. 

All right. So now that we have URLs so defined, there's a couple of verbs with which to be familiar in the context of the web, namely GET and POST. And that is to say, there's two different ways to request information from a server. That is, there's two different ways to format requests that go inside of this envelope. 

And the default, daresay, and the most common one is just what's called GET, literally the verb, the English verb get. And we'll see in a moment what this means exactly concretely. But just know that there's an alternative that we'll play with over time known as POST. 

And whereas GET, as the verb suggests, is all about just getting information, POST, as the verb kind of suggests, is more about sending information. So POST is used when you submit a credit card. Because you're sort of sending potentially sensitive information. 

POST is used when you upload an image to a website or the like, but GET is used when you're just clicking on links and visiting web pages and not really pushing any information to the server. So for today, we'll focus primarily on GET. So what does this mean? 

Inside of this envelope, probably unbeknownst to you up until now, is our messages that look like this. These are HTTP messages that are being put automatically in these virtual envelopes for you by your Mac, your PC, for your phone. So for instance, if you were to visit HTTPS://www.harvard.edu, you would hit Enter. What your Mac, PC, or phone is going to do is put a textual message that looks literally like this inside of a virtual envelope, address it on the outside to the appropriate IP address for harvard.edu using your own IP address as the source address, and then hand it off to some nearest router. 

But inside of this envelope is enough information to the server to know what it is you want. So for instance, GET is the verb. So you just want to get some information. 

The information you want to get is /, which I defined earlier as just the default page on the website. HTTP/2 just means what version of HTTP we're talking about. You'll see nowadays in the wild, 1.1, you'll see 2, you'll start to see version 3 over time. But I'll use 2 for all of my examples here. 

And you'll see inside of this envelope, too, what we're going to start calling an HTTP header, a single line of text that literally tells the server what fully qualified domain name it's looking for. And this is important only insofar as nowadays, generally on a server, you might have multiple websites being hosted. This is not going to be true probably of Google or of Microsoft or Meta or massive companies like that. But it's definitely going to be true of smaller enterprises, even places like Harvard that don't need thousands of web servers, but maybe just a couple, or maybe just a few. 

So in this case, this ensures that when the server receives this packet, it knows to serve up harvard.edu and not yale.edu or some other website that, by coincidence, might just be hosted on the same server because both Harvard and Yale are maybe paying the same cloud provider to host their websites. So dot dot dot just means there's other HTTP headers. But notice the colon here is just giving us yet another one of those key value pairs. The key is HOST. The value is www.harvard.edu. There, again, are those dictionaries that I claimed we would continue to see all over the place. 

What then comes back from the server? If this is what's inside the message from browser to server, what does the server send back? Ideally, the server sends back a message that looks like this, an acknowledgment of what version is being used, a status code, which is going to be an arcane looking number, like 200. 

It's going to then have another HTTP header of its own saying what type of content is in this envelope, ideally, something called text/html. That is hypertext markup language, which we're about to see. And then some other stuff. That's what's coming back from the server to the browser. 

And we can actually now see this. Let me actually go over to VS Code here. Let me maximize my terminal window just so we can see more at once. And let me go ahead and type in this command, curl -I https://www.harvard.edu/, so a complete URL that's secure, that's got the host name of W-W-W. 

And curl just means connect to a URL. It's a command line program that comes with Linux, comes with Mac OS, Windows. You might have to install it individually. And it just lets me simulate being a browser,. It's going to let me simulate sending a packet like this without caring what the website actually looks like, so no pictures, no images, no text, no nothing, just what's inside of the envelope in terms of the server's response. 

And here's mostly dot dot dot, the ellipses I raised my hand at earlier? There's a lot of these key value pairs. But if I scroll up to the top, you'll see that 200 is the status code that came back. And you'll see that the content type is indeed text/html. 

And there's a whole lot of other stuff here, clearly. A lot of this is diagnostic. It reveals information about the server that might be useful generally to more technical people than me at this point in the conversation, or maybe my Mac or my PC or my phone. For now, we can focus really on just the essence of this response, which is this here. 

But here's where even these arcane numbers might start to get a little more familiar, in fact. Suppose that I want to see this in my browser. Actually, let me do this. Let me go back to VS Code here. Let me open up incognito mode here, which generally is to give you private browsing, so to speak. 

And we'll talk more about this next week. In incognito mode or private mode, you have no history, you have no cookies, you have no sessions, terms we'll define next week. I'm going to use it again and again today to make sure that my browser is essentially starting from scratch, freshly, so that I don't have anything in my history from previous examples. And what I'm going to do, actually, first is open up, via my browser's menu, so-called developer tools. These are going to look a little different in Chrome versus Edge versus Firefox versus Safari versus other browsers as well. 

But almost any modern browser, whatever your favorite is nowadays, has built into it developer tools. And you might have to click a different button to access it. But these are tools for developers, like, web developers that want to not just use the browser to go places, but use the browser to develop their own websites and web applications. 

Now there's a whole bunch of tabs here. And I'm going to focus on the Network tab initially. Essentially, this is like diagnostic information, kind of like debug50, like a debugger. But it's specific to the web and the web browser here. 

So with my developer tools open and with the Network tab open, I'm going to go up to the URL bar and type in https://www.harvard.edu/. So the exact same thing that I typed in curl a moment ago in my terminal, I'm just typing in my browser like I would normally do. 

And if I hit Enter, what's interesting about developer tools, and let me go ahead and drag them to the top and maximize the window, is you see all of the HTTP requests, all of the virtual envelopes that just went instantaneously it would seem back and forth between my Mac here and Harvard's own web server. And notice it's way more than a single envelope. It's way more than a single request. 

Why? For now, assume that each of those rows of output represents maybe a sound that was downloaded, a video, an image, some text. There's all sorts of media in web pages nowadays. And they might actually be spread across multiple files. Browsers are designed, if you will, to recursively get all of the media for a single web page and download it automatically with we humans only typing the URL itself once. 

But watch this. At the very top of this output, I scrolled all the way to the top of my network tab. I'll see a request, a row that represents my original request for the website. And if I Zoom in here, we'll see that 200 means apparently OK. So all is well. Here's the contents of the website. 

But there's a lot. In fact, if I look at the very bottom of the window, harvard.edu is composed of 91 separate files it would seem. And that's just the landing page itself, not to mention everything else we might click on ultimately. 

But 200, OK, is a good thing. And odds are you've never actually seen that, because it's, indeed, OK. So let's consider actually what else could happen when you make these requests. 

Well, here, for instance, is a shorter request. Suppose that I omit the W-W-W just because it's faster to type. And honestly, you and I are almost always, nowadays I bet, in the habit of just typing something.com, or something.edu. We don't bother typing the HTTPS, the so-called scheme or protocol. We probably don't bother typing the W-W-W. 

You can probably think of someone in your life who's very pedantic like that, typing it out in its full. But you don't need to do that typically for a couple of reasons. If I, in fact, go back to VS Code here, let me use curl again to connect to another URL that's similar, https://harvard.edu. 

Now notice before I went to W-W-W. And that's indeed Harvard's preferred URL, if you will. But harvard.edu will still work. 

But watch what happens when I hit Enter. I'm going to get back the contents of the virtual envelope that Harvard just sent back to me. But it's not OK. It's not 200 anymore. 

It's actually this number here, 301, which actually means something specific. 301 actually means that Harvard's website moved permanently, so to speak. In other words, Harvard, Yale, any server can configure itself to redirect the user to another place if they prefer to canonicalize on some other URL. So by default for branding purposes, most websites still probably use www.something.something. So Harvard is, in fact, doing this. And for reasons we'll talk about next week, there's technical motivations to do so related to something called cookies and sessions. 

But for now, that just seems to be a different status code. But if I now open up another browser window and I'll do this again in, let's say, how about incognito mode, just to start fresh with a brand new window. Let me open my developer tools again. 

Let me go to the URL bar and only type https://harvard.edu/ Enter. I'm still in my network tab here. And if I scroll to the very top of this, notice, ah, the top row looks a little different now. It's not 200 anymore. 

And I can click on that here. And what I'm now seeing in yellow is that 301, AKA, moved permanently. So this is to say you've been able to do this all this time in your browser if you care to. You can see what's going on underneath the hood, if you will. Check that off, I think. 

Underneath the hood so as to just understand what's going on. Now for users, this is not that useful or intellectually interesting. But for developers, this can be very useful for understanding things and also diagnosing problems ultimately. 

So that's just a couple of the status codes that can come back, not just 200, but perhaps 301. There's also this one now, with which humans generally are familiar, 404. Well, it turns out 404 is what happens when a file is not found. 

So I can simulate that here. Let me go back to VS Code and my terminal window. Let me do curl -I https://www-- because Harvard prefers that-- harvard.edu/cats. Let's see if there's a page about cats within Harvard's website. I'm pretty sure there's not. 

And so, indeed, when I hit Enter, a whole lot of output, a lot of HTTP headers. But notice at the top, 404. It's File Not Found. 

Now what you see in the browser is going to completely depend on the website. Some websites just display an error message or a status code number. And that's why you and I have seen probably in the world 404 messages. 

Sometimes they're much more user friendly. Sometimes there are links back to the home page to help you out. It's entirely up to the server. But that status code indicates that something has gone wrong. 

And in fact, there's a whole bunch of these status codes. Some of which you'll now start to see in the class. 200's OK. And it's a good thing if you never see that, because it means everything's working. 404 is not found. 301 is moved permanently. 

Any of these that start with 3 relate to redirects. Long story short, there's different ways to redirect the user from one place to another, as we saw from that location header a moment ago. 400s are generally bad. It means that the user, the browser somehow did something wrong. Like, 403 forbidden probably means you're not logged in. 

500 you're going to start doing next week most likely. 500 is, like, the segfault of the web, if you will. So there's no pointers or anything like that. But 500 means that you wrote some buggy code, as invariably we all will next week. 

418 is an April Fool's joke from years ago. Some servers honor this. But someone wrote up literally this long technical document proposing a response that says I'm a teapot for a server, even though it was just a joke on April 1 some years ago, so sort of geek humor, if you will. 

So those are then the status codes that are available to us. Let me show you one other. Has anyone been to this URL here? So you have? 

All right. So without spoiling here, let me actually-- well, let me go into incognito mode here. Someone's pulling it up on their phone, clearly. 

Safetyschool.org/ Enter. Oh, my goodness. Another box gets crossed out today, too, I think. So how is that working? 

Well, if we actually diagnose this with curl-- let me go into VS Code, curl -I HTTP-- and it doesn't support HTTPS because this is an old website-- ://safetyschool.org/ Enter, all this server does is return an HTTP 301 response with a location that literally refers us back to yale.edu. And this is amazing. Someone has been paying for this domain name for decades. And all it does is literally this. 

Now I know for our friends at Yale who are watching this, it's not quite fair to poke fun. It turns out Yale got us even better. So later today, we'll turn the tables a little bit. All right. 

So let's go ahead and take a look now at what it is that composes this web page when it is indeed 200 OK. Let's introduce another language here, or an actual language called HTML, which is not a programming language but is a mark up language. Which is to say it's all about aesthetics, like, mocking up a web page so that you can see the information you care about. But HTML is not going to have functions and loops and conditionals and all of that stuff we talked about in Week 0 It's just about presenting information. 

So here are some of the building blocks of HTML. You're about to see really only two vocabulary words. HTML honestly is the kind of language that you learn in, like, 30 minutes and then you're just kind of off and running with online tutorials, documentation, and the like. I still remember years ago just learning it from a teaching fellow who kind of gave me a crash course and then you kind of fill in the blanks yourself because it has relatively few concepts associated with it. Even though, in fairness, it can take years to get good at making pretty websites, today we can get good very quickly at making functional websites, so that artistic disclaimer. 

So in the world of HTML, there's really two concepts, tags and attributes. And those of you who have played with websites growing up might be familiar with some of these already. So here is some sample HTML. 

HTML is just a text-based language. You type it out with your keyboard. Again, it's not a programming language. So you can't call functions or write logic. But you can mock up a web page. 

And this web page, for instance, is quite simply going to say hello, title in its title bar, or the tab. And then the body of it, the big white box, it's going to say hello comma body, just to distill this really into its essence before we make more interesting pages. So what's going on in this HTML is enough detail that the server can display the information for it. 

So in fact, let me go ahead and reveal this as follows. I'm going to go over to VS Code here. I'm going to create a new file here called, for instance-- let's just call it hello.html. 

And I'm going to really quickly whip up that same web page from memory. So DOCTYPE html html lang equals quote, unquote "en" close bracket. Open head, open title, hello comma title. And then down here, open body. 

And you'll notice I'm actually not quite as fast as I might seem to be. VS Code is configured to automatically finish half of my thought for me. So when I open one of these things that we're about to call tags, VS Code is doing some of the heavy lifting for me. And in here, I'm going to do hello comma body. 

But I think this is the entirety of the file that I just proposed in the slide version thereof. So this is clearly now a text file in my code space within VS Code. How do I actually view it with a web browser? 

So if this file were created on my Mac or PC, I could literally double click it and Chrome or my default browser would open up and show me this web page. But this file, technically, is not on my Mac or PC. It's in the cloud. It's in your code space. 

So all that we need to do is actually turn on a web server to serve this file to me or to anyone else in the world, in fact. And the command we're going to run now is literally called http-server. This is a piece of software that someone else wrote that we pre-installed in everyone's code space. And by running this, it starts a server whose purpose in life is to listen for HTTP requests. And as soon as it receives one from a browser, be it mine or anyone else's, it will respond with the contents of that file. 

So let me go into VS Code here. Let me reopen my terminal window. And I'm going to go ahead and literally run http-server Enter. 

And now you'll see a whole bunch of output, most of which isn't germane to our discussion yet. But here is this URL here. And if I hover over it, I'll see a little Open URL pop up that I can click on, or on my Mac, I can Command click on the URL itself, and that will open up in a new tab this folder. 

So this is going to look a little esoteric at first glance. But this is what's called a directory listing. It's just literally the contents of the folder that I'm in. 

So I'm in my Codespaces default folder. I deleted everything from last week and weeks prior. Your folder will, of course, have many other things that you've created and kept. I have a Source 8 directory that I downloaded in advance because it's got all of today's examples made in advance. 

But there's the file I just created. And there's some other information here, like the date and time at which I created this file, and so forth. But you'll see that this is just a web page that lives at this URL here. And this is actually somewhat specific to Codespaces, the infrastructure we're using. 

But if I Zoom in up here, you'll see that I am effectively running my own web server at this weird looking URL that GitHub dynamically generated for us, for me. And you'll have a different unique one as well. You'll see that baked into this URL is actually a port number. 

And they're doing some trickery. Normally, I would have to access this web server at Port 80, or 443, or even 8080. And the reason for this is because cs50.dev, that is to say Codespaces, the tool that we're using in the cloud, is obviously itself already a web server. And it's GitHub's web server that's listening already on Port 80 and 443. 

So if I want to run my own web server on their web server, I just have to pick another port number. And so what you're seeing in the URL here is a hint of that. By convention, the program I just ran, http-server, does not try to use 80. It does not try to use 443. 

It uses 8080 by default. And that's why you see it in the URL here. And underneath the hood, that virtual envelope actually contains Port 8080. Because this is not an official web server. This is not CS50.dev or GitHub.dev. This is little old me trying to serve up my brand new hello.html file. 

But the point here is this. When I click on this file, I should see the results of my hard work. And there is a big white box, otherwise known as the view port, inside of which are the only words in the body of my page, hello, body. And if I scroll up further, you'll see in my tab here hello comma title. 

So this now maps back to the code we just saw. Here is the HTML that I just pulled up in my browser. And it is what told the browser what to do visually. 

So let's walk through this top to bottom. This first line here is what's called the document type declaration. Honestly, you just copy paste this nowadays. And it means hey, browser, I'm using version 5 of HTML. 

Odds are in some number of years, this line might change over time to indicate different versions. But for now, this just means I'm using the latest version of this HTML language. That's kind of anomaly, because you're not going to see this exclamation point again. Everywhere else, you're going to see a lot of less than signs and greater than signs or angled brackets, so to speak. 

But they're almost always going to be symmetric, as follows. This tag here, this is an HTML tag, says, hey browser, here comes my HTML. And this is what's known as an attribute. So anything after the name of a tag is what we'd call an attribute. 

And attributes can have values. Those values that are associated with the attributes with an equal sign and typically quotation marks, single quotes or double quotes, as in this case. So here we, again, have that paradigm of a dictionary, key, value, pairs. They're everywhere in computing, even though the syntax obviously keeps changing, whether when we're in SQL, or Python, or now HTML. 

This tag at the very bottom now means hey, browser, that's it for my HTML. So when you see a tag that looks like another, but starts with a forward slash-- and you do not need to repeat the attributes, that would just be very annoying to have to type it here and here, we keep it succinct-- this is what's known as a close tag, or an end tag that conceptually corresponds to this start tag or open tag. So they're sort of symmetric. 

Inside of that are two children, so to speak. So there's actually a notion of a family tree-type hierarchy here, or a tree, as we've discussed in data structures. The HTML tag, per the indentation here, has one child called head and another child called body. Everything between the start tag and an end tag here is what's also generically known as an element. 

So this is the head element. This is the body element. A bunch of new vocab, but it's not that intellectually interesting. It's just jargon that we'll use. 

Here means hey, browser, here comes the head of my page. So like the very top of it, which generally for now means just the title bar. In fact, this means, hey, browser, here comes the title of my page. 

And then here, notice there's no more angle brackets. This is literally raw text. And this is why we saw in the actual gray tab of my browser hello comma title. This means, hey, browser, that's it for the title. This means, hey, browser, that's it for the head. 

Meanwhile, down here, hey, browser, here comes the body of my main page. Like, 90-plus percent of the page inside of the so-called viewport, the big rectangular region, hey, browser, here comes the body. Hey, browser, sir, that's it for the body. 

What is in the body? In this super simple case, literally just hello comma body. That's it. 

So HTML really is that pedantic. It just tells the browser start doing this. Stop doing this. Start. Stop. Start. Stop. And that's how it knows what to do, top to bottom, left to right when actually reading the code therein. 

All right. Questions about any of this here HTML code. And yeah, in front? 

AUDIENCE: Would browsers be considered a HTML interpreter? 

DAVID MALAN: Say that again? 

AUDIENCE: Would browsers be considered a HTML code interpreter? 

DAVID MALAN: Oh, yes. I think that's fair. The question is, can browsers be considered HTML interpreters? 

Yes, I don't think people tend to call it that. Interpretation generally implies that you're parsing something that's logical in nature, your functions, loops, conditionals, and so forth. Parser is a term you might indeed hear much more often. A parser is a piece of software that analyzes code, analyzes text top to bottom, left to right, breaks it down into chunks that have semantic meaning, like the tags, like the attributes, like the elements that we're talking about and then it displays them, in this case. There's not as much to interpret in quite the same way. But that's reasonable, nonetheless. Yeah? 

AUDIENCE: With all the frameworks, do you think is is worth learning HTML from scratch or just use a Bootstrap [INAUDIBLE]? 

DAVID MALAN: Really good question. With all the frameworks out there, should you bother learning HTML and writing it from scratch or using frameworks, like something called Bootstrap. Well, we spent a few minutes today talking about that very framework. 

But even frameworks like Bootstrap absolutely assume that something about HTML, something also about something called CSS, more on that in a bit, and better still, something about JavaScript. If you really don't want to know and understand these things, that's when you reach for like a third party service, like Squarespace nowadays or Wix, where you really just click and drag and drop and create websites that are, at the end of the day, still HTML. But the developers at Wix and Squarespace have automated the process with a graphical user interface or GUI of letting you create it. 

But even then, most web developers, or even just business people who want to create their own website and they're not programmers themselves or technical folks, they might still like to know a little something about HTML, CSS, and JavaScript because then you can open like the advanced settings and configure things. And indeed, that's a frustration that you'll tend to feel if you can't quite drop down conceptually to that level. 

All right. So just to make this a little more-- to give you more of a mental model for this, this indentation is not strictly necessary. Kind of, like, in C, where we care, where style50 cares, but not Clang, about what your code looks like. 

Similarly, browsers are pretty tolerant. You can have all of this white space, all of this pretty indentation. Or you cannot. It's not going to care, generally, one way or the other. 

However, this is certainly much more readable. But we'll see next week as we start to generate HTML automatically, it's not always important that the code you generate be pretty printed. But when you're writing in this format, it absolutely should be when you're collaborating or submitting to other people. 

So this, though, is what we would call a tree representation of this. So here is that hierarchy. So if we think of the whole web page as what's generally known as a document, that document has a root element called the HTML element, which it's open HTML tag and its closed tag. 

It has, as I claim, two children. The head tag has one child title. And then both of those leaf nodes, or leaves to borrow the family tree vernacular, have text nodes of hello, title and hello, body, respectively. 

So this is going to be useful later today because it turns out, with JavaScript, an actual programming language, we can start to modify this tree in the computer's memory or RAM and make the page dynamically change by essentially creating new HTML on the fly, even if that didn't come from the server. Case in point, many of you use Gmail or maybe Outlook. Generally speaking, you don't have to reload the page to see if you've got new mail. It just magically appears at the top of the page in kind of a stack. And it just keeps pushing old mail down, down, down. 

Well, that's going to be the result of some JavaScript code updating this tree in memory. And it has the effect of just dynamically generating more and more HTML that represents your email's inbox, for instance. All right. 

So with that said, why don't we go ahead and actually try this out in a couple of ways. So let me go back to VS Code here. Let me propose to actually tweak my code here a little bit. 

So let me go into, let's see, my VS Code editor here. Let me zoom out. And notice down below, actually, all this time as I was clicking on hello.html, my HTTP server program actually is outputting sort of the logs from a server. It turns out any time you request a page with a browser from a server, that server is probably logging a little something about you. 

One, it's probably logging your IP address. Two, it's probably logging the type of browser you're using Chrome, or Safari, or Edge, or something like that. It's probably logging the operating system version you're using, be it Windows, or Mac OS, or Android, or iOS, or the like, and maybe some other information as well. 

We won't dwell on this today. But there's a lot of information that will be logged about you, even if you are in incognito mode or private mode. So more on that next week. 

And today, unlike all past lectures, even though by default you see this in your own code space, you see here a ports tab, which for the most part is not that useful for us today. But you will see that this row here mentions HTTP server. Why? 

Because in my terminal, that command is still running. It is a server. And it's just there living to serve now by waiting and waiting and waiting for me to click on more of those links. And every time I do click on a link, I'll see another line of output here. 

But it turns out that all this time in your ports tab of VS Code, you can see all of the TCP ports, for instance, that are in use. Now generally, you haven't needed any of those, at least for your own work. But notice that HTTP server is indeed listening on Port 8080. 

CS50 has some of its own customizations. And this is a bit of a geek Easter egg. But we presume to use Port 1337, which perhaps those more comfortable will know what it means. This is like leetspeak. So it actually spells Leet if you're cool and use a 1, 7, and 2, 3. OK. 

So anyhow, we chose that port number. But there are some conventions. Next week we're going to actually start using Port 5,000, which isn't in use at the moment. But long story short, you can see this stuff underneath the hood. And indeed, we're just sort of peeling back some of these layers that have been there now for some time. 

Well, let's go ahead and do this. I'm going to go ahead and create another terminal window using my plus icon down here in the console. Notice that at the right-hand side of my screen, I now have two bash shells. Bash is the name of your prompt, so to speak, where the dollar sign is. 

If I click on the first one, there's HTTP server. It's still running. And I want it to keep running today. 

But I'd also like to be able to run more commands in my code space. So I've simply created a second terminal. And I can go back and forth by clicking it right. Let me go ahead now and copy hello.html like that and create a brand new file called-- how about paragraphs.html? 

And in paragraphs.html, I'm going to first paste all of that. I'm going to hide my terminal window now without stopping HTTP server. And I'm going to go ahead and just create some paragraphs of text. 

And in fact, let me go ahead and cheat here real quick. I'm going to go ahead and, in my other window here secretly, open up a whole bunch of text so that I can grab some Latin-like text to copy paste. So now I'm back. And all I did was secretly copy paste a whole bunch of text. 

I'm going to make a couple of changes to this file, where I currently have just a title and a body. One, I'm going to rename this to paragraphs, just so I can keep straight which file is which. And down here, I'm going to go ahead and paste in a big paragraph of text. 

This is not actually Latin. It's sort of lorem ipsum text, which is Latin-like random words that's meant to look like Latin. And typographers historically used this as sort of placeholders for actual text. But notice this is a pretty decently long paragraph. And so it's going to make my web page a bit bigger. 

So let me go back to my other tab, where I have hello.html open from before. Let me click back. And now notice, in my directory listing, I have a new file, paragraphs.html. Let me go ahead and open that up. 

And voila, there is a big paragraph of text. Just for fun, let me create three such paragraphs. So I'm going to cheat temporarily and just copy and paste two more times. 

But I'm going to separate it with a blank line, as you would in, like, Google Docs or Microsoft Word for paragraphs in English or any language. And I'm going to go back to my paragraphs. Nothing has changed yet because HTTP, just like the exercise with Phyllis and Brian, requires that we send the packets back and forth if we want to get updated content. 

So I have to click my browser's reload button, or hit Control R or Command R, depending on your browser or OS, and notice that when I do that, I definitely get more text. But it just looks like one big blob, not three separate paragraphs. What might your intuition be for why that is, even though I've clearly indented this and given blank lines between? Yeah? 

AUDIENCE: HTML doesn't care about the whitespace. 

DAVID MALAN: Yeah. So HTML doesn't care about the whitespace or technically, more than one whitespace. I can hit as many Enters as I want. All of them are going to be ignored except for a single space. It's going to be normalized to just a single space. 

In general, this is useful. Because it means I can pretty print my HTML and indent things visually, even if I don't want the browser to indent anything manually for me. But here's where we're going to need some more tags. 

And it turns out the simplest fix for this problem is to use the paragraph tag. And for short, it's just open bracket p close bracket. I'm going to be a little pedantic, and even though VS Code is being a little annoying because it's trying to autocomplete my thoughts but I don't want it to autocomplete just yet, sometimes you have to fight with the text editor. So these autocomplete features have upsides and downsides. 

But I'm going to go ahead and put a paragraph tag, open and close around each of these paragraphs. And I'm going to maintain my indentation, just to keep it visually clean on the screen. And now I'm going to put this one last close tag on this line here. 

And so it's a lot more verbose. But notice that it's effectively telling the browser start a paragraph, end a paragraph, start a paragraph, end a paragraph, and so forth. So if I go back to my other tab and I click Reload, now we have some semblance of what I expected, which is three separate paragraphs in this case. 

All right? So that's the p tag, the paragraph tag. P for short, because as you'll see, many of these tags are abbreviated just because they're slightly faster to type. Let's do another example. 

Let me go back to VS Code here. I'm going to copy this text. I'm going to create a new file called-- how about headings.html? And I'm going to paste this, close my terminal just to give me more room. I'm going to rename the title to be Headings, just to keep straight which is which. 

I'm going to delete all of these paragraphs to make it-- actually, no, let's not do that. Let's keep the paragraphs. But like an academic paper or a textbook, let's give these chapter headings or section headings or the like. 

Well, I could just do something like this. How about 1? And then down here I could put 2, and then down here I could put 3. 

But of course, if I reload this, it's not really going to look as I-- whoops, if I go back to this directory listing, open up headings.html, it's fine. It's not super pretty. 

But it would be nice to give a little more prominence to these headings. And in fact, there's a bunch of tags for this. I can use H1, for instance, for one really big heading. And then let me close the tag over here and indent. 

Then down here-- and, again, whitespace doesn't matter, so I'm going to give myself a little bit of breathing room just so it's clear which of these is which. For this, maybe it's not Chapter 2, but Section 2. So let me do H2. 

And then inside of this, I'm going to go ahead and do 2. And just to be clear, I don't have to put these on their own lines. I'm just doing that to be a bit pedantic. You can technically just do this and keep everything on one line. But I'll be consistent, at least. But either approach is fine. 

And then, down here, I'm going to use maybe a sub subsection. So let me delete this and do h3. I'm just going to write the word three. And then just to be neat, I'm going to indent everything like this here. 

So now if I go back to headings and I reload, I'm going to get some default formatting. It might not be the formatting you want, but it looks like it's big and bold, but in decreasing order. H1 is the biggest. H2 is smaller. H3 is even smaller. 

And you can go down to h6. And it gets smaller and smaller. And at that point, if you've got, like, sub sub sub sub subsections of your book or paper, you're probably organizing it poorly. So they stop at some point there. 

All right. Well, what else can we do in HTML? These things are omnipresent. Let me copy this HTML and close that tab, open my terminal, and create a new file, like, code list.html. And let's make a list of information. 

Let me just paste that HTML, just to save some time today, and change my title to list. Let me get rid of all of these paragraphs, just to simplify things. So now I'm sort of back to where I began. 

And then inside of the body of this page, let me go ahead and make a list, like foo, bar, baz. If you've never heard these words, these are, like, computer scientists go-to words. A mathematician might choose x, y, and z by default. CS people tend to go with foo, bar, and baz for historical reasons. So here's a list of three arbitrary words. 

If I go over to my other tab, go back to my directory listing, there's my new file. Let's click on list.html, same problem. It's a list. 

But it's not one after the other. Last time, of course, we fixed this with paragraphs. But you know what'd be nice? To make it a little prettier, like a bulleted list, which are kind of everywhere these days. 

So I could try to simulate this. And you might be in the habit of doing this in some programs. But of course, if I go back to my other tab, Reload, I'm just sort of making the problem worse visually. 

But it turns out-- let me undo that-- there is an unordered list tag, otherwise known as ul for short, that I can put all three of these words in an unordered list. Let me go ahead and indent everything consistently. But to have three items in this list, I actually need another tag, a list item tag. And I'm going to go ahead and add that tag there, list item here and there, and then another list item tag here. 

And here's where it's a stylistic choice. I could move foo and bar and baz onto their own lines. But this is going to start to get excessively tall, like, too much white space. So reasonable people will disagree. But this feels a little more readable to me. 

So I'm going to leave it as such. Go back to my other tab. And now when I reload, you get a nice bulleted list by default. And you see these all over the web. 

What if I want to have a numbered list, that is to say, ordered list? Any instincts for changing these bullets to numbers? So ol is a good instinct. And, indeed, sometimes HTML makes perfect sense. 

As in this case, if I change ul to ol, I don't have to manually number anything. Because when I reload, it's going to use Arabic numerals automatically for me like this, top to bottom. And what's nice about this is, if I go in and I insert things in the middle, I don't have to manually renumber things. The browser is going to do the counting for me. 

And if you're doing an outline, you can actually specify whether you want 1, 2, 3, or A, B, C, or I, double I, triple I, or so forth. There's different numbering systems you can use. But by default, we get our decimal numbers here. 

I'm going quickly. But it's hard to get too excited about bulleted lists and such. But any questions on these tags thus far? 

We'll by design try to escalate quickly momentarily. All right. So how about just a few other tags to make things more visually interesting? Let me go ahead here and cheat by opening up a file that I made in advance that's going to demonstrate what a table looks like. 

So here let me open a file that I brought with me called table.html. And because I brought it with me, I actually included a comment at the top. And in fact, if you download today's files from the website, you'll see that they're generally commented, like our C code and Python code was. 

It's a little weird. But here is the syntax for a comment in HTML. It's a less than sign, or open bracket, then an exclamation point, then dash dash, two hyphens. Then at the very end of the comment, it's almost the opposite but not quite. It's hyphen hyphen close bracket instead. 

Why these symbols? Humans probably decided years ago that there's no way someone's going to accidentally type or rather, intentionally type those characters visually. So let's use them for comment symbols as well. If you really want to type them, there is a way around that. 

But here is my table title. And here is just kind of a little, maybe, guessing game. Here is a table tag with a tr child. And here's the closed child. 

And then there's a bunch of td tags. So I'll give you tr stands for table row. td stands for table data, AKA cell, to borrow language from, like, spreadsheet software. Does anyone want to guess what this file is going to look like if I open table.html in my browser? What is this reminiscent of? Yeah? 

AUDIENCE: Num pad. 

DAVID MALAN: Yeah. So it's like a numeric keypad from a phone, for instance, if you're dialing someone's number manually. So let me actually go to my other tab. Go back to my directory index. 

There's table.html. And it's not going to look very pretty. But it is structured in the way I might expect. 

And in my browser, I'm going to go ahead and just zoom in. Command plus or Control plus will generally do this. It does look like it's laid out tabular in rows and columns with everything very nicely aligned. 

So that might be useful as we get to larger and larger data sets. Let me go back to VS Code here. Let me create one more program, for instance. 

And how about code image.html? And just to save time, let me paste that code. And also, let me secretly copy over a file that I brought with me that we've seen in the past. 

Let me close my terminal. I'm going to delete everything about tables from this file because I'm just saving time by copying and pasting. But I'm going to rename the top to image. I'm going to get rid of the comment because it's no longer applicable. 

But in the body of this page, I'm going to link to maybe an image of the Weeks bridge by the river. So I'm going to use an image tag, img for short. 

And now, huh, it's obviously not going to be sufficient to just say image tag. Because what image? So here is where attributes, again, get useful. 

This attribute earlier, though I didn't quite highlight it, seems to indicate that this page is largely in English, as have been my past ones, the Latin one aside. That attribute on the HTML tag is useful for browsers that have Google Translate or something similar built in. And also, it's useful for SEO, search engine optimization. Because when Google and Bing sort of automatically crawl my web page in the future, they'll know what language I intend for most of the content to be in, which might help them index it and keep track of it for search results. 

So here, for the image tag, I'm similarly going to need an attribute. And that attribute is called source, src for short. And what you put inside of its quotes for its value, double quotes or single quotes, is the name of the image that you want to include. And I include it in advance in my code space, a file called bridge dot ping from Week 4 when we played around with images. 

And if I go ahead now and go to my other tab, go back to my directory index and zoom out, you'll see now not only bridge.png, portable network graphic, which I manually copied in, but also image.html, which I just created. And voila, here is that same Weeks bridge. It's a little too big for my browser window. We'll see in a little bit how we can fix things like that. But indeed, that's an image that's now embedded into the page. 

But notice, if this image were ever broken, or if I had visual difficulties such that I might have screen reader software for accessibility installed, it's generally good practice to also make sure that pages are accessible as possible. And so some tags have additional attributes you can include. Like, for an image here, there's actually an Alt attribute that specifies alternative text to describe this image. 

This is what a human would see if they have a very slow interconnect connection. And before the image downloads, they can see this alternative text. Or if I'm blind, I need a screen reader, I could have these words recited to me verbally by providing this clue. So it's best practice to include this, like, photo of bridge so that all users can know what they're looking at, clicking on, or the like, so keeping that in mind, too. 

All right. Let's do one other piece of multimedia. Let me close these two tabs. Let me open my terminal and open up a file called video.html. Let me go ahead and copy, secretly, a file called video.mp4, which is a common video file format, and close my terminal window and go ahead and paste in here some HTML from before. 

But let's now embed a video file, as you might if making a video-based website. Let me rename this one, too, to video. Let me get rid of the old comment, which is not applicable. 

And it turns out videos are almost as simple. There is a video tag. There is a bunch of different attributes we can put on that. But I'll come back to that in a moment. 

But videos, because you might want to have high resolution, low resolution, depending on people's bandwidth, because these things can be big, they actually have source children. And confusingly, it's actually S-O-U-R-C-E, not S-R-C, in this case. And even more annoyingly, it takes an attribute called source.src. 

This is not good design. But this is what we're stuck with, video.mp4. And then the type of this video, which you could generally look up if the browser doesn't recognize it, video/mp4. This is what's known as a content type or mime type. 

And then, I can actually configure this. And you would only know this by taking a class, reading a book, looking at an online reference. I can actually add some video controls to the website, like a Play button, a Pause button, and all of that. I can mute the video by default. And so this is just going to modify the behavior of this video tag. But this is anomalous. For some attributes, it just doesn't make sense to have values. Because muted, it sort of says all the information we need. We could do, quote, unquote, "true." But humans decided years ago not to bother with that. 

So some attributes do not need value. So you do not need equal signs or quotation marks. And you would only know this from, say, documentation. All right. 

Let me go back to my directory listing. Let me go back here to this here. You'll see that there's now not only video.mp4, but also video.html. 

I hope you'll forgive me for this. There's at least no sound. But when I click on this page, it embeds a video here, which I can then click on the controls for. And you see some short video file playing here, albeit without sound. 

All right. None of that, let me go back here to my VS Code. And let's play around now with what the web is really known for, which is hyperlinks. So hypertext markup language, HTML, is all about linking one site to another, one page to another. And nothing we've done thus far is interactive beyond this own video controls. 

So let me go ahead and do this. Let me go into VS Code here. And let me go ahead and create the simplest of files that just allows me to click on a link. 

So let me go ahead and copy this to save time, open up VS code's terminal window. Code a file called link.html. I'll close my terminal. Paste this code. Rename video to link. Get rid of the actual video tag. 

And in the body of this page, let's do something simple like invite people to visit, for instance, Harvard. All right. If I now go to my directory index and reload, we'll now see link.html. 

And of course, this doesn't really do anything useful, because I literally just used English text. All right. Well, what if I do what you're in the habit of doing on social media and various websites, visit harvard.edu. 

Let me go back to the web page, reload. The text changes. But it's clearly not automatically linking. I still can't click on this. 

All right. Well, maybe it needs to be www.harvard.edu. Let me go back, reload. All right. Still not auto linking. 

Let me go over here. And maybe it needs https:// and a slash at the end, like a full URL. Let's go over here, reload. And it's still not working. I can highlight and copy it, but that's not very user friendly. 

So what's going on? Well, all of today's social media sites, when you copy paste a URL, someone at the server side wrote code, be it in Python or JavaScript or anything else, to automatically notice and detect URLs and then wrap them with HTML tags that actually hyperlink them. So what I actually need to do here is this. 

I'm going to introduce an anchor tag, a for short. The hyper reference attribute of which is the URL that I want to send the user to, so href for short. I'm going to close the tag. 

But then, in between the start tag and close tag, I'm now going to put the text that I want the human to see. So it's a lot more verbose. But this is what websites like social media sites are generating automatically for you when they just detect with a pattern that you have typed something that indeed looks like a URL. 

Let me go back to VS Code. Let me go back to this tab here and reload. And now we actually see a working link. And this is going to be super small. You're not going to be able to see this quite well. 

But if you hover over this link, you'll generally see in most browsers a little clue at the bottom as to where you're going to be directed before you click there. This can help if you're a little suspicious and might not want to click on the actual link. It's small on my screen, but hopefully more visible on yours. That's not generally the case on mobile in quite, though, the same way. 

But notice that this very simple primitive of anchor tags like this can pretty quickly be abused, unfortunately. In fact, let me go ahead here and go back to VS Code. And I could do something malicious like this, like, actually trick someone into applying to Harvard instead of Yale by just changing the href to not match the text that the human is seeing. And if I reload the page here, you'll see that it looks like I'm going to Yale. But notice, super small, bottom left-hand of my screen, it still says the real URL. 

But you can get even more malicious. You can not just say Yale. You could literally say https://www.yale.edu/. You can make it look like a real URL, reload it. And now it's really quite malicious. 

And this is representative of what you all probably know already as phishing attacks, P-H-I-S-H-I-N-G, whereby you're being socially engineered. People are trying to dupe you into clicking something that leads you to your PayPal account, typically, so that you log into some bogus website. Now you've given them access to your account and you're out some money. It's this simple because of, unfortunately, these building blocks of HTML. 

All right. With that said, any questions on this? No? All right. How about just for one final flourish before snacks will be served, let me propose to introduce some final features herein. 

It turns out, and I'll open some of these premade already. Let me open up VS Code and open up a file called meta0.html. This has nothing to do with Meta, the social media company. It has to do with metadata, or specifically, meta tag. 

It turns out that in the head of the web pages that we've written thus far, we've only had titles. But it turns out there's actually literally a tag called meta that has a couple of attributes like name and content. And this one here, it's a little arcane, but it's very common to copy paste these into the source code for websites nowadays because essentially, this makes them mobile friendly. 

Instead of making the font some default small size, it will take into account the width of the phone or the tablet and sort of scale the font proportionally. So there's some useful accessibility and user friendly tips like this. There's other use cases for meta tags like this. 

Let me open a file called meta1.html that I made in advance. Here are three meta tags inside of this file. They're using a property attribute with a content attribute as well. And this is a little more specific. 

But nowadays, too, on social media, when you copy and paste a URL into a message online and hit Enter, you very often see a preview of that link. It's sort of automatically generated. It makes a nice pretty image and some nice fonts. 

Where does that image come from? Where does that information come from? From these meta tags, any web page can have meta tags like this so that when this page's URL is copy pasted into social media sites or others, those sites know what preview to show to humans. It comes literally from the values of these tags. 

So for instance, this would create some user friendly preview that says CS50, Introduction to The Intellectual Enterprises of Computer Science and The Art of Programming. And in this case, it would show a picture of a cat as the default image for that particular page. You have full control as a web developer over those kinds of things. 

Lastly, when it comes to features of HTML, let's go ahead and quickly reimplement Google, if we may. So let me go ahead and create a new file here called search.html. Let me copy paste some code to save time. 

Let me go ahead and get rid of all of these meta tags to make a different point with this one. Let me get rid of that comment. Change this title to be, say, search instead. 

And inside of the body here, let's do this. I'm going to introduce a form tag. And now in the form tag, I'm going to create an input, a text input. And let's go ahead and let's just say that. 

And now I'm going to have a button that has, let's say, button, that has a value of search, so super simple and not yet complete. But let me go to my directory index and back. Let me open up search.html. 

And I actually have the beginnings of a search form, an interactive form for the web. But it doesn't actually do anything useful yet. But let me do this. Let me go to the actual google.com. Let me search for something like cats, C-A-T-S. 

And of course, we're going to see a whole bunch of cats here. And we're going to see that the search box was automatically populated at the very top of the page. Now the URL that Google led me to, even though I started at the very simple google.com, is actually pretty long. And I'm going to frankly just delete anything I don't understand. Because I'm going to distill this URL to just this one here. 

It turns out that in URLs you can also put user input in the form of key value pairs. So in any URL, you can actually have not only a path like we saw earlier, you can have a path with a key and a value prefixed with a single question mark. And in fact, if you want to have two keys and values, you just interpose them with an ampersand instead. 

So this is to say there is a standard way in HTML and really HTTP for sending input from a browser to a server. And it's generally formatted like this. What this means is actually this. Let me zoom out, close that tab, and open a brand new one. And let me manually go to-- and I'll zoom in-- https://wwww.google.com/search?q=dogs. 

Now it has to be q, because that's what Larry and Sergey of Google fame decided two decades ago when they made Google itself. Q stands for query. But they could have called that key anything they want. 

I'm going to hit Enter after zooming out. And what you'll see is that I don't need google.com to search for me. I can literally go to a URL of all of the dog search results manually. 

Now no one's normally going to do that. That makes no sense. But it does suggest how simple the mechanics of the web are. If you want to pass input to a server, you suffix the URL with a question mark, key equals value. Key equals value may be separated you buy these ampersands, as I proposed. 

So what does this mean? Well, Google really did the hard part, the back end, the database. They crawled the internet and found all of these cats and dogs. 

But I can make the front end, that is the user interface that still works for it. And I'm going to do this. I'm going to add an attribute to my form tag that specifies an action attribute of https://www.google.com/search. And I'm going to specify that the method I want the browser to use is indeed get. 

This is inconsistent. I capitalized it as all caps before. In HTML, you actually do it as lowercase. But that's also the default. So strictly speaking, I don't even need to specify that. But I will, just to be pedantic. 

Inside of my input, my text box, which used to look like this, just a big white rectangle, I'm going to actually give it a name of q, because I know that's what Google servers expect. And I'm also going to specify-- eh, just that for now. 

Let me go back now and reload. And it's going to still look very simple. But notice this. 

If I type in cats and click Search, in just a moment, I'm going to be whisked away from my own Codespaces URL ending in search.html to, after zooming out and clicking Search, the actual google.com. Which prepopulates the URL with q equals cats up top, prepopulates this text box with the user's input, which is to say, like, the front end of google.com is trivial, as is most every website. It's as simple as these key value pairs and things like web forms like that. 

Now I can make this a little prettier. And just so you've seen it, if I specified that the type of this input isn't text, which is the default, but is search, I actually get some nice features. Let me reload this now. 

And if I start typing in, like, dogs, now I get this little x to click, which clears it. So a lot of websites have that. It's a little bit of a nicety. 

If you don't know what you want the user to type in, you can actually be kind of explicit for them. And you can add a placeholder attribute that says query or keywords or whatever you want to show them. If I go back to the browser and reload, you'll see a grayed out text that's not actually there. It goes away if I type in bird, for instance. 

But it's explanatory, placeholder text for the user. You'll notice that it wants to autocomplete cats or bird or dog or anything I've typed before. You can disable that. 

There is an attribute called autocomplete whose value can be either on, which is default, or off, which can be explicitly specified. And notice this, too. When I reload the page, it's actually annoying in terms of user experience. 

Before I can search for anything, I have to move my cursor, I have to click in the text box. And now it has focus, so to speak. It gets highlighted in some color, usually blue. 

That's not the best website. Why are you making the users pick up their mouse or their trackpad just to click on the only thing they're going to do anyway? So there's another attribute that's handy, Auto Focus, which will just move the cursor there for the user. So this is to say, even though a lot of websites don't do this, there's a lot of functionality that you can enable by just knowing the language all the more. 

So with that, we now have a pretty useful feature. In fact, heck, I can say this is Google Search, change the value of that button, reload. And now I'll go ahead and type in birds, Enter, and voila. Now we have a whole bunch of birds as well. 

So that's a lot. I think it's definitely time for a snack. So let's take a 10-minute break for a snack. And when we come back, we'll make all of this look prettier. 

All right. So we are back. And it was brought to my attention during break that we were pretty darn close to clearing one of these rows. And I will concede that your classmates, Darwin and Jude, socially engineered me into saying one of the remaining squares that they needed. And so I'm sad to say that bingo was declared during break, which Carter has already confirmed, because I was tricked into giving a long answer to a short question. 

So congratulations to those two. I do dare say, too, that whole bit with safetyschool.org probably isn't going over well in New Haven. So I'm pretty sure we can check off this box here. 

However, as promised, in fairness, since we love them both equally, I thought it only fair to resume now with a look at perhaps one of the best Harvard-Yale pranks that was actually on us, with this 2.5-minute glimpse at how our classmates at Yale pranked Harvard some years back. If we could dim the lights now for this. 

[VIDEO PLAYBACK] 

[MUSIC PLAYING] 

[CHEERING] 

[BAND MUSIC PLAYING] 

- All the way at the top and then you pass it down. 

[CROWD NOISE] 

- [INAUDIBLE] this for you, Yale. We love you, Yale. 

- We're here to cheer for Harvard. 

- Yeah! Go Harvard! 

- Go Harvard! 

- [INAUDIBLE] one and pass it down? 

- Pass them down. 

- Great. 

- It says go Harvard. 

- We're nice. 

- You see that [BLEEP]? 

- Look at them. They have the paper! 

- It's going to happen. 

- It's actually gonna happen! 

- I can't [BLEEP] believe this! 

- What do you think of Yale? 

- They don't think good. 

[LAUGHTER] 

- It may be a complete mess. I don't know. 

- Dude, does everyone have it? Does everyone have their stuff? Does everyone have their stuff? 

- The probability that it's going to be legible it's very small, though. 

- I agree. 

- It's too complicated. 

- [INAUDIBLE]. 

- I know. But it's too complicated. 

- What houses are you guys in? 

- [INAUDIBLE]. 

- That's not a real house. 

- How many extra are there? 

- Ho-fo. 

- Yeah. 

- You guys aren't from Harvard, are you? 

- Fo-ho. 

- Pforzheimer. 

- Yeah, but you said ho-fi. 

- Just make sure everyone has it. 

- Well, she's probably drunk. 

- It looks like they're still passing. Are all the cards distributed? 

- [INAUDIBLE]. 

- All right. Let's do it now. 

[CHEERING] 

- Hold up your signs! 

- [BLEEP]. 

[CHANTING] 

- You suck. You suck. You suck. You suck. You suck. You [BLEEP]. 

- Did it. 

- [BLEEP]. 

- You suck. You suck. You suck. You suck. You suck. You suck. 

- What do you think of Yale, sir? 

- [INAUDIBLE]. 

- One more time! One more time! 

- Oh, and there it goes again! 

[CHANTING] 

- Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! Harvard sucks! 

[END PLAYBACK] 

DAVID MALAN: So fair is fair there. So now back to some HTML. And we will transition momentarily then to this other language, CSS, by which we can style things all the more. 

So there's this feature in HTML that's actually present in Python, even though we didn't use it yet, and that's present in JavaScript and, really, most modern languages known as regular expressions. Which is otherwise known as regexes, which is a way of using patterns to validate input or to extract information from strings. And so by that I mean this. 

Let me go over to VS Code here. Let me go ahead and create a new file called register.html. I'm going to copy paste some code from earlier, just to save some keystrokes. And in here, I'm going to go ahead and change my title to register. And in my code, I'm going to go ahead and create a very simple form representative of a registration form now. 

So in this body, I'm going to do a form tag. I'm not going to bother sending it to Google or to any server in particular here. I'm going to give it an input tag with autocomplete equals off, as before. I'm going to have auto focus on as before. 

I'm going to give this form field the name of email this time instead of q. I'm going to give it a placeholder of quote, unquote "email," just so that the user knows what they're supposed to type. And it turns out that browsers have not only type text or search, but also type email, whereby you can rely on the browser to ensure that the human has actually typed in an email address. 

Now I'm going to go ahead and have a button that this time will be called Register. And now let's go over to my other tab, reload my directory index. There's register.html. And we'll see a relatively simple form field now. 

But it's prompting me to register with some email address. If I go ahead and sort of type in just my name and try Register, you'll notice that the browser sort of yells at me with the built-in error message saying, oh, please include an at in the email address. And it's pretty good in that if I do mail an at, but nothing more, which is also not valid, and try to register, it's telling me still that it's incomplete. 

So built into browsers is some defense against incorrect user input in this way. If I finally do type in malan@harvard.edu and click Register, then the form would be submitted successfully to the server. If, though, I want to tolerate only .edu addresses because I'm making an education-themed website for students in the US, I can actually add another attribute here, which is actually quite useful, too. 

I can add a pattern attribute. And inside of its value, I can put one of these things called a regular expression, or a regex. That is an actual pattern that the browser should match the user's input against and make sure it indeed matches. 

And this is going to look a little cryptic. But I'm going to go ahead and do this. .+@.+ backslash dot edu. Now this looks a little weird. But it turns out I'm using certain building blocks that we'll just scratch the surface of today. But it's an incredibly useful and powerful feature in programming languages more generally. Because in the world of regular expressions, there are certain patterns that mean something. 

And here's a really good URL of some documentation they're for in the world of the web and JavaScript, specifically. And here's kind of a short cheat sheet, some excerpts thereof. It turns out in the world of regular expressions or patterns, a dot represents any single character except line terminators, like backslash n. 

A star or an asterisk represents 0 or more times. A plus means one or more times. A question mark means 0 or one time. A number inside of curly braces means n times, or n occurrences. And then two numbers in curly braces, n comma n, means at least n times, but at most, m times or occurrences. 

And then there's a few other, actually. So what does that mean? Well, let me go over to VS Code again. And let me zoom in on the pattern I used. 

And it would seem that, in this case, a dot represents any character. Plus means one or more. So one or more characters to the left of an sign, then literally the at sign. Then another dot plus means one or more characters to the right of the at sign. 

But the whole thing has to end in .edu. But there's this additional backslash before the last dot. And why might that be, intuitively? Even though I've not said? Because I want a literal dot, a literal period, not any one character there. 

So I escape the period to make it have not special significance per this cheat sheet, but rather a literal period. So what this means is if I go actually back to VS Code here and I try to claim to work at like malan@harvard.com and click Register, that's a valid-looking email address. But when I click Register now. 

Whoops! Sorry. It went through because I did not reload the page after making the change. So I screwed up. 

Let me go back to the register.html URL. Let me reload the page and type in malan@harvard.com, for instance, and even-- sorry. Let me type in malan@harvard.com. And even though it's a valid-looking URL, it does not in fact and in .edu. So the browser can defend against that in this way. 

But the more important takeaway for now is that as useful as this is, as user friendly as this, this is not generally the best technique for validating user input and protecting against invalid user input. Why? Browsers can't be trusted. Or more generally, clients can't be trusted. 

Why? Because the way HTML works as we've seen it thus far is that everything is happening on my own Mac, or your own PC, or your own phone locally. Per the envelope story we told earlier, your browser is downloading the HTML, reading it top to bottom, left to right, and then displaying it on your computer. 

But we've already seen that my computer, for instance, has built into it these developer tools. And there among the tabs here, are not just that network tab, let me actually go to the Elements tab, which we haven't seen previously. In the Elements tab, you actually will see a pretty, printed version of the same HTML. 

But what that means is that you can not only see the HTML, you can actually change it. Now you're not going to be able to change it on the server. But I can absolutely change my own copy thereof. 

So suppose I'm now a hacker in the story. And I really want to register for this website, but it's apparently restricted to people with .edu addresses. I don't have a .edu address, let me propose. 

So that's fine. Let me actually go into the developer tools. Let me just double click on the attribute there, highlight it, and boom. Now gone is that pattern entirely. 

The web browser now will let me register with malan@harvard.com because the developer tools give you full-fledged access to the underlying HTML. So if I've changed the HTML, the defense is no longer in place. Now what's the takeaway then is client-side validation is wonderfully user friendly. 

But it's not secure. It's not safe. So next week, we'll spend more time server-side at making sure that even if someone messes with my HTML or my website, they still can't actually get through and do anything bad on the server. And this is true in general, too. 

Let me actually, just for fun, go to, maybe, let's say, harvard.edu. Let me open up my development tools. And let's see where I might go here. 

Suppose that I want to hack into harvard.edu. Well, notice that I'm on my elements tab and there's a lot of HTML that composes this page. And notice that these triangles indicate that most of it's been collapsed. But if I expanded them, I could see more and more of the tags and attributes. 

But suppose I'm now a hacker. And I want to maybe delete this menu. Notice that you can also right click or Control click on any element in a web page, typically. With these developer tools, click Inspect or some similarly named menu option, and you can actually been whisked away to the actual HTML tags that implement that feature of the web page. 

One, it's wonderfully useful for learning how things work, teaching yourself new tricks, and even fixing problems. Here, though, I'm going to try to use it maliciously. And I'm going to highlight this tag here, div tag, as it's called. I'm going to delete it. 

And watch what happens at top right. Gone is the menu. Now, of course, if you go to harvard.edu right now, the menu is still there. If I reload harvard.edu the menu is back. 

So it's only my own local copy. But this does speak to how you should not trust anything happening client side. Because someone can be mutating that same code. 

Now it turns out there's other patterns that you can use in regular expressions. For instance, these are what are called character classes. You can, for instance, specify in square brackets some number of digits or characters that you want to match against. 

This is a range of characters, 0 through 9. So it's effectively the same thing as that, but easier to type. There are certain shortcuts, backslash lowercase d means any decimal digit. Backslash capital d means anything that's not a decimal digit. And dot dot dot, there's bunches of other patterns. 

You might use these to maybe validate a phone number in a web page, if you want it to be formatted in a certain way, for better or for worse. But long story short, regular expressions will be, someday, your friend as you try to solve certain problems with data. 

As an aside, it does escalate quickly. So this is typically the regular expression that browsers nowadays use to validate email addresses. It is way more complicated than . +@.+. 

Why? Because you can't have @@@.edu. There's certain characters you don't want to allow. There are certain characters you do want to allow. So long story short, this is a much larger regular expression that is more correct when it comes to valid email addresses. 

All right. So with that said, there's one tool with which you should be familiar. And that is at this URL here, validator.w3.org. And this is a free web service from the World Wide Web Consortium, which is the group that essentially standardizes this HTML language. 

And if you go to their web page, there's a few different ways to validate your own code. Essentially, check it for correctness by typing in its URL, otherwise known more generally as a URI, by uploading a file or by direct input. So just for kicks, for instance, I'm going to go into VS Code and grab my HTML that I just made. 

I'm going to go back to validator.w3.org and paste it into the direct input box and click Check. And it's just a nice handy website that, if I scroll down, in green, you will hopefully see this, no errors or warnings to show. So it's a handy feature just to make sure that at least syntactically your code is correct, even if it's not behaving the way that you might want. 

All right. With that said, the second of today's three languages, and we'll just scratch the surface ultimately of JavaScript to give you a sense of its capabilities, but CSS is something that's worth understanding some of the basic building blocks thereof. So let me propose that there are some additional terms to know. 

In the world of CSS, we're, again, going to have key value pairs. In this world, they're called properties instead of attributes. Why? It was invented by different people, but it's the same kinds of ideas. 

In the world of CSS, you're going to have ways of specifying different selectors, as they're called. That is to say we're going to be able to specify the font size, the color, the margins and a lot of aesthetics when it relates to tags in our web page. And there's going to be different ways to select those tags, as we'll soon see. 

In an HTML page like this, this is our super simple one with which we began, it turns out that you can also include a style tag in the head of the page that has some of your stylistic decisions, font sizes, colors, margins, and all of those kinds of aesthetics. We'll also see another approach whereby you can relegate all of that stuff to a separate file, like styles.css, or something .css. And you can link to it in the head of the page. 

Link here does not mean A, like, ideally our anchor tag before would have been called a link. But it's not. This just means that these two files are linked in some way conceptually. All right. 

So that is to say we can use these kinds of tags now to enhance our own code. So let me propose that we do this. Let me go into VS Code here. Let me go ahead and create a very, very simple home page for someone like John Harvard by running code of-- how about home.html? 

And in home.html, I'm going to copy paste some of my starter HTML from before. And now in the body of this page, I'm going to do a few things. I'm going to have a web page with a paragraph up here that just says John Harvard as the title thereof. Another paragraph that says something simple like welcome to my home page exclamation point. And then, like, a footer at the bottom and a third paragraph that's a copyright, say, John Harvard, for instance. 

So super simple, but representative of a header, a main part of the page, and a footer thereof. If I go into my other tab and reload my directory listing, I will see now home.html. And it's going to be pretty bare bones, right? 

It's the same text, same font size. It is three separate paragraphs. But let me start to stylize this a little bit differently. Let me make the top bigger and bolder, perhaps, or rather, the top bigger and centered and make this text shrink thereafter. 

So I'm going to go ahead and do this. It turns out that you can have not necessarily a style tag, but even more simply, a style attribute on certain tags, like this. I'm going to add a style attribute that has a font size of maybe large. And how about a style attribute here, a font size medium. And then maybe down here-- oops, close quotes. 

And then down here-- whoops. Thank you. OK. I owe you some cookies. All right, so style here of font size small, so relatively simple ideas. 

And here is just another stupid syntax for key value pairs. Again, left hand is not talking to right hand. In CSS, cascading style sheets, which is the language we're now talking about, it's key colon value. In HTML, it's key equals quote unquote value. It's just different techniques for the exact same dictionary-like idea. 

All right. If I go back to my other tab and reload, notice that it's a little subtle, but it is large, medium, and small. I didn't center things yet, so let me do that. It turns out that this thing collectively is what's called a property. And a property is defined by a key value pair. 

If you want to have multiple properties for key value pairs, in CSS, you separate them with semicolons. So those are back. And if I want to center the text, I can do text-align: center. 

I could now end my thought with the semicolon. It's not strictly necessary. But I'll keep it just so that I'm consistent. But it's only necessary if you have more than one. 

I'm going to go ahead and center everything, though. So I'm going to go down here and add a semicolon after medium, down here and add a semicolon after small. So I align, text-align center, center, center for all three paragraphs. If I go back to this other tab and I reload, voila. Now it is, in fact, centered. 

But here's where we can start to have a conversation about, maybe, design. So I claim this is correct. But is this perhaps the best design? Well, maybe not. 

I mean, these aren't really paragraphs, first of all, semantically. It's not even complete sentences. But there are three different divisions of the page, right, like, the header up there, the main part in the middle, and then the footer. So it turns out, and we saw a glimpse of this in Harvard's source code, there's another tag instead of p for paragraph called div, for division. 

And even though this is actually not going to have much of a functional effect at first, it's maybe semantically a bit better. Because, again, these aren't really paragraphs. So if I really want to nitpick, I do have three divisions of the page. So div is a very common way to give yourself just a rectangular region of the page to style as you see fit. 

If I go back now and reload, notice that it does tighten things up. The paragraph tag gave me some vertical whitespace for free. So I've lost that. But I could add it back if I really wanted to. 

But now, let's come to this question of design. What's redundant about what I've done thus far, even if you've never seen CSS before? Yeah? 

AUDIENCE: [INAUDIBLE]. 

DAVID MALAN: Yeah. I mean, I had to center all three divs, which is just sort of stupid, it would seem. Copy paste has generally not been necessary. Even though I'm doing it to save time today in general, when the results are copied and pasted, ultimately, this has not been good practice in any of our languages. 

So it turns out I can do this. Let me actually delete this one. And I can keep or get rid of the semicolon, but I'll get rid of it for parity with our first version. I'm going to get rid of this one, too. 

And you know what? Here's the C in CSS cascading. It's more like a waterfall effect. 

And if I go up to a parent tag here, like, the body is the parent of all three divs, I could put the style attribute here and say text-align: center there. And that has the effect of cascading down onto all three of the children that are nested inside of it. 

So now it's sort of better designed because I've only said text-align: center once. If I go back to the web page and reload, it has no functional impact visually. But it's better design. Because if I want to align it left, or right, or center, I can change it in one place and not three independent places. 

All right. What else might I change after this here? Well, it turns out that I could do something a little clearer as well. This copyright symbol? I mean, it's just sort of homemade with two parentheses and a C. 

It turns out that there are ways to get special symbols in HTML. And you can use what are called HTML entities. You would only know these by looking them up or memorizing the numbers. 

But it turns out that number 169 is the special HTML entity for an actual copyright symbol. So let me zoom in here and then reload. And you'll see that the parenthetical C actually becomes the proper mark for copyright, so marginally useful. Or you could copy paste it from some other website, for instance, if you didn't know how to type it on your own keyboard. So that's an HTML entity, another feature with which to be familiar. 

But having three divs on a page isn't necessarily ideal nowadays, especially for search engine optimization, SEO, for screen readers for accessibility. Because at a glance, I don't really know which of these divs is the most important. Arguably the footer is generally for the human reader, like, the least information-bearing piece of content. 

So why don't I try to signal as much to the browser, to the screen reader, to the search engine? So it turns out there are what are called semantic tags nowadays. Indeed, we're up to version 5 of HTML. 

And one of the relatively newer features is, instead of using generic divs, you can actually use actual names of tags, like header and main and even footer. And here, too, the visual effect is not going to be any different if I go here and reload. But there's more semantic information underneath the hood. So that, again, all of those different types of services, the browser, the screen reader, and the like just know a little more about the page. And maybe a screen reader now would focus on the main part of the page before reciting all of the fine print in the footer, for instance, to the human. 

All right. Well, what else could we do here? Well, it would be nice at some point to be able to reuse these styles. And if I find myself making not one page but two pages or 10 pages or 100 pages, it's kind of annoying to have to type out all of the same styles. So wouldn't it be nice to start to factor this stuff out? 

Well, I can do that, too. Let me actually go ahead and do this. Let me get rid of this attribute and this attribute and this attribute. And honestly, too, as I do this, I would argue that the code looks just a little cleaner now. It's more obvious what is a tag and what the actual data of the page is, metadata and data, if you will. 

But I've lost all of my styling. But wouldn't it be nice to preserve some of the styling by doing what I proposed earlier, which is using not a style attribute, but a style tag. And indeed, you can put a style tag in the head of your web page where you can put all of those same properties. And you need a little more syntax, a few more keystrokes. 

But I can say this. If I want to center the entire body of my page, I can actually do so by specifying text-align: center;. Here the semi-colons are going to be generally necessary, especially have you multiple properties. Next I'm going to say header. 

And inside of these curly braces, font-size: large, unlike C, where you could get away with no curly braces if there's a single line, you do need them in CSS. In the main tag, let's go ahead and style with font-size: medium. 

And then in the footer tag, let's go ahead and style with font-size: small. Now this looks a little worse. Because it just kind of blew up and it's a lot longer. 

But it is a step toward factoring this out. And honestly, when it comes to web pages, I'm not the best artist in the world. I can make the data display. 

But friends of mine are certainly better at making things really pretty and pixel perfect, so to speak. So it's kind of nice if I can isolate all of the style to one part of my file and all of the content to another. Because maybe I could now collaborate with someone else. 

So if I go back to now the other tab and reload, functionally, no different still. It still looks exactly the same. But I'm starting to make it a little better designed. And in fact, there's another way to do this. 

Suppose that I find myself in the habit of very often centering text on a page. And honestly, it's just a little annoying to have to type this out for every tag that I want centered. Well, I could create what are called classes as well in CSS. It turns out you can make up your own words-- but I'm going to choose some reasonably named ones-- by prefixing them with a dot or a period. 

And if I want to call this set of properties, even though there's just one, centered, I can literally write .centered there instead. I can write this .large. I can call this .medium. I can call this .small. 

And what this means now is I have reusable sets of properties, kind of like containers whereby anywhere I use the word "centered," it's going to get that one text-align: center property applied. Anywhere I use quote, unquote "large," it's going to be made large. 

And so if I scroll down now here, I do need to reintroduce another attribute-- but it's a very common one in the world of HTML now-- that of class. So class equals large. Down here I'm going to do class equals medium. Down here I'm going to do class equals small. 

And it's getting a little more verbose, but I'm not polluting all of my HTML with the actual styles. I'm just kind of having this layer of indirection and of abstraction, if you will, on top of those very specific properties. And then for the body, I can do the same idea. Class equals centered. 

And if I go back to my web page here and reload, still looks exactly the same. But I've kind of centralized where I can do things. And frankly, I could do something like this, color: red;. I can package up multiple properties, go back to the page here, and reload. 

And now that has applied to everything. So I have a reusable set of properties. Even though centered is maybe not the best name now, because it also makes things red. 

But I can come up with reusable sets of properties. And honestly, one final flourish here would be let's not assume that my buddy, whether it's my project partner or a colleague in the real world, it's kind of stupid to try to edit the same file. Because invariably we're going to break things on each other. 

So I could actually do this. Let me take all of this. And I'll get rid of the red. Let me go ahead and highlight everything I just did and cut it to my clipboard. 

I'm going to get rid of the style tag altogether. But I am going to go into VS Code and create-- how about a file called home.css, just so I know what's what. And in this file, I'm just going to literally paste everything I just made. 

But I'm going to go back to my home page here. And I'm going to add that other tag I proposed earlier, link href="home.css", and I need one weird attribute, too. The relationship of this link is that of quote, unquote "style sheets." And that's just the way it is according to the tag. 

And now one last time, if I reload this page, the red is going to go away. Because I deleted that. But the font sizes and centering are still there. But what I've done was introduce some basic building blocks in this language I claim is called CSS that's going to allow me to now centralize all of the styling, the aesthetics now of my web page. 

All right. Let me pause here and see if there are any questions on these techniques thus far. It's just more key value pairs. Questions on this? No? 

All right. So here's where things can get prettier quickly. Let me go ahead now and close these two tabs. Let me go into a file we created earlier called link.html, which you'll recall looked a little something like this. 

And now we can make this web page behave a little more like the real world. Let me undo the phishing attack and just literally say Harvard down here. But let me go ahead and start to style the anchor tag as follows. 

Previously, this page looked a little boring like this. The link was blue originally. But because I visited harvard.edu, by default, the browser changes to purple. Which is fine, but maybe you don't want that. Maybe we want something that's a little more crimson, for instance. 

So let me do this. Let me go into the head of this link.html page. Let me add a style tag herein. And in there, let me style the anchor tag as follows. Inside of this anchor tag, I'm going to do color: red. And let's go ahead and leave it as such for now. 

Let me go back to the link page and reload. And it's going to be a little subtle. But right now it's purple. And now it's definitely red. So I've modified that. 

Now underlining links is good for accessibility. But a lot of websites choose to not underline them and instead underline them when you hover over them. So that is an effect we can achieve, even though it might not be ideal. 

But let's at least demonstrate how websites are doing that. I can specify that this link should have text decoration of none. Now I would only know that by having taken a class, read a book, looked at an online reference. The default is underline. But I can override that by saying none. 

So if I now go back to my page, reload, it's still going to be red. But it's now not going to be underlined. But notice if I hover over it, it changes to a little pointer finger if I zoom in here. But it's clearly not underlining, so that's OK. 

Because there's another way of selecting tags here. I can say a:hover. And then inside of this CSS, I can say text-decoration: underline when the anchor tag is being hovered over with the cursor. If I go back to my tab here and reload, still looks the same. 

But watch as my mouse gets close. It now underlines, as a lot of websites do. So it's a relatively simple idea. It's not as compelling on mobile, especially, because it doesn't do anything if you hover your finger over the glass of your phone. But it does work on laptops and desktops in this way, even though it's perhaps a little passé now to do this kind of technique. 

But there's other ways to select tags on a page. And in fact, let me go back to this one here. And in this page, let me propose that you can go in one of two places. Visit Harvard or a href = https://www.yale.edu/ and then Yale's website. 

So it's getting a little long. So I'm going to hit Enter. Because the browser won't care that there's some whitespace. But at least, now I have two links on the page. 

If I reload this, you'll see that both of them are red or crimson, which isn't quite right. But that's OK. I can actually distinguish these two somehow. One way to do this would actually be to add one more HTML attribute that we haven't needed or used before, that of ID. I can use almost any name for this ID that I want. 

And I'm going to say, quote, unquote, "Harvard" is the unique ID of this link. And the unique ID of this link is quote, unquote, "Yale," for instance. And what I can now do up here is I'm going to get rid of this color red. Because I don't want all anchor tags to be red, but I do want Harvard tags to be red. 

So I can say #harvard and then color: red;, and then I can do #Yale and I can say color: blue;, for instance. The hash symbol here represents an ID. The dot we saw earlier represents a class. And when you don't have a symbol before it, it represents literally the name of the tag. 

So when I mentioned these various selectors earlier, type selector is just the name of the tag. Class selector is the dot. ID selector is the hash. And there's also ways to select attributes specifically. 

So if I go back here in VS Code now, I've added a bunch of CSS here, properties. But if I reload now, one of these should be red and the other is in fact blue. So in short, just by way of these style attributes and these style tags, we have a lot more control over how we can actually stylize our pages. 

And here's now where this gets interesting. And you asked about Bootstrap, a popular framework or library. There do, indeed, in the real world exist a lot of third party frameworks that a lot of smart people have just figured out what would make our web pages look prettier. 

And they've come up with design patterns for us that make it way easier and way faster to make pretty looking forms, pretty looking tables, and the like. And one of these products is indeed called Bootstrap. It's freely available. And you can see its own documentation at getbootstrap.com. 

And what I've done in advance is I've actually prepared some of our past data to actually be formatted a little more prettily. So let me actually go back to VS Code here. And I'm going to open up a terminal. 

And I'm going to cheat and copy a file I brought with me called phonebook0.html. And if I open this file, you'll see that it looks like this. It's a big table that has two columns now called name and number. And I've added some other tags which are not that interesting, but I didn't need them before. 

But in this table, there's a table head and there's a table body. So there's, like, a special row at the top and then all of the rest of the data in a CSV or a spreadsheet. And you can probably infer from this table row, from this table row, from this table row, it kind of looks like, indeed, a phone book. 

So if I go back to my browser here, go into my directory listing and open up phonebook0.html, it's not the prettiest thing. But it is tabular. And notice that the browser has automatically put in bold the name, and the number, and everything's in columns. But it's not very pretty. 

But what if I do this? Let me actually go into VS Code here. And let me borrow another file I came with called phonebook1.html. And that file is going to look a little bit different than the [INAUDIBLE] in that I've included a link tag in the header. 

Now I'm not linking to my own CSS. I actually went to getbootstrap.com. I read some of their documentation. And I'm linking now to Bootstrap's CSS file, which is actually really, really big. 

And in fact, if I open this file here, let me actually open this up in a tab, and visit this URL here, the folks at Bootstrap have written a crazy amount of properties by defining their own classes and other such keywords. And you and I and really anyone on the internet is welcome to use all of this CSS. And the documentation makes clear what all of this does. A normal person would not need to read through any of this in that way. 

But I've included this file called bootstrap.min.css. And min just means they got rid of most of the whitespace. And if I now go back to my other tab and go back to phonebook1.html, it's the exact same data. 

But thanks to that link tag, it now looks much prettier. And I didn't have to figure out how to move things over to the right. I didn't have to figure out how to draw these gray lines. I didn't have to figure out how to format things in precisely this way. Bootstrap, wonderfully, did most of that for me. 

Now this is still a very static table. It's not interactive. I can't sort by names or columns or the like. 

So let's revisit one other program that we made in advance together. And this one is actually a new version of the search program. So if I open up this program, search2.html, and close my terminal window, you'll see that I've borrowed some of the same content before. 

Let me go to the essence of it. Here is the form and the action that I used earlier. But I've added a whole bunch of classes to it. And this is the essence of these third party frameworks. They generally create a whole bunch of classes that you can use and reuse. 

But they figured out all of the relevant properties. So for instance, for my Google search button, I've given it two classes, a class of button, BTN for short, and button-light. These are not standard HTML or CSS things. These are Bootstrap names that they invented, just like I invented center and large and medium and small. I've also specified that there are a whole bunch of other classes associated with pretty much every tag in this file. 

So if I zoom out here and go back to my directory index and open this, the first version of search. It was super, super simple because it only contained the HTML form. Let me go ahead and open up search2.html. And the essence of the form is exactly the same. Therein is the query at the bottom of the page. 

But thanks to CSS, I now have a button that looks a little more interesting. It's gray and it's rounded. I also have an I'm feeling lucky button, which will send a different request and show me by default the very first search result. 

So in short, the file that I just opened, even though I made it in advance, it's only 55 lines. And most of that is whitespace. And it did take me a little bit of time to figure out the classes and read the documentation. 

But most of the work is done by this third party framework or library of CSS classes and properties that someone else made for me. And so as CSS goes, that's kind of it for the basics. It's just a bunch of more key value pairs in the form of these properties, whereby you can select elements of a web page by way of their ID, or classes, or even the names thereof. 

And here's something that's kind of neat, too. Let me go to harvard.edu again. Let me go ahead and open up the inspector, as before, and draw your attention to one final feature of these developer tools under the Elements tab. So under the Elements tab here is all of the HTML that composes harvard.edu as of today. 

But let me go ahead and expand this right-hand portion. It turns out you can also see all of the CSS that is being applied to the website as of now. So for instance, if I go to a page here-- let's go to Give Now. Might as well. Let's give them a plug here. 

Under Give Now, let's see if this is going to go well. Let's go ahead and highlight this part. Suppose they really want to draw attention to give online. And I right click on that. I choose inspect, as before. And here now, notice that the developer tools jumped right to the HTML tag that represents that particular line of text. If I zoom in, it turns out it's an H1 tag. It's big and bold. 

Suppose, though, I want to change its color. Well, if I go over on the right here, you can see all of the CSS properties that currently apply to that specific tag. And most of these we haven't even talked about line, height, margin bottom, font, weight, margin top, and a bunch of other fairly self-explanatory things. 

But if I want to experiment, I can go up here in top and say color: red. And I can literally change that on the web page live to see how it looks. It's not changing the server. It's just changing my copy. 

But I can at least make that change. You can do even fancier things where, if you click computed, you can scroll down and figure out, OK, wait a minute. It's white right now. That's the same thing as this, rgb(255, 255, 255). That's the same thing as ffffff from weeks prior. 

But I can click this little arrow and it will even show me where in Harvard CSS that white color comes from. So if it's actually my site I can actually figure things out and make changes as well. So in short, if you find that you like the world of web development, in your own browser that you've had all this time, there's so much darn functionality built in. And it's just up to you now to start experimenting with it, exploring what you can actually do with it. 

But let us use our final moments today to introduce you to a final language called JavaScript, which is itself a proper programming language. And you're about to see a bunch of syntax that's kind of new, but kind of familiar. And the goal here is not to teach you JavaScript per se, but to begin to lay the foundation for you yourselves learning a new language on your own. 

By the end of CS50, you will not have learned all that is out there, certainly. And the goal here ultimately is to help you have a sense with a support structure in place, be it the humans or the [INAUDIBLE] involved that you can ask questions of along the way. Let's go ahead and do this. 

In my directory index, I'm going to go into the source 8 directory where I've got all of today's examples ready to go. I'm going to go into VS Code's Explorer, where I can see all of those files. And in my source 8 directory, let me go ahead and open up hello version 1 dot HTML. Recall that the last time we played with hello.html, it was literally just HTML. 

But here's an example of a language called JavaScript. And at this page, it's going to work as follows. If I open hello 1 dot html in my page, I have a very simple form. Let me zoom in. 

Let me type in my name, for instance, D-A-V-I-D, and hit Enter. And voila! This is not a very good user interface. But you can see that this web page says, quote, unquote, hello, David. 

So how did I get this form to trigger a pop up? Well, if I go into VS Code here, you'll see a web form. But I've added another attribute, namely an onsubmit attribute. And in the world of HTML, onsubmit allows you to write a tiny bit of JavaScript code inside of the quotes that will be executed whenever the user submits this form. 

So what this is saying is call a function called greet and then return false. And what return false means is that don't actually submit this form to the server, like keep the user on this page so we can just see a pop up. So what is this greet function? 

Well, it turns out, in the world of HTML, there's not only a style tag you can put in your head of your page, but also a script tag, inside of which is JavaScript code. The syntax is a little different from Python and from C. But it's maybe a little closer to Python. Instead of def last week or two weeks ago, we'll now use function, literally, to begin the definition of a function. And if I want to call this function greet, so be it. 

JavaScript comes with a function called alert. And so if I literally do alert, hello, quote, unquote, and then plus something else, just like in Python, that's going to concatenate, or join the two things left and right. But here's some functionality that comes with your browser, too. It turns out, per the notion of this whole page being a document, you can call document.queryselector, which allows you to select any of the tags or elements in the page, specifically you can select the tag that has an ID of name. 

So CSS and JavaScript use the same syntax. If you see hash something, that is referring to the ID of a tag that you created. If you then, after selecting the element of HTML with that unique ID, want its value, you just do dot value. So we saw dots a lot in Python and in C to go inside of structures. You can go inside of that text box and get its value. 

So notice here if I scroll down, not only am I using autocomplete and autofocus and so forth, I also, for convenience, gave my input box a unique ID of name. So what's effectively happening is, when I click Submit, my JavaScript's greet function is called, it queries for that text box, goes inside of it and gets its value. And then, using this plus operator, just like in Python, concatenates the two together and passes them to this alert function for an underwhelming, but functional alert in the window. 

All right. How else can we do this? This is generally frowned upon to use onsubmit in this way. Generally speaking, the world does not like mixing attributes, rather JavaScript code with HTML so closely as this. So let me show you another variant of this, even though it's going to look a little bit cryptic. But at least it will be representative of how else you can solve this problem. 

In hello2.html, we have this code. Notice that at the top of my body now is the form. But at the bottom of the body is this script tag. So I've just moved it from head to the body of the page. Because I'm going to then instead do this. If I want to tell the browser to listen for submissions of that form, I can use this fairly cryptic syntax, but you'll see it again and again over time as follows. 

Go into the document. Select with this query the form tag. And then call this special function that comes with the browser called addEventListener. 

So tell the browser to listen for a certain type of event for this form. What event do you want to listen for? The submission of the form, so quote, unquote submit. 

What do you want to have happen whenever that event is heard? You want to call this function here. So this is what's known as an anonymous function. The syntax is a little weird, but I've not given the function a name. It apparently takes an argument as input called event, but that's per the documentation. 

And what these two lines of code do essentially is they still call the alert function. They still output hello comma space. And they still query the HTML for the ID name to get the value that the humans typed in. And then just for good measure, we prevent the default behavior for any form with this line of code, just so that it doesn't actually submit anything to the server. It keeps the user actually here. 

This will be a little scarier, too, but just so you've seen it. In hello3.html, this is actually a more common technique. Whereby you can listen for one other special event. It turns out when you load a web page, lots of stuff has to happen. It's got to be read top to bottom, left to right. It's got to download other files, the images, the sounds, the videos, and so forth. 

If you want to wait until the whole page has been read into memory essentially, you can use this event as well, DOMContentLoaded. That tree we drew earlier is what's called a DOM, document object model, which is just a fancy way of saying a tree in the computer's memory that represents the web page. So this is the syntax that you'll find that people use to tell the browser once the whole DOM, the whole tree has been loaded, then go ahead and execute this code. And it means that no matter what, the whole web page will be ready in order before this code is actually executed. 

And this ensures, for instance, that even though this script is at the top of my file and my form is at the bottom of my file, none of this code will get executed until the whole DOM is ready, all of the HTML has been read top to bottom, left to right. All right. Well, let's go ahead and make this a little more interesting, just to show you some of the capabilities of JavaScript within a browser nowadays. 

So if I open up maybe this one here, background.html. And let me open it up in the browser. And this is going to be super simple in terms of user interface. But here's a big white viewport, big body that's just white in color by default. 

But there's three buttons at top left. And if I click R, it makes the background red. G makes the background green. And B makes the background blue. 

What's interesting about this demo, sort of underwhelming as the user interface is, is it demonstrates that you can modify CSS using JavaScript. And HTML, CSS, and JavaScript are therefore very intertwined in the context of a browser. How? 

Here's the raw HTML. Here are the three buttons. And I've given them three separate IDs red, green, and blue, just so I can refer to the specific button. 

And notice what I've done here. I've declared a variable in JavaScript, which uses slightly different syntax of let as the keyword. Instead of int or char or string, you can use the keyword let, which essentially means let me create this variable called body. And this is just how, using query selector, I can select the body element from the web page. Because I'm going to use it three separate times. 

What do I want to do three separate times? For instance, this. I want to go into the document and select whatever element has the unique ID of red. I want to tell the browser to listen for this event, click. 

So we saw submit before. You can listen for clicks as well. When the click happens on this button, I want this function to be called. 

What does this function do? Something super, super simple-- all it does is it changes the body's styles, background color to be, quote, unquote red instead. So what's going on here? 

We didn't see this earlier. But it turns out in CSS there is actually a CSS property called background-color. And I can see it as follows. Let me reload this page. Open the browser's inspector. Open up elements. 

And if I hover over the body here, notice that there's no background color by default. But if I do in, say, lowercase, background color colon yellow, it immediately changes the background to yellow. Unfortunately, in JavaScript, you can't do background dash color. Why might this be? Yeah? 

AUDIENCE: [INAUDIBLE]. 

DAVID MALAN: It thinks it's minus or subtraction. Right? So I would wager there was a human at some point in the room designing JavaScript where they realized like, damn it. We shouldn't have used a hyphen in CSS. Because it's now going to be misinterpreted as a subtraction operator in JavaScript. 

So the way the JavaScript world solved this was whatever has a hyphen in it as background dash color, you change it in the JavaScript version thereof to be camelcase, so to speak, whereby there's this hump in the middle with it's a capital C, no hyphen, instead of a lowercase C instead. And I do this here, and I do this here so as to essentially listen for a click on any of those three buttons so that the end result is that it changes it from red to green to blue based on what I'm clicking. 

And here's where the developer tools get kind of cool. Notice at bottom right here, notice that as I click on this, the CSS of the page at bottom right is changing to match whatever is happening. So you can really see and understand what's going on underneath that hood there. All right. 

We have time for a few other demonstrations. Back in my day when I learned HTML, there was a bunch of hideous tags still in circulation. Among them was a blink tag, which literally, if you used blink and put words in between its open tag and close tag, you would get text on your screen just kind of doing this. 

Even uglier was what was called the marquee tab, which would actually scroll text across the screen like this. And no self-respecting website tends to have blinking text or scrolling text in this way. Because it's just tends to be ugly. 

However, even though the blink tag is among the few tags that's ever been removed from the language, you can bring it back with a bit of JavaScript. So here, for instance, is an example in blink.html. Here's a super simple page. The only thing in the body is hello, world. 

But there is a script tag up in my head of my page here. And let's see what's inside of this script tag. Well, I've defined on line 8 downward, a function called blink. What does it do? 

Well. I first declare a variable called body. And I get the body element using queryselector. I then ask this question. 

If the body's styles visibility property, which we haven't talked about yet is quote, unquote, hidden, then change the body's styles visibility property to be, quote, unquote, visible. Else, if it's not hidden, that is it's visible, change it to hidden instead. 

And here, too, this is another one of these left-hand, right-hand situations. I do not know why the opposite of visible is not invisible. It is, instead, hidden. So, again, arguably poor design, but this is what we have. 

How is this useful? Well, there turns out. In your browser, there's a JavaScript function called setinterval that's associated not with the document per se, but the window, which is another global variable that you just get automatic access to in the browser that allows you to call a function, any number of milliseconds, again and again and again. So if I want my text to blink every half a second or 500 milliseconds, I just use window.setinterval to call blink every 500 milliseconds. 

And notice, it's very important not to call blink here, as with parentheses, like in C or Python. Because I don't want to call blink at this moment in time. I just want to inform the setinterval function of the name of the blink function. So I just pass in the name blink. 

And if I go back to my directory listing, I open up blink.html, you'll see what I used to see in the late '90s, when HTML 1 was all the rage, like at the beginnings of a ugly websites, including my own personal home page at the time. My own personal home page, too, at the time, which is probably findable somewhere online in the archives, it was back in the days where you wouldn't just show people the content of your page. You had to click a Enter button to enter the web page and just really ridiculous. There's a lot of things in tech that you can do, but should not do. And the world has learned this as have I, the hard way. 

All right. Let's do a couple of final examples that are now representative of what modern websites do and what you and I take for granted on web apps and mobile apps alike. For instance, this feature of autocomplete. Case in point, when I went to google.com before and I started searching for cats or dogs or birds, it was trying to finish my thought and populating a dropdown with a bunch of different suggestions. 

I can actually do that myself in JavaScript as follows. Let me open up a file called large.js, which is a file that I made based on speller's own dictionary. Recall that we gave you a big list of words, like 100,000 plus words. I copied those into this JavaScript file. 

But I formatted them in what's called the JavaScript array. So JavaScript has arrays. They're more like Python lists than they are like C arrays. 

The syntax is square brackets. Let is my keyword to say give me a variable called WORDS, which is all caps because I'm going to use it globally. And here is a 100,000 words from that dictionary in this file. All right? 

Now let me close this file and open up the actual HTML file, autocomplete.html. Let me scroll down to the bottom. And you'll see that in this page in the body are two things. 

One, an input, so a text box so I can start typing words. And then, two, an unordered list that's empty. So there's no actual list items in that unordered list initially, but there is a lot of JavaScript. Here's how I'm including the large dictionary. And here's how I'm implementing autocomplete. 

So let me first show you what this does. Let me go back to my directory index, click on autocomplete.html. I'll zoom in. And if I type in C, I immediately get an unordered list of all words starting with C. 

If I type CA, it gets filtered further. But we can't see the difference because there's so many words starting with CA. CAT, the list is changing. CATS, the list is changing. 

And notice that if I were to open my developer tools, what gets really interesting is you can see this list being made in real time. Let me delete it. Notice that the UL at bottom left is now empty. 

But if I type in suddenly CATS, notice that the triangle appears and there are all of the list items that my JavaScript code is apparently dynamically creating. And indeed, how do I do this? Well, this one's more of a mouthful, but here's the idea. 

I used a queryselector function to get that input text box. I then add a listener to that input, listening for what's called key up. It turns out you can listen for the finger going down or the finger going up. So I'm waiting until the user lifts their finger off the keyboard, AKA, key up. 

When it hears that event, it should do the following. It's going to create a variable, a temp variable called HTML equal to quote, unquote nothing. In JavaScript, as an aside, you can use single quotes or double quotes for whatever reasons stylistically, JavaScript programmers tend to use single quotes. 

I can then say if that input has a value, because the humans typed in one or more letters, then iterate over all of the words in the dictionary. And we've not seen of before, but it's Javascript's equivalent of Python's for loop. If that word starts with whatever the input value is, go ahead and add-- that is concatenate to the HTML variable and open tag LI. 

Then, whatever the word is, using this JavaScript specific syntax, and then close the tag. And then lastly, using queryselector, grab the UL tag, go into its inner HTML, so to speak, inside of it, and change it to be this HTML I just created. And so in this way, using JavaScript, I can dynamically add to and subtract from the HTML in the page. 

There are so many other events here, too, clicking, submitting, key up, dragging, and dropping, and so forth. This is just some of the events that web pages and mobile apps can listen for. But we'll do one final one, which speaks to the power of browsers nowadays and even the implications for privacy. 

If I go into geolocation.html, it turns out you can figure out where in the world a user is with, like, three lines of code nowadays, assuming they've turned on location services and opted in on their device. Here, albeit cryptically, is a final global variable that comes with browsers today called navigator. It has a geolocation object associated with it, which comes with a function called getCurrentPosition. 

You can then specify or figure out the user's latitude and the user's longitude. And all I'm going to do is write these to the screen so I can see this demonstration live. So our very final demonstration here of JavaScript is going to be this one here for geolocation to show you how easy and how invasive even code can be if I click on geolocation and wait. 

There are my GPS coordinates, latitude and longitude. And to confirm as much roughly, let's go ahead and open up a browser, paste in those coordinates, click on the Google Maps result that comes up first. Zoom in, in, turn on satellite mode. And in-- and I'm not quite in that corner of the building. 

But I'm presumably close to an access point that Google has known about and associates with my GPS coordinates. It's that easy when you actually use something like Uber or Lyft or the like to figure out where the user is by just asking their browser via code like this. 

So that's it for HTML, CSS, and JavaScript. In the problem set, you'll explore all of these. One more lecture to go in which we'll combine all of these. But until then we'll see you next time. 

[MUSIC PLAYING] Buffering, OK. Josh, nice. [INAUDIBLE], oh! 

[LAUGHING] 

[INAUDIBLE]. No, oh, wait. 

That was amazing, Josh. 

Sophie! 

[LAUGHTER] 

Amazing. That was perfect. [INAUDIBLE]. 

[LAUGHTER] 

I think I-- 

[INAUDIBLE]. 

AUDIENCE: [INAUDIBLE]. 

DAVID MALAN: Guy. That was amazing. Thank you all. 

AUDIENCE: Good. 

[APPLAUSE]