1 00:00:00,000 --> 00:00:03,880 [MUSIC PLAYING] 2 00:00:03,880 --> 00:00:16,302 3 00:00:16,302 --> 00:00:17,260 DAVID MALAN: All right. 4 00:00:17,260 --> 00:00:21,040 This is CS50, and last time where we left off 5 00:00:21,040 --> 00:00:23,250 was here, focusing on data structures. 6 00:00:23,250 --> 00:00:25,860 And indeed, one of the last data structures we looked at 7 00:00:25,860 --> 00:00:27,142 was that of a hash table. 8 00:00:27,142 --> 00:00:29,600 But that was the result of a progression of data structures 9 00:00:29,600 --> 00:00:31,979 that we began with this thing here, an array. 10 00:00:31,979 --> 00:00:35,270 Recall that an array was actually a data structure that was actually introduced 11 00:00:35,270 --> 00:00:39,900 back in week two of CS50, but it was advantageous at the time, 12 00:00:39,900 --> 00:00:42,800 because it allowed us to do things efficiently, like binary search, 13 00:00:42,800 --> 00:00:45,560 and it was very easy to use with its square bracket notation 14 00:00:45,560 --> 00:00:48,030 and adding integers or strings or whatever it is. 15 00:00:48,030 --> 00:00:49,790 But it had limitations, recall. 16 00:00:49,790 --> 00:00:51,950 And among those limitations were its lack 17 00:00:51,950 --> 00:00:54,510 of resizeability, its lack of dynamism. 18 00:00:54,510 --> 00:00:57,750 We had to decide in advance how big we wanted this data structure to be, 19 00:00:57,750 --> 00:01:01,350 and if we wanted it to be any bigger, or for that matter any smaller, 20 00:01:01,350 --> 00:01:04,900 we would have to dynamically ourselves resize it and copy 21 00:01:04,900 --> 00:01:09,010 all of the old elements into a new array, and then go about our business. 22 00:01:09,010 --> 00:01:12,060 And so we introduced last time this thing here instead, 23 00:01:12,060 --> 00:01:15,830 a linked list that addresses that problem by having us on demand 24 00:01:15,830 --> 00:01:18,070 allocate these things that we called nodes, 25 00:01:18,070 --> 00:01:22,210 storing inside them an integer or really any data type that we want, 26 00:01:22,210 --> 00:01:25,540 but connecting those nodes with these arrows pictured 27 00:01:25,540 --> 00:01:28,910 here, specifically connecting them or threading them together using something 28 00:01:28,910 --> 00:01:30,120 called pointers. 29 00:01:30,120 --> 00:01:33,460 Whereby pointers are just addresses of those nodes in memory. 30 00:01:33,460 --> 00:01:36,830 So while we pay a bit of a price in terms of more memory in order 31 00:01:36,830 --> 00:01:40,610 to link these nodes together, we gain this flexibility, 32 00:01:40,610 --> 00:01:43,790 because now when we want to grow or shrink this kind of data structure, 33 00:01:43,790 --> 00:01:48,610 we simply use our friend malloc or free or similar functions still. 34 00:01:48,610 --> 00:01:52,360 But we then, using linked lists-- and at a lower level, 35 00:01:52,360 --> 00:01:56,520 pointers as a new building block, did we begin to solve other problems. 36 00:01:56,520 --> 00:01:59,680 We considered the problem of a stack of trays in the cafeteria, 37 00:01:59,680 --> 00:02:02,830 and we presented an abstract data type known as a stack. 38 00:02:02,830 --> 00:02:06,280 And a stack supports operations like push and pop. 39 00:02:06,280 --> 00:02:08,620 But what's interesting about a stack for our purposes 40 00:02:08,620 --> 00:02:11,720 recall is that we don't need to commit necessarily 41 00:02:11,720 --> 00:02:14,120 to implementing it one way or another. 42 00:02:14,120 --> 00:02:17,450 Indeed, we can abstract away the underlying implementation 43 00:02:17,450 --> 00:02:20,280 details of a stack, and implement it using an array if we want, 44 00:02:20,280 --> 00:02:21,990 if we find that easier or convenient. 45 00:02:21,990 --> 00:02:24,580 Or for that matter we can implement it using a linked list, 46 00:02:24,580 --> 00:02:27,670 if we want that additional ability to grow and shrink. 47 00:02:27,670 --> 00:02:33,590 And so the data type itself in a stack has these two operations, push and pop, 48 00:02:33,590 --> 00:02:36,240 but they're independent, ultimately, of how we actually 49 00:02:36,240 --> 00:02:37,880 implement things underneath the hood. 50 00:02:37,880 --> 00:02:40,160 And that holds true as well for this thing you here. 51 00:02:40,160 --> 00:02:42,840 A line, or more properly a queue, whereby 52 00:02:42,840 --> 00:02:47,470 instead of having this last in, first out or LIFO property, 53 00:02:47,470 --> 00:02:52,300 we want something more fair in the human world, a first in, first out. 54 00:02:52,300 --> 00:02:54,240 So that when you NQueue or Dqueue some piece 55 00:02:54,240 --> 00:02:57,840 of data, whatever was NQueued first is the first thing 56 00:02:57,840 --> 00:03:00,075 to get out of that queue as well. 57 00:03:00,075 --> 00:03:03,200 And here too did we see that we could implement these things using a linked 58 00:03:03,200 --> 00:03:05,950 list or using array, and I would wager there is yet 59 00:03:05,950 --> 00:03:08,860 other possible implementations as well. 60 00:03:08,860 --> 00:03:11,770 And then we transitioned from these abstract data types 61 00:03:11,770 --> 00:03:15,750 to another sort of paradigm for building a data structure in memory. 62 00:03:15,750 --> 00:03:19,220 Rather than just linking things together in a unidirectional way, so to speak, 63 00:03:19,220 --> 00:03:22,070 with a linked list, we introduced trees, and we introduced things 64 00:03:22,070 --> 00:03:24,270 like binary search trees, that so long as you 65 00:03:24,270 --> 00:03:28,030 keep these data structures pretty well balanced, such that the height of them 66 00:03:28,030 --> 00:03:31,300 is logarithmic and is not linear like a linked list, 67 00:03:31,300 --> 00:03:34,520 can we achieve the kind of efficiency that we saw back in week zero 68 00:03:34,520 --> 00:03:36,790 when we did binary search on a phone book. 69 00:03:36,790 --> 00:03:40,230 But now, thanks to these pointers and thanks to malloc and free 70 00:03:40,230 --> 00:03:43,890 can we grow and shrink the data structure without committing in advance 71 00:03:43,890 --> 00:03:46,305 to an actual fixed size array. 72 00:03:46,305 --> 00:03:49,286 And similarly did we solve another real world problem. 73 00:03:49,286 --> 00:03:51,410 Recall that a few weeks ago we looked at forensics, 74 00:03:51,410 --> 00:03:54,230 and most recently did we look at compression, both of which 75 00:03:54,230 --> 00:03:56,000 happen to involve files. 76 00:03:56,000 --> 00:03:59,570 And in this case, the goal was to compress information, 77 00:03:59,570 --> 00:04:04,082 ideally losslessly, without throwing away any of the underlying information. 78 00:04:04,082 --> 00:04:06,290 And thanks to Huffman coding did we see one technique 79 00:04:06,290 --> 00:04:11,210 for doing that, whereby instead of using seven or eight bits for every letter 80 00:04:11,210 --> 00:04:14,890 or punctuation symbol in some text, we can instead come up with our own coding 81 00:04:14,890 --> 00:04:19,339 that we use one bit like a one to represent a super common letter like e, 82 00:04:19,339 --> 00:04:22,610 and two or three or four or more bits for the less 83 00:04:22,610 --> 00:04:26,310 common letters in our world. 84 00:04:26,310 --> 00:04:29,060 And then again we came to hash tables. 85 00:04:29,060 --> 00:04:33,430 And hash tables too is an abstract type that we could implement using an array 86 00:04:33,430 --> 00:04:36,260 or using a linked list or using an array and a linked list. 87 00:04:36,260 --> 00:04:39,724 And indeed, we looked first at a hash table as little more than an array. 88 00:04:39,724 --> 00:04:41,890 But we introduced this idea of a hash function, that 89 00:04:41,890 --> 00:04:44,970 allows you, given some input, to decide on some output 90 00:04:44,970 --> 00:04:47,520 and index a numeric value, typically, that allows 91 00:04:47,520 --> 00:04:49,117 you to decide where to put some value. 92 00:04:49,117 --> 00:04:51,200 But if you use something like an array, of course, 93 00:04:51,200 --> 00:04:53,280 you might paint yourself into a corner, such 94 00:04:53,280 --> 00:04:55,930 that you don't have enough room ultimately for everything. 95 00:04:55,930 --> 00:04:59,660 And so we introduced separate chaining, whereby a hash table in this form 96 00:04:59,660 --> 00:05:03,140 is really just an array, pictured here vertically, and a set of linked 97 00:05:03,140 --> 00:05:06,300 lists hanging off that array, pictured here horizontally, 98 00:05:06,300 --> 00:05:09,090 that allows us to get some pretty good efficiency in terms 99 00:05:09,090 --> 00:05:13,300 of hashing, finding the chain that we want in pretty much constant time, 100 00:05:13,300 --> 00:05:15,490 and then maybe incurring a bit of linear cost 101 00:05:15,490 --> 00:05:18,670 if we actually have a number of collisions. 102 00:05:18,670 --> 00:05:22,250 Now today, after leaving behind these data structures-- among them 103 00:05:22,250 --> 00:05:25,300 a try, which recall was our last data structure that allowed us 104 00:05:25,300 --> 00:05:30,320 in theory in constant time to look up or insert or even delete words in a data 105 00:05:30,320 --> 00:05:33,460 structure, depending only on the length of the string, 106 00:05:33,460 --> 00:05:37,630 not how many strings were in there-- do we continue to use these ideas, 107 00:05:37,630 --> 00:05:39,680 these building blocks, these data structures. 108 00:05:39,680 --> 00:05:43,230 But now today we literally leave behind the world of C 109 00:05:43,230 --> 00:05:45,960 and starts to enter the world of web programming, 110 00:05:45,960 --> 00:05:50,580 or really the world of web pages and dynamic outputs 111 00:05:50,580 --> 00:05:53,580 and databases, ultimately, and all of the things that most of us 112 00:05:53,580 --> 00:05:54,970 are familiar with every day. 113 00:05:54,970 --> 00:05:58,894 But it turns out that this time we don't have to leave behind those ingredients. 114 00:05:58,894 --> 00:06:00,810 Indeed, something like this, which you'll soon 115 00:06:00,810 --> 00:06:04,980 know as HTML-- the language in which web pages are written-- HyperText Markup 116 00:06:04,980 --> 00:06:09,210 Language-- even this textual document, which seems to have a bit of structure 117 00:06:09,210 --> 00:06:12,640 to it, as you might glean here from the indentation, can underneath the hood 118 00:06:12,640 --> 00:06:15,080 be itself represented as a tree. 119 00:06:15,080 --> 00:06:18,220 A DOM, or Document Object Model, but indeed, we'll 120 00:06:18,220 --> 00:06:21,470 see now some real world, very modern applications of the same data 121 00:06:21,470 --> 00:06:23,780 structures in software that we ourselves use. 122 00:06:23,780 --> 00:06:26,260 Because today, we look at how the internet works, 123 00:06:26,260 --> 00:06:29,650 and in turn how we actually build software atop it. 124 00:06:29,650 --> 00:06:31,905 But first, a teaser. 125 00:06:31,905 --> 00:06:32,755 [VIDEO PLAYBACK] 126 00:06:32,755 --> 00:06:35,625 [MUSIC PLAYING] 127 00:06:35,625 --> 00:06:38,876 128 00:06:38,876 --> 00:06:47,420 -He came with a message, with a protocol all his own. 129 00:06:47,420 --> 00:07:01,530 130 00:07:01,530 --> 00:07:06,350 He came to a world of cruel firewalls, uncaring routers, 131 00:07:06,350 --> 00:07:10,850 and dangers far worse than death. 132 00:07:10,850 --> 00:07:11,445 He's fast. 133 00:07:11,445 --> 00:07:13,210 He's strong. 134 00:07:13,210 --> 00:07:18,092 He's TCPIP, and he's got your address. 135 00:07:18,092 --> 00:07:20,810 136 00:07:20,810 --> 00:07:23,820 Warriors of the Net. 137 00:07:23,820 --> 00:07:24,820 [END PLAYBACK] 138 00:07:24,820 --> 00:07:27,700 DAVID MALAN: All right, so coming soon is how the internet works. 139 00:07:27,700 --> 00:07:28,920 And it's not quite like that. 140 00:07:28,920 --> 00:07:30,070 But we'll see in a bit more detail. 141 00:07:30,070 --> 00:07:33,111 But let's consider first something a little more familiar, if abstractly, 142 00:07:33,111 --> 00:07:34,020 like our own home. 143 00:07:34,020 --> 00:07:36,330 So odds are, before coming to a place like this, 144 00:07:36,330 --> 00:07:39,870 you had internet access at home or at school or at work or the like. 145 00:07:39,870 --> 00:07:42,364 And inside of that building-- let's call it your home-- 146 00:07:42,364 --> 00:07:43,530 you had a number of devices. 147 00:07:43,530 --> 00:07:46,790 Maybe a laptop, maybe a desktop, maybe both, maybe multiple. 148 00:07:46,790 --> 00:07:50,810 And you had some kind of internet service provider, Comcast or Verizon 149 00:07:50,810 --> 00:07:54,500 or companies like that, that actually run some kind of wired connection, 150 00:07:54,500 --> 00:07:57,060 typically-- though it could be wireless-- into your home, 151 00:07:57,060 --> 00:08:00,850 and via that connection are you on your laptop or desktop able to get out 152 00:08:00,850 --> 00:08:01,940 onto the internet. 153 00:08:01,940 --> 00:08:05,670 Well it turns out that the internet itself is a pretty broad term. 154 00:08:05,670 --> 00:08:08,390 The internet is really just this interconnection 155 00:08:08,390 --> 00:08:09,790 of lots of different networks. 156 00:08:09,790 --> 00:08:11,050 Harvard here has a network. 157 00:08:11,050 --> 00:08:12,790 Yale has a network. 158 00:08:12,790 --> 00:08:13,730 Google has a network. 159 00:08:13,730 --> 00:08:14,710 Facebook has a network. 160 00:08:14,710 --> 00:08:16,280 Your home has a network and the like. 161 00:08:16,280 --> 00:08:18,960 And so the internet really is the interconnection 162 00:08:18,960 --> 00:08:21,020 of all of those physical networks. 163 00:08:21,020 --> 00:08:25,490 And on top of this internet, do there run services, things like the web 164 00:08:25,490 --> 00:08:26,510 or the world wide web. 165 00:08:26,510 --> 00:08:27,590 Things like email. 166 00:08:27,590 --> 00:08:28,950 Things like Facebook Messenger. 167 00:08:28,950 --> 00:08:29,820 Things like Skype. 168 00:08:29,820 --> 00:08:32,880 And any number of applications that we use every day 169 00:08:32,880 --> 00:08:35,789 run on top of this physical layer known as the internet. 170 00:08:35,789 --> 00:08:38,340 But how does this internet itself work? 171 00:08:38,340 --> 00:08:42,919 Well, when you first plug in your computer to a home modem 172 00:08:42,919 --> 00:08:45,640 that you might get from Verizon or Comcast-- it might be a cable 173 00:08:45,640 --> 00:08:49,470 modem or a DSL modem or another technology still-- or more commonly 174 00:08:49,470 --> 00:08:52,710 these days, you connect wirelessly, such that your Mac or PC 175 00:08:52,710 --> 00:08:56,890 laptop connects somehow wirelessly to this device, what actually happens? 176 00:08:56,890 --> 00:08:59,560 Like the first time you have internet installed on your home, 177 00:08:59,560 --> 00:09:02,340 how does your computer know how to connect to that device, 178 00:09:02,340 --> 00:09:04,900 and how does that device know how to get your laptop's data 179 00:09:04,900 --> 00:09:07,320 to and from the rest of the internet? 180 00:09:07,320 --> 00:09:09,140 Well, odds are you know on your Mac or PC 181 00:09:09,140 --> 00:09:11,520 you at least get to choose the name of your network, 182 00:09:11,520 --> 00:09:16,240 whether it's Harvard University or Yale or LinkSys or Airport Extreme 183 00:09:16,240 --> 00:09:20,300 or whatever it is at home, and then once you're connected to that, 184 00:09:20,300 --> 00:09:22,610 it turns out that there's special software running 185 00:09:22,610 --> 00:09:24,984 on this device in your home called a router. 186 00:09:24,984 --> 00:09:27,150 And actually, it can be called any number of things. 187 00:09:27,150 --> 00:09:30,080 But one of its primary functions is to route information, 188 00:09:30,080 --> 00:09:33,940 and also to assign certain settings to your computer. 189 00:09:33,940 --> 00:09:36,580 Indeed, running inside of this so-called router in your home 190 00:09:36,580 --> 00:09:42,180 typically is a protocol, a special type of software called DHCP-- Dynamic Host 191 00:09:42,180 --> 00:09:43,377 Configuration Protocol. 192 00:09:43,377 --> 00:09:46,460 And this is just a fancy way of saying but that little device in your home 193 00:09:46,460 --> 00:09:48,370 knows how to get you onto the internet. 194 00:09:48,370 --> 00:09:49,780 And how does it do that? 195 00:09:49,780 --> 00:09:51,970 Well, the first time you turn on your Mac or PC 196 00:09:51,970 --> 00:09:55,340 and connect to your home network-- or Harvard's or Yale's for that matter-- 197 00:09:55,340 --> 00:09:59,730 you are assigned, thanks to this technology DHCP an IP address, 198 00:09:59,730 --> 00:10:02,570 a numeric address, something of the form something 199 00:10:02,570 --> 00:10:05,490 dot something dot something dot something that uniquely in theory 200 00:10:05,490 --> 00:10:07,390 identifies your computer on the internet, 201 00:10:07,390 --> 00:10:12,160 so long as your computer speaks this protocol IP, or the Internet Protocol. 202 00:10:12,160 --> 00:10:15,370 And we'll see in a bit that IP and TCP-- or more 203 00:10:15,370 --> 00:10:18,530 commonly known as TCPIP-- is really just a set of conventions 204 00:10:18,530 --> 00:10:21,890 that governs how computers talk to each other on the internet. 205 00:10:21,890 --> 00:10:24,850 And the first way they do that is by agreeing upon in advance 206 00:10:24,850 --> 00:10:26,940 what each of their addresses look like. 207 00:10:26,940 --> 00:10:29,580 Now, these addresses are actually changing in format over time, 208 00:10:29,580 --> 00:10:31,829 because frankly, we're running out of these addresses. 209 00:10:31,829 --> 00:10:33,800 But the most common address right now still 210 00:10:33,800 --> 00:10:38,690 is an IP version 4, or V4 address, that is literally of the form something dot 211 00:10:38,690 --> 00:10:40,660 something dot something dot something. 212 00:10:40,660 --> 00:10:43,585 And so when your computer first turns on in your home network, 213 00:10:43,585 --> 00:10:46,210 you are given a number that looks a little something like that. 214 00:10:46,210 --> 00:10:50,650 And via that address now can you talk to other computers on the internet, 215 00:10:50,650 --> 00:10:53,680 because this is like your from address in the physical world, 216 00:10:53,680 --> 00:10:56,930 and you can receive responses from computers on the internet, 217 00:10:56,930 --> 00:10:59,480 because they now know you via this address. 218 00:10:59,480 --> 00:11:03,420 So much like the CS building here is that 33 Oxford Street 219 00:11:03,420 --> 00:11:05,710 Cambridge, Massachusetts, or the CS building at Yale 220 00:11:05,710 --> 00:11:10,170 was 51 Prospect Street, New Haven, Connecticut, much as those 221 00:11:10,170 --> 00:11:12,840 addresses uniquely identified those two buildings, 222 00:11:12,840 --> 00:11:17,590 so do IP addresses in the world of computers uniquely identify computers. 223 00:11:17,590 --> 00:11:20,560 So here for instance just happens to be by convention 224 00:11:20,560 --> 00:11:23,444 what most of Harvard's own IP addresses look like. 225 00:11:23,444 --> 00:11:26,110 Now that I'm on this network here, odds are my IP address starts 226 00:11:26,110 --> 00:11:28,530 with 140.247 dot something dot something, 227 00:11:28,530 --> 00:11:31,520 or 128.103 dot something dot something. 228 00:11:31,520 --> 00:11:35,700 Or at New Haven at Yale, it might look like 130.132 dot 229 00:11:35,700 --> 00:11:39,600 something dot something, or 128.36 dot something dot something. 230 00:11:39,600 --> 00:11:41,560 And it turns out that each of these somethings 231 00:11:41,560 --> 00:11:46,590 simply is by definition a number between 0 and 255. 232 00:11:46,590 --> 00:11:47,810 0 to 255. 233 00:11:47,810 --> 00:11:49,690 I feel like we've heard these numbers before. 234 00:11:49,690 --> 00:11:53,490 And indeed, if you can count from 0 to 255, 235 00:11:53,490 --> 00:11:56,210 that means you're using what 8 bits. 236 00:11:56,210 --> 00:11:59,930 And so each of these numbers is 8 bits plus 8 plus 8 plus 8. 237 00:11:59,930 --> 00:12:01,760 So that's 32 bits. 238 00:12:01,760 --> 00:12:05,840 And indeed, an IP address typically these days-- at least version 4-- 239 00:12:05,840 --> 00:12:10,320 is a 32-bit value which means there can be total no more than 4 billion 240 00:12:10,320 --> 00:12:11,654 or so computers on the internet. 241 00:12:11,654 --> 00:12:14,611 And we're actually starting to bump up against that, because everything 242 00:12:14,611 --> 00:12:17,630 these days seems to be on the internet, whether it's your phone, laptop, 243 00:12:17,630 --> 00:12:20,070 or even some smart device in your home. 244 00:12:20,070 --> 00:12:22,080 And so there is a way to mitigate that. 245 00:12:22,080 --> 00:12:24,880 It turns out that your computer, even if you're on campus, 246 00:12:24,880 --> 00:12:27,129 might not quite have one of those Harvard or Yale IPs. 247 00:12:27,129 --> 00:12:30,790 You might instead have depending on where you are on campus a private IP 248 00:12:30,790 --> 00:12:33,054 address, or if you're in your home, you similarly 249 00:12:33,054 --> 00:12:34,470 might have one of these addresses. 250 00:12:34,470 --> 00:12:36,420 And these are private in the sense that they 251 00:12:36,420 --> 00:12:39,450 are used to route information within your home or within your school 252 00:12:39,450 --> 00:12:41,870 or within your company, but these addresses are not 253 00:12:41,870 --> 00:12:43,960 meant to be used by the outside world. 254 00:12:43,960 --> 00:12:46,270 Instead, what you get from Harvard or Yale 255 00:12:46,270 --> 00:12:48,640 or Comcast or Verizon when you connect to their network 256 00:12:48,640 --> 00:12:53,160 typically is at least the ability to have one or more public IP 257 00:12:53,160 --> 00:12:56,900 addresses that the rest of the world knows you by. 258 00:12:56,900 --> 00:12:59,260 So what does this actually mean? 259 00:12:59,260 --> 00:13:01,630 Well, sometimes it doesn't really mean anything at all. 260 00:13:01,630 --> 00:13:04,505 And in fact, if you look at popular media today or various television 261 00:13:04,505 --> 00:13:08,660 shows, you'll see that IP is either miscommunicated or outright 262 00:13:08,660 --> 00:13:09,920 misunderstood. 263 00:13:09,920 --> 00:13:11,247 Let's take a look. 264 00:13:11,247 --> 00:13:12,440 [VIDEO PLAYBACK] 265 00:13:12,440 --> 00:13:14,933 -It's a 32-bit IPv4 address. 266 00:13:14,933 --> 00:13:16,852 -IP, as in the internet? 267 00:13:16,852 --> 00:13:17,560 -Private network. 268 00:13:17,560 --> 00:13:18,690 To meet is private network. 269 00:13:18,690 --> 00:13:31,731 270 00:13:31,731 --> 00:13:32,960 It's just so amazing. 271 00:13:32,960 --> 00:13:36,010 272 00:13:36,010 --> 00:13:37,180 It's in their IP address. 273 00:13:37,180 --> 00:13:41,207 She's letting us watch what she's doing in real time. 274 00:13:41,207 --> 00:13:41,790 [END PLAYBACK] 275 00:13:41,790 --> 00:13:44,784 DAVID MALAN: No, no, that is not what a hacker does in real time, 276 00:13:44,784 --> 00:13:46,950 and that is not how you watch a hacker in real time. 277 00:13:46,950 --> 00:13:49,470 Indeed, if you zoom in on this screen here, 278 00:13:49,470 --> 00:13:51,982 you'll see that what's actually being looked at 279 00:13:51,982 --> 00:13:53,690 has nothing to do with networking per se. 280 00:13:53,690 --> 00:13:55,398 This is actually programming code written 281 00:13:55,398 --> 00:13:57,910 in a language called Objective C, which happens 282 00:13:57,910 --> 00:13:59,950 to be used conventionally for Mac applications 283 00:13:59,950 --> 00:14:01,657 or more recently iOS applications. 284 00:14:01,657 --> 00:14:03,740 And of all the things for them to have pulled out, 285 00:14:03,740 --> 00:14:06,190 they use this code, which has to be something 286 00:14:06,190 --> 00:14:10,020 related to some kind of drawing program insofar as it's talking about crayons. 287 00:14:10,020 --> 00:14:13,280 Moreover, if you actually look at one of the other scenes from this show, 288 00:14:13,280 --> 00:14:15,040 this was the IP address in question. 289 00:14:15,040 --> 00:14:17,430 This too is not technically accurate. 290 00:14:17,430 --> 00:14:22,620 What's wrong with this IP address in this frame here from the show? 291 00:14:22,620 --> 00:14:26,040 Yeah, so if the IP addresses can only be from 0 to 255, 292 00:14:26,040 --> 00:14:28,100 275 is definitely too big. 293 00:14:28,100 --> 00:14:30,680 Now, in their defense, this is probably a good thing, 294 00:14:30,680 --> 00:14:34,130 because now they're not broadcasting some random, unsuspecting person's 295 00:14:34,130 --> 00:14:35,420 actual IP address. 296 00:14:35,420 --> 00:14:38,510 But there too there's a technical limitation. 297 00:14:38,510 --> 00:14:43,300 But of course, we humans, when we visit websites using Safari or Chrome or IE 298 00:14:43,300 --> 00:14:46,400 or Edge or whatever, we rarely if ever type 299 00:14:46,400 --> 00:14:51,230 in the address of websites or servers by these numeric IP addresses. 300 00:14:51,230 --> 00:14:55,390 Rather, we seem to use more user-friendly words, 301 00:14:55,390 --> 00:15:01,640 like www.google.com, or harvard.edu, or yale.edu, or facebook.com, or the like. 302 00:15:01,640 --> 00:15:03,600 And thankfully, there exists in the world 303 00:15:03,600 --> 00:15:08,890 another system, another technology known as DNS-- Domain Name System. 304 00:15:08,890 --> 00:15:13,400 And what DNS does is it simply converts numeric IP addresses 305 00:15:13,400 --> 00:15:17,520 to more human-friendly host names, or fully qualified domain names. 306 00:15:17,520 --> 00:15:21,870 Which is to say when I first sit down at my Mac or my PC on my home network 307 00:15:21,870 --> 00:15:26,590 or Harvard's or Yale's and I type in something like www.google.com and hit 308 00:15:26,590 --> 00:15:31,080 Enter, the way that my computer actually talks to google.com 309 00:15:31,080 --> 00:15:33,620 is by way of those numeric IP addresses. 310 00:15:33,620 --> 00:15:38,931 But the way my Mac or PC figures out what that IP address is of google.com 311 00:15:38,931 --> 00:15:42,610 is it asks the local operating system-- Mac OS or Windows-- 312 00:15:42,610 --> 00:15:46,380 and if Mac OS or Windows doesn't know, my operating system asks 313 00:15:46,380 --> 00:15:49,750 Harvard's network or Yale's network or Comcast's network, 314 00:15:49,750 --> 00:15:52,340 wherever I physically am, because each of those networks 315 00:15:52,340 --> 00:15:56,440 has their own DNS server, whose purpose in life is to convert IP 316 00:15:56,440 --> 00:15:59,330 addresses to host names and host names to IP addresses. 317 00:15:59,330 --> 00:16:03,100 And in the event that Comcast or Yale or Harvard, wherever I am, 318 00:16:03,100 --> 00:16:06,580 doesn't know the answer to what is the IP address for www.google.com, 319 00:16:06,580 --> 00:16:10,110 there exist root servers in the world. 320 00:16:10,110 --> 00:16:12,930 Servers that are globally administered at the end of the day 321 00:16:12,930 --> 00:16:17,839 can at least help those DNS servers figure out what the answers are. 322 00:16:17,839 --> 00:16:19,630 And indeed, when you buy or when you rarely 323 00:16:19,630 --> 00:16:22,090 rent a domain name, among the things you're doing 324 00:16:22,090 --> 00:16:24,440 is informing the world via a set of standards 325 00:16:24,440 --> 00:16:26,500 what your server's IP addresses are. 326 00:16:26,500 --> 00:16:30,380 And so that's exactly what Google and others have done. 327 00:16:30,380 --> 00:16:34,610 But of course, the data at the end of the day still has to get from my laptop 328 00:16:34,610 --> 00:16:35,120 to Google. 329 00:16:35,120 --> 00:16:37,970 And then my search results have to get from Google to me. 330 00:16:37,970 --> 00:16:39,030 And how does that happen? 331 00:16:39,030 --> 00:16:40,905 I mean, most of Google's servers are probably 332 00:16:40,905 --> 00:16:44,630 out in Mountain View, California or maybe here on the East Coast somewhere, 333 00:16:44,630 --> 00:16:45,897 if they have multiple servers. 334 00:16:45,897 --> 00:16:47,230 Or maybe somewhere in the world. 335 00:16:47,230 --> 00:16:50,105 And indeed, big companies these days have servers all over the place. 336 00:16:50,105 --> 00:16:53,070 So how does one little old laptop know how 337 00:16:53,070 --> 00:16:55,310 to request search results from Google or how 338 00:16:55,310 --> 00:17:00,240 to request my news feed from Facebook or how to do any number of other things 339 00:17:00,240 --> 00:17:01,100 on the internet? 340 00:17:01,100 --> 00:17:04,200 Well it does it by way of these things called routers. 341 00:17:04,200 --> 00:17:08,030 It turns out that between me and most any other point on the internet, 342 00:17:08,030 --> 00:17:10,500 there's one or more routers-- special servers 343 00:17:10,500 --> 00:17:13,839 that could be this big, this big, any number of sizes these days. 344 00:17:13,839 --> 00:17:17,599 They're just computers that typically live in data centers of some sort. 345 00:17:17,599 --> 00:17:21,359 And these routers' purpose in life is to quite simply route information. 346 00:17:21,359 --> 00:17:24,260 So when my Mac wants to talk to google.com, 347 00:17:24,260 --> 00:17:27,950 my Mac constructs what we call a packet of information inside of which 348 00:17:27,950 --> 00:17:28,710 is my request. 349 00:17:28,710 --> 00:17:30,772 Give me all of your search results for cats, 350 00:17:30,772 --> 00:17:32,730 for instance, if that's what I'm searching for. 351 00:17:32,730 --> 00:17:36,410 And that packet is handed off to the nearest router. 352 00:17:36,410 --> 00:17:39,830 That router happens to be, at this point in the story, at Harvard here. 353 00:17:39,830 --> 00:17:41,210 Harvard has its own routers. 354 00:17:41,210 --> 00:17:44,960 And Harvard's routers are somehow wired or wirelessly connected 355 00:17:44,960 --> 00:17:46,710 to other routers in the world. 356 00:17:46,710 --> 00:17:50,141 And those routers, typically no more than 30 routers away, 357 00:17:50,141 --> 00:17:53,390 can get my data by routing it, routing it, routing it, routing it, routing it, 358 00:17:53,390 --> 00:17:56,310 until it eventually reaches its correct destination. 359 00:17:56,310 --> 00:17:59,430 In its simplest form, what you can think of these routers 360 00:17:59,430 --> 00:18:02,190 as doing is looking at those IP addresses-- something 361 00:18:02,190 --> 00:18:05,500 dot something dot something dot something-- and deciding, based 362 00:18:05,500 --> 00:18:07,280 on those numbers, which direction to go. 363 00:18:07,280 --> 00:18:10,386 So maybe if my IP address starts arbitrarily with 1, 364 00:18:10,386 --> 00:18:12,510 maybe the packet should go that way to that router. 365 00:18:12,510 --> 00:18:14,260 If it starts with 2, it should go that way 366 00:18:14,260 --> 00:18:16,800 and be routed to that router, or that way, or that way. 367 00:18:16,800 --> 00:18:17,960 It doesn't really matter. 368 00:18:17,960 --> 00:18:19,960 This all happens dynamically thanks to software. 369 00:18:19,960 --> 00:18:22,980 But routers just use those IP addresses to decide 370 00:18:22,980 --> 00:18:25,720 which way to route your information. 371 00:18:25,720 --> 00:18:27,190 And we can actually see this. 372 00:18:27,190 --> 00:18:31,320 Let me go ahead into CS50 IDE, and Macs and PCs and other computers 373 00:18:31,320 --> 00:18:32,790 have the same software. 374 00:18:32,790 --> 00:18:35,930 This will allow me to do a number of things at my command line here. 375 00:18:35,930 --> 00:18:39,430 For instance, suppose that I wanted to check 376 00:18:39,430 --> 00:18:41,445 what the IP address is for google.com. 377 00:18:41,445 --> 00:18:44,840 Because if I want to send Google a letter, like a packet of information 378 00:18:44,840 --> 00:18:47,620 requesting a whole bunch of search results about cats, 379 00:18:47,620 --> 00:18:49,570 I need to know their IP address. 380 00:18:49,570 --> 00:18:51,690 So what I can do at the command line here 381 00:18:51,690 --> 00:18:56,210 is run a command that's pretty popular called nslookup-- names server lookup. 382 00:18:56,210 --> 00:19:00,510 And I can type in something like www.google.com Enter, 383 00:19:00,510 --> 00:19:04,960 and wala, I seem to get the answer here that Google's IP address is apparently 384 00:19:04,960 --> 00:19:08,780 172.217.4.36. 385 00:19:08,780 --> 00:19:11,410 And I know that answer, because Harvard's server-- 386 00:19:11,410 --> 00:19:15,520 and I know it's Harvard, because it starts with 140.247-- Harvard's DNS 387 00:19:15,520 --> 00:19:18,400 server somewhere here on campus just knew that result. 388 00:19:18,400 --> 00:19:23,066 But it's non-authoritative, in the sense that Harvard does not run google.com. 389 00:19:23,066 --> 00:19:26,060 But Harvard has previously asked Google or someone else 390 00:19:26,060 --> 00:19:27,670 for Google's IP address. 391 00:19:27,670 --> 00:19:30,950 And so Harvard is answering the question for me, but not authoritatively. 392 00:19:30,950 --> 00:19:34,500 It's a delegate who is relaying that information to me. 393 00:19:34,500 --> 00:19:36,710 Now, suppose I want to do this for another site. 394 00:19:36,710 --> 00:19:42,700 Let me go ahead and search for nslookup say www.facebook.com. 395 00:19:42,700 --> 00:19:48,590 And you'll see here that Facebook's IP address is apparently 31.13.80.36. 396 00:19:48,590 --> 00:19:50,980 And there's some more cleverness going on here. 397 00:19:50,980 --> 00:19:54,160 It turns out there's other types of DNS records 398 00:19:54,160 --> 00:19:57,761 or entries, starmini.c10r.facebook.com. 399 00:19:57,761 --> 00:19:59,260 I don't really know what that means. 400 00:19:59,260 --> 00:20:00,870 Facebook's a big enough company that there's probably 401 00:20:00,870 --> 00:20:02,560 a lot more complexity going on. 402 00:20:02,560 --> 00:20:08,240 But just out of curiosity, let me go ahead and copy this IP address here. 403 00:20:08,240 --> 00:20:15,110 And in a browser, go to http:// that IP address. 404 00:20:15,110 --> 00:20:16,380 Enter. 405 00:20:16,380 --> 00:20:18,150 And wala, I make my way to Facebook.com. 406 00:20:18,150 --> 00:20:22,200 But it would be pretty bad for business if everyone in the world 407 00:20:22,200 --> 00:20:24,670 had to know that Facebook's IP address is this. 408 00:20:24,670 --> 00:20:27,240 Back in the day when people still used phone numbers, 409 00:20:27,240 --> 00:20:32,390 you might have services like 1-800-COLLECT, C-O-L-L-E-C-T, 410 00:20:32,390 --> 00:20:36,050 these mnemonics, so that it was easier for humans to remember phone numbers. 411 00:20:36,050 --> 00:20:38,570 Thankfully, DNS does all of this automatically. 412 00:20:38,570 --> 00:20:41,240 We just have to remember facebook.com, and DNS 413 00:20:41,240 --> 00:20:44,680 does that conversion even more dynamically than the old school 414 00:20:44,680 --> 00:20:47,500 1-800-COLLECT tricks that the world adopted. 415 00:20:47,500 --> 00:20:51,217 So that's how my computer would get the TO address. 416 00:20:51,217 --> 00:20:54,300 So at this point in the story, if I want to send a request to google.com-- 417 00:20:54,300 --> 00:20:57,000 and this is just an envelope in which I might send a letter-- 418 00:20:57,000 --> 00:20:59,080 I need to have two pieces of information. 419 00:20:59,080 --> 00:21:03,440 I need to have the TO address here, which for Google recall-- 420 00:21:03,440 --> 00:21:05,190 let me look it up again-- is 172.217.4.36. 421 00:21:05,190 --> 00:21:13,670 7 And so I'm going to put that in the TO field of this envelope. 422 00:21:13,670 --> 00:21:15,790 And now I need to know my own IP address. 423 00:21:15,790 --> 00:21:18,350 So it turns out my computer has its own IP address. 424 00:21:18,350 --> 00:21:21,560 And so when I send this request over the internet to Google, 425 00:21:21,560 --> 00:21:24,530 I'm going to need to include my own IP address, which Windows or Mac 426 00:21:24,530 --> 00:21:25,590 OS knows for me. 427 00:21:25,590 --> 00:21:27,720 And so in the top corner of this envelope might 428 00:21:27,720 --> 00:21:30,900 I write my actual IP address as well. 429 00:21:30,900 --> 00:21:33,510 So now I have to actually route this information. 430 00:21:33,510 --> 00:21:35,580 I first have to write Google a note, and I 431 00:21:35,580 --> 00:21:42,420 might say on this blank sheet of paper, search for cats. 432 00:21:42,420 --> 00:21:43,984 So this might be my search request. 433 00:21:43,984 --> 00:21:45,900 And I'm going to go ahead and just bundle this 434 00:21:45,900 --> 00:21:48,340 up, put this inside of this envelope. 435 00:21:48,340 --> 00:21:52,310 But now I need to send this envelope or this so-called packet of information 436 00:21:52,310 --> 00:21:53,802 to www.google.com. 437 00:21:53,802 --> 00:21:55,010 And who knows where they are? 438 00:21:55,010 --> 00:21:55,860 Maybe they're in California. 439 00:21:55,860 --> 00:21:56,790 Maybe they're here on the East Coast. 440 00:21:56,790 --> 00:21:58,050 Maybe they're somewhere else. 441 00:21:58,050 --> 00:21:59,740 How do I route this information? 442 00:21:59,740 --> 00:22:01,930 Well, turns out that Harvard has a router, again, 443 00:22:01,930 --> 00:22:04,030 and Harvard's routers know of other routers. 444 00:22:04,030 --> 00:22:06,920 And in turn, and we using the same command prompt 445 00:22:06,920 --> 00:22:09,170 can we actually see the path that my data should 446 00:22:09,170 --> 00:22:20,570 take if I trace the route one query at a time from here to www.google.com. 447 00:22:20,570 --> 00:22:24,670 And now what you see, one row at a time, is the following. 448 00:22:24,670 --> 00:22:29,250 The first hop between me and Google is apparently this router here. 449 00:22:29,250 --> 00:22:35,292 Row number, mr-sc-1-gw-vl427.fas.net.harvard.edu. 450 00:22:35,292 --> 00:22:37,000 Don't quite understand all of that, but I 451 00:22:37,000 --> 00:22:40,180 do know just from knowing the people there, MR is the machine room. 452 00:22:40,180 --> 00:22:43,099 So here at Harvard Science Center, there is a room with machines. 453 00:22:43,099 --> 00:22:44,890 And that's where this server apparently is. 454 00:22:44,890 --> 00:22:46,370 SC means Science Center. 455 00:22:46,370 --> 00:22:48,840 GW by convention means gateway, which is just 456 00:22:48,840 --> 00:22:51,320 a synonym for router, this kind of device. 457 00:22:51,320 --> 00:22:53,840 And then I don't know what VL427 means. 458 00:22:53,840 --> 00:22:56,970 But I do know that if we continue to the next hop here, 459 00:22:56,970 --> 00:23:01,520 row two, Core Science Center gateway, or Core Science Center router. 460 00:23:01,520 --> 00:23:03,640 So one router is connected to another router. 461 00:23:03,640 --> 00:23:06,010 The third hop to which my data is delivered 462 00:23:06,010 --> 00:23:10,400 is bdrgw2, which I know by convention means border gateway. 463 00:23:10,400 --> 00:23:14,380 And so this data is being passed from hop one to two to three. 464 00:23:14,380 --> 00:23:17,280 And once it goes there, it goes to hop four or router 465 00:23:17,280 --> 00:23:20,430 number four, which is nox1sumgw. 466 00:23:20,430 --> 00:23:24,292 So nox is the northern crossroads, which is a common peering point here 467 00:23:24,292 --> 00:23:27,250 in the Northeast of the US, which just means lots of different internet 468 00:23:27,250 --> 00:23:30,410 service providers interconnect their cabling and their technology 469 00:23:30,410 --> 00:23:32,710 so as to route data to and from locations. 470 00:23:32,710 --> 00:23:34,950 That's apparently where we're connected here. 471 00:23:34,950 --> 00:23:39,470 Then I don't know where row five is, but it 472 00:23:39,470 --> 00:23:41,620 looks like its owned by internet two, which 473 00:23:41,620 --> 00:23:45,860 is a fast level of internet service that a lot of universities use. 474 00:23:45,860 --> 00:23:50,950 Then router 6, 7, 8, 9, 10, and 11 don't even disclose that they have names. 475 00:23:50,950 --> 00:23:51,760 And they might not. 476 00:23:51,760 --> 00:23:54,400 Routers don't and computers don't need to have 477 00:23:54,400 --> 00:23:57,990 domain names or human-friendly terms, it's just useful for us humans. 478 00:23:57,990 --> 00:24:02,660 But then lastly in hop 12, we finally make our way to whatever this is, 479 00:24:02,660 --> 00:24:07,660 which seems to be some kind of synonym or alias for one of Google's servers. 480 00:24:07,660 --> 00:24:11,870 So it seems that in just 12 hops, I can get data from here to Google. 481 00:24:11,870 --> 00:24:15,800 And you know how long it takes to get from here to Google, wherever they are? 482 00:24:15,800 --> 00:24:17,970 9 milliseconds in total. 483 00:24:17,970 --> 00:24:20,974 That's pretty darn fast to make a request from my computer 484 00:24:20,974 --> 00:24:24,140 to some other computer, especially when that computer could be most anywhere 485 00:24:24,140 --> 00:24:25,610 in the world or in the country. 486 00:24:25,610 --> 00:24:27,026 Now, there's a lot of variability. 487 00:24:27,026 --> 00:24:33,151 If you look at each of these rows-- 1.5 milliseconds, 1.9, 2.9, 25, 25, 25. 488 00:24:33,151 --> 00:24:34,150 These aren't cumulative. 489 00:24:34,150 --> 00:24:37,814 What my computer is doing is sending a packet to the first router, 490 00:24:37,814 --> 00:24:39,980 then to the second rather, then to the third router, 491 00:24:39,980 --> 00:24:42,120 and measuring each time how long it takes. 492 00:24:42,120 --> 00:24:45,350 So you really just get a rough sense, an average of sorts, 493 00:24:45,350 --> 00:24:47,040 based on running this command like this. 494 00:24:47,040 --> 00:24:49,610 So it seems to take between 10 and 30 milliseconds 495 00:24:49,610 --> 00:24:51,770 to get my data from me to Google. 496 00:24:51,770 --> 00:24:53,680 Now, I don't know where Google's servers are, 497 00:24:53,680 --> 00:24:56,580 but I do know that UC Berkeley is in California, 498 00:24:56,580 --> 00:24:59,250 and their servers I do think are in California. 499 00:24:59,250 --> 00:25:03,675 So let's do another by tracing the route to www.berkeley.edu 500 00:25:03,675 --> 00:25:05,590 where some of our friends there are. 501 00:25:05,590 --> 00:25:09,080 That was super fast, even though it still took some 93 milliseconds. 502 00:25:09,080 --> 00:25:11,670 So I'm going to infer that the server of Google's 503 00:25:11,670 --> 00:25:13,990 that I'm talking to isn't all the way in California, 504 00:25:13,990 --> 00:25:17,820 because to get to California in reality seems to take a good 100 or 90 505 00:25:17,820 --> 00:25:18,530 milliseconds. 506 00:25:18,530 --> 00:25:20,180 But let's see what we can glean here. 507 00:25:20,180 --> 00:25:21,530 So Machine Room Science Center. 508 00:25:21,530 --> 00:25:22,500 It's a core gateway. 509 00:25:22,500 --> 00:25:26,330 It's a border gateway to Northern Crossroads, to an unnamed server. 510 00:25:26,330 --> 00:25:28,150 Don't know what this one is. 511 00:25:28,150 --> 00:25:30,380 But I can guess maybe what this is. 512 00:25:30,380 --> 00:25:33,970 And notice in particular, router number six jumps from seven 513 00:25:33,970 --> 00:25:36,940 milliseconds to like 49. 514 00:25:36,940 --> 00:25:38,690 That's a pretty good distance. 515 00:25:38,690 --> 00:25:42,120 And indeed, if you look at the name here, Hous, this I'm guessing 516 00:25:42,120 --> 00:25:45,520 is a router that's in Houston, Texas, halfway across the country. 517 00:25:45,520 --> 00:25:50,220 After that, maybe Los Angeles here in step 8. 518 00:25:50,220 --> 00:25:52,220 And that, indeed takes a little more time. 519 00:25:52,220 --> 00:25:55,210 So you can probably infer that it's farther away. 520 00:25:55,210 --> 00:25:56,660 No name, no name. 521 00:25:56,660 --> 00:25:59,560 This one here, I'm not really sure. 522 00:25:59,560 --> 00:26:04,360 But now we seem to be in Berkeley's campus and CalWeb-- California web, 523 00:26:04,360 --> 00:26:06,820 their server farm production. 524 00:26:06,820 --> 00:26:10,500 Indeed, it takes some 90 seconds in total to get to Berkeley. 525 00:26:10,500 --> 00:26:11,670 What about MIT? 526 00:26:11,670 --> 00:26:13,090 MIT should be pretty close. 527 00:26:13,090 --> 00:26:15,540 Let's do a trace route to MIT.edu. 528 00:26:15,540 --> 00:26:22,640 And it takes-- all right, so it seems that two routers between us and MIT 529 00:26:22,640 --> 00:26:25,000 aren't even cooperating, and that's their prerogative. 530 00:26:25,000 --> 00:26:26,880 Not actually responding to our requests. 531 00:26:26,880 --> 00:26:30,070 And so in about 10 milliseconds, we get to MIT's server, 532 00:26:30,070 --> 00:26:32,010 which seems to be hosted by a third party 533 00:26:32,010 --> 00:26:35,109 company called Akamai, which is a content delivery network, 534 00:26:35,109 --> 00:26:35,900 among other things. 535 00:26:35,900 --> 00:26:38,540 Which means MIT has outsourced to some third party 536 00:26:38,540 --> 00:26:41,620 the physical hosting of their servers, which is not uncommon. 537 00:26:41,620 --> 00:26:43,260 But let's do one more. 538 00:26:43,260 --> 00:26:46,200 Let's do one like for CNN, but not here in the US. 539 00:26:46,200 --> 00:26:52,260 But maybe .co.jp for the Japanese version of CNN's website. 540 00:26:52,260 --> 00:26:54,660 Let's go ahead and run this. 541 00:26:54,660 --> 00:26:58,430 Initially following the same route, Machine Room, Core Gateway, border. 542 00:26:58,430 --> 00:27:04,180 And then wala, 189 milliseconds later, we seem to have gotten to Japan. 543 00:27:04,180 --> 00:27:06,330 But what can we glean from these numbers? 544 00:27:06,330 --> 00:27:09,170 I'm not quite sure where all of these hops are. 545 00:27:09,170 --> 00:27:14,150 But what is interesting to me is this one here between routers 8 and 9, what 546 00:27:14,150 --> 00:27:16,470 do you notice? 547 00:27:16,470 --> 00:27:18,650 That's a sizable jump in time. 548 00:27:18,650 --> 00:27:19,680 And it's not a fluke. 549 00:27:19,680 --> 00:27:22,690 It's not an anomaly, because indeed, it seems to persist. 550 00:27:22,690 --> 00:27:25,680 So if we go farther and farther into this trace, then 551 00:27:25,680 --> 00:27:30,610 indeed it's staying at 170 plus milliseconds. 552 00:27:30,610 --> 00:27:36,715 So what do you think is in between routers number 8 and 9? 553 00:27:36,715 --> 00:27:38,690 What would be between these? 554 00:27:38,690 --> 00:27:41,950 I dare say there's an entire ocean between them. 555 00:27:41,950 --> 00:27:44,400 And we can see that thanks to this animation here, 556 00:27:44,400 --> 00:27:47,270 there's a whole lot going on between points A and B, 557 00:27:47,270 --> 00:27:51,460 including sometimes some pretty big cables and some pretty big oceans. 558 00:27:51,460 --> 00:27:54,650 Let's take a look. 559 00:27:54,650 --> 00:27:58,618 [MUSIC PLAYING] 560 00:27:58,618 --> 00:28:54,694 561 00:28:54,694 --> 00:28:56,860 All right, there's something about really cool music 562 00:28:56,860 --> 00:28:58,510 that makes lines cool. 563 00:28:58,510 --> 00:29:01,112 But indeed, those pictures capture the complexity 564 00:29:01,112 --> 00:29:04,070 of all the wiring that's actually interconnecting all of the continents 565 00:29:04,070 --> 00:29:07,820 and countries of the world that actually explains more technically some 566 00:29:07,820 --> 00:29:10,020 of those differences in timings. 567 00:29:10,020 --> 00:29:12,520 But at the end of the day, this packet has to get somewhere. 568 00:29:12,520 --> 00:29:15,450 And suppose it does make its way over to Google servers, 569 00:29:15,450 --> 00:29:17,930 and Google receives this packet of information, 570 00:29:17,930 --> 00:29:21,650 realizes, oh, someone is searching for cats again. 571 00:29:21,650 --> 00:29:25,890 What does Google actually do in order to respond to that request? 572 00:29:25,890 --> 00:29:29,960 Well, it turns out that Google too is going to use a whole bunch of packets. 573 00:29:29,960 --> 00:29:33,920 And whereas previously, it was their address in the TO field 574 00:29:33,920 --> 00:29:36,030 and my address in the FROM field, now they're 575 00:29:36,030 --> 00:29:40,440 just going to simply reverse this so that the TO field now is to me, 576 00:29:40,440 --> 00:29:44,010 the FROM field is from Google. 577 00:29:44,010 --> 00:29:47,700 And inside of this envelope is going to be their various search results. 578 00:29:47,700 --> 00:29:51,150 Now turns out we found one such search result here. 579 00:29:51,150 --> 00:29:53,887 So if Google has decided to send me back this search result. 580 00:29:53,887 --> 00:29:55,970 Maybe I was feeling lucky and clicked that button. 581 00:29:55,970 --> 00:30:00,580 So I just get back one result. They're going to put the cat into the envelope. 582 00:30:00,580 --> 00:30:02,730 But sometimes, the data is pretty big. 583 00:30:02,730 --> 00:30:06,400 Sometimes this image might be kilobytes, megabytes, or if it's a video file, 584 00:30:06,400 --> 00:30:07,770 could be gigabytes large. 585 00:30:07,770 --> 00:30:10,150 And it would be kind of rude if Google, in order 586 00:30:10,150 --> 00:30:13,850 to send me a really big response, shoved a really big piece of information 587 00:30:13,850 --> 00:30:18,550 in its packet and then clogged the internet so-called tubes on their way 588 00:30:18,550 --> 00:30:21,550 back to my laptop, thereby preventing anyone else from talking to Google 589 00:30:21,550 --> 00:30:24,710 or nearby websites at that same moment in time. 590 00:30:24,710 --> 00:30:28,020 So indeed, what Google and what many websites do 591 00:30:28,020 --> 00:30:34,145 is they leverage a feature of IP, and its sister protocol 592 00:30:34,145 --> 00:30:37,280 TCP that lets us fragment this. 593 00:30:37,280 --> 00:30:40,890 And indeed, they will take this perfectly nice picture of a cat, 594 00:30:40,890 --> 00:30:44,900 and they will fragment it, thanks to IP, into maybe four different pieces, each 595 00:30:44,900 --> 00:30:47,740 of which is smaller than the original. 596 00:30:47,740 --> 00:30:51,330 And inside of this envelope then goes one piece at a time. 597 00:30:51,330 --> 00:30:56,820 And so if I put one such piece in this first envelope. 598 00:30:56,820 --> 00:31:01,830 I can then much more efficiently clearly proceed to transmit this. 599 00:31:01,830 --> 00:31:04,290 And then if I do the same with a second and a third 600 00:31:04,290 --> 00:31:09,407 and maybe a fourth envelope, now Google can respond with one, two, three 601 00:31:09,407 --> 00:31:12,490 and maybe more packets of information that make their way on the internet, 602 00:31:12,490 --> 00:31:14,850 not even necessarily following the same path. 603 00:31:14,850 --> 00:31:16,760 In fact, there's no guarantee that A to B 604 00:31:16,760 --> 00:31:21,120 is going to be the same route as B to A. Things change dynamically over time. 605 00:31:21,120 --> 00:31:23,910 But Google's going to have to include a little bit 606 00:31:23,910 --> 00:31:26,750 more information on this envelope. 607 00:31:26,750 --> 00:31:30,500 It's not sufficient anymore just to send me four envelopes. 608 00:31:30,500 --> 00:31:33,980 What else had they probably best do so that I can actually 609 00:31:33,980 --> 00:31:37,370 see my cat when it gets back to me? 610 00:31:37,370 --> 00:31:40,170 I've got to know how many packets they sent me, 611 00:31:40,170 --> 00:31:41,897 and I need to know in what order. 612 00:31:41,897 --> 00:31:44,230 So it turns out that what Google is probably going to do 613 00:31:44,230 --> 00:31:49,060 is something like this, write on this envelope the number of the packet 614 00:31:49,060 --> 00:31:50,630 and really how many there are. 615 00:31:50,630 --> 00:31:52,588 And this is a bit of a white lie, it's actually 616 00:31:52,588 --> 00:31:55,806 done a little differently thanks to some other fields 617 00:31:55,806 --> 00:31:57,180 that are inside of this envelope. 618 00:31:57,180 --> 00:32:00,760 But we can think of it really as 1/4, 2/4, 3/4, 4/4, 619 00:32:00,760 --> 00:32:03,050 so that if I only get two of these envelopes 620 00:32:03,050 --> 00:32:06,930 or three of these envelopes or four, I now know definitively, wait a minute, 621 00:32:06,930 --> 00:32:08,760 I only got 3/4 of my cat. 622 00:32:08,760 --> 00:32:11,660 And moreover, the ones I did get, I know the order in which 623 00:32:11,660 --> 00:32:13,600 I can reassemble those packets. 624 00:32:13,600 --> 00:32:16,090 Now, I mentioned this other protocol, TCP, 625 00:32:16,090 --> 00:32:18,760 that, indeed often works in conjunction with IP. 626 00:32:18,760 --> 00:32:22,600 And you can think of IP as giving you features like addressing, signing 627 00:32:22,600 --> 00:32:25,322 every computer in the world a unique address, and fragmentation, 628 00:32:25,322 --> 00:32:26,530 being able to chop things up. 629 00:32:26,530 --> 00:32:32,090 But TCP further allows us to associate sequence numbers with packets 630 00:32:32,090 --> 00:32:35,490 that allows me the receiver to know, wait a minute, 631 00:32:35,490 --> 00:32:37,740 I'm missing one or more packets. 632 00:32:37,740 --> 00:32:42,480 So TCP is often said to guarantee delivery, and it is this protocol. 633 00:32:42,480 --> 00:32:44,650 So long as your Mac or your PC or your computer 634 00:32:44,650 --> 00:32:47,190 supports it, which they all do these days. 635 00:32:47,190 --> 00:32:50,060 If it determines, hey, wait a minute, I'm missing this packet, 636 00:32:50,060 --> 00:32:53,520 TCP is the protocol, the set of conventions, that say Google, 637 00:32:53,520 --> 00:32:56,320 I need this packet again or these packets again, 638 00:32:56,320 --> 00:32:58,580 and they will be retransmitted. 639 00:32:58,580 --> 00:33:00,792 Now, you pay a price in terms of performance, 640 00:33:00,792 --> 00:33:03,250 because now you might have to wait for the rest of the cat. 641 00:33:03,250 --> 00:33:06,540 So there might be a bit of a latency in order to get back that response. 642 00:33:06,540 --> 00:33:08,660 And that might not always be desirable. 643 00:33:08,660 --> 00:33:10,540 And indeed, I can think of some scenarios, 644 00:33:10,540 --> 00:33:16,540 like if you're watching a baseball game on TV or soccer or football 645 00:33:16,540 --> 00:33:20,000 where you're watching a live stream-- or maybe it's the Oscars or the Emmys, 646 00:33:20,000 --> 00:33:24,775 or something live, where you really want to stay in sync with that broadcast, 647 00:33:24,775 --> 00:33:26,900 even if sometimes there's network issues or there's 648 00:33:26,900 --> 00:33:29,160 buffering-- you don't necessarily want it to buffer. 649 00:33:29,160 --> 00:33:32,380 You don't necessarily want lost information to be retransmitted. 650 00:33:32,380 --> 00:33:34,810 You'd rather just lose a few seconds of the show 651 00:33:34,810 --> 00:33:37,130 so that at least you're staying current, especially if you're there 652 00:33:37,130 --> 00:33:39,350 with a bunch of other people and it would be just silly if you 653 00:33:39,350 --> 00:33:41,002 gradually over time drift out of date. 654 00:33:41,002 --> 00:33:43,960 And so the rest of the world is finished watching the show or the game, 655 00:33:43,960 --> 00:33:45,790 and you're still chugging along. 656 00:33:45,790 --> 00:33:48,940 So as an alternative to TCP, there's other protocols, one of which 657 00:33:48,940 --> 00:33:53,120 is called UDP that's very often used for live streaming 658 00:33:53,120 --> 00:33:56,300 and for video and applications like that, where you really just want 659 00:33:56,300 --> 00:34:00,510 the software to forge ahead, rather than wait for some new data 660 00:34:00,510 --> 00:34:02,460 to get transmitted. 661 00:34:02,460 --> 00:34:05,669 But there's other things we can do with the internet. 662 00:34:05,669 --> 00:34:08,210 And indeed, there's lots of things we ourselves do every day. 663 00:34:08,210 --> 00:34:11,320 It's not just the web, like in downloading cats from Google. 664 00:34:11,320 --> 00:34:14,050 But there's email, and there's Skype, and Facebook Messenger, 665 00:34:14,050 --> 00:34:15,850 and any number of other services. 666 00:34:15,850 --> 00:34:20,679 So how in the world does a computer upon receiving a packet of information 667 00:34:20,679 --> 00:34:25,239 know if it is an email or if it is a web page, or put more concretely, 668 00:34:25,239 --> 00:34:30,159 how do I know if I should show this user this cat in his or her email program 669 00:34:30,159 --> 00:34:33,219 or in his or her browser, which might be the same? 670 00:34:33,219 --> 00:34:36,310 In other words, how do I distinguish between one type of program 671 00:34:36,310 --> 00:34:38,060 running on the internet from another? 672 00:34:38,060 --> 00:34:43,469 Well, turns out that TCP also provides a standardization of services. 673 00:34:43,469 --> 00:34:46,770 And that is just a fancy way of saying that in addition to saying 674 00:34:46,770 --> 00:34:52,340 on this envelope to who it is and what number it is and from whom it is, 675 00:34:52,340 --> 00:34:56,989 I also need to uniquely identify the type of service 676 00:34:56,989 --> 00:34:58,660 whose information is in that packet. 677 00:34:58,660 --> 00:35:00,300 And I do this just by writing a number. 678 00:35:00,300 --> 00:35:02,470 And I typically write one of these numbers. 679 00:35:02,470 --> 00:35:06,200 80 if that packet is meant to be web information. 680 00:35:06,200 --> 00:35:09,639 So HTTP is the string that most of us type most every day-- or at least 681 00:35:09,639 --> 00:35:12,180 see these days, even though our browsers generally fill it in 682 00:35:12,180 --> 00:35:13,690 if we don't explicitly type it. 683 00:35:13,690 --> 00:35:15,920 It turns out that the world decided years ago 684 00:35:15,920 --> 00:35:19,230 that if you want to send information from yourself 685 00:35:19,230 --> 00:35:22,920 to a web server like Google to request cats, 686 00:35:22,920 --> 00:35:27,150 you had better write the number 80 in the TO field in addition 687 00:35:27,150 --> 00:35:28,940 to Google's IP address. 688 00:35:28,940 --> 00:35:31,890 This way, Google knows it's not an email destined for Gmail, 689 00:35:31,890 --> 00:35:35,870 knows it's not a message destined for Google Hangouts or the like. 690 00:35:35,870 --> 00:35:37,840 Google servers can actually distinguish this 691 00:35:37,840 --> 00:35:41,990 as an HTTP request or web request from any number of other services. 692 00:35:41,990 --> 00:35:45,620 If you're using encryption, HTTPS, that special number 693 00:35:45,620 --> 00:35:47,730 that the world standardized on is 443. 694 00:35:47,730 --> 00:35:49,920 You rarely see this, but it's on the envelopes 695 00:35:49,920 --> 00:35:52,850 that your Macs or PCs are actually sending to Google servers. 696 00:35:52,850 --> 00:35:55,820 Meanwhile, there's other port numbers, so to speak. 697 00:35:55,820 --> 00:35:58,300 If you've ever heard of FTP, file transfer protocol. 698 00:35:58,300 --> 00:36:00,620 This is software that's not recommended anymore, 699 00:36:00,620 --> 00:36:02,760 because it's comply unencrypted. 700 00:36:02,760 --> 00:36:05,880 But it's still unfortunately popular in some applications 701 00:36:05,880 --> 00:36:08,370 or with some less expensive web services. 702 00:36:08,370 --> 00:36:10,670 21 is the number that identifies that service. 703 00:36:10,670 --> 00:36:12,560 And that just means inside of this packet 704 00:36:12,560 --> 00:36:16,270 is information related to transferring files, not a web page per se. 705 00:36:16,270 --> 00:36:18,529 22, SSH, Secure Shell. 706 00:36:18,529 --> 00:36:21,320 This is a very popular protocol, at least among computer scientists 707 00:36:21,320 --> 00:36:25,600 and others, that allows you to run commands on your Mac or PC 708 00:36:25,600 --> 00:36:29,100 on a remote server, but in an encrypted way. 709 00:36:29,100 --> 00:36:31,630 And those kinds of packets contain the number 22. 710 00:36:31,630 --> 00:36:35,290 SMTP-- Simple Mail Transfer Protocol-- is what email generally 711 00:36:35,290 --> 00:36:36,630 is for outbound email. 712 00:36:36,630 --> 00:36:39,420 So if you send an email, your envelopes have 25 on them. 713 00:36:39,420 --> 00:36:42,110 And then lastly, DNS is again that service 714 00:36:42,110 --> 00:36:44,980 that converts host names to IP addresses and vice versa. 715 00:36:44,980 --> 00:36:47,670 So when your Mac or PC asks the world, hey, wait a minute, 716 00:36:47,670 --> 00:36:50,790 what is the IP address for www.google.com? 717 00:36:50,790 --> 00:36:54,390 That envelope has the number 53 on the outside. 718 00:36:54,390 --> 00:36:58,210 And dot dot dot, there's dozens or even hundreds of these other things, 719 00:36:58,210 --> 00:37:00,280 for Skype and for Google Hangouts and the like. 720 00:37:00,280 --> 00:37:02,500 But these here are just some of the most common. 721 00:37:02,500 --> 00:37:03,980 So the envelope, at the end of the day, has 722 00:37:03,980 --> 00:37:05,521 a decent amount of information on it. 723 00:37:05,521 --> 00:37:08,340 The TO address, the FROM address, and that TO address furthermore 724 00:37:08,340 --> 00:37:10,380 has a port number associated with it. 725 00:37:10,380 --> 00:37:12,730 And then, if it's been fragmented especially, 726 00:37:12,730 --> 00:37:15,310 there's got to be some kind of number that 727 00:37:15,310 --> 00:37:21,170 identifies the packet itself so that you can detect if something is missing. 728 00:37:21,170 --> 00:37:23,760 But there's kind of a side effect, or really a feature 729 00:37:23,760 --> 00:37:27,860 of having this level of detail on each of these envelopes. 730 00:37:27,860 --> 00:37:31,584 You've probably heard of a firewall. 731 00:37:31,584 --> 00:37:32,750 Maybe not in the real world. 732 00:37:32,750 --> 00:37:34,500 In the real world, a firewall is literally 733 00:37:34,500 --> 00:37:36,680 a wall that's meant to block fire, typically 734 00:37:36,680 --> 00:37:38,944 in like strip malls and offices or stores that 735 00:37:38,944 --> 00:37:40,360 are next to each other physically. 736 00:37:40,360 --> 00:37:42,560 A firewall is meant to keep a fire that breaks out 737 00:37:42,560 --> 00:37:46,340 in one store from traveling into another store, creating even more damage. 738 00:37:46,340 --> 00:37:50,370 But in the software world, a firewall is a piece of software 739 00:37:50,370 --> 00:37:54,330 that really keeps packets out that you don't want coming in, 740 00:37:54,330 --> 00:37:57,290 or keeps packets in that you don't want going out. 741 00:37:57,290 --> 00:38:02,980 So a firewall might be used by parents to prevent kids 742 00:38:02,980 --> 00:38:05,320 from accessing Facebook or Google, or silly things 743 00:38:05,320 --> 00:38:08,320 during the day for instance, if they want them focusing on other things. 744 00:38:08,320 --> 00:38:10,990 It might be used by universities or corporations 745 00:38:10,990 --> 00:38:13,130 to block access to certain websites that you simply 746 00:38:13,130 --> 00:38:16,790 don't want your students or your staff actually accessing. 747 00:38:16,790 --> 00:38:19,920 It might be used to keep corporate data inside, 748 00:38:19,920 --> 00:38:23,190 so that nothing accidentally leaks out-- financial information, or emails, 749 00:38:23,190 --> 00:38:23,690 or the like. 750 00:38:23,690 --> 00:38:26,830 You can use a firewall to block outbound access as well. 751 00:38:26,830 --> 00:38:30,470 But this invites the question then, how is a firewall implemented? 752 00:38:30,470 --> 00:38:32,860 Well, it's not all that hard, really. 753 00:38:32,860 --> 00:38:36,910 Because if the internet is just a whole bunch of these packets flying 754 00:38:36,910 --> 00:38:41,660 back and forth between computers, between routers, leaving and entering 755 00:38:41,660 --> 00:38:46,000 our own network, whether that's my home or my campus or my company, 756 00:38:46,000 --> 00:38:49,039 I could just have my routers, for instance, 757 00:38:49,039 --> 00:38:51,580 look at every one of those envelopes, look at the TO address, 758 00:38:51,580 --> 00:38:54,770 maybe look at the FROM address, and just blacklist certain addresses. 759 00:38:54,770 --> 00:38:59,780 Indeed, if I know that I don't want my employees accessing Facebook, 760 00:38:59,780 --> 00:39:03,090 I could, for instance, just say to my routers, configure my routers, 761 00:39:03,090 --> 00:39:10,592 do not allow any data going to or from IP address 31.13.80.36. 762 00:39:10,592 --> 00:39:13,800 Now, it might be easier said than done, because in reality, Facebook probably 763 00:39:13,800 --> 00:39:15,260 has multiple IP addresses. 764 00:39:15,260 --> 00:39:18,060 So we might have to grow this list or dig a little deeper in order 765 00:39:18,060 --> 00:39:18,890 to block them. 766 00:39:18,890 --> 00:39:22,650 And better yet, we could potentially look inside of the envelopes themselves 767 00:39:22,650 --> 00:39:25,226 to see, is this a Facebook packet? 768 00:39:25,226 --> 00:39:27,600 But if they're using encryption, which they do by default 769 00:39:27,600 --> 00:39:30,350 these days, that might not really be feasible. 770 00:39:30,350 --> 00:39:32,480 So we can have kind of a heavy-handed solution 771 00:39:32,480 --> 00:39:35,932 there, and just block everything we think is Facebook.com. 772 00:39:35,932 --> 00:39:40,520 But certainly, things might leak out potentially over time if things change. 773 00:39:40,520 --> 00:39:41,650 But what else could we do? 774 00:39:41,650 --> 00:39:45,100 Suppose that I really don't want people Skyping during the day, 775 00:39:45,100 --> 00:39:47,200 or I don't want people using Facebook Messenger, 776 00:39:47,200 --> 00:39:51,620 or some software that has its own unique TCP port number 777 00:39:51,620 --> 00:39:53,930 that some company or the world has standardized on. 778 00:39:53,930 --> 00:39:59,190 You could block all outbound email by just blocking port 25, it would seem, 779 00:39:59,190 --> 00:40:01,020 or a few other ports that are popular. 780 00:40:01,020 --> 00:40:04,850 You could block all web access by blocking 80 and 443. 781 00:40:04,850 --> 00:40:08,080 You could block all DNS traffic, if you really want. 782 00:40:08,080 --> 00:40:10,980 And indeed, a lot of companies do this, especially 783 00:40:10,980 --> 00:40:15,140 like Starbucks kind of places, internet cafes in airports and the like. 784 00:40:15,140 --> 00:40:18,310 Sometimes they only want you using their DNS server, 785 00:40:18,310 --> 00:40:20,410 not your own company's or your own home's. 786 00:40:20,410 --> 00:40:23,750 And so they can block access to any DNS server other than their own. 787 00:40:23,750 --> 00:40:26,650 This is unfortunately often or sometimes for advertising 788 00:40:26,650 --> 00:40:30,060 reasons, so that they can actually keep track of what you're accessing 789 00:40:30,060 --> 00:40:34,100 and where and why-- or where, at least. 790 00:40:34,100 --> 00:40:37,580 But it's all possible technologically with this underneath the hood. 791 00:40:37,580 --> 00:40:39,490 So what are some of the defenses in place, 792 00:40:39,490 --> 00:40:43,090 especially when you want to visit some site that isn't necessarily encrypted? 793 00:40:43,090 --> 00:40:45,644 Or maybe you want to visit some site that is blocked, 794 00:40:45,644 --> 00:40:48,810 and you want to simply be able to work around this, because you're traveling 795 00:40:48,810 --> 00:40:50,690 or you need to be able to access something privately 796 00:40:50,690 --> 00:40:51,990 at your home or your work. 797 00:40:51,990 --> 00:40:56,130 Well it turns out, that there are services called VPNs or Virtual Private 798 00:40:56,130 --> 00:40:56,660 Networks. 799 00:40:56,660 --> 00:40:59,670 And Harvard has one VPN at vpn.harvard.edu. 800 00:40:59,670 --> 00:41:03,565 And Yale has one as well at access.yale.edu. 801 00:41:03,565 --> 00:41:06,440 And this is simply software that you generally download to your phone 802 00:41:06,440 --> 00:41:10,880 or your computer that allows you to connect via some protocol and some port 803 00:41:10,880 --> 00:41:16,080 to your company or to your home's network, but in an encrypted way. 804 00:41:16,080 --> 00:41:19,330 So a VPN gives you an encrypted tunnel, so to speak, 805 00:41:19,330 --> 00:41:21,254 so that you are connected to the internet. 806 00:41:21,254 --> 00:41:22,170 That's a precondition. 807 00:41:22,170 --> 00:41:23,650 You have to get on the internet itself. 808 00:41:23,650 --> 00:41:25,600 But then you configure your Mac or PC to route 809 00:41:25,600 --> 00:41:29,030 all-- in theory-- of your internet traffic through the VPN. 810 00:41:29,030 --> 00:41:32,500 So even if I'm just visiting Gmail or Facebook or whatever on my Mac, 811 00:41:32,500 --> 00:41:35,910 if I'm connected to Harvard's VPN, all of that traffic by design 812 00:41:35,910 --> 00:41:39,570 is going through Harvard.edu first, and then it's 813 00:41:39,570 --> 00:41:42,550 going out to Facebook or Google or wherever it's destined. 814 00:41:42,550 --> 00:41:44,880 Similarly, if I'm traveling in a foreign country that 815 00:41:44,880 --> 00:41:48,790 happens to block a lot of internet access, if they do allow VPN access, 816 00:41:48,790 --> 00:41:53,740 I can, in my hotel room or wherever, connect to Harvard or to Yale, route 817 00:41:53,740 --> 00:41:57,410 all of my internet traffic through Harvard or Yale, and then from Yale 818 00:41:57,410 --> 00:42:00,710 to Harvard to wherever I'm going on the internet. 819 00:42:00,710 --> 00:42:03,390 And the upside of this is that it's entirely encrypted, 820 00:42:03,390 --> 00:42:06,340 which means no one at that company or that country in theory 821 00:42:06,340 --> 00:42:09,530 knows what data is going through the tunnel. 822 00:42:09,530 --> 00:42:13,460 But it also potentially costs me a good amount of time. 823 00:42:13,460 --> 00:42:15,890 We've seen that we're really only talking milliseconds, 824 00:42:15,890 --> 00:42:18,030 but hundreds of milliseconds can certainly add up. 825 00:42:18,030 --> 00:42:21,110 So if I'm abroad, for instance, trying to connect to some website that's 826 00:42:21,110 --> 00:42:26,260 going from that country to Harvard, to the destination, back to Harvard, 827 00:42:26,260 --> 00:42:30,360 back to the country I'm in, your internet connectivity might be slower, 828 00:42:30,360 --> 00:42:33,090 but at least it's not actually permanently blocked. 829 00:42:33,090 --> 00:42:36,120 So if you've ever heard of friends of yours actually accessing services 830 00:42:36,120 --> 00:42:39,234 like Netflix or Hulu, that for licensing reasons, 831 00:42:39,234 --> 00:42:41,400 do restrict you typically to being in this country-- 832 00:42:41,400 --> 00:42:44,270 this is why you might have read that Hulu and Netflix and others are 833 00:42:44,270 --> 00:42:47,130 cracking down on people using VPNs, whether it's 834 00:42:47,130 --> 00:42:49,550 Harvard's or Yale's or a third party companies, 835 00:42:49,550 --> 00:42:52,670 so as to circumvent those licensing restrictions. 836 00:42:52,670 --> 00:42:55,260 But technologically, all it's doing is giving you 837 00:42:55,260 --> 00:42:57,840 an encrypted tunnel between you and someone 838 00:42:57,840 --> 00:43:00,500 you have an affiliation with, like Harvard or Yale, 839 00:43:00,500 --> 00:43:03,060 and encrypting all of your traffic in between there, 840 00:43:03,060 --> 00:43:07,040 and routing all of your traffic through it. 841 00:43:07,040 --> 00:43:11,330 So with that said, we've looked at DNS, and we've looked at DHCP, 842 00:43:11,330 --> 00:43:12,670 and we've looked at routers. 843 00:43:12,670 --> 00:43:14,390 And there's other hardware still, whether, it's 844 00:43:14,390 --> 00:43:16,098 in your home or campus or office, there's 845 00:43:16,098 --> 00:43:19,850 things like switches, which are fairly simple devices that just have lots 846 00:43:19,850 --> 00:43:23,020 of ethernet jacks, so to speak, that you can plug physical cables into, 847 00:43:23,020 --> 00:43:25,390 and those cables can then intercommunicate, 848 00:43:25,390 --> 00:43:28,020 so that you can wire computers together en mass. 849 00:43:28,020 --> 00:43:29,895 There are things called access points or APs. 850 00:43:29,895 --> 00:43:32,603 Those are the things around campus that have the little bunny ear 851 00:43:32,603 --> 00:43:34,050 antennas that are often blinking. 852 00:43:34,050 --> 00:43:36,110 Those are the wireless access points. 853 00:43:36,110 --> 00:43:40,030 And access points often have firewalls, often have routing software built in. 854 00:43:40,030 --> 00:43:44,600 So the line is increasingly blurry these days as to what these small devices do. 855 00:43:44,600 --> 00:43:46,919 So it really is the services that matter. 856 00:43:46,919 --> 00:43:48,710 And indeed, while a little dated, I thought 857 00:43:48,710 --> 00:43:52,050 it would be fun to take a look now at a longer form version of the 60 858 00:43:52,050 --> 00:43:54,187 second trailer of Warriors of the Net that 859 00:43:54,187 --> 00:43:56,770 was made a few years ago to paint a more visual picture of how 860 00:43:56,770 --> 00:43:57,561 the internet works. 861 00:43:57,561 --> 00:44:01,580 It definitely takes some liberties with shall we say accuracy. 862 00:44:01,580 --> 00:44:03,470 But it also helps paint a picture of what 863 00:44:03,470 --> 00:44:05,930 really is going on underneath the hood. 864 00:44:05,930 --> 00:44:08,060 So let's take a look at the internet. 865 00:44:08,060 --> 00:44:13,549 866 00:44:13,549 --> 00:44:15,046 [MUSIC PLAYING] 867 00:44:15,046 --> 00:44:16,543 [VIDEO PLAYBACK] 868 00:44:16,543 --> 00:45:13,340 869 00:45:13,340 --> 00:45:18,450 -For the first time in history, people and machinery 870 00:45:18,450 --> 00:45:22,210 are working together, realizing a dream. 871 00:45:22,210 --> 00:45:26,190 A uniting force that knows no geographical boundaries, without 872 00:45:26,190 --> 00:45:29,670 regard to race, creed, or color. 873 00:45:29,670 --> 00:45:34,780 A new era, where communication truly brings people together. 874 00:45:34,780 --> 00:45:38,170 This is the Dawn of the Net. 875 00:45:38,170 --> 00:45:41,850 876 00:45:41,850 --> 00:45:43,710 Want to know how it works? 877 00:45:43,710 --> 00:45:48,305 Click here to begin your journey into the net. 878 00:45:48,305 --> 00:45:51,490 879 00:45:51,490 --> 00:45:54,710 Now exactly what happened when you clicked on that link? 880 00:45:54,710 --> 00:45:56,930 You started a flow of information. 881 00:45:56,930 --> 00:45:59,630 This information travels down into your own personal mail 882 00:45:59,630 --> 00:46:05,340 room, where Mr. IP packages it, labels it, and send it on its way. 883 00:46:05,340 --> 00:46:07,750 Each packet is limited in its size. 884 00:46:07,750 --> 00:46:12,970 The mailroom must decide how to divide the information, and how to package it. 885 00:46:12,970 --> 00:46:16,810 Now the package needs a label, containing important information, 886 00:46:16,810 --> 00:46:21,655 such as sender's address, receiver's address, and the type of packet it is. 887 00:46:21,655 --> 00:46:38,510 888 00:46:38,510 --> 00:46:41,900 Because this particular packet is going out on to the internet, 889 00:46:41,900 --> 00:46:45,040 it also gets an address for the proxy server, which has 890 00:46:45,040 --> 00:46:48,400 a special function, as we'll see later. 891 00:46:48,400 --> 00:46:53,710 The packet is now launched onto your Local Area Network, or LAN. 892 00:46:53,710 --> 00:46:57,620 This network is used to connect all the local computers, routers, 893 00:46:57,620 --> 00:47:00,210 printers, et cetera for information exchange 894 00:47:00,210 --> 00:47:02,880 within the physical walls of the building. 895 00:47:02,880 --> 00:47:07,740 The LAN is a pretty uncontrolled place, and unfortunately, accidents 896 00:47:07,740 --> 00:47:08,470 can happen. 897 00:47:08,470 --> 00:47:18,110 898 00:47:18,110 --> 00:47:20,470 The highway of the LAN is packed with all types of. 899 00:47:20,470 --> 00:47:25,730 Information these are IP packets, Novell packets, Apple Talk packets. 900 00:47:25,730 --> 00:47:28,120 They're going against traffic, as usual. 901 00:47:28,120 --> 00:47:31,310 The local router reads the address, and if necessary, 902 00:47:31,310 --> 00:47:35,105 lifts the packet onto another network. 903 00:47:35,105 --> 00:47:37,200 Ah, the router. 904 00:47:37,200 --> 00:47:41,770 A symbol of control in a seemingly disorganized world. 905 00:47:41,770 --> 00:47:43,700 [METHODICAL MUTTERING] 906 00:47:43,700 --> 00:47:52,280 907 00:47:52,280 --> 00:47:57,690 There he is, systematic, uncaring, methodical, conservative, 908 00:47:57,690 --> 00:48:00,910 and sometimes not quite up to speed. 909 00:48:00,910 --> 00:48:03,520 But at least he is exact, for the most part. 910 00:48:03,520 --> 00:48:18,900 911 00:48:18,900 --> 00:48:21,890 As the packets leave the router, they make their way 912 00:48:21,890 --> 00:48:27,700 into the corporate internet and head for the router switch. 913 00:48:27,700 --> 00:48:30,410 A bit more efficient than the router, the router switch 914 00:48:30,410 --> 00:48:35,570 plays fast and loose with IP packets, deftly routing them along the way. 915 00:48:35,570 --> 00:48:40,830 A digital pinball wizard, if you will. 916 00:48:40,830 --> 00:48:44,686 [ERRATIC MUTTERING] 917 00:48:44,686 --> 00:48:56,550 918 00:48:56,550 --> 00:48:59,210 As packets arrive at their destination, they're 919 00:48:59,210 --> 00:49:05,000 picked up by the network interface, Ready to be sent to the next level. 920 00:49:05,000 --> 00:49:08,060 In this case, the proxy. 921 00:49:08,060 --> 00:49:11,870 The proxy is used by many companies as sort of a middleman 922 00:49:11,870 --> 00:49:14,850 in order to lessen the load on their internet connection, 923 00:49:14,850 --> 00:49:18,970 and for security reasons as well. 924 00:49:18,970 --> 00:49:23,030 As you can see, the packets are all of various sizes, 925 00:49:23,030 --> 00:49:24,415 depending on their content. 926 00:49:24,415 --> 00:49:42,010 927 00:49:42,010 --> 00:49:48,750 The proxy opens the packet and looks for the web address or URL. 928 00:49:48,750 --> 00:49:51,790 Depending upon whether the address is acceptable, 929 00:49:51,790 --> 00:49:53,820 the packet is sent on to the internet. 930 00:49:53,820 --> 00:50:00,660 931 00:50:00,660 --> 00:50:04,060 There are, however, some addresses which do not 932 00:50:04,060 --> 00:50:06,770 meet with the approval of the proxy. 933 00:50:06,770 --> 00:50:09,583 That is to say, corporate or management guidelines. 934 00:50:09,583 --> 00:50:12,830 935 00:50:12,830 --> 00:50:17,380 These are summarily dealt with. 936 00:50:17,380 --> 00:50:19,260 We'll have none of that. 937 00:50:19,260 --> 00:50:22,240 For those who make it, it's on the road again. 938 00:50:22,240 --> 00:50:33,580 939 00:50:33,580 --> 00:50:35,920 Next up, the firewall. 940 00:50:35,920 --> 00:50:39,867 941 00:50:39,867 --> 00:50:44,250 The corporate firewall serves two purposes. 942 00:50:44,250 --> 00:50:46,910 It prevents some rather nasty things from the internet 943 00:50:46,910 --> 00:50:50,090 from coming into the intranet, and it can also 944 00:50:50,090 --> 00:50:54,650 prevent sensitive corporate information from being sent out onto the internet. 945 00:50:54,650 --> 00:50:57,240 946 00:50:57,240 --> 00:50:59,980 Once through the firewall, a router picks up the packet, 947 00:50:59,980 --> 00:51:04,980 and places it onto a much narrower road, or bandwidth, as we say. 948 00:51:04,980 --> 00:51:10,470 Obviously, the road is not broad enough to take them all. 949 00:51:10,470 --> 00:51:13,310 Now, you might wonder what happens to all those packets 950 00:51:13,310 --> 00:51:16,110 which don't make it along the way. 951 00:51:16,110 --> 00:51:19,200 Well, when Mr. IP doesn't receive an acknowledgement 952 00:51:19,200 --> 00:51:22,040 that a packet has been received in due time, 953 00:51:22,040 --> 00:51:24,200 he simply sends a replacement packet. 954 00:51:24,200 --> 00:51:26,810 955 00:51:26,810 --> 00:51:32,260 We are now ready to enter the world of the internet, a spider 956 00:51:32,260 --> 00:51:37,400 web of interconnected networks which span our entire globe. 957 00:51:37,400 --> 00:51:42,990 Here, routers and switches establish links between networks. 958 00:51:42,990 --> 00:51:45,540 Now, the net is an entirely different environment 959 00:51:45,540 --> 00:51:48,520 than you'll find within the protective walls of your LAN. 960 00:51:48,520 --> 00:51:51,100 Out here, it's the Wild West. 961 00:51:51,100 --> 00:51:53,350 Plenty of space, plenty of opportunities, 962 00:51:53,350 --> 00:51:56,760 plenty of things to explore and places to go. 963 00:51:56,760 --> 00:51:59,650 Thanks to very little control and regulation, 964 00:51:59,650 --> 00:52:05,270 new ideas find fertile soil to push the envelope of their possibilities. 965 00:52:05,270 --> 00:52:09,270 But because of this freedom, certain dangers also lurk. 966 00:52:09,270 --> 00:52:14,020 You'll never know when you'll meet the dreaded ping of death. 967 00:52:14,020 --> 00:52:18,330 A special version of a normal request ping, which some idiot thought up 968 00:52:18,330 --> 00:52:22,660 to mess up unsuspecting hosts. 969 00:52:22,660 --> 00:52:27,420 The path our packets take may be via satellite, telephone lines, wireless, 970 00:52:27,420 --> 00:52:29,790 or even transoceanic cable. 971 00:52:29,790 --> 00:52:33,450 They don't always take the fastest or shortest routes possible, 972 00:52:33,450 --> 00:52:37,170 but they will get there eventually. 973 00:52:37,170 --> 00:52:42,110 Maybe that's why it's sometimes called the world wide wait. 974 00:52:42,110 --> 00:52:44,920 But when everything is working smoothly, you 975 00:52:44,920 --> 00:52:50,750 can circumvent the globe five times over at the drop of a hat, literally. 976 00:52:50,750 --> 00:52:55,150 And all for the cost of a local call or less. 977 00:52:55,150 --> 00:52:58,971 Near the end of our destination, we'll find another firewall. 978 00:52:58,971 --> 00:53:02,200 979 00:53:02,200 --> 00:53:05,320 Depending upon your perspective as a data packet, 980 00:53:05,320 --> 00:53:10,570 the firewall could be a bastion of security or a dreaded adversary. 981 00:53:10,570 --> 00:53:15,480 It all depends on which side you're on and what your intentions are. 982 00:53:15,480 --> 00:53:21,710 The firewall is designed to let in only those packets that meet its criteria. 983 00:53:21,710 --> 00:53:26,200 This firewall is operating on ports 80 and 25. 984 00:53:26,200 --> 00:53:29,865 All attempts to enter through other ports are closed for business. 985 00:53:29,865 --> 00:53:44,310 986 00:53:44,310 --> 00:53:52,580 Port 25 is used for mail packets, while port 80 is the entrance for packets 987 00:53:52,580 --> 00:53:54,285 from the internet to the web server. 988 00:53:54,285 --> 00:53:57,710 989 00:53:57,710 --> 00:54:02,170 Inside the firewall, packets are screened more thoroughly. 990 00:54:02,170 --> 00:54:04,830 Some packets make it easily through customs, 991 00:54:04,830 --> 00:54:08,310 while others look just a bit dubious. 992 00:54:08,310 --> 00:54:10,910 The firewall officer is not easily fooled, 993 00:54:10,910 --> 00:54:16,580 such as when this ping of death packet tries to disguise itself 994 00:54:16,580 --> 00:54:18,230 as a normal ping packet. 995 00:54:18,230 --> 00:54:24,240 996 00:54:24,240 --> 00:54:27,320 For those packets lucky enough to make it this far, 997 00:54:27,320 --> 00:54:29,548 the journey is almost over. 998 00:54:29,548 --> 00:54:32,620 999 00:54:32,620 --> 00:54:39,120 It's just a line up on the interface to be taken up into the web server. 1000 00:54:39,120 --> 00:54:42,350 Nowadays, a web server can run on many things, 1001 00:54:42,350 --> 00:54:46,380 from a mainframe to a webcam to the computer on your desk. 1002 00:54:46,380 --> 00:54:48,530 Why not your refrigerator? 1003 00:54:48,530 --> 00:54:51,070 With a proper set up, you can find out if you 1004 00:54:51,070 --> 00:54:55,220 have the makings for chicken cacciatore or if you have to go shopping. 1005 00:54:55,220 --> 00:54:58,510 Remember, this is the dawn of the net. 1006 00:54:58,510 --> 00:55:00,163 Almost anything's possible. 1007 00:55:00,163 --> 00:55:04,300 1008 00:55:04,300 --> 00:55:09,160 One by one, the packets are received, opened, and unpacked. 1009 00:55:09,160 --> 00:55:13,230 1010 00:55:13,230 --> 00:55:18,050 The information they contain, that is, your request for information, 1011 00:55:18,050 --> 00:55:20,145 is sent on to the web server application. 1012 00:55:20,145 --> 00:55:28,420 1013 00:55:28,420 --> 00:55:37,000 The packet itself is recycled, ready to be used again, and filled 1014 00:55:37,000 --> 00:55:47,300 with your requested information, addressed, and sent out on its way 1015 00:55:47,300 --> 00:55:53,440 back to you, back past the firewall, routers, 1016 00:55:53,440 --> 00:56:08,050 and on through to the internet, back through your corporate firewall, 1017 00:56:08,050 --> 00:56:14,330 and onto your interface, ready to supply your web browser with the information 1018 00:56:14,330 --> 00:56:19,860 you requested, that is, this film. 1019 00:56:19,860 --> 00:56:26,550 1020 00:56:26,550 --> 00:56:30,460 Pleased with their efforts, and trusting in a better world, 1021 00:56:30,460 --> 00:56:33,830 our trusty data packets ride off blissfully 1022 00:56:33,830 --> 00:56:38,620 into the sunset of another day, knowing fully they 1023 00:56:38,620 --> 00:56:40,690 have served their masters well. 1024 00:56:40,690 --> 00:56:43,730 1025 00:56:43,730 --> 00:56:47,010 Now isn't that a happy ending? 1026 00:56:47,010 --> 00:56:55,454 1027 00:56:55,454 --> 00:56:56,933 [END PLAYBACK] 1028 00:56:56,933 --> 00:56:59,652 DAVID MALAN: All right, so that is how the internet works. 1029 00:56:59,652 --> 00:57:01,860 And as has been our tendency over the past few weeks, 1030 00:57:01,860 --> 00:57:05,000 now that we know how we can get data from point A to point B, 1031 00:57:05,000 --> 00:57:07,290 we can abstract above that, and just take 1032 00:57:07,290 --> 00:57:09,850 for granted now that we can move data from point A to point B 1033 00:57:09,850 --> 00:57:12,160 and start moving the actual data. 1034 00:57:12,160 --> 00:57:16,280 So that invites the question now of what is inside this envelope. 1035 00:57:16,280 --> 00:57:19,700 When I get a response back from Google containing a whole bunch of cats, 1036 00:57:19,700 --> 00:57:23,770 or when I get back my news feed from Facebook, or my inbox from Google. 1037 00:57:23,770 --> 00:57:26,730 Well, inside of these packets quite often 1038 00:57:26,730 --> 00:57:32,942 is messages that conform to HTTP, the Hypertext Transfer Protocol. 1039 00:57:32,942 --> 00:57:35,650 So this is just one of those services that we alluded to earlier. 1040 00:57:35,650 --> 00:57:40,410 Among them also were SSH, and DNS, and SMTP, and yet others. 1041 00:57:40,410 --> 00:57:43,420 But HTTP is perhaps by far the most common one in so far 1042 00:57:43,420 --> 00:57:46,140 as we use the web so much these days. 1043 00:57:46,140 --> 00:57:51,080 So inside of HTTP, there are certain types 1044 00:57:51,080 --> 00:57:55,190 of messages, messages that conform to certain patterns by which we 1045 00:57:55,190 --> 00:57:56,530 get information. 1046 00:57:56,530 --> 00:57:58,780 Now, what is the P in HTTP? 1047 00:57:58,780 --> 00:58:00,330 HTTP, Hypertext Transfer Protocol. 1048 00:58:00,330 --> 00:58:02,769 Well, let me borrow Arthuro over here. 1049 00:58:02,769 --> 00:58:04,810 And we have this silly human convention of course 1050 00:58:04,810 --> 00:58:07,935 that when you meet someone for the first time or the first time in a while, 1051 00:58:07,935 --> 00:58:09,930 you say, oh, hi, my name is David. 1052 00:58:09,930 --> 00:58:11,500 Nice to meet you, Arthuro. 1053 00:58:11,500 --> 00:58:13,000 And we exchange hands. 1054 00:58:13,000 --> 00:58:16,570 And when I put out my hand, Arthuro knows to put out his hand. 1055 00:58:16,570 --> 00:58:18,240 And then we do this silly handshake. 1056 00:58:18,240 --> 00:58:19,005 Why is that? 1057 00:58:19,005 --> 00:58:20,130 Well, it's just a protocol. 1058 00:58:20,130 --> 00:58:20,820 It's a convention. 1059 00:58:20,820 --> 00:58:23,486 It's a set of conventions that we humans for better or for worse 1060 00:58:23,486 --> 00:58:25,550 have adopted by which we greet each other. 1061 00:58:25,550 --> 00:58:30,820 Similarly do computers have protocols via which they communicate, 1062 00:58:30,820 --> 00:58:34,430 and sets of conventions that govern how you start to communicate 1063 00:58:34,430 --> 00:58:36,626 and how you finish communicating. 1064 00:58:36,626 --> 00:58:38,500 So what do those messages actually look like? 1065 00:58:38,500 --> 00:58:40,830 The simplest of them is quite literally this verb 1066 00:58:40,830 --> 00:58:44,010 here, get, whereby inside of this envelope, when 1067 00:58:44,010 --> 00:58:46,430 I'm requesting information of Google for the first time-- 1068 00:58:46,430 --> 00:58:48,460 and indeed, I put that message before, search 1069 00:58:48,460 --> 00:58:53,680 for cats-- that actually has a certain message at the top of it, really, 1070 00:58:53,680 --> 00:58:55,222 that is literally get. 1071 00:58:55,222 --> 00:58:58,430 There's a little more information, but at the end of the day, it just is get. 1072 00:58:58,430 --> 00:59:01,180 Specifically, these are the first couple of lines 1073 00:59:01,180 --> 00:59:04,640 inside of any request that my browser makes of a web server, 1074 00:59:04,640 --> 00:59:06,480 like in this case, harvard.edu. 1075 00:59:06,480 --> 00:59:10,230 If I want to get the default home page of Harvard., I literally, 1076 00:59:10,230 --> 00:59:15,570 inside of my envelope, write this message-- GET slash space HTTP/1.1, 1077 00:59:15,570 --> 00:59:17,800 which is the latest version of HTTp that people use. 1078 00:59:17,800 --> 00:59:21,030 Then below that, I specify the host that I want to talk to, just in case 1079 00:59:21,030 --> 00:59:24,850 Harvard or Google or whoever has multiple domain names physically 1080 00:59:24,850 --> 00:59:27,140 running on the same servers, which is possible. 1081 00:59:27,140 --> 00:59:29,256 So I say host, www.harvard.edu. 1082 00:59:29,256 --> 00:59:30,880 And then maybe there's some other text. 1083 00:59:30,880 --> 00:59:34,390 But this first line or two is really the most important. 1084 00:59:34,390 --> 00:59:36,110 And then what comes back from the server, 1085 00:59:36,110 --> 00:59:38,660 whether it's being sent to Harvard or being sent to Yale, 1086 00:59:38,660 --> 00:59:43,820 is a response that hopefully says is literally, OK, inside 1087 00:59:43,820 --> 00:59:47,880 of which is the cat or inside of which is the inbox for Gmail 1088 00:59:47,880 --> 00:59:50,390 or inside of which is my news feed from Facebook. 1089 00:59:50,390 --> 00:59:53,930 All of which typically are in this language here, 1090 00:59:53,930 --> 00:59:56,580 HTML-- HyperText Markup Language. 1091 00:59:56,580 --> 01:00:01,352 So whereas HTTP is a protocol, like a sort of handshake agreement 1092 01:00:01,352 --> 01:00:04,060 that governs that when I want to request information of a server, 1093 01:00:04,060 --> 01:00:06,200 I should say GET and then a few other words, 1094 01:00:06,200 --> 01:00:09,880 and then the server should respond with OK and a few other words, 1095 01:00:09,880 --> 01:00:13,540 HTML is the language in which the actual web 1096 01:00:13,540 --> 01:00:17,360 pages that are coming back from Google or Facebook or Harvard or Yale 1097 01:00:17,360 --> 01:00:18,360 are actually written in. 1098 01:00:18,360 --> 01:00:20,990 It's not a programming language like C or Scratch. 1099 01:00:20,990 --> 01:00:23,350 It's a markup language, as we'll see, that 1100 01:00:23,350 --> 01:00:25,390 really controls formatting and layout. 1101 01:00:25,390 --> 01:00:28,730 There aren't ifs and loops and other such constructs instead. 1102 01:00:28,730 --> 01:00:32,120 But that's what's below the dot dot dot when the response comes back 1103 01:00:32,120 --> 01:00:35,990 from Harvard or Yale or Google is this language HTML. 1104 01:00:35,990 --> 01:00:40,960 Now, 200 is a status code, so to speak, that we almost never actually see 1105 01:00:40,960 --> 01:00:41,970 from a server. 1106 01:00:41,970 --> 01:00:46,710 But odds are, some of you have seen at least one of these status codes before. 1107 01:00:46,710 --> 01:00:49,780 And perhaps the most obvious or the most familiar 1108 01:00:49,780 --> 01:00:53,140 is probably this one here, when you've requested some web page, 1109 01:00:53,140 --> 01:00:57,070 and either it doesn't exist anymore or you have a typo more commonly 1110 01:00:57,070 --> 01:00:58,830 or the URL is broken for some reason. 1111 01:00:58,830 --> 01:01:01,950 Odds are you have literally seen the status code 404, 1112 01:01:01,950 --> 01:01:03,960 because the server is just showing it to you. 1113 01:01:03,960 --> 01:01:06,420 But at a lower level, these numbers are actually 1114 01:01:06,420 --> 01:01:08,580 typically sent in these packets of information 1115 01:01:08,580 --> 01:01:10,910 back and forth from the server to me. 1116 01:01:10,910 --> 01:01:15,540 But we'll see before long that you can use status codes like 301 and 302 1117 01:01:15,540 --> 01:01:17,680 to you induce redirects, so to speak. 1118 01:01:17,680 --> 01:01:20,250 If you want to send the user from one URL to another-- maybe 1119 01:01:20,250 --> 01:01:22,620 the domain name is changed-- you can do that there. 1120 01:01:22,620 --> 01:01:25,870 For efficiency, a server can say 304, not modified. 1121 01:01:25,870 --> 01:01:27,920 As in, you already asked me for this page. 1122 01:01:27,920 --> 01:01:29,795 It hasn't modified since you asked me for it, 1123 01:01:29,795 --> 01:01:31,711 I'm not going to send it to you again, thereby 1124 01:01:31,711 --> 01:01:33,390 saving a bit of time and bandwidth. 1125 01:01:33,390 --> 01:01:35,610 Unauthorized or forbidden generally means 1126 01:01:35,610 --> 01:01:38,090 that you don't have access to the file for some reason. 1127 01:01:38,090 --> 01:01:39,570 And 500's actually pretty bad. 1128 01:01:39,570 --> 01:01:42,653 So we'll probably induce this ourselves before long when we actually write 1129 01:01:42,653 --> 01:01:44,550 programs that run on a web server. 1130 01:01:44,550 --> 01:01:47,600 But 500 means there's generally a problem in your code 1131 01:01:47,600 --> 01:01:51,680 that's supposed to be serving up web content to browsers. 1132 01:01:51,680 --> 01:01:54,180 So let's actually see these kinds of things too. 1133 01:01:54,180 --> 01:01:58,300 It turns out that I can pretend to be a browser at my command line here. 1134 01:01:58,300 --> 01:02:00,540 In fact, I can use a program called Telnet, 1135 01:02:00,540 --> 01:02:03,540 which is an older program, similar in spirit to something called SSH, 1136 01:02:03,540 --> 01:02:05,770 which I mentioned earlier, but it's not encrypted. 1137 01:02:05,770 --> 01:02:10,690 But it allows me to connect to a remote server specifically on a certain port. 1138 01:02:10,690 --> 01:02:16,160 So I for instance, can connect to harvard.edu and on port 80 1139 01:02:16,160 --> 01:02:17,220 specifically. 1140 01:02:17,220 --> 01:02:21,389 I could actually with textual commands send emails to Harvard in this way, 1141 01:02:21,389 --> 01:02:23,180 or send chat messages if they support that. 1142 01:02:23,180 --> 01:02:27,080 But for now, we're focusing only on HTTP, the unencrypted version. 1143 01:02:27,080 --> 01:02:28,990 And if I go ahead and hit enter, you'll see 1144 01:02:28,990 --> 01:02:33,970 that I'm connected to www.harvard.edu.cdn.cloudflare.net, 1145 01:02:33,970 --> 01:02:35,130 which is curious. 1146 01:02:35,130 --> 01:02:39,280 But it turns out-- and we could see this if we poked around with nslookup again. 1147 01:02:39,280 --> 01:02:42,360 It turns out that Harvard is also outsourcing its home 1148 01:02:42,360 --> 01:02:47,290 page to a third party CDN-- Content Delivery Network-- called Cloudflare, 1149 01:02:47,290 --> 01:02:49,550 so Harvard's servers really live elsewhere. 1150 01:02:49,550 --> 01:02:52,904 And now I talked too long and the connection got automatically closed. 1151 01:02:52,904 --> 01:02:56,070 So let me go ahead and redo this, and just pretend to be a browser by typing 1152 01:02:56,070 --> 01:03:04,660 GET/HTTP/1.1 host www.harvard.edu and then Enter Enter twice. 1153 01:03:04,660 --> 01:03:09,190 And it flew across the screen, but let me scroll back up to the top. 1154 01:03:09,190 --> 01:03:13,600 This is-- even though it might look cryptic to you at the moment 1155 01:03:13,600 --> 01:03:17,690 if you've never made web pages before-- this is this language called HTML. 1156 01:03:17,690 --> 01:03:21,570 And it's quite a lot of HTML, so let me keep scrolling up and up and up and up. 1157 01:03:21,570 --> 01:03:27,630 Until hopefully if we go up high enough-- oh, I've exceeded my buffer. 1158 01:03:27,630 --> 01:03:29,130 So I'm going to do this differently. 1159 01:03:29,130 --> 01:03:30,963 I'm going to go ahead and-- you might recall 1160 01:03:30,963 --> 01:03:37,479 from a past problem, where you can actually redirect the output to a file. 1161 01:03:37,479 --> 01:03:40,270 So I'm going to go ahead and save this in a file called output.txt. 1162 01:03:40,270 --> 01:03:48,330 GET/HTTP/1.1 host www.harvard.edu, enter, enter. 1163 01:03:48,330 --> 01:03:53,740 And now I'm going to go ahead and open this file, which is here. 1164 01:03:53,740 --> 01:03:56,640 And you can see that what just happened was this. 1165 01:03:56,640 --> 01:03:59,860 The server responded with 200 OK, which is great. 1166 01:03:59,860 --> 01:04:03,227 And then the date of the server in Greenwich Mean Time. 1167 01:04:03,227 --> 01:04:04,560 And then a bunch of information. 1168 01:04:04,560 --> 01:04:06,480 Cookies, we'll come back to these before long. 1169 01:04:06,480 --> 01:04:08,601 But those will be germane to when we actually 1170 01:04:08,601 --> 01:04:10,100 write our own software for the read. 1171 01:04:10,100 --> 01:04:12,030 Drupal, seems that Harvard's website is using 1172 01:04:12,030 --> 01:04:16,562 Drupal, a popular content management software for websites. 1173 01:04:16,562 --> 01:04:18,520 And then there's some other stuff about caching 1174 01:04:18,520 --> 01:04:20,330 and when the site expires and so forth. 1175 01:04:20,330 --> 01:04:21,371 This is a little strange. 1176 01:04:21,371 --> 01:04:23,300 Harvard's website apparently expired in 1978. 1177 01:04:23,300 --> 01:04:24,710 But more on that another time. 1178 01:04:24,710 --> 01:04:28,640 And so there's some interesting HTTP headers 1179 01:04:28,640 --> 01:04:34,090 besides things like the host field that we sent and the GET and the OK 1180 01:04:34,090 --> 01:04:36,369 that I mentioned earlier as well. 1181 01:04:36,369 --> 01:04:38,660 Now, Telnet is not a very user-friendly way to do this. 1182 01:04:38,660 --> 01:04:41,590 I'm going to actually redo this with a different command, Curl, 1183 01:04:41,590 --> 01:04:46,070 whereby I can do a curl-I, and I'm going to then do 1184 01:04:46,070 --> 01:04:50,370 the full URL-- www.harvard.edu, Enter. 1185 01:04:50,370 --> 01:04:52,310 And now what's nice with curl. 1186 01:04:52,310 --> 01:04:54,080 Is that I don't actually see the HTML. 1187 01:04:54,080 --> 01:04:58,180 I only see in this case the HTTP headers, which are still quite a few, 1188 01:04:58,180 --> 01:05:00,960 but we can now at least see them a little more readily. 1189 01:05:00,960 --> 01:05:03,260 In fact, let me go and do the same now for yale.edu, 1190 01:05:03,260 --> 01:05:06,860 and see if we can glean any differences in their servers. 1191 01:05:06,860 --> 01:05:08,040 There we go here. 1192 01:05:08,040 --> 01:05:12,267 So the headers that are coming back for Yale are these that I've highlighted. 1193 01:05:12,267 --> 01:05:14,850 And it looks too that there's some interesting stuff going on. 1194 01:05:14,850 --> 01:05:17,540 It seems that Yale also uses Drupal. 1195 01:05:17,540 --> 01:05:21,940 So it seems that both universities are doing something rather familiar. 1196 01:05:21,940 --> 01:05:24,680 But most of this information is not all that useful. 1197 01:05:24,680 --> 01:05:27,920 But it is useful if maybe we do this. 1198 01:05:27,920 --> 01:05:31,730 What if we visit, for instance-- why don't we 1199 01:05:31,730 --> 01:05:41,410 go to HTTP-- how about we go to reference.cs50.net, which you might 1200 01:05:41,410 --> 01:05:43,670 use as an alternative to man pages. 1201 01:05:43,670 --> 01:05:45,610 And this is a little curious. 1202 01:05:45,610 --> 01:05:47,340 It moved permanently. 1203 01:05:47,340 --> 01:05:49,120 This is not 200 OK. 1204 01:05:49,120 --> 01:05:50,110 Move permanently. 1205 01:05:50,110 --> 01:05:51,100 Where did it go? 1206 01:05:51,100 --> 01:05:54,752 Well, wait a minute, let me go ahead and highlight that URL. 1207 01:05:54,752 --> 01:05:56,960 And let me go ahead in another tab and just go there. 1208 01:05:56,960 --> 01:05:58,280 OK, it's there. 1209 01:05:58,280 --> 01:06:00,100 So where did it move to? 1210 01:06:00,100 --> 01:06:02,010 And in fact, if I look at the domain again, 1211 01:06:02,010 --> 01:06:04,560 it is indeed there, but notice this. 1212 01:06:04,560 --> 01:06:08,290 Almost all of CS50's website's actually run not over HTTP per se 1213 01:06:08,290 --> 01:06:12,020 but HTTPS, where the S means secure, whereby 1214 01:06:12,020 --> 01:06:15,550 all of our websites for the most part are encrypted. 1215 01:06:15,550 --> 01:06:16,870 But that's not what I typed. 1216 01:06:16,870 --> 01:06:20,957 I just went to http://reference.cs50.net. 1217 01:06:20,957 --> 01:06:23,540 And yet when I do that with this command line interface, which 1218 01:06:23,540 --> 01:06:29,510 mimics the behavior of a browser, if I visit HTTP, I'm told by CS50's server, 1219 01:06:29,510 --> 01:06:31,990 moved permanently, status code 301. 1220 01:06:31,990 --> 01:06:35,860 But notice this one other header that's kind of interesting-- location. 1221 01:06:35,860 --> 01:06:38,560 This location header-- and a header to be clear 1222 01:06:38,560 --> 01:06:41,160 is just a word, a colon, and then a value. 1223 01:06:41,160 --> 01:06:44,950 This header specifies where we move to. 1224 01:06:44,950 --> 01:06:49,600 So this seems to be a mechanism whereby using HTTP 1225 01:06:49,600 --> 01:06:52,360 headers-- sort of messages inside the envelope 1226 01:06:52,360 --> 01:06:55,960 that the human doesn't really see, but that the browser doesn't understand. 1227 01:06:55,960 --> 01:06:57,970 This seems to be a way that we can forcibly 1228 01:06:57,970 --> 01:07:01,380 redirect all users from the insecure version of our website 1229 01:07:01,380 --> 01:07:04,600 to the secure version, so that thereafter, all of the information 1230 01:07:04,600 --> 01:07:05,100 is secure. 1231 01:07:05,100 --> 01:07:08,183 And frankly, there's not all that much private information going on there. 1232 01:07:08,183 --> 01:07:10,860 But if you don't really want the whole world or the NSA 1233 01:07:10,860 --> 01:07:14,890 or Harvard or Yale knowing what pages, what functions you 1234 01:07:14,890 --> 01:07:19,340 need to look up on reference.cs50.net, by forcing everything 1235 01:07:19,340 --> 01:07:23,770 to HTTPS, in theory, everything is perfectly secure now so 1236 01:07:23,770 --> 01:07:25,920 that only you know what pages you're visiting. 1237 01:07:25,920 --> 01:07:27,500 And we, since we run the server. 1238 01:07:27,500 --> 01:07:28,710 But no one in between. 1239 01:07:28,710 --> 01:07:33,140 And indeed, that's one of the biggest values of using HTTPS-based URLs, 1240 01:07:33,140 --> 01:07:35,550 so that even if there is some man in the middle, 1241 01:07:35,550 --> 01:07:39,800 so to speak, a bad guy, an adversary between you and that remote server, 1242 01:07:39,800 --> 01:07:42,540 whether it's here on campus or in Starbucks or the airport 1243 01:07:42,540 --> 01:07:45,780 or some random adversary on the internet, he or she in theory 1244 01:07:45,780 --> 01:07:48,980 should not be able to see anything between points A and B 1245 01:07:48,980 --> 01:07:53,860 if you are, as before using a VPN between those points or two, 1246 01:07:53,860 --> 01:07:59,350 using a protocol like HTTPS that by design is encrypting information. 1247 01:07:59,350 --> 01:08:03,370 And suffice it to say the encryption is far fancier than Caesar or Vegener. 1248 01:08:03,370 --> 01:08:05,670 But it is indeed similar in spirit, where 1249 01:08:05,670 --> 01:08:10,040 those zeros and ones going back and forth are scrambled in some way 1250 01:08:10,040 --> 01:08:18,350 that only you and the point B server can actually decode them or decrypt them. 1251 01:08:18,350 --> 01:08:20,424 So let's visit an actual website now, Google. 1252 01:08:20,424 --> 01:08:23,340 But before we do that, let's turn off some of the more modern features 1253 01:08:23,340 --> 01:08:25,880 by going to Setting, going to Search Settings, 1254 01:08:25,880 --> 01:08:28,359 and turn off so-called instant results. 1255 01:08:28,359 --> 01:08:30,810 Because for our purposes today, instant results 1256 01:08:30,810 --> 01:08:32,785 use a technology or language called JavaScript, 1257 01:08:32,785 --> 01:08:35,160 which we'll get to in a few weeks' time, but for now it's 1258 01:08:35,160 --> 01:08:38,460 just going to be a distraction from the underlying HTTP feature. 1259 01:08:38,460 --> 01:08:41,970 So I'm going to go ahead and indeed never show instant results. 1260 01:08:41,970 --> 01:08:47,340 So that now when I search for something like cats on google.com and hit Enter, 1261 01:08:47,340 --> 01:08:52,420 I'm going to find myself at a fairly long URL, indeed this URL here. 1262 01:08:52,420 --> 01:08:55,630 And I have no idea what most of this URL means, not knowing 1263 01:08:55,630 --> 01:08:57,399 how Google works underneath the hood. 1264 01:08:57,399 --> 01:08:59,779 But I'm looking for some familiar patterns. 1265 01:08:59,779 --> 01:09:04,609 And indeed, if I pretty much a little ignorantly but hopefully cleverly just 1266 01:09:04,609 --> 01:09:08,010 delete anything I don't understand, I'm going to deliberately leave myself 1267 01:09:08,010 --> 01:09:11,286 with just the essence of this URL. 1268 01:09:11,286 --> 01:09:13,529 So notice, I didn't type this URL. 1269 01:09:13,529 --> 01:09:18,120 I ended up at this URL after I typed in cats to that search box and hit Enter. 1270 01:09:18,120 --> 01:09:20,120 Now I found myself in a really long URL and then 1271 01:09:20,120 --> 01:09:22,899 I just started deleting things I didn't understand to distill 1272 01:09:22,899 --> 01:09:25,279 this URL into quite simply this. 1273 01:09:25,279 --> 01:09:33,279 https://www.google.com/search?q=cats. 1274 01:09:33,279 --> 01:09:36,020 Well, it turns out that much like in the world of C, 1275 01:09:36,020 --> 01:09:39,200 you have functions from CS50 like getString and getInt, 1276 01:09:39,200 --> 01:09:42,020 or if you implement them yourself, scanF or other such functions 1277 01:09:42,020 --> 01:09:43,740 whereby you can get user input. 1278 01:09:43,740 --> 01:09:49,479 It's less obvious at first glance how a web server can get input from a user. 1279 01:09:49,479 --> 01:09:53,510 Because there is no-- well, rather, you can see the search 1280 01:09:53,510 --> 01:09:57,320 box that I typed into, but until I hit Enter, 1281 01:09:57,320 --> 01:09:59,914 the server doesn't see that information necessarily. 1282 01:09:59,914 --> 01:10:02,830 And that's a bit of a white lie, because nowadays thanks to JavaScript 1283 01:10:02,830 --> 01:10:04,705 and thanks to autocomplete, Google's actually 1284 01:10:04,705 --> 01:10:06,420 seeing every keystroke you type. 1285 01:10:06,420 --> 01:10:09,280 But in theory, when I hit Enter, only when I hit Enter, 1286 01:10:09,280 --> 01:10:11,820 do they see the full word cats. 1287 01:10:11,820 --> 01:10:16,550 And how do they get access to it not having physical access to my keyboard? 1288 01:10:16,550 --> 01:10:19,450 They see it in the URL here. 1289 01:10:19,450 --> 01:10:23,500 And so indeed HTTP, beyond supporting status codes and the sort 1290 01:10:23,500 --> 01:10:25,930 of digital equivalent of my handshake with Arthuro, 1291 01:10:25,930 --> 01:10:30,690 also supports input, specifically input parameters that in this case 1292 01:10:30,690 --> 01:10:33,640 is arbitrarily but reasonably called q, because back in the day, 1293 01:10:33,640 --> 01:10:38,980 Google decided that the default input to its search page would be q for query. 1294 01:10:38,980 --> 01:10:43,230 And indeed, if I hit Enter now, the results seem no different. 1295 01:10:43,230 --> 01:10:46,810 So for whatever reason, Google uses by default a lot more parameters, 1296 01:10:46,810 --> 01:10:47,890 all of which I deleted. 1297 01:10:47,890 --> 01:10:50,670 But the only necessary one is cats. 1298 01:10:50,670 --> 01:10:54,620 And notice even without changing the page, I can go up in here 1299 01:10:54,620 --> 01:10:57,540 and change my cats to dogs and hit Enter. 1300 01:10:57,540 --> 01:11:02,130 And now notice I've searched for dogs just as though I had typed this myself. 1301 01:11:02,130 --> 01:11:05,410 But indeed, the only thing I've been changing up here is the keyword. 1302 01:11:05,410 --> 01:11:09,270 And if I search for mice now, I'm changing the search result. 1303 01:11:09,270 --> 01:11:12,280 So it seems that the essence of an HTTP request 1304 01:11:12,280 --> 01:11:15,530 boils down to what is sent here. 1305 01:11:15,530 --> 01:11:16,620 So let's try this as well. 1306 01:11:16,620 --> 01:11:19,690 Let me go ahead and copy that URL. 1307 01:11:19,690 --> 01:11:23,680 And just for good measure, I can go ahead and do something like curl 1308 01:11:23,680 --> 01:11:25,164 and then paste this URL. 1309 01:11:25,164 --> 01:11:27,830 And let me go ahead and quote it, just because it has a question 1310 01:11:27,830 --> 01:11:29,340 mark that could break things. 1311 01:11:29,340 --> 01:11:30,450 And hit Enter. 1312 01:11:30,450 --> 01:11:34,310 It's pretty overwhelming here, but this is all of the HTML 1313 01:11:34,310 --> 01:11:36,540 that's coming back from google.com. 1314 01:11:36,540 --> 01:11:39,590 So when I see these search results in google.com, 1315 01:11:39,590 --> 01:11:43,960 this web page is written in this language called HTML. 1316 01:11:43,960 --> 01:11:47,590 And HTML, as we'll see, is a little overwhelming perhaps at first glance, 1317 01:11:47,590 --> 01:11:49,550 but follows some very simple patterns. 1318 01:11:49,550 --> 01:11:52,710 And we can see them better in browsers like Chrome as follows. 1319 01:11:52,710 --> 01:11:56,423 If you Control-Click or right click on your web page, most any web page 1320 01:11:56,423 --> 01:11:58,432 if you're using Chrome, you can choose Inspect. 1321 01:11:58,432 --> 01:12:00,640 And there's keyboard shortcuts and other menu options 1322 01:12:00,640 --> 01:12:02,010 by which you can access this. 1323 01:12:02,010 --> 01:12:05,400 And notice among the elements tab here that just popped up. 1324 01:12:05,400 --> 01:12:08,680 And notice now, again a little overwhelming. 1325 01:12:08,680 --> 01:12:13,400 But what's nice about Chrome-- and Edge can do this and Firefox and Safari 1326 01:12:13,400 --> 01:12:16,870 and others-- it can pretty print your HTML. 1327 01:12:16,870 --> 01:12:19,880 Sort of like Style 50 you can sort of see through any messiness, 1328 01:12:19,880 --> 01:12:22,410 similarly, can the browser kind of look at the mess that 1329 01:12:22,410 --> 01:12:25,720 just came across the wire from Google and format it as follows. 1330 01:12:25,720 --> 01:12:30,170 And indeed, it looks like this language HTML follows a certain pattern. 1331 01:12:30,170 --> 01:12:33,855 There's always this at the top, open bracket, exclamation point, doc type, 1332 01:12:33,855 --> 01:12:35,390 HTML, close bracket. 1333 01:12:35,390 --> 01:12:39,999 Then there's open bracket html in lower case, then some other words and quotes 1334 01:12:39,999 --> 01:12:41,040 and equals signs perhaps. 1335 01:12:41,040 --> 01:12:43,000 Then a head, then a body. 1336 01:12:43,000 --> 01:12:45,214 Maybe some divs for divisions of the page. 1337 01:12:45,214 --> 01:12:47,880 And even though this is quite a lot, let's look at a simpler one 1338 01:12:47,880 --> 01:12:49,040 just for kicks real fast. 1339 01:12:49,040 --> 01:12:53,045 Let's go to harvard.edu and hit Enter. 1340 01:12:53,045 --> 01:12:56,920 And indeed-- well, actually, it looks just about as complicated. 1341 01:12:56,920 --> 01:12:59,480 Here's the HTML that composes harvard.edu. 1342 01:12:59,480 --> 01:13:02,180 So let's try to distill this into its essence. 1343 01:13:02,180 --> 01:13:04,460 I showed a web page earlier. 1344 01:13:04,460 --> 01:13:07,740 Let's go back to that to point out-- to be clear, 1345 01:13:07,740 --> 01:13:11,420 these were called query strings. 1346 01:13:11,420 --> 01:13:15,200 Let's come back to HTML. 1347 01:13:15,200 --> 01:13:17,560 So HTML is up to version 5 these days. 1348 01:13:17,560 --> 01:13:21,350 And this governs what syntax you should use when writing HTML. 1349 01:13:21,350 --> 01:13:23,770 And here per the earlier slide is perhaps 1350 01:13:23,770 --> 01:13:25,640 the simplest web page we can make. 1351 01:13:25,640 --> 01:13:28,100 So the key components-- and there's others we can add 1352 01:13:28,100 --> 01:13:30,529 and others we will soon add-- boil down to this. 1353 01:13:30,529 --> 01:13:33,070 This first line, this is so-called document type declaration. 1354 01:13:33,070 --> 01:13:36,210 This is just a fancy way of saying, you have to type this line first 1355 01:13:36,210 --> 01:13:38,490 in your file in order to tell the browser that's 1356 01:13:38,490 --> 01:13:41,000 reading this file top to bottom, left to right this web 1357 01:13:41,000 --> 01:13:43,090 page is written in version 5 of HTML. 1358 01:13:43,090 --> 01:13:46,860 Previous versions either didn't have this or had longer versions of this. 1359 01:13:46,860 --> 01:13:51,650 Is just a globally-understood symbol that means version 5. 1360 01:13:51,650 --> 01:13:55,790 Then below that is your actual HTML tags. 1361 01:13:55,790 --> 01:14:00,000 So web pages are composed of HTML tags, or more properly, elements. 1362 01:14:00,000 --> 01:14:04,880 And most elements have an open tag and a closed tag-- a start tag 1363 01:14:04,880 --> 01:14:10,350 and an end tag-- that are identical, except for typically the slash. 1364 01:14:10,350 --> 01:14:11,750 So indeed, notice the symmetry. 1365 01:14:11,750 --> 01:14:15,270 This tag here, and so far it's what we'll call an open tag or start tag, 1366 01:14:15,270 --> 01:14:18,680 means hey browser, here comes a web page written in HTML. 1367 01:14:18,680 --> 01:14:21,480 Hey browser, here comes the head of the web page. 1368 01:14:21,480 --> 01:14:23,740 Hey browser, here comes the title of the web page. 1369 01:14:23,740 --> 01:14:26,120 And there's no technical reason I wrote this all on one 1370 01:14:26,120 --> 01:14:29,864 line instead of putting hello world on its own line and this other tag 1371 01:14:29,864 --> 01:14:30,530 on its own line. 1372 01:14:30,530 --> 01:14:34,450 It just felt short enough to just write in one line, so I went with it. 1373 01:14:34,450 --> 01:14:37,140 But notice that title is open tier. 1374 01:14:37,140 --> 01:14:40,940 Then there's literally some hard coded text, hello world. 1375 01:14:40,940 --> 01:14:43,370 And then there is the opposite so to speak, of the tag. 1376 01:14:43,370 --> 01:14:45,900 It's the same word for the tag, but this forward 1377 01:14:45,900 --> 01:14:49,460 slash inside of the tag, which closes or ends the tag 1378 01:14:49,460 --> 01:14:52,511 and sort of ends the whole title element. 1379 01:14:52,511 --> 01:14:55,010 Meanwhile, that's it for the head, at least in this example. 1380 01:14:55,010 --> 01:14:56,480 So hey browser, that's it for the head. 1381 01:14:56,480 --> 01:14:58,021 Oh hey, browser, here comes the body. 1382 01:14:58,021 --> 01:15:00,080 Hey browser, here's some actual text. 1383 01:15:00,080 --> 01:15:02,120 Hey browser, that's it for the body. 1384 01:15:02,120 --> 01:15:04,310 Hey browser, that's it for the web page. 1385 01:15:04,310 --> 01:15:07,890 So I've also by convention-- and for stylistic purposes like in C-- 1386 01:15:07,890 --> 01:15:11,936 indented things to be very pretty printed, very readable to humans. 1387 01:15:11,936 --> 01:15:13,560 But the browser certainly doesn't care. 1388 01:15:13,560 --> 01:15:16,600 And indeed, we saw when we looked at the mess that is Google's website, 1389 01:15:16,600 --> 01:15:20,120 it's just a big mess of tags and markup so to speak. 1390 01:15:20,120 --> 01:15:22,530 But for Google, that makes sense, because you 1391 01:15:22,530 --> 01:15:25,600 don't want to have to transmit any characters unnecessarily. 1392 01:15:25,600 --> 01:15:28,380 Indeed, if you think about it, if Google's website gets 1393 01:15:28,380 --> 01:15:30,660 visited by a billion people per day, which 1394 01:15:30,660 --> 01:15:32,190 actually feels kind of reasonable. 1395 01:15:32,190 --> 01:15:37,610 And suppose that a programmer at Google hits the space bar just one extra time 1396 01:15:37,610 --> 01:15:39,712 and saves Google's home page. 1397 01:15:39,712 --> 01:15:41,920 Well what's the implication of Google having just one 1398 01:15:41,920 --> 01:15:44,330 additional space in their web page? 1399 01:15:44,330 --> 01:15:46,490 If that web page is downloaded a billion times, 1400 01:15:46,490 --> 01:15:50,730 that's a billion extra ASCII characters that gets downloaded per day. 1401 01:15:50,730 --> 01:15:54,370 And a billion ASCII characters is a billion bytes, which is one gigabyte. 1402 01:15:54,370 --> 01:15:58,100 So just by hitting the spacebar can really big players like Google 1403 01:15:58,100 --> 01:16:01,820 cost themselves a huge amount of space and maybe cost or time. 1404 01:16:01,820 --> 01:16:06,640 So that's why a lot of big websites minify or compress their information, 1405 01:16:06,640 --> 01:16:08,506 whereas we will be a little more lax here, 1406 01:16:08,506 --> 01:16:10,880 because it's more important for now certainly that things 1407 01:16:10,880 --> 01:16:12,980 be readable and understandable. 1408 01:16:12,980 --> 01:16:16,010 But the white space does not matter to the browser. 1409 01:16:16,010 --> 01:16:18,480 So let's actually do something with this. 1410 01:16:18,480 --> 01:16:23,020 Keeping in mind the following, just as this indentation kind of implies, 1411 01:16:23,020 --> 01:16:26,367 this really if you think about it is a tree structure. 1412 01:16:26,367 --> 01:16:29,450 There's some document on the screen, which I will literally call document, 1413 01:16:29,450 --> 01:16:31,760 because that's what browsers do. 1414 01:16:31,760 --> 01:16:34,300 The top element of which-- I'll draw with a rectangle, 1415 01:16:34,300 --> 01:16:39,030 distinguish it from the document itself-- is the HTML element that 1416 01:16:39,030 --> 01:16:40,980 starts here and ends here. 1417 01:16:40,980 --> 01:16:45,110 And in so far as it starts here and ends here, everything that's inside of it, 1418 01:16:45,110 --> 01:16:48,120 you can think of as children in a family tree. 1419 01:16:48,120 --> 01:16:50,240 And the first child is head, the second child 1420 01:16:50,240 --> 01:16:53,430 is body, left and right respectively. 1421 01:16:53,430 --> 01:16:56,130 The head tag meanwhile has the title child, 1422 01:16:56,130 --> 01:16:57,770 and so that's why we see title here. 1423 01:16:57,770 --> 01:16:59,890 And then I'll draw it with an ellipse, just different shape 1424 01:16:59,890 --> 01:17:00,920 because it's raw text. 1425 01:17:00,920 --> 01:17:02,280 It's not an actual tag. 1426 01:17:02,280 --> 01:17:05,080 And similarly does body have some text below it. 1427 01:17:05,080 --> 01:17:06,410 So this is just a tree. 1428 01:17:06,410 --> 01:17:10,060 It's not a binary tree, although it might be by coincidence here, 1429 01:17:10,060 --> 01:17:11,810 because there aren't many children. 1430 01:17:11,810 --> 01:17:15,420 But it's some kind of tree structure, each of whose nodes has zero 1431 01:17:15,420 --> 01:17:17,140 or more children. 1432 01:17:17,140 --> 01:17:20,120 And indeed, underneath the hood what is IE, 1433 01:17:20,120 --> 01:17:23,680 what is Edge or Firefox or Chrome or Safari actually doing 1434 01:17:23,680 --> 01:17:26,660 when it downloads a web page like this? 1435 01:17:26,660 --> 01:17:30,680 Some programmer or programmers have after taking classes like CS50 1436 01:17:30,680 --> 01:17:35,570 and knowing what these data structures are implemented in code a tree that 1437 01:17:35,570 --> 01:17:37,150 represents that web page. 1438 01:17:37,150 --> 01:17:39,500 And indeed, once in a few weeks we get to JavaScript 1439 01:17:39,500 --> 01:17:42,220 using yet another language will you be able to manipulate 1440 01:17:42,220 --> 01:17:46,107 that tree in real time to change the contents of a web page 1441 01:17:46,107 --> 01:17:47,190 and what a user is seeing. 1442 01:17:47,190 --> 01:17:50,280 Indeed, if you kind of fast forward in your mind, 1443 01:17:50,280 --> 01:17:54,960 suppose that you do use something like Facebook and Messenger 1444 01:17:54,960 --> 01:17:57,750 built into it for sending messages to people or Gmail, 1445 01:17:57,750 --> 01:18:00,970 where you suddenly get new rows of emails and your web page, 1446 01:18:00,970 --> 01:18:02,000 what's really happening? 1447 01:18:02,000 --> 01:18:03,850 Every time you get a message in Facebook, 1448 01:18:03,850 --> 01:18:08,120 it's just as though this tree is getting modified with like another child 1449 01:18:08,120 --> 01:18:08,970 somewhere in here. 1450 01:18:08,970 --> 01:18:10,845 Every time you get a new email in Gmail, it's 1451 01:18:10,845 --> 01:18:13,590 like another node is appearing in this tree. 1452 01:18:13,590 --> 01:18:17,990 So there really is this equivalence to this markup language HTML and the tree 1453 01:18:17,990 --> 01:18:22,140 structures that we've just come from in recent weeks. 1454 01:18:22,140 --> 01:18:24,310 So let's actually now do something with this. 1455 01:18:24,310 --> 01:18:27,340 I'm going to go over to CS50 IDE, and I'm 1456 01:18:27,340 --> 01:18:32,710 going to go ahead and make if you will the simplest of web pages as follows. 1457 01:18:32,710 --> 01:18:37,590 I'm going to go ahead and create a new file, a text file. 1458 01:18:37,590 --> 01:18:40,579 I'm going to call it hello.html. 1459 01:18:40,579 --> 01:18:42,370 And I'm going to go ahead and populate this 1460 01:18:42,370 --> 01:18:44,850 with exactly what we saw a moment ago. 1461 01:18:44,850 --> 01:18:47,020 Doc type, HTML. 1462 01:18:47,020 --> 01:18:48,290 Open bracket, HTML. 1463 01:18:48,290 --> 01:18:50,680 And notice that CS50 IDE is trying to be helpful here, 1464 01:18:50,680 --> 01:18:52,789 and when it notices you typing something familiar, 1465 01:18:52,789 --> 01:18:54,830 it's going to try to finish your thought for you. 1466 01:18:54,830 --> 01:18:55,670 So indeed, it did. 1467 01:18:55,670 --> 01:18:57,890 I'm going to go ahead and open now the head of the page. 1468 01:18:57,890 --> 01:18:59,390 It's going to complete that thought. 1469 01:18:59,390 --> 01:19:03,347 I'm going to open the title of the page, hello world. 1470 01:19:03,347 --> 01:19:05,680 And now I'm going to move my cursor down here physically 1471 01:19:05,680 --> 01:19:11,070 to do body, close bracket, hello comma world, save. 1472 01:19:11,070 --> 01:19:12,980 So I have written code. 1473 01:19:12,980 --> 01:19:16,670 It's source code, but it's code written in HTML-- HyperText Markup Language. 1474 01:19:16,670 --> 01:19:19,380 And indeed, you see no loops or conditions or functions. 1475 01:19:19,380 --> 01:19:20,680 There's no logic. 1476 01:19:20,680 --> 01:19:22,620 This is just markup. 1477 01:19:22,620 --> 01:19:24,110 Do this, stop doing this. 1478 01:19:24,110 --> 01:19:25,470 Do this, stop doing this. 1479 01:19:25,470 --> 01:19:27,730 It's fairly mundane. 1480 01:19:27,730 --> 01:19:31,430 But it's going to allow us to actually visit this file in a browser. 1481 01:19:31,430 --> 01:19:37,030 Indeed, let me go into a browser now and visit this page hello.html. 1482 01:19:37,030 --> 01:19:38,260 Incredibly underwhelming. 1483 01:19:38,260 --> 01:19:39,830 Indeed, this is a huge screen. 1484 01:19:39,830 --> 01:19:42,830 And all I've created is a web page that says hello world up here. 1485 01:19:42,830 --> 01:19:45,010 And if I scrolled up, I could actually see the tab 1486 01:19:45,010 --> 01:19:47,720 whose title is also hello world. 1487 01:19:47,720 --> 01:19:49,120 But that's my first web page. 1488 01:19:49,120 --> 01:19:52,490 And if I now apply a lesson learned, if I go ahead and right click 1489 01:19:52,490 --> 01:19:56,930 or Control-Click Chrome's backdrop and choose inspect, 1490 01:19:56,930 --> 01:19:59,995 now you'll notice finally here's a simple web page, 1491 01:19:59,995 --> 01:20:02,370 and not all the messiness that was Harvard's or Google's. 1492 01:20:02,370 --> 01:20:04,170 You can actually see your HTML. 1493 01:20:04,170 --> 01:20:06,440 You can't permanently change the files here, 1494 01:20:06,440 --> 01:20:09,620 because you need to do that in CS50 IDE and change the files. 1495 01:20:09,620 --> 01:20:12,340 And so here's where there's a potential point of confusion. 1496 01:20:12,340 --> 01:20:15,690 CS50 IDE is of course a cloud based service, 1497 01:20:15,690 --> 01:20:18,430 and it's where I'm writing and saving my files. 1498 01:20:18,430 --> 01:20:21,600 And it just so happens that built into CS50 IDE 1499 01:20:21,600 --> 01:20:26,510 is its own web server just for serving students work. 1500 01:20:26,510 --> 01:20:31,940 So when I visit this web here in another tab, I'm visiting not CS50 IDE per se, 1501 01:20:31,940 --> 01:20:37,860 but the web server running on a certain port on CS50 IDE 1502 01:20:37,860 --> 01:20:39,810 so I can serve up these web pages. 1503 01:20:39,810 --> 01:20:43,320 So let's go ahead and do something a little more interesting than that. 1504 01:20:43,320 --> 01:20:47,865 Let me go ahead now and create another file say as follows. 1505 01:20:47,865 --> 01:20:49,990 Let me go ahead and copy this just for good measure 1506 01:20:49,990 --> 01:20:52,600 so I don't have to recreate the whole thing. 1507 01:20:52,600 --> 01:20:58,640 And let me go ahead and create a new file called Image.html. 1508 01:20:58,640 --> 01:21:00,320 Paste this in here. 1509 01:21:00,320 --> 01:21:04,830 And instead of hello world, I'm just going to write say image up here. 1510 01:21:04,830 --> 01:21:06,312 And how do I embed an image? 1511 01:21:06,312 --> 01:21:08,520 Well, turns out that there is that literally an image 1512 01:21:08,520 --> 01:21:10,702 tag-- img to be succint. 1513 01:21:10,702 --> 01:21:12,410 Indeed, you might want to write out this. 1514 01:21:12,410 --> 01:21:16,500 But nope, back in the day people decided that img is sufficient. 1515 01:21:16,500 --> 01:21:19,044 I'm going to go ahead and give it a source. 1516 01:21:19,044 --> 01:21:20,460 What should the source of this be? 1517 01:21:20,460 --> 01:21:25,540 Well, let me just do a quick search for like a grumpy cat. 1518 01:21:25,540 --> 01:21:26,996 And there's a good one. 1519 01:21:26,996 --> 01:21:29,120 So I'm going to go ahead and Control-Click or Right 1520 01:21:29,120 --> 01:21:32,416 Click for our purposes now just the image address here. 1521 01:21:32,416 --> 01:21:35,040 We'll assume this is my image and I'm grabbing the address here 1522 01:21:35,040 --> 01:21:35,665 for the moment. 1523 01:21:35,665 --> 01:21:39,970 I'm going to paste it in here, in that there is the URL of a JPEG that 1524 01:21:39,970 --> 01:21:41,660 is of a grumpy cat. 1525 01:21:41,660 --> 01:21:44,830 Now with an image, there isn't really the same concept 1526 01:21:44,830 --> 01:21:47,720 of like starting an image and stopping an image like there 1527 01:21:47,720 --> 01:21:53,010 is start the title stop the title, start the body, stop the body. 1528 01:21:53,010 --> 01:21:55,900 And so there are so-called empty elements in HTML 1529 01:21:55,900 --> 01:22:00,470 that you can express either by doing this, which feels a little silly. 1530 01:22:00,470 --> 01:22:03,650 Like you're opening the image tag and then immediately closing it, 1531 01:22:03,650 --> 01:22:05,316 which feels a little ridiculous. 1532 01:22:05,316 --> 01:22:07,690 And so there's shorter hand syntax where you can actually 1533 01:22:07,690 --> 01:22:11,410 put the slash inside of the open tag like this so 1534 01:22:11,410 --> 01:22:14,130 that the element is empty so to speak. 1535 01:22:14,130 --> 01:22:14,900 Open and closed. 1536 01:22:14,900 --> 01:22:17,020 It's not strictly required, but at least this way 1537 01:22:17,020 --> 01:22:20,530 we're making clear our intent is to open and close the thing all at once. 1538 01:22:20,530 --> 01:22:24,700 Now for accessibility purposes, for someone who has trouble with vision, 1539 01:22:24,700 --> 01:22:28,570 you might want to provide some alternative text like grumpy cat 1540 01:22:28,570 --> 01:22:31,570 so that if they're using a screen reader or some other device, there 1541 01:22:31,570 --> 01:22:33,980 it can actually have a system support explaining what it 1542 01:22:33,980 --> 01:22:36,660 is that you might otherwise be seeing. 1543 01:22:36,660 --> 01:22:41,430 So let me go ahead now and open this file image.html. 1544 01:22:41,430 --> 01:22:42,960 And it's pretty darn simple. 1545 01:22:42,960 --> 01:22:46,050 But there is my own web page with this big white background, 1546 01:22:46,050 --> 01:22:48,780 and nothing else yet and this grumpy cat. 1547 01:22:48,780 --> 01:22:51,300 All right, but of course this web page doesn't do anything. 1548 01:22:51,300 --> 01:22:53,990 It would be nice if I could click on something and go somewhere. 1549 01:22:53,990 --> 01:22:55,050 So let's do that. 1550 01:22:55,050 --> 01:23:01,730 Let's do another example whereby-- I'll call this link.html. 1551 01:23:01,730 --> 01:23:05,340 And in here-- let me get started just by copying and pasting 1552 01:23:05,340 --> 01:23:09,864 that-- instead of the cat, let me go ahead and do a an anchor. 1553 01:23:09,864 --> 01:23:11,280 So it's a little counterintuitive. 1554 01:23:11,280 --> 01:23:13,140 It's not link, it's anchor. 1555 01:23:13,140 --> 01:23:16,360 And then anchor, confusingly, has a hyperreference, 1556 01:23:16,360 --> 01:23:17,889 which is the link to which it goes. 1557 01:23:17,889 --> 01:23:19,930 And I'm going to go ahead and do something clever 1558 01:23:19,930 --> 01:23:27,080 like https://www.google.com/search?q=cats. 1559 01:23:27,080 --> 01:23:28,850 And then close bracket. 1560 01:23:28,850 --> 01:23:31,130 And now notice CS50 IDE is trying to be helpful. 1561 01:23:31,130 --> 01:23:34,770 It closes the tag for me, and I can just write the word cats. 1562 01:23:34,770 --> 01:23:36,120 But let me finish this thought. 1563 01:23:36,120 --> 01:23:40,660 Let me say search for cats period. 1564 01:23:40,660 --> 01:23:44,010 And so now, even though we've seen only some simple tags so far, 1565 01:23:44,010 --> 01:23:47,766 you can use to HTML in line, so to speak, sort of 1566 01:23:47,766 --> 01:23:49,140 in the middle of another thought. 1567 01:23:49,140 --> 01:23:51,770 If I want to convey the sentence search for cats, 1568 01:23:51,770 --> 01:23:55,160 but I want cats to be clickable so that when you click on the word cats 1569 01:23:55,160 --> 01:23:57,460 it actually goes to Google and searches for cats, 1570 01:23:57,460 --> 01:24:00,420 I can borrow the idea from earlier-- and I just 1571 01:24:00,420 --> 01:24:03,720 happen to remember that q is the query that I have to pass in. 1572 01:24:03,720 --> 01:24:07,540 And notice that I surround cats with the open tag and the close tags. 1573 01:24:07,540 --> 01:24:11,480 So that now if I open a browser with this file, 1574 01:24:11,480 --> 01:24:13,520 I see again, a very simple web page. 1575 01:24:13,520 --> 01:24:16,260 And I can even zoom in to make this more clear. 1576 01:24:16,260 --> 01:24:18,700 All it says is search for cats period. 1577 01:24:18,700 --> 01:24:21,474 But notice, it's the link alone that's underlined. 1578 01:24:21,474 --> 01:24:23,890 And it happens to be purple by default, because we already 1579 01:24:23,890 --> 01:24:27,280 searched for cats earlier, and browsers typically remember URLs you visited. 1580 01:24:27,280 --> 01:24:30,920 So that's why it's purple and not say blue, which tends to be the default. 1581 01:24:30,920 --> 01:24:34,950 But if I click on this, indeed, I get a page full of cats. 1582 01:24:34,950 --> 01:24:36,490 I can combine these ideas. 1583 01:24:36,490 --> 01:24:40,440 Let me actually go into the IDE, and instead of the word cats, 1584 01:24:40,440 --> 01:24:42,927 let me go ahead and paste the image tag. 1585 01:24:42,927 --> 01:24:44,760 So it's a little hard to see all on one line 1586 01:24:44,760 --> 01:24:49,710 here, but notice I can search for a href, close this tag. 1587 01:24:49,710 --> 01:24:54,290 And then immediately open the image tag with its same value as before. 1588 01:24:54,290 --> 01:24:55,600 And then close that. 1589 01:24:55,600 --> 01:24:56,970 And then close the anchor tag. 1590 01:24:56,970 --> 01:24:59,360 Save that, reload. 1591 01:24:59,360 --> 01:25:01,120 Now it's a little stupid grammatically. 1592 01:25:01,120 --> 01:25:03,000 Search for cat picture. 1593 01:25:03,000 --> 01:25:07,820 But notice if I hover over the cat, my cursor becomes a little pointer. 1594 01:25:07,820 --> 01:25:11,270 And indeed, if I look in Chrome's bottom left corner, I'll see that if I click, 1595 01:25:11,270 --> 01:25:12,800 it's going to lead me to a URL. 1596 01:25:12,800 --> 01:25:16,340 And indeed, if I click on the cat, anywhere on the cat, 1597 01:25:16,340 --> 01:25:17,470 now I've made a hyperlink. 1598 01:25:17,470 --> 01:25:21,720 So now the world wide web so to speak is getting more interesting. 1599 01:25:21,720 --> 01:25:24,930 It's getting pretty ugly, but at least it's getting more interesting. 1600 01:25:24,930 --> 01:25:26,660 So what are these things? 1601 01:25:26,660 --> 01:25:28,430 They're not tags, per se. 1602 01:25:28,430 --> 01:25:30,080 These are what we'll call attributes. 1603 01:25:30,080 --> 01:25:32,520 So indeed, it seems that based on these simple examples 1604 01:25:32,520 --> 01:25:37,150 alone certain tags like image can have their behavior modified 1605 01:25:37,150 --> 01:25:38,110 with these attributes. 1606 01:25:38,110 --> 01:25:41,600 And the format for those is a keyword like alt for alternative 1607 01:25:41,600 --> 01:25:44,170 equals and then quote unquote some value, 1608 01:25:44,170 --> 01:25:46,500 and source-- src-- which is by design. 1609 01:25:46,500 --> 01:25:51,330 You can't write out source S-O-U-R-C-E. You'd have to do src per 1610 01:25:51,330 --> 01:25:54,462 the documentation equals quote unquote some URL. 1611 01:25:54,462 --> 01:25:56,170 And you would only know that these things 1612 01:25:56,170 --> 01:25:59,810 exist by googling around, reading some online documentation, taking a class. 1613 01:25:59,810 --> 01:26:02,390 But thankfully, there's not terribly, terribly many of them. 1614 01:26:02,390 --> 01:26:04,660 And most every one can be looked up on demand when 1615 01:26:04,660 --> 01:26:07,154 you're curious how to do something. 1616 01:26:07,154 --> 01:26:09,070 In fact, let's take a look at a few other tags 1617 01:26:09,070 --> 01:26:12,590 some this time that I've put together in advance. 1618 01:26:12,590 --> 01:26:16,680 We have a whole bunch of online examples that you're welcome to look for online. 1619 01:26:16,680 --> 01:26:19,150 Here's one that has a whole bunch of paragraphs. 1620 01:26:19,150 --> 01:26:22,500 So in this page here, notice that I've done a couple of things. 1621 01:26:22,500 --> 01:26:26,350 Inside of my body, I have a bunch of Latin paragraphs. 1622 01:26:26,350 --> 01:26:29,050 Sort of nonsensical Latin, but I've wrapped each of them 1623 01:26:29,050 --> 01:26:32,960 in an open p tag and a closed p tag, simply because I want these 1624 01:26:32,960 --> 01:26:35,590 to be three separate blocks of text. 1625 01:26:35,590 --> 01:26:41,590 And let me go ahead into my browser now and open this file in today's directory 1626 01:26:41,590 --> 01:26:42,930 as paragraphs.html. 1627 01:26:42,930 --> 01:26:43,720 And that's it. 1628 01:26:43,720 --> 01:26:46,300 It's a little more interesting now that it fills the screen. 1629 01:26:46,300 --> 01:26:48,730 But indeed, there are distinct paragraphs. 1630 01:26:48,730 --> 01:26:52,880 There's one other tag that I proactively included here, which 1631 01:26:52,880 --> 01:26:54,780 is a little cryptic at first glance. 1632 01:26:54,780 --> 01:26:57,981 But this is a metatag that has to go in the head of the web page. 1633 01:26:57,981 --> 01:27:00,480 And here too you would know this from some online reference. 1634 01:27:00,480 --> 01:27:03,520 And it's cryptic only insofar as there's a lot of words here. 1635 01:27:03,520 --> 01:27:06,780 But the effect of this essentially is that if this same web 1636 01:27:06,780 --> 01:27:11,400 page is viewed not on my browser but on my phone, which might otherwise 1637 01:27:11,400 --> 01:27:14,920 be pretty small to look at, and I'd have to squint to see the text, 1638 01:27:14,920 --> 01:27:18,190 this tag is one technique for actually telling the web 1639 01:27:18,190 --> 01:27:23,620 page to sort of resize itself and the text for whatever the device with is. 1640 01:27:23,620 --> 01:27:25,520 So without this tag, these three paragraphs 1641 01:27:25,520 --> 01:27:28,200 you might have to squint to actually read them pretty well on an Android 1642 01:27:28,200 --> 01:27:29,100 phone or an iPhone. 1643 01:27:29,100 --> 01:27:31,130 With that tag, the font size will sort of 1644 01:27:31,130 --> 01:27:35,310 grow to take into account the fact that this is a smaller device 1645 01:27:35,310 --> 01:27:37,940 and everything should not just be squeezed in on there. 1646 01:27:37,940 --> 01:27:40,880 But otherwise, syntactically, everything else there is the same. 1647 01:27:40,880 --> 01:27:42,210 Let's look at another example. 1648 01:27:42,210 --> 01:27:46,310 If I go into headings.html, this one doesn't do all that much. 1649 01:27:46,310 --> 01:27:50,590 But it seems to demonstrate tags called H1 through H6, literally saying 1650 01:27:50,590 --> 01:27:52,620 one, two, three, four, five, six. 1651 01:27:52,620 --> 01:27:56,240 And by convention, though this differs ever so slightly by browsers, 1652 01:27:56,240 --> 01:27:58,420 H1 is big bold text. 1653 01:27:58,420 --> 01:28:01,550 H2 is not quite as big, but still bold text. 1654 01:28:01,550 --> 01:28:03,290 H3 is not quite as big. 1655 01:28:03,290 --> 01:28:04,380 H4 not quite as big. 1656 01:28:04,380 --> 01:28:06,650 Headings that you might see in a research paper 1657 01:28:06,650 --> 01:28:09,570 or in the chapters and sections or subsections of a book. 1658 01:28:09,570 --> 01:28:14,000 It's a way of adding sort of semantic headings to a web page that in our case 1659 01:28:14,000 --> 01:28:16,830 might look ultimately like this. 1660 01:28:16,830 --> 01:28:18,420 From bigger to smaller. 1661 01:28:18,420 --> 01:28:20,970 And so these might just be the section headings in some book 1662 01:28:20,970 --> 01:28:23,270 or some kind of reference like that. 1663 01:28:23,270 --> 01:28:25,280 What about lists, which are pretty common? 1664 01:28:25,280 --> 01:28:29,040 Well, if we go into list.html, it's pretty common on the web 1665 01:28:29,040 --> 01:28:32,250 or in various applications to have bulleted lists or ordered lists. 1666 01:28:32,250 --> 01:28:35,470 This is in an unordered list of bullets, foo, bar, and baz, which 1667 01:28:35,470 --> 01:28:37,910 are just silly variable names in computer science. 1668 01:28:37,910 --> 01:28:43,350 And if we want to see what this one is, if I go into list.html, 1669 01:28:43,350 --> 01:28:48,050 you'll see quite simply that we just have a little more nesting. 1670 01:28:48,050 --> 01:28:50,800 Body, UL, and LI. 1671 01:28:50,800 --> 01:28:57,606 So UL us Unordered List, LI is List Item, and foo, bar, and baz 1672 01:28:57,606 --> 01:28:58,980 are each of the three list items. 1673 01:28:58,980 --> 01:29:03,070 If I change this ever so slightly to OL, Ordered List, 1674 01:29:03,070 --> 01:29:05,780 and then go back to that web page and reload, 1675 01:29:05,780 --> 01:29:07,482 now it's an automatically numbered list. 1676 01:29:07,482 --> 01:29:09,940 So there's a lot of features you sort of get for free here, 1677 01:29:09,940 --> 01:29:12,260 not unlike a typical Word processor. 1678 01:29:12,260 --> 01:29:15,980 If we want to go really all out and see a lot of nesting, 1679 01:29:15,980 --> 01:29:18,120 you can see a table here, which might be useful 1680 01:29:18,120 --> 01:29:21,210 if you want to show a whole bunch of tabular data for research purposes 1681 01:29:21,210 --> 01:29:25,380 or maybe sports scores and data on a ESPN site or the like. 1682 01:29:25,380 --> 01:29:28,540 It's a little more involved, but if you just read it top to bottom, 1683 01:29:28,540 --> 01:29:30,090 it all becomes pretty intuitive. 1684 01:29:30,090 --> 01:29:33,420 Inside of this page's body there's an HTML table. 1685 01:29:33,420 --> 01:29:36,130 This table has a TR, Table Row. 1686 01:29:36,130 --> 01:29:39,070 And that table row has table data, table data, table data. 1687 01:29:39,070 --> 01:29:41,900 So three columns, left to right. 1688 01:29:41,900 --> 01:29:44,020 And another row with another three columns, 1689 01:29:44,020 --> 01:29:47,430 another row with another three, columns another row with another three columns. 1690 01:29:47,430 --> 01:29:49,150 And I chose these values arbitrarily just 1691 01:29:49,150 --> 01:29:52,910 to kind of markup an old school telephone keypad, because indeed, 1692 01:29:52,910 --> 01:29:57,530 if we go into this with table.html, you see this. 1693 01:29:57,530 --> 01:30:01,240 You can add borders, and we'll see ways you can actually tweak the aesthetics. 1694 01:30:01,240 --> 01:30:03,600 But it's just laying things out in a grid here, 1695 01:30:03,600 --> 01:30:06,620 like you might tabular style data. 1696 01:30:06,620 --> 01:30:09,710 But none of these have been all that pretty thus far. 1697 01:30:09,710 --> 01:30:13,690 Indeed, I'm just using the default fonts and sizes, which apparently are just 1698 01:30:13,690 --> 01:30:16,780 black text, white background, Times New Roman 1699 01:30:16,780 --> 01:30:19,230 font, and pretty small text at that. 1700 01:30:19,230 --> 01:30:22,070 The web of course these days is much prettier than this. 1701 01:30:22,070 --> 01:30:24,940 So how do you actually start to stylize things? 1702 01:30:24,940 --> 01:30:28,010 Well, as we often do, let's take a progression of ideas. 1703 01:30:28,010 --> 01:30:30,990 Let me go into version zero of this file. 1704 01:30:30,990 --> 01:30:33,470 css0.html. 1705 01:30:33,470 --> 01:30:35,029 That does something terribly simply. 1706 01:30:35,029 --> 01:30:36,820 It's more interesting than any of the pages 1707 01:30:36,820 --> 01:30:40,240 we've seen thus far, if only because we have some slightly differing 1708 01:30:40,240 --> 01:30:43,790 font sizes and some actual content, but it's still pretty simple. 1709 01:30:43,790 --> 01:30:44,680 So what am I doing? 1710 01:30:44,680 --> 01:30:46,235 This is big and bold and centered. 1711 01:30:46,235 --> 01:30:48,110 This is kind of medium and bold and centered. 1712 01:30:48,110 --> 01:30:51,000 And this is kind of small, this copyright holder there. 1713 01:30:51,000 --> 01:30:54,760 So let's solve this in one way, but then iteratively improve upon this 1714 01:30:54,760 --> 01:30:55,920 as follows. 1715 01:30:55,920 --> 01:31:02,910 Let me go into css0.html, and we'll see that I've introduced amazingly already 1716 01:31:02,910 --> 01:31:04,090 another language. 1717 01:31:04,090 --> 01:31:08,160 CSS-- Cascading Style Sheets-- is another language 1718 01:31:08,160 --> 01:31:12,110 that is almost always used in conjunction with HTML these days. 1719 01:31:12,110 --> 01:31:16,260 And whereas HTML is all about formatting-- rather, 1720 01:31:16,260 --> 01:31:19,000 all about markup and all about layouts and sort 1721 01:31:19,000 --> 01:31:22,240 of semantically tagging things in a way that makes sense, 1722 01:31:22,240 --> 01:31:25,650 CSS is used to kind of take things the last mile 1723 01:31:25,650 --> 01:31:30,510 and stylize things so that they look and appear in exactly the way 1724 01:31:30,510 --> 01:31:31,700 that you intend. 1725 01:31:31,700 --> 01:31:33,850 So this is a little messy at the moment, because I 1726 01:31:33,850 --> 01:31:38,610 seem to be co-mingling my HTML and CSS literally as follows. 1727 01:31:38,610 --> 01:31:41,762 Turns out that in HTML there's a generic tag 1728 01:31:41,762 --> 01:31:43,720 called the div for just a division of the page. 1729 01:31:43,720 --> 01:31:46,400 If you want to think of the page as having rectangular regions, 1730 01:31:46,400 --> 01:31:48,110 div would be one way of doing that. 1731 01:31:48,110 --> 01:31:50,590 Or you could use a p tag or paragraph. 1732 01:31:50,590 --> 01:31:54,250 And I can add a style attribute here that's a style font 1733 01:31:54,250 --> 01:31:59,500 size colon 36 pixels semi-colon font weight colon bold semi-colon. 1734 01:31:59,500 --> 01:32:02,810 And not all of the semi-colons, at least on the end there, are necessary. 1735 01:32:02,810 --> 01:32:04,980 But this is two CSS properties. 1736 01:32:04,980 --> 01:32:08,070 A property called font size with a value of 36 pixels, 1737 01:32:08,070 --> 01:32:11,950 and a property of font weight with a value of bold. 1738 01:32:11,950 --> 01:32:16,930 And then similarly, notice what I've done in a div of tag outside of this 1739 01:32:16,930 --> 01:32:19,954 have I wrapped it with text align center. 1740 01:32:19,954 --> 01:32:21,620 And that's a property called text align. 1741 01:32:21,620 --> 01:32:24,870 Its value is center, and it's going to center all of its children so to speak. 1742 01:32:24,870 --> 01:32:27,786 So we can use the same language from our discussion of data structures 1743 01:32:27,786 --> 01:32:28,400 and trees. 1744 01:32:28,400 --> 01:32:33,590 Meanwhile, you'll notice that my middle div is slightly smaller at 24 pixels 1745 01:32:33,590 --> 01:32:36,907 and not bold, and my last one is 12 pixels. 1746 01:32:36,907 --> 01:32:38,740 But this is a little messy now, because I've 1747 01:32:38,740 --> 01:32:41,830 co-mingled my HTML markup with my CSS. 1748 01:32:41,830 --> 01:32:45,120 It would be kind of nice if we could factor out the aesthetics, 1749 01:32:45,120 --> 01:32:49,340 put them in one central spot to make it easier to edit. 1750 01:32:49,340 --> 01:32:51,300 And so let me propose this instead. 1751 01:32:51,300 --> 01:32:56,880 I've now simplified the body of my page to just have three divs, each of which 1752 01:32:56,880 --> 01:32:57,980 has a unique ID. 1753 01:32:57,980 --> 01:33:00,390 Turns out there's an attribute in HTML called ID that 1754 01:33:00,390 --> 01:33:02,220 allows you to have a unique identifier. 1755 01:33:02,220 --> 01:33:04,080 You can use that almost any word you want, 1756 01:33:04,080 --> 01:33:06,663 though there are some restrictions on the letters you can use, 1757 01:33:06,663 --> 01:33:09,510 or where you can have numbers, and so forth. 1758 01:33:09,510 --> 01:33:11,770 But I'm just going to sort of conveniently call 1759 01:33:11,770 --> 01:33:14,370 the top div top, middle, and bottom. 1760 01:33:14,370 --> 01:33:15,950 And those are unique. 1761 01:33:15,950 --> 01:33:20,910 And now that I have the ability to identify those divs uniquely, 1762 01:33:20,910 --> 01:33:23,160 let's look at another tag up here. 1763 01:33:23,160 --> 01:33:28,880 Inside of the head of my web page now, notice I have a style tag. 1764 01:33:28,880 --> 01:33:32,260 Not a style attribute, an actual style tag. 1765 01:33:32,260 --> 01:33:35,190 And the syntax here is a little different from before, 1766 01:33:35,190 --> 01:33:37,960 but it's kind of reminiscent of C. But none of this 1767 01:33:37,960 --> 01:33:41,390 has to do with programming per se, this is just aesthetics now. 1768 01:33:41,390 --> 01:33:47,030 This syntax here says, hey, browser, apply to the body tag 1769 01:33:47,030 --> 01:33:50,460 the following CSS properties in between curly braces. 1770 01:33:50,460 --> 01:33:52,940 Text align center for the entire body. 1771 01:33:52,940 --> 01:33:55,910 Hey, browser, apply the following properties 1772 01:33:55,910 --> 01:33:59,510 to whatever HTML tag has a unique ID of top. 1773 01:33:59,510 --> 01:34:01,270 So the hashtag here means ID. 1774 01:34:01,270 --> 01:34:03,760 It's just a symbol that the world has adopted. 1775 01:34:03,760 --> 01:34:07,660 So this means whatever HTML tag has a unique ID of top, 1776 01:34:07,660 --> 01:34:09,164 apply these two properties to it. 1777 01:34:09,164 --> 01:34:11,830 Notice the semi-colon's on the end, and I've invented everything 1778 01:34:11,830 --> 01:34:13,300 to keep things nice and pretty. 1779 01:34:13,300 --> 01:34:16,610 Middle will have this property, bottom will have that property. 1780 01:34:16,610 --> 01:34:20,420 So now it's cleaner in that I've relegated to the top 1781 01:34:20,420 --> 01:34:24,020 to one central spot all of the aesthetics of my web page. 1782 01:34:24,020 --> 01:34:27,770 I've left all of the lower level markup down here. 1783 01:34:27,770 --> 01:34:29,830 So that if on a whim tomorrow I want to change 1784 01:34:29,830 --> 01:34:32,020 the font size or the color or the layout, 1785 01:34:32,020 --> 01:34:35,790 I can do that very simply without actually changing the data. 1786 01:34:35,790 --> 01:34:38,630 So the data is things like these white words here. 1787 01:34:38,630 --> 01:34:42,310 And I've got some metadata, these red tags and green attributes, 1788 01:34:42,310 --> 01:34:46,360 here, so that I can uniquely identify things in the page. 1789 01:34:46,360 --> 01:34:50,610 But the aesthetics are now fundamentally separated. 1790 01:34:50,610 --> 01:34:54,230 But it's still a little messy, because they're still in the same file. 1791 01:34:54,230 --> 01:34:58,380 So let me open a third version of this, css2.html, 1792 01:34:58,380 --> 01:35:01,770 which makes the file even smaller. 1793 01:35:01,770 --> 01:35:05,910 What do I seem to have done here? 1794 01:35:05,910 --> 01:35:09,290 So in this case, I seem to have similarly given 1795 01:35:09,290 --> 01:35:11,350 IDs to these three divs. 1796 01:35:11,350 --> 01:35:14,860 But I've introduced into the head of the page not a style tag, 1797 01:35:14,860 --> 01:35:18,360 but a link tag, confusingly named, because it's not an anchor tag, 1798 01:35:18,360 --> 01:35:20,190 it's link with an href. 1799 01:35:20,190 --> 01:35:21,410 So even more confusing. 1800 01:35:21,410 --> 01:35:27,760 But all this means is hey, browser, grab the contents of this file-- css2.css-- 1801 01:35:27,760 --> 01:35:33,780 the relation to this file is that of style sheet. 1802 01:35:33,780 --> 01:35:35,480 So it's stylisation. 1803 01:35:35,480 --> 01:35:40,000 And then apply it to this web page. 1804 01:35:40,000 --> 01:35:41,040 What is in css2.css? 1805 01:35:41,040 --> 01:35:46,670 It's just those same tags as before, but in their own file. 1806 01:35:46,670 --> 01:35:48,270 So what's the purpose of this? 1807 01:35:48,270 --> 01:35:51,720 At the end of the day, the result in each of these three cases 1808 01:35:51,720 --> 01:35:53,420 is an identical web page. 1809 01:35:53,420 --> 01:35:57,630 All three of these things look exactly like this, so there are no prettier. 1810 01:35:57,630 --> 01:36:00,240 But from a design perspective underneath the hood, 1811 01:36:00,240 --> 01:36:03,860 these things are fundamentally better designed, 1812 01:36:03,860 --> 01:36:08,520 because now this CSS file in theory could be shared across multiple pages. 1813 01:36:08,520 --> 01:36:12,550 Multiple pages of mine could now have this one link tag up top, 1814 01:36:12,550 --> 01:36:16,330 so that once a browser downloads css2.css or whatever the file is, 1815 01:36:16,330 --> 01:36:19,760 it can reuse and cache the file for my entire website 1816 01:36:19,760 --> 01:36:22,310 so that as the user clicks around to my website, 1817 01:36:22,310 --> 01:36:24,340 they don't have to re download the CSS file. 1818 01:36:24,340 --> 01:36:26,050 And indeed, even if the browser tries, it 1819 01:36:26,050 --> 01:36:32,030 can get that HTTP 304 not modified message so that it doesn't waste time 1820 01:36:32,030 --> 01:36:35,700 or bandwidth redownloading the file. 1821 01:36:35,700 --> 01:36:39,460 So this also allows me to use, as we'll eventually 1822 01:36:39,460 --> 01:36:43,820 see in future problems, third party libraries. 1823 01:36:43,820 --> 01:36:47,355 It turns out that a lot of people in the world who are better than little old me 1824 01:36:47,355 --> 01:36:52,400 at design certainly have created files ending in .css that have some really 1825 01:36:52,400 --> 01:36:56,600 beautiful stylizations that you can apply to your own web pages so that you 1826 01:36:56,600 --> 01:36:58,830 don't have to worry about as much the aesthetics. 1827 01:36:58,830 --> 01:37:02,800 Bootstrap is one such tool formerly from Twitter, 1828 01:37:02,800 --> 01:37:06,020 and other such libraries exist that allow you to stylize your site just 1829 01:37:06,020 --> 01:37:09,870 by using themes or skins, so to speak, that other people have created. 1830 01:37:09,870 --> 01:37:12,860 There is one last piece of syntax here I should draw attention 1831 01:37:12,860 --> 01:37:15,360 to is this thing here. 1832 01:37:15,360 --> 01:37:19,807 So this cryptic sequence of characters is what's known as an HTML entity. 1833 01:37:19,807 --> 01:37:22,140 It turns out there are some symbols that to my knowledge 1834 01:37:22,140 --> 01:37:26,070 I can't type on my Mac's keyboard, like the copyright symbol. 1835 01:37:26,070 --> 01:37:30,130 You can maybe do it on iOS these days via special software support. 1836 01:37:30,130 --> 01:37:35,669 But this is the canonical way of putting certain special characters inside 1837 01:37:35,669 --> 01:37:38,210 of a web page that you might not be able to express or easily 1838 01:37:38,210 --> 01:37:39,337 express on your keyboard. 1839 01:37:39,337 --> 01:37:40,670 And these are standardized, too. 1840 01:37:40,670 --> 01:37:42,730 So if I actually googled HTML entities, I 1841 01:37:42,730 --> 01:37:44,810 could actually see whole charts telling me 1842 01:37:44,810 --> 01:37:49,364 that ampersand hashtag 169 semi-colon will give me the copyright symbol. 1843 01:37:49,364 --> 01:37:51,530 And just to be clear, when that's actually rendered, 1844 01:37:51,530 --> 01:37:53,080 you don't see that in the page. 1845 01:37:53,080 --> 01:37:58,230 You instead see the more familiar copyright symbol there. 1846 01:37:58,230 --> 01:38:02,130 So let's now finally try to tie some of these things together. 1847 01:38:02,130 --> 01:38:07,450 I know that Google supports search queries via GET. 1848 01:38:07,450 --> 01:38:12,280 And this is in contrast just to be clear with one other thing. 1849 01:38:12,280 --> 01:38:13,650 That is POST. 1850 01:38:13,650 --> 01:38:16,850 It would be a little worrisome if every time you 1851 01:38:16,850 --> 01:38:19,340 logged into Facebook or Google or any website, 1852 01:38:19,340 --> 01:38:22,310 or any time you bought something on Amazon or any website, 1853 01:38:22,310 --> 01:38:25,530 if your credit card and your password and all your sort 1854 01:38:25,530 --> 01:38:29,190 of semi-private information appeared in the URL 1855 01:38:29,190 --> 01:38:31,370 just like these Google search queries. 1856 01:38:31,370 --> 01:38:34,590 So it turns out that HTTP supports another verb. 1857 01:38:34,590 --> 01:38:37,720 And there's a few others, but the two we'll focus on are GET and POST. 1858 01:38:37,720 --> 01:38:41,080 And POST is inside the envelope's initial message, 1859 01:38:41,080 --> 01:38:44,750 just like my handshake to AJ, almost identically. 1860 01:38:44,750 --> 01:38:46,240 But instead of GET, it's POST. 1861 01:38:46,240 --> 01:38:50,700 What do you want to post information to and what protocol do you want to use? 1862 01:38:50,700 --> 01:38:54,210 This is an example of a snippet of how I might log into Facebook. 1863 01:38:54,210 --> 01:38:58,100 When I log in to Facebook, I don't want my friends or my siblings or my family 1864 01:38:58,100 --> 01:39:01,940 members being able to see in my browser's history or the search box 1865 01:39:01,940 --> 01:39:05,340 what my user name or really what my password is. 1866 01:39:05,340 --> 01:39:09,010 And that's exactly what HTTP GET does by design. 1867 01:39:09,010 --> 01:39:12,680 POST is just another way of submitting information to a server, 1868 01:39:12,680 --> 01:39:17,356 still using the same conventions of HTTP parameter equals some value. 1869 01:39:17,356 --> 01:39:19,730 And indeed, you can send multiple ones by separating them 1870 01:39:19,730 --> 01:39:21,021 in this case with an ampersand. 1871 01:39:21,021 --> 01:39:24,310 No relationship to the ampersand we just saw in an HTML entity. 1872 01:39:24,310 --> 01:39:27,120 But notice that this email and password are deliberately 1873 01:39:27,120 --> 01:39:28,580 below the HTTP headers. 1874 01:39:28,580 --> 01:39:32,000 So they're not in URL bar, there instead deeper 1875 01:39:32,000 --> 01:39:34,200 inside the envelope, if you will. 1876 01:39:34,200 --> 01:39:37,750 But I need to know this because when I make my own web pages, 1877 01:39:37,750 --> 01:39:39,250 this becomes relevant. 1878 01:39:39,250 --> 01:39:44,960 Let me go ahead and create a super simple web page called search.html 1879 01:39:44,960 --> 01:39:50,660 that again has the doc type declaration at the top, that then has my HTML tags, 1880 01:39:50,660 --> 01:39:55,590 my head tags, my title tags-- and I'll call this search. 1881 01:39:55,590 --> 01:40:00,340 And then over here I will have the body of the page. 1882 01:40:00,340 --> 01:40:03,760 And then I'm just going to do an H1 for CS50 search, which 1883 01:40:03,760 --> 01:40:06,160 is just a big bold heading on the page. 1884 01:40:06,160 --> 01:40:08,960 And now I'm going to have a form. 1885 01:40:08,960 --> 01:40:17,070 And I'm going to have action equals https://www.google.com/search. 1886 01:40:17,070 --> 01:40:21,050 The method I want to use is necessarily GET, not POST. 1887 01:40:21,050 --> 01:40:23,490 Though in different contexts, I might want to use POST. 1888 01:40:23,490 --> 01:40:25,490 But I'm not doing logons or something like that. 1889 01:40:25,490 --> 01:40:26,990 I'm using Google search engine. 1890 01:40:26,990 --> 01:40:30,500 So now I have the HTML form element, which we've not yet seen. 1891 01:40:30,500 --> 01:40:33,710 But it turns out there's another tag called input 1892 01:40:33,710 --> 01:40:38,770 that you can give a name to like q, that can be a type like text, 1893 01:40:38,770 --> 01:40:40,480 and it's empty. 1894 01:40:40,480 --> 01:40:43,770 And then we can have another input whose type 1895 01:40:43,770 --> 01:40:48,060 might be quote unquote submit and close that tag. 1896 01:40:48,060 --> 01:40:50,270 And then save the page. 1897 01:40:50,270 --> 01:40:56,660 If I now go back into this file and go to search.html, if I zoom in, 1898 01:40:56,660 --> 01:41:01,740 we see if you will, version one of Google, without any aesthetics. 1899 01:41:01,740 --> 01:41:03,610 And indeed, the actual version one of Google 1900 01:41:03,610 --> 01:41:05,370 wasn't all that much more complicated. 1901 01:41:05,370 --> 01:41:11,980 But if I now type in cats, submit this query, I go to actual Google, 1902 01:41:11,980 --> 01:41:14,360 typing in effectively cats, because of the URL 1903 01:41:14,360 --> 01:41:17,650 I was redirected to-- which is to say that using HTML, 1904 01:41:17,650 --> 01:41:20,850 we can reconstruct exactly what Google's been doing all this time. 1905 01:41:20,850 --> 01:41:25,330 Because if you distill the essence of Google into just a few lines of code, 1906 01:41:25,330 --> 01:41:26,570 this is it. 1907 01:41:26,570 --> 01:41:29,610 And indeed, this is essentially what Google looked like a few years ago. 1908 01:41:29,610 --> 01:41:31,740 Although, to be fair, they also had this. 1909 01:41:31,740 --> 01:41:35,860 They had another input whose type was submit, 1910 01:41:35,860 --> 01:41:41,200 and whose value even early on was I'm Feeling Lucky. 1911 01:41:41,200 --> 01:41:43,645 And if we save this, it's going to actually do anything, 1912 01:41:43,645 --> 01:41:46,270 because we need a little more logic in order to make that work. 1913 01:41:46,270 --> 01:41:50,050 But if I reload, now we get the second Google button as well. 1914 01:41:50,050 --> 01:41:54,250 And so all we've implemented for now is the front end of Google, so to speak. 1915 01:41:54,250 --> 01:41:58,440 We have completely punted to Google's back end, their own databases, 1916 01:41:58,440 --> 01:42:01,760 their own software, the actual searching of things, because that's 1917 01:42:01,760 --> 01:42:04,170 because we don't really have a language yet, 1918 01:42:04,170 --> 01:42:07,990 a way of expressing searches ourselves. 1919 01:42:07,990 --> 01:42:12,550 Indeed, we could using C and using HTML and using 1920 01:42:12,550 --> 01:42:15,220 CSS start to build our own server, and we could actually 1921 01:42:15,220 --> 01:42:20,096 write code in C that receives something like q equals cats, parse the cats, 1922 01:42:20,096 --> 01:42:21,970 like to read it, extract it from that string, 1923 01:42:21,970 --> 01:42:24,730 then figure out in our own database where can I find some cats. 1924 01:42:24,730 --> 01:42:28,250 But it's going to be incredibly, incredibly tedious to do that in C. 1925 01:42:28,250 --> 01:42:31,380 In fact, if you think back to the problems Vigenere and Caesar 1926 01:42:31,380 --> 01:42:35,550 and the like, even just manipulating strings in C is really non-trivial 1927 01:42:35,550 --> 01:42:37,760 and gets quickly tedious. 1928 01:42:37,760 --> 01:42:40,200 And so we really need a better language. 1929 01:42:40,200 --> 01:42:43,100 And that language is going to be in the coming weeks Python, which 1930 01:42:43,100 --> 01:42:48,130 is a higher level language than C. In fact, the Python interpreter 1931 01:42:48,130 --> 01:42:51,100 so to speak itself is written in C. So the world some years ago 1932 01:42:51,100 --> 01:42:55,140 used C to write support for really what many would call a better language 1933 01:42:55,140 --> 01:42:56,570 for solving problems like this. 1934 01:42:56,570 --> 01:43:00,240 And so not only can you use Python for command line applications 1935 01:43:00,240 --> 01:43:04,310 and processing and analyzing data like a data scientist might use it for. 1936 01:43:04,310 --> 01:43:09,220 We can also use Python to actually write the back end of google.com, 1937 01:43:09,220 --> 01:43:11,790 or the back end of Facebook, or the back of any web server 1938 01:43:11,790 --> 01:43:15,210 that has to read the parameters, understand them, maybe look up 1939 01:43:15,210 --> 01:43:17,350 some data or store some data in a database, 1940 01:43:17,350 --> 01:43:20,510 and respond to the user with dynamic output. 1941 01:43:20,510 --> 01:43:23,040 So all that and more in the weeks ahead. 1942 01:43:23,040 --> 01:43:27,032 1943 01:43:27,032 --> 01:43:32,274 [MUSIC PLAYING] 1944 01:43:32,274 --> 01:44:36,724