1 00:00:00,000 --> 00:00:03,395 >> [MUSIC PLAYING] 2 00:00:03,395 --> 00:00:13,100 3 00:00:13,100 --> 00:00:15,570 >> DAVID J. MALAN: So I just wanted to assuage to. 4 00:00:15,570 --> 00:00:18,260 I would echo exactly what Scaz said about institutional memory. 5 00:00:18,260 --> 00:00:20,350 CS50 has been around for some 20 years at Harvard. 6 00:00:20,350 --> 00:00:22,280 And the reality is, from the seniors on down, 7 00:00:22,280 --> 00:00:25,045 there is annually reassurance that the freshmen, the sophomores, 8 00:00:25,045 --> 00:00:26,870 and the juniors and also the seniors taking 9 00:00:26,870 --> 00:00:30,360 CS50, that you end up doing fine. 10 00:00:30,360 --> 00:00:32,680 >> The reality is, students do not fail CS50. 11 00:00:32,680 --> 00:00:35,740 In fact, in the rare instances where we've had Es or Fs, 12 00:00:35,740 --> 00:00:37,990 it's really been because of extenuating circumstances, 13 00:00:37,990 --> 00:00:39,840 whether it's medical or personal. 14 00:00:39,840 --> 00:00:41,830 Ds are incredibly uncommon as well. 15 00:00:41,830 --> 00:00:45,270 And I can say comfortably, though we typically don't disclose statistics, 16 00:00:45,270 --> 00:00:48,450 but given that there is no institutional memory here whatsoever, 17 00:00:48,450 --> 00:00:51,810 a majority of students in CS50 do end up getting A range grades. 18 00:00:51,810 --> 00:00:54,720 A significant chunk end up ending up in the B range too. 19 00:00:54,720 --> 00:00:57,490 >> So even though you might be equating in your mind threes 20 00:00:57,490 --> 00:01:00,690 with 60% and therefore Ds, or Cs, or the like, 21 00:01:00,690 --> 00:01:02,530 it really does not line up with the reality. 22 00:01:02,530 --> 00:01:05,238 In fact, we mean exactly what we say at the beginning of the term 23 00:01:05,238 --> 00:01:08,380 that so many students in CS50, both in Cambridge and here in New Haven, 24 00:01:08,380 --> 00:01:10,220 have never taken a CS course before. 25 00:01:10,220 --> 00:01:13,090 And what indeed ultimately matters is where you end up in week 12 26 00:01:13,090 --> 00:01:15,882 relative to yourself in week zero. 27 00:01:15,882 --> 00:01:17,590 Now we have multiple tracks in the course 28 00:01:17,590 --> 00:01:20,548 as you know-- less comfortable, more comfortable, somewhere in between. 29 00:01:20,548 --> 00:01:23,790 And indeed, when you get statistics on this week's quiz, 30 00:01:23,790 --> 00:01:27,460 don't be discouraged if, especially if you feel that you're around the mean 31 00:01:27,460 --> 00:01:30,780 or below the mean or the median, especially since we don't necessarily 32 00:01:30,780 --> 00:01:33,560 take all those demographics into account mid-semester 33 00:01:33,560 --> 00:01:35,000 with the grading statistics. 34 00:01:35,000 --> 00:01:37,250 >> In other words, we know statistically every year 35 00:01:37,250 --> 00:01:39,570 that students who are less comfortable, do a little worse on the quiz. 36 00:01:39,570 --> 00:01:42,050 And students who are more comfortable do a little better on the quiz. 37 00:01:42,050 --> 00:01:45,430 But per that promise in the syllabus and also in the first week of lectures, 38 00:01:45,430 --> 00:01:46,880 we take all of that into account. 39 00:01:46,880 --> 00:01:48,900 >> Indeed, at years end, what we end up doing 40 00:01:48,900 --> 00:01:52,154 is normalizing all scores across sections, both in Cambridge 41 00:01:52,154 --> 00:01:54,570 and now here in New Haven, which means taking into account 42 00:01:54,570 --> 00:01:57,310 the disparate styles, the disparate harshness, the different sort 43 00:01:57,310 --> 00:02:00,722 of personalities that the individual TAs have here and in Cambridge 44 00:02:00,722 --> 00:02:02,930 so that you're not at a disadvantage even if you just 45 00:02:02,930 --> 00:02:06,120 happen to have had a TF or a TA who's been a little tougher on you 46 00:02:06,120 --> 00:02:07,170 in your mind. 47 00:02:07,170 --> 00:02:10,139 >> Two, we take into account comfort level and actual background, or lack 48 00:02:10,139 --> 00:02:13,310 thereof, when taking quiz scores into account. 49 00:02:13,310 --> 00:02:14,830 So those two are factored in. 50 00:02:14,830 --> 00:02:17,142 And at the end of the day, because it's always the case 51 00:02:17,142 --> 00:02:19,100 that a student ended up in a less comfy section 52 00:02:19,100 --> 00:02:24,250 when he or she really belonged in an in-between or vice versa, 53 00:02:24,250 --> 00:02:26,230 everything is so incredibly individualized. 54 00:02:26,230 --> 00:02:29,560 Indeed, you will get annoyed at us at the end of the term when we are late 55 00:02:29,560 --> 00:02:32,630 submitting your grades because with Scaz, and Jason, and Andy, and I, 56 00:02:32,630 --> 00:02:35,430 and the team will have done in Cambridge is literally 57 00:02:35,430 --> 00:02:38,480 have hundreds of emails back and forth with all hundred of the courses 58 00:02:38,480 --> 00:02:41,279 TAs, here and in Cambridge, asking them what 59 00:02:41,279 --> 00:02:44,070 they think of all of their students based on a draft of the grades. 60 00:02:44,070 --> 00:02:46,230 And everything there after is incredibly individualized. 61 00:02:46,230 --> 00:02:49,230 So to the extent we get to know you in office hours, sections, and more, 62 00:02:49,230 --> 00:02:51,350 all of that too is taken into account. 63 00:02:51,350 --> 00:02:55,210 >> So though we tend to use this five point scale, please, detach yourself 64 00:02:55,210 --> 00:02:57,492 from the assumption that a three is indeed a 60%. 65 00:02:57,492 --> 00:02:58,450 It is meant to be good. 66 00:02:58,450 --> 00:03:01,360 And the teaching assistants are charged at term start 67 00:03:01,360 --> 00:03:04,050 to try to keep scores in the twos, and threes, 68 00:03:04,050 --> 00:03:06,590 and fours range so that we actually have room to grow. 69 00:03:06,590 --> 00:03:08,610 And we actually have a yardstick by which 70 00:03:08,610 --> 00:03:11,086 we can give you useful feedback as to how you're doing 71 00:03:11,086 --> 00:03:12,210 and how you're progressing. 72 00:03:12,210 --> 00:03:15,130 So please do take that to heart. 73 00:03:15,130 --> 00:03:20,565 >> Are there any questions I can help address or concerns I can help assuage? 74 00:03:20,565 --> 00:03:23,800 Or promises I can try to keep? 75 00:03:23,800 --> 00:03:24,690 No? 76 00:03:24,690 --> 00:03:25,330 OK. 77 00:03:25,330 --> 00:03:26,010 >> All right. 78 00:03:26,010 --> 00:03:28,970 So with that said, this is CS50. 79 00:03:28,970 --> 00:03:31,670 This is the start of week six here in New Haven. 80 00:03:31,670 --> 00:03:33,820 Let's begin with a brief dimming of the lights 81 00:03:33,820 --> 00:03:37,439 to set the stage for today's content. 82 00:03:37,439 --> 00:03:38,105 [VIDEO PLAYBACK] 83 00:03:38,105 --> 00:03:44,045 [MUSIC PLAYING] 84 00:03:44,045 --> 00:03:46,600 -He came with a message. 85 00:03:46,600 --> 00:03:50,300 86 00:03:50,300 --> 00:03:52,900 With a protocol all his own. 87 00:03:52,900 --> 00:04:06,417 88 00:04:06,417 --> 00:04:13,090 He came to a world of cool firewalls, uncaring routers, and dangers far worse 89 00:04:13,090 --> 00:04:16,079 that death. 90 00:04:16,079 --> 00:04:21,079 He's fast, he's strong, he's TCP/IP. 91 00:04:21,079 --> 00:04:23,120 And he's got your address. 92 00:04:23,120 --> 00:04:25,820 93 00:04:25,820 --> 00:04:29,268 Warriors of the net. 94 00:04:29,268 --> 00:04:29,917 >> [END PLAYBACK] 95 00:04:29,917 --> 00:04:31,000 DAVID J. MALAN: All right. 96 00:04:31,000 --> 00:04:32,030 This is CS50. 97 00:04:32,030 --> 00:04:33,820 This is the start of week six. 98 00:04:33,820 --> 00:04:37,270 And this is the start of our look at the internet and web programming. 99 00:04:37,270 --> 00:04:41,220 And, perhaps most excitingly, today marks the transition for us 100 00:04:41,220 --> 00:04:43,780 from our command line world of C to the web 101 00:04:43,780 --> 00:04:47,020 based world of PHP, and HTML, and CSS, and SQL, and JavaScript, 102 00:04:47,020 --> 00:04:49,800 and so much more that is on the horizon. 103 00:04:49,800 --> 00:04:53,390 >> But first, it has come to our attention in walking across campus that there 104 00:04:53,390 --> 00:04:57,914 is a certain bathroom here in New Haven called the Harvard room, which 105 00:04:57,914 --> 00:04:59,080 is a little greyed out here. 106 00:04:59,080 --> 00:05:03,830 But indeed, someone went to the time and expense of etching in Harvard room 107 00:05:03,830 --> 00:05:05,700 on this here room. 108 00:05:05,700 --> 00:05:07,790 Thank you for that. 109 00:05:07,790 --> 00:05:11,020 I can't say we have an analogue in Cambridge yet, 110 00:05:11,020 --> 00:05:14,060 but I think we have a little project for ourselves now when we go back. 111 00:05:14,060 --> 00:05:15,890 So thank you for that. 112 00:05:15,890 --> 00:05:18,340 >> So a quick look back at where we left off last week 113 00:05:18,340 --> 00:05:21,010 and where you're going this coming week with problems set five. 114 00:05:21,010 --> 00:05:24,350 So in problem set five, you'll be challenged to implement a spellchecker. 115 00:05:24,350 --> 00:05:26,630 And to do that, you'll be handed a pretty big text 116 00:05:26,630 --> 00:05:29,160 file with like 140,000 English words. 117 00:05:29,160 --> 00:05:32,610 And you'll be challenged to decide on a data structure with which you 118 00:05:32,610 --> 00:05:35,340 want to load all of those words into memory, and into RAM, 119 00:05:35,340 --> 00:05:38,470 and then implement a few functions, one of which is going to be check. 120 00:05:38,470 --> 00:05:41,555 Whereby when passed an argument, a word, your function check 121 00:05:41,555 --> 00:05:43,430 simply is going to have to say true or false, 122 00:05:43,430 --> 00:05:44,990 this is a word in the dictionary. 123 00:05:44,990 --> 00:05:47,110 >> But you're going to have some design discretion and challenges 124 00:05:47,110 --> 00:05:48,568 when it comes to implementing that. 125 00:05:48,568 --> 00:05:51,250 In the simplest implementation, you could certainly 126 00:05:51,250 --> 00:05:53,960 implement a spellchecker in the underlying dictionary 127 00:05:53,960 --> 00:05:55,380 with what kind of data structure? 128 00:05:55,380 --> 00:05:57,796 You just need to store a whole bunch of strings in memory? 129 00:05:57,796 --> 00:06:00,074 What's the go to answer from week two perhaps? 130 00:06:00,074 --> 00:06:00,740 AUDIENCE: Array. 131 00:06:00,740 --> 00:06:01,500 DAVID J. MALAN: You can use an array. 132 00:06:01,500 --> 00:06:02,750 And that's not all that bad. 133 00:06:02,750 --> 00:06:05,631 But you don't necessarily know in advance how big of an array 134 00:06:05,631 --> 00:06:08,630 you're going to need, if you don't know the file necessarily in advance. 135 00:06:08,630 --> 00:06:10,110 So you're going to have to use a little bit of trickery 136 00:06:10,110 --> 00:06:11,970 like malloc, like we started using. 137 00:06:11,970 --> 00:06:13,977 Or we could address that concern by using 138 00:06:13,977 --> 00:06:16,810 what other data structure that's been sort of a marginal enhancement 139 00:06:16,810 --> 00:06:17,894 on an array? 140 00:06:17,894 --> 00:06:18,810 AUDIENCE: Linked list. 141 00:06:18,810 --> 00:06:21,270 DAVID J. MALAN: Like a linked list, wherein we get some dynamism. 142 00:06:21,270 --> 00:06:22,686 But there's a little more expense. 143 00:06:22,686 --> 00:06:24,150 We have pointers to maintain. 144 00:06:24,150 --> 00:06:25,890 And you've not yet coded this up, but there's definitely 145 00:06:25,890 --> 00:06:28,473 to be a little more complexity than just using square brackets 146 00:06:28,473 --> 00:06:30,080 and jumping around an array. 147 00:06:30,080 --> 00:06:33,340 >> But an array's running time, if you're searching for a word, 148 00:06:33,340 --> 00:06:34,179 might be log of n. 149 00:06:34,179 --> 00:06:35,970 But again, it might be a little non-trivial 150 00:06:35,970 --> 00:06:38,734 to build up that array not knowing the size in advance. 151 00:06:38,734 --> 00:06:41,150 A linked list though, if you just store a bunch of strings 152 00:06:41,150 --> 00:06:43,300 in a linked list, what's your upper bound 153 00:06:43,300 --> 00:06:46,920 on running time going to be to search for or check a word in that list? 154 00:06:46,920 --> 00:06:47,700 >> AUDIENCE: n. 155 00:06:47,700 --> 00:06:50,575 >> DAVID J. MALAN: Yeah, big O of n or linear because in the worst case, 156 00:06:50,575 --> 00:06:52,640 the word is like a Z word all the way at the end. 157 00:06:52,640 --> 00:06:55,350 And because of a linked list, because those arrows by default, 158 00:06:55,350 --> 00:06:58,280 in a singly linked list, only go from one direction to the other, 159 00:06:58,280 --> 00:06:59,590 you can't jump around. 160 00:06:59,590 --> 00:07:01,160 You have to follow all of them. 161 00:07:01,160 --> 00:07:05,505 >> So we proposed at the end of last week, week five, that there are better ways. 162 00:07:05,505 --> 00:07:08,727 And in fact, the holy grail would really be constant time 163 00:07:08,727 --> 00:07:10,560 whereby when you want to look up a word, you 164 00:07:10,560 --> 00:07:13,370 get an instant answer irrespective of how many words are already 165 00:07:13,370 --> 00:07:14,350 in your dictionary. 166 00:07:14,350 --> 00:07:17,680 >> This is an artist's rendition of what you might call a hash table. 167 00:07:17,680 --> 00:07:21,900 And a hash table is kind of a nice amalgam of an array-- drawn vertically 168 00:07:21,900 --> 00:07:26,416 here, just because-- and then a linked list-- draw horizontally here. 169 00:07:26,416 --> 00:07:28,790 And the hash table can be implemented in bunches of ways. 170 00:07:28,790 --> 00:07:34,110 This excerpt from a textbook happens to use these people's birth dates 171 00:07:34,110 --> 00:07:38,940 as the means by which it's deciding where to put someone's name. 172 00:07:38,940 --> 00:07:41,230 So this is a dictionary if you will of names. 173 00:07:41,230 --> 00:07:45,240 And in order to expedite putting names into this data structure, 174 00:07:45,240 --> 00:07:49,280 they look at, apparently, these people's birth dates with respect to a month. 175 00:07:49,280 --> 00:07:50,570 >> So it's 1 to 31. 176 00:07:50,570 --> 00:07:52,910 And forget about February and corner cases like that. 177 00:07:52,910 --> 00:07:57,050 And if your birthday is on January 1, or February 1, or December 1, 178 00:07:57,050 --> 00:07:59,890 you're going to end up at the very first chain up top. 179 00:07:59,890 --> 00:08:02,150 If your birth date is like the 25th of a month, 180 00:08:02,150 --> 00:08:04,567 you're going to end up at bucket number 25. 181 00:08:04,567 --> 00:08:07,400 And if there's already someone there in any of those locations, what 182 00:08:07,400 --> 00:08:10,470 you start doing with these linked lists is stitching them together 183 00:08:10,470 --> 00:08:14,320 so that you can have an arbitrary number of people, or anything, 184 00:08:14,320 --> 00:08:15,580 at that location. 185 00:08:15,580 --> 00:08:18,400 >> So you have kind of a mix of constant time for hashing. 186 00:08:18,400 --> 00:08:21,160 And to hash something means to take as input like a person, 187 00:08:21,160 --> 00:08:25,360 or his or her name, or his or her birth date, and then decide on some output 188 00:08:25,360 --> 00:08:29,780 based on that, like looking at their birthday and outputting one through 31. 189 00:08:29,780 --> 00:08:31,900 >> So then you might have a bit of linear time, 190 00:08:31,900 --> 00:08:34,429 but in reality, and as in the case of problem set five, 191 00:08:34,429 --> 00:08:36,220 we're not going to be working in P set five 192 00:08:36,220 --> 00:08:40,059 so much about asymptotic running time, like the theoretical slowness 193 00:08:40,059 --> 00:08:41,809 with which an algorithm might run. 194 00:08:41,809 --> 00:08:44,330 We're going to care about the actual number of seconds 195 00:08:44,330 --> 00:08:47,350 and the actual amount of memory, the actual number of bytes of memory 196 00:08:47,350 --> 00:08:48,140 you're using. 197 00:08:48,140 --> 00:08:52,710 So frankly, having one huge chain of like a million people 198 00:08:52,710 --> 00:08:56,710 is pretty damn slow if you're searching for a name in a list of size million. 199 00:08:56,710 --> 00:08:59,830 >> But what if you divide that list up into 31 parts? 200 00:08:59,830 --> 00:09:04,400 Searching 1/31 of that super long list, in reality, 201 00:09:04,400 --> 00:09:05,741 is certainly going to be faster. 202 00:09:05,741 --> 00:09:07,240 Asymptotically, it's the same thing. 203 00:09:07,240 --> 00:09:08,860 You're just dividing by a constant factor. 204 00:09:08,860 --> 00:09:10,651 And recall that we throw those things away. 205 00:09:10,651 --> 00:09:13,486 But in reality, it's going to be 31 times faster. 206 00:09:13,486 --> 00:09:16,110 And that's what we're going to start to leverage in P set five. 207 00:09:16,110 --> 00:09:18,750 >> So P set five too also proposes that you consider 208 00:09:18,750 --> 00:09:21,810 slightly more sophisticated data structure called a trie. 209 00:09:21,810 --> 00:09:24,420 And a trie is just a tree like data structure. 210 00:09:24,420 --> 00:09:26,672 But instead of having little circles or rectangles 211 00:09:26,672 --> 00:09:28,380 as we keep drawing for nodes, it actually 212 00:09:28,380 --> 00:09:30,840 has entire arrays for its nodes. 213 00:09:30,840 --> 00:09:33,430 And even though this is a bit abstract here to look at, 214 00:09:33,430 --> 00:09:35,450 Zamyla in the P set walk through will walk you 215 00:09:35,450 --> 00:09:37,580 through in more detail on this. 216 00:09:37,580 --> 00:09:39,980 This is a data structure that rather cleverly 217 00:09:39,980 --> 00:09:44,130 might have each node being an array of size 26, A through Z or zero 218 00:09:44,130 --> 00:09:45,320 through 25. 219 00:09:45,320 --> 00:09:49,260 And when you want to insert a person's name into this data structure or find 220 00:09:49,260 --> 00:09:53,990 him or her, what you do, if the name is like Maxwell, M-A-X-W-E-L-L, 221 00:09:53,990 --> 00:09:57,900 you first look at M. And then you jump to the corresponding M location 222 00:09:57,900 --> 00:09:59,100 in the first array. 223 00:09:59,100 --> 00:10:02,400 You then jump to A, the first location in the next array, 224 00:10:02,400 --> 00:10:03,610 following the arrows. 225 00:10:03,610 --> 00:10:08,300 Then X, then W, then E, then L, then L, and then maybe some special end 226 00:10:08,300 --> 00:10:11,850 character, some sentinel that says a word stops here. 227 00:10:11,850 --> 00:10:14,780 >> And what's nice about this-- and keep in mind that the picture here, 228 00:10:14,780 --> 00:10:16,797 notice how edges of every array are cut off. 229 00:10:16,797 --> 00:10:19,630 That's just because this thing would be massive and horrific to look 230 00:10:19,630 --> 00:10:20,338 at on the screen. 231 00:10:20,338 --> 00:10:21,820 So it's excerpted. 232 00:10:21,820 --> 00:10:25,920 What's nice about this approach is that if there's a million names already 233 00:10:25,920 --> 00:10:30,890 in this data structure, how many steps does it take me to insert Maxwell? 234 00:10:30,890 --> 00:10:36,450 M-A-X-W-E-L-L-- like seven-ish steps to insert or look for Maxwell. 235 00:10:36,450 --> 00:10:39,320 >> Suppose there's a trillion names in this data structure. 236 00:10:39,320 --> 00:10:41,900 How many steps does it take me to look for Maxwell? 237 00:10:41,900 --> 00:10:43,450 M-A-X-- still seven. 238 00:10:43,450 --> 00:10:45,770 >> And therein lies the so-called constant time. 239 00:10:45,770 --> 00:10:47,960 If we assume that words are certainly bounded 240 00:10:47,960 --> 00:10:52,150 by 20 characters, or 46 characters, or some reasonably small integer, 241 00:10:52,150 --> 00:10:53,790 then it's effectively a constant. 242 00:10:53,790 --> 00:10:57,790 And so insertion and searching a trie is super fast. 243 00:10:57,790 --> 00:10:59,540 Of course, we never get anything for free. 244 00:10:59,540 --> 00:11:02,740 And even though you probably haven't dived into P set five yet, 245 00:11:02,740 --> 00:11:06,596 what price are we probably paying to get that greater efficiency time wise? 246 00:11:06,596 --> 00:11:07,470 >> AUDIENCE: Memory. 247 00:11:07,470 --> 00:11:08,390 >> DAVID J. MALAN: Memory, right? 248 00:11:08,390 --> 00:11:10,240 I mean, we've not drawn the whole picture here. 249 00:11:10,240 --> 00:11:12,823 This excerpt from the textbook hasn't drawn all of the arrays. 250 00:11:12,823 --> 00:11:16,687 There's a huge amount of memory and just null pointers that aren't being used. 251 00:11:16,687 --> 00:11:17,520 So it's a trade off. 252 00:11:17,520 --> 00:11:21,050 And it'll be left to you in P set five to decide on which way you want to go. 253 00:11:21,050 --> 00:11:24,460 >> Now this idea of hashing, as an aside, is actually super prevalent. 254 00:11:24,460 --> 00:11:27,980 So to hash a value means, quite simply, to take something as input 255 00:11:27,980 --> 00:11:29,220 and produce an output. 256 00:11:29,220 --> 00:11:31,270 So a hash function is just an algorithm. 257 00:11:31,270 --> 00:11:34,990 >> And generally, a hash functions purpose in life is to take something as input 258 00:11:34,990 --> 00:11:39,655 and produce a number as output, like the number one through 31 or A through Z, 259 00:11:39,655 --> 00:11:40,960 zero through 25. 260 00:11:40,960 --> 00:11:44,250 So it takes a complex output and shrinks it down to something 261 00:11:44,250 --> 00:11:46,310 that's a little more useful and manageable. 262 00:11:46,310 --> 00:11:49,250 >> And so it turns out in a very popular function 263 00:11:49,250 --> 00:11:51,160 that the security world and the human world's 264 00:11:51,160 --> 00:11:53,060 been using for years is called SHA1. 265 00:11:53,060 --> 00:11:56,260 This is a pretty fancy mathematical formula that does essentially that. 266 00:11:56,260 --> 00:11:58,870 >> You take a really big chunk of zeros and ones-- 267 00:11:58,870 --> 00:12:01,530 that could be a megabyte long, a gigabyte long-- 268 00:12:01,530 --> 00:12:05,930 and it shrinks it down to just a few bits, a few bits, 269 00:12:05,930 --> 00:12:09,082 so that you have a number like one through 31, or A through Z. 270 00:12:09,082 --> 00:12:11,540 But in reality, it's a little bigger than just A through Z. 271 00:12:11,540 --> 00:12:16,640 >> Unfortunately, we're on the cusp of what someone playfully called the SHAppening 272 00:12:16,640 --> 00:12:19,840 whereby the world is about to end in probably a few months 273 00:12:19,840 --> 00:12:22,617 time because researchers, just this past week, 274 00:12:22,617 --> 00:12:25,700 published a report that contrary to what security researchers have thought 275 00:12:25,700 --> 00:12:29,810 for some time, by just spending about, what was it, 276 00:12:29,810 --> 00:12:33,420 I think it was $175,000-- a lot of money, 277 00:12:33,420 --> 00:12:36,540 but not beyond the reach of particularly bad bad guys, 278 00:12:36,540 --> 00:12:41,560 or particularly bad countries-- $175,000 could buy you a lot of rented server 279 00:12:41,560 --> 00:12:42,690 space in the cloud. 280 00:12:42,690 --> 00:12:44,619 And we'll come back to the cloud before long. 281 00:12:44,619 --> 00:12:47,410 But it just means renting server space on like Microsoft's servers, 282 00:12:47,410 --> 00:12:50,285 or Google's, or Amazon's, or the like where you can pay by the minute 283 00:12:50,285 --> 00:12:51,670 to use someone else's computers. 284 00:12:51,670 --> 00:12:54,250 >> And it turns out if you can pay someone else to borrow their computers 285 00:12:54,250 --> 00:12:56,730 and run code that you've written on it and use pretty fancy 286 00:12:56,730 --> 00:13:01,580 mathematics, you can essentially figure out how someone's hash function is 287 00:13:01,580 --> 00:13:05,320 working, and given its output, reverse engineer what its input is. 288 00:13:05,320 --> 00:13:08,590 And for today's purposes, suffice it to say, this is bad. 289 00:13:08,590 --> 00:13:12,540 Because SHA1 and hash functions like it are super commonly 290 00:13:12,540 --> 00:13:17,050 used in security applications, encrypted connections on the web, 291 00:13:17,050 --> 00:13:21,890 bank transactions, cellular encryption for your cell phones, and the like. 292 00:13:21,890 --> 00:13:24,880 And so any time someone finds a way to reverse 293 00:13:24,880 --> 00:13:28,510 engineer one of these technologies or break it, bad things can happen. 294 00:13:28,510 --> 00:13:30,300 >> Now the world already knew this. 295 00:13:30,300 --> 00:13:31,310 This was foreseeable. 296 00:13:31,310 --> 00:13:34,670 And the world has since moved from SHA1 to SHA256, 297 00:13:34,670 --> 00:13:37,320 which is just a fancy way of saying they use bigger bits. 298 00:13:37,320 --> 00:13:40,570 And in fact, even CS50's own website upgraded last year to-- 299 00:13:40,570 --> 00:13:43,290 not that we face all this many threats trying to get at the PDFs 300 00:13:43,290 --> 00:13:46,520 and whatnot-- but CS50's website uses the bigger hash function, 301 00:13:46,520 --> 00:13:47,980 which means that we will be safe. 302 00:13:47,980 --> 00:13:50,020 So all of your PDFs will be safe, but not 303 00:13:50,020 --> 00:13:52,880 necessarily your money or anything particularly private or personal 304 00:13:52,880 --> 00:13:53,380 to use. 305 00:13:53,380 --> 00:13:56,550 Sp check out that URL if you'd like some additional details. 306 00:13:56,550 --> 00:13:59,840 >> So problem set five is indeed on the horizon. 307 00:13:59,840 --> 00:14:01,560 Quiz one is this coming Wednesday. 308 00:14:01,560 --> 00:14:04,479 But do take advantage of office hours, both tonight and tomorrow. 309 00:14:04,479 --> 00:14:07,770 And also take advantage of office hours, if you're available, right after this. 310 00:14:07,770 --> 00:14:11,550 The staff and I'll stick around and do more casual Q&A in addition to tonight. 311 00:14:11,550 --> 00:14:16,610 And let me strongly note here, for those of us here in New Haven-- 312 00:14:16,610 --> 00:14:19,360 so it's absolutely per Scaz's remarks felt, I'm sure, 313 00:14:19,360 --> 00:14:21,140 like a bit of an uphill struggle. 314 00:14:21,140 --> 00:14:24,627 And by reputation, if you haven't learned already or heard 315 00:14:24,627 --> 00:14:27,710 from some friends at Harvard, know here are some new institutional memory. 316 00:14:27,710 --> 00:14:30,790 P set five kind of sort of tends to be the hardest in CS50, 317 00:14:30,790 --> 00:14:32,590 or the most challenging for most students. 318 00:14:32,590 --> 00:14:37,180 >> But what that means is that we're almost at the top of this hill. 319 00:14:37,180 --> 00:14:38,270 And I really do mean this. 320 00:14:38,270 --> 00:14:40,728 It's the most challenging, but it's also the most rewarding 321 00:14:40,728 --> 00:14:43,560 in that unlike most every other introductory computer science 322 00:14:43,560 --> 00:14:47,980 course in the US that we know of, most students do not finish an intro 323 00:14:47,980 --> 00:14:51,400 course having already implemented things like trees, and tries, 324 00:14:51,400 --> 00:14:52,880 and hash tables, and the like. 325 00:14:52,880 --> 00:14:54,770 >> And so I do hope, and we do hope that you're 326 00:14:54,770 --> 00:14:57,280 have an enormous sense of satisfaction even 327 00:14:57,280 --> 00:15:00,760 if the week or two via which you get to that satisfaction 328 00:15:00,760 --> 00:15:02,490 does feel a little bit like this. 329 00:15:02,490 --> 00:15:05,250 But let me reassure, we only have four P sets left. 330 00:15:05,250 --> 00:15:07,380 So sort of that top is in sight. 331 00:15:07,380 --> 00:15:12,370 >> On the other side of it, trust us, it's just rolling hills and clouds. 332 00:15:12,370 --> 00:15:16,000 And shall we say, puppies are on the other side. 333 00:15:16,000 --> 00:15:18,340 So you just have to hang in there a little longer. 334 00:15:18,340 --> 00:15:22,050 I mean, indeed as we start to transition into the world of web programming, 335 00:15:22,050 --> 00:15:26,060 you'll find that things become-- this is adorable actually. 336 00:15:26,060 --> 00:15:29,680 337 00:15:29,680 --> 00:15:33,490 OK, we'll post this URL later. 338 00:15:33,490 --> 00:15:36,140 You'll find too that we're reaching sort of a plateau 339 00:15:36,140 --> 00:15:38,150 where everything is indeed still sophisticated 340 00:15:38,150 --> 00:15:40,170 and challenging by design, but you're not 341 00:15:40,170 --> 00:15:42,590 going to feel like we are perpetually going up this hill. 342 00:15:42,590 --> 00:15:44,390 So take some comfort in that. 343 00:15:44,390 --> 00:15:47,409 >> So without further ado, let's start to make this market transition 344 00:15:47,409 --> 00:15:49,950 in the semester to the world of the web, and really the world 345 00:15:49,950 --> 00:15:51,420 with which all of us are more familiar. 346 00:15:51,420 --> 00:15:53,753 We've got internet devices in our pockets, on our desks, 347 00:15:53,753 --> 00:15:55,127 in our backpacks, and the like. 348 00:15:55,127 --> 00:15:56,210 How does all of this work? 349 00:15:56,210 --> 00:15:59,077 And how can we start writing code that's not super arcane 350 00:15:59,077 --> 00:16:01,910 and in some blinking text prompt that none of your friends or family 351 00:16:01,910 --> 00:16:04,659 are ever going to want to interact with, but something you can put 352 00:16:04,659 --> 00:16:06,660 on their phones, or on their web browsers, 353 00:16:06,660 --> 00:16:09,010 or on any devices with which they interact. 354 00:16:09,010 --> 00:16:10,430 >> So here is someone's home. 355 00:16:10,430 --> 00:16:14,040 And inside of this home is a couple of laptops, a couple of old school desktop 356 00:16:14,040 --> 00:16:17,470 computers, something called a router or hub in the middle, 357 00:16:17,470 --> 00:16:20,310 and then some kind of cable modem or DSL modem. 358 00:16:20,310 --> 00:16:24,560 And then there's the internet, generally drawn as a cloud up there in the sky. 359 00:16:24,560 --> 00:16:29,230 >> So this picture, though a little sort of dated, 360 00:16:29,230 --> 00:16:32,604 certainly captures what most of you probably have in your homes, 361 00:16:32,604 --> 00:16:35,520 or effectively what all of you have in your dorm rooms, or apartments, 362 00:16:35,520 --> 00:16:36,480 or the like. 363 00:16:36,480 --> 00:16:40,010 >> So what is actually going on when you try to use the internet today? 364 00:16:40,010 --> 00:16:42,010 So every computer on the internet, it turns out, 365 00:16:42,010 --> 00:16:46,860 needs to have a unique address, much like we in the real world need 366 00:16:46,860 --> 00:16:52,050 a postal address, like 51 Prospect Street, New Haven, Connecticut, or 33 367 00:16:52,050 --> 00:16:54,170 Oxford Street, Cambridge, Massachusetts. 368 00:16:54,170 --> 00:16:58,520 So do computers on the internet need a way of uniquely addressing themselves. 369 00:16:58,520 --> 00:17:01,180 >> That is so that when one computer wants to talk to another, 370 00:17:01,180 --> 00:17:04,525 it can send a message and inform the recipient to whom 371 00:17:04,525 --> 00:17:05,900 it should send the response back. 372 00:17:05,900 --> 00:17:07,900 So it just makes sort of intuitive sense perhaps 373 00:17:07,900 --> 00:17:09,980 that everything have an address of some sort. 374 00:17:09,980 --> 00:17:11,240 >> But how do you get an address? 375 00:17:11,240 --> 00:17:13,589 Well, if you get here on campus, or you go home 376 00:17:13,589 --> 00:17:15,760 and you turn on your laptop or desktop computer, 377 00:17:15,760 --> 00:17:17,770 and either plug it in or connect to Wi-Fi, 378 00:17:17,770 --> 00:17:19,569 it turns out that there's a special server 379 00:17:19,569 --> 00:17:22,089 on most networks called a DHCP server. 380 00:17:22,089 --> 00:17:23,880 Doesn't really matter what this stands for, 381 00:17:23,880 --> 00:17:26,660 but it's dynamic host configuration protocol, which is just 382 00:17:26,660 --> 00:17:29,760 a fancy way of saying, this is a computer that either Yale has, 383 00:17:29,760 --> 00:17:32,600 or Harvard has, or Comcast has, or Verizon has, 384 00:17:32,600 --> 00:17:35,100 or your company has, whose purpose in life, 385 00:17:35,100 --> 00:17:38,810 when it hears someone newly added to the network, is to say here, 386 00:17:38,810 --> 00:17:40,010 use this address. 387 00:17:40,010 --> 00:17:42,790 >> So we humans don't have to hard code into our computers 388 00:17:42,790 --> 00:17:44,040 what our unique address is. 389 00:17:44,040 --> 00:17:47,070 We just turn it on, open the lid, and somehow this server 390 00:17:47,070 --> 00:17:52,210 on the local network just tells me that my address is 51 Prospect Street, or 33 391 00:17:52,210 --> 00:17:53,940 Oxford Street, or the like. 392 00:17:53,940 --> 00:17:56,000 >> Now it's not going to be so verbose as that. 393 00:17:56,000 --> 00:18:00,210 Rather what I'm going to get is a numeric address called an IP address. 394 00:18:00,210 --> 00:18:01,960 IP meaning internet protocol. 395 00:18:01,960 --> 00:18:06,025 And odds are by this time in your life, you probably heard or seen the word IP, 396 00:18:06,025 --> 00:18:08,140 or generally thrown it around perhaps. 397 00:18:08,140 --> 00:18:10,720 But in fact, it's pretty straight forward a thing. 398 00:18:10,720 --> 00:18:13,610 >> An IP address is just a dotted decimal number, 399 00:18:13,610 --> 00:18:17,150 which means it's something dot something dot something dot something. 400 00:18:17,150 --> 00:18:21,980 And each of those somethings happens to be a number between 0 and 255. 401 00:18:21,980 --> 00:18:26,710 >> So based on five plus weeks of CS50, if these numbers each range from 0 to 255, 402 00:18:26,710 --> 00:18:28,713 how many bits is each of those number signs? 403 00:18:28,713 --> 00:18:29,420 >> AUDIENCE: Eight. 404 00:18:29,420 --> 00:18:30,100 >> DAVID J. MALAN: It's got to be eight. 405 00:18:30,100 --> 00:18:31,933 So in total, how many bits is an IP address? 406 00:18:31,933 --> 00:18:32,710 AUDIENCE: 32. 407 00:18:32,710 --> 00:18:33,820 >> DAVID J. MALAN: So 32. 408 00:18:33,820 --> 00:18:35,830 8 plus 8 plus 8 plus 8 is 32. 409 00:18:35,830 --> 00:18:38,767 How many total IP addresses can there be in the world? 410 00:18:38,767 --> 00:18:39,600 AUDIENCE: 4 billion. 411 00:18:39,600 --> 00:18:42,410 DAVID J. MALAN: So roughly four billion because that's 2 the 32 power. 412 00:18:42,410 --> 00:18:44,410 And if you can't sort of grok that in your mind, 413 00:18:44,410 --> 00:18:47,470 just know that 32-bit values can be as big as 4 billion 414 00:18:47,470 --> 00:18:49,140 if it's all positive values. 415 00:18:49,140 --> 00:18:52,500 So that means there's 4 billion possible IP addresses in the world. 416 00:18:52,500 --> 00:18:55,090 >> And funny story, we're kind of running out of them. 417 00:18:55,090 --> 00:18:59,720 And in fact it's a huge problem in that the world also saw this problem coming, 418 00:18:59,720 --> 00:19:03,639 but hasn't necessarily responded to it in the most rapid way possible. 419 00:19:03,639 --> 00:19:05,680 And indeed, once you've finished CS50 and started 420 00:19:05,680 --> 00:19:06,950 paying attention in the tech world, you'll 421 00:19:06,950 --> 00:19:08,520 see this is very commonly thematic. 422 00:19:08,520 --> 00:19:12,260 >> For instance, if we go really old school nowadays, Y2K. 423 00:19:12,260 --> 00:19:13,570 That wasn't really a surprise. 424 00:19:13,570 --> 00:19:15,700 Like everyone knew for 1,000 years that that 425 00:19:15,700 --> 00:19:20,250 was-- more than a thousand years-- that that was eventually going to happen. 426 00:19:20,250 --> 00:19:23,295 And yet, we responded to it very much at the last minute. 427 00:19:23,295 --> 00:19:24,420 And that's happening again. 428 00:19:24,420 --> 00:19:26,740 So today we'll talk about IP version 4. 429 00:19:26,740 --> 00:19:29,250 But know that the world is finally getting 430 00:19:29,250 --> 00:19:31,590 around to upgrading to something called IPv6, 431 00:19:31,590 --> 00:19:34,540 which instead of 32-bit addresses, uses-- anyone 432 00:19:34,540 --> 00:19:36,443 want to take a guess, how many bits? 433 00:19:36,443 --> 00:19:37,310 >> AUDIENCE: 64? 434 00:19:37,310 --> 00:19:38,380 >> DAVID J. MALAN: Good guess, but no. 435 00:19:38,380 --> 00:19:39,975 We're finally trying to get ahead of the curve. 436 00:19:39,975 --> 00:19:40,558 >> AUDIENCE: 128. 437 00:19:40,558 --> 00:19:43,490 DAVID J. MALAN: 128, which is a freaking huge number 438 00:19:43,490 --> 00:19:46,250 of IP addresses, because that's like times 2, times 2, 439 00:19:46,250 --> 00:19:49,620 times 2, a lot of times twos up from 4 billion. 440 00:19:49,620 --> 00:19:50,530 >> So if curious. 441 00:19:50,530 --> 00:19:54,110 It turns out-- and I just googled this to find this out-- Yale computers, 442 00:19:54,110 --> 00:19:58,910 here at Yale, tend to start with these numbers-- 130.132 dot something, 443 00:19:58,910 --> 00:20:00,669 and 128.36 dot something. 444 00:20:00,669 --> 00:20:02,710 But there's certainly exceptions across the board 445 00:20:02,710 --> 00:20:05,334 depending on what department and building and campus you're on. 446 00:20:05,334 --> 00:20:09,310 Harvard tends to have 140.247, or 128.103. 447 00:20:09,310 --> 00:20:11,530 And generally this is useless information, 448 00:20:11,530 --> 00:20:13,260 but it's something you might notice now. 449 00:20:13,260 --> 00:20:15,593 When you start poking around settings on your computers, 450 00:20:15,593 --> 00:20:18,540 you might start to notice these kinds of patterns before long. 451 00:20:18,540 --> 00:20:23,470 >> But when you're at home and have an Apple AirPort, or a Linksys device, 452 00:20:23,470 --> 00:20:26,560 or a D-Link, or whatever it is your parents or siblings installed 453 00:20:26,560 --> 00:20:28,890 in your house, well what you probably have 454 00:20:28,890 --> 00:20:30,800 is what's called a private IP address. 455 00:20:30,800 --> 00:20:34,850 And these were actually a nice, temporary solution 456 00:20:34,850 --> 00:20:38,050 to the problem of running short on IP addresses. 457 00:20:38,050 --> 00:20:40,382 >> And what you can do with home networks, typically-- 458 00:20:40,382 --> 00:20:42,340 and frankly, even Yale and Harvard are starting 459 00:20:42,340 --> 00:20:46,840 to do this in different areas-- is you can give a whole bunch of computers 460 00:20:46,840 --> 00:20:50,360 one IP address so long as you put a special device in front of them, 461 00:20:50,360 --> 00:20:52,410 something called a router, or it can be called 462 00:20:52,410 --> 00:20:54,060 a proxy or any number of other things. 463 00:20:54,060 --> 00:20:56,710 But a certain device that has that one IP address. 464 00:20:56,710 --> 00:20:59,450 And then behind that device, within a building, 465 00:20:59,450 --> 00:21:03,030 within a house or an apartment, can be any number of computers, all of which 466 00:21:03,030 --> 00:21:06,460 have an IP address that start with one of these digits here. 467 00:21:06,460 --> 00:21:08,590 And so long as that computer knows how to convert 468 00:21:08,590 --> 00:21:10,900 the public address to the private address, 469 00:21:10,900 --> 00:21:13,340 everything can sort of work as expected. 470 00:21:13,340 --> 00:21:17,290 >> But the converse of this is that if you're at home and you have a sibling, 471 00:21:17,290 --> 00:21:19,650 and both of you are visiting some website, 472 00:21:19,650 --> 00:21:23,520 that website does not know if it's you or your sibling visiting the website, 473 00:21:23,520 --> 00:21:26,780 because you appear to be the same person because all of your data 474 00:21:26,780 --> 00:21:30,050 is going through that router or that central point. 475 00:21:30,050 --> 00:21:32,220 >> But enough on these lower level details. 476 00:21:32,220 --> 00:21:37,770 Let's take a look at how IP addresses sometimes come up perhaps in the media 477 00:21:37,770 --> 00:21:41,370 and how we can now start to ruin, frankly, even more shows for you. 478 00:21:41,370 --> 00:21:45,010 If we could dim the lights for a few seconds. 479 00:21:45,010 --> 00:21:46,550 >> [VIDEO PLAYBACK] 480 00:21:46,550 --> 00:21:49,310 >> -It's a 32-bit on IPP 4 address. 481 00:21:49,310 --> 00:21:50,650 >> -IP ES internet-- 482 00:21:50,650 --> 00:21:52,800 >> -Private network, Tamia's private network. 483 00:21:52,800 --> 00:22:05,670 484 00:22:05,670 --> 00:22:08,145 She's so amazing. 485 00:22:08,145 --> 00:22:09,630 -Come on Charlie. 486 00:22:09,630 --> 00:22:11,750 DAVID J. MALAN: It's a mirror IP address. 487 00:22:11,750 --> 00:22:15,093 She's letting us watch which she's doing in real time. 488 00:22:15,093 --> 00:22:16,285 >> [END PLAYBACK] 489 00:22:16,285 --> 00:22:17,130 >> DAVID J. MALAN: OK. 490 00:22:17,130 --> 00:22:18,480 So a few problems with this. 491 00:22:18,480 --> 00:22:22,740 So one, what we're looking at here on the screen 492 00:22:22,740 --> 00:22:25,290 is a code written in a language called Objective-C, 493 00:22:25,290 --> 00:22:28,020 which is kind of a successor to the C language that we're doing. 494 00:22:28,020 --> 00:22:30,150 This has absolutely nothing to do with programming. 495 00:22:30,150 --> 00:22:32,399 In fact, as best I can tell, this is a drawing program 496 00:22:32,399 --> 00:22:38,360 that someone downloaded from the internet somehow involving crayons. 497 00:22:38,360 --> 00:22:43,660 >> Perhaps less egregious is that this IP address, valid or invalid? 498 00:22:43,660 --> 00:22:44,520 >> AUDIENCE: Invalid. 499 00:22:44,520 --> 00:22:48,479 >> DAVID J. MALAN: Invalid, because 275 is, of course, not between 0 and 255. 500 00:22:48,479 --> 00:22:51,770 That too is probably OK though, because you don't want to bunch of crazy people 501 00:22:51,770 --> 00:22:54,790 who are like pausing TV on their TiVos and then visiting the IP to see 502 00:22:54,790 --> 00:22:56,290 if there's actually something there. 503 00:22:56,290 --> 00:22:58,200 So that one's a little less egregious. 504 00:22:58,200 --> 00:23:00,990 But realize that too is sort of all around us. 505 00:23:00,990 --> 00:23:04,980 >> So of course, none of us ever really type numeric addresses 506 00:23:04,980 --> 00:23:06,410 into our browsers. 507 00:23:06,410 --> 00:23:09,580 It would be kind of a bad thing if Google, to visit Google, 508 00:23:09,580 --> 00:23:15,060 you had to go to 123.46.57.89. 509 00:23:15,060 --> 00:23:17,007 And the whole world had to just remember that. 510 00:23:17,007 --> 00:23:19,090 And frankly, we've kind of seen this issue before. 511 00:23:19,090 --> 00:23:24,130 Back in the day when people don't have cell phones and contact lists, 512 00:23:24,130 --> 00:23:27,640 and companies actually still-- actually, I guess companies still have 800 513 00:23:27,640 --> 00:23:30,600 numbers and the like-- but you generally see numbers advertised 514 00:23:30,600 --> 00:23:36,480 as 1-800-COLLECT, C-O-L-L-E-C-T. Because no one can really remember, 515 00:23:36,480 --> 00:23:39,754 when seeing an advertisement on a bus or billboard, what someone's number is, 516 00:23:39,754 --> 00:23:42,420 but they can probably, with higher probability, remember a word. 517 00:23:42,420 --> 00:23:44,530 >> So we adopted the same kind of system in the world 518 00:23:44,530 --> 00:23:48,290 of the internet whereby there's a domain name system so that we humans can 519 00:23:48,290 --> 00:23:52,340 type google.com, facebook.com, yale.edu, harvard.edu, 520 00:23:52,340 --> 00:23:56,140 and let the computers figure out what the corresponding IP 521 00:23:56,140 --> 00:23:58,480 address is for a given name. 522 00:23:58,480 --> 00:24:01,620 >> And the way you do this in the real world is that for $10 a year, 523 00:24:01,620 --> 00:24:05,900 maybe $50 a year, you can buy a domain name, or really rent a domain name. 524 00:24:05,900 --> 00:24:09,860 And then whoever you're paying to rent that domain name, 525 00:24:09,860 --> 00:24:14,069 you tell them who in the world knows what your IP address is. 526 00:24:14,069 --> 00:24:16,360 And we won't go into these particulars, but many of you 527 00:24:16,360 --> 00:24:18,750 might want, for final projects, to actually sign up 528 00:24:18,750 --> 00:24:22,120 for your own web hosting company, either for free 529 00:24:22,120 --> 00:24:23,500 or for a few dollars per month. 530 00:24:23,500 --> 00:24:26,100 Some of you might want to buy, for a few dollars, your own domain name, 531 00:24:26,100 --> 00:24:28,900 just for fun or to start a business or a personal site or the like. 532 00:24:28,900 --> 00:24:30,880 >> And realize that all of that will ultimately 533 00:24:30,880 --> 00:24:35,260 boil down to you telling the world what your server's IP address is. 534 00:24:35,260 --> 00:24:37,740 And then these DNS servers actually take care 535 00:24:37,740 --> 00:24:39,910 of informing the rest of the world. 536 00:24:39,910 --> 00:24:41,830 So all a DNS server has, in short, inside 537 00:24:41,830 --> 00:24:44,780 of its memory is like the equivalent of a Google spreadsheet 538 00:24:44,780 --> 00:24:49,420 or an Excel spreadsheet with at least two columns, one of which has names, 539 00:24:49,420 --> 00:24:52,100 like harvard.edu, and yale.edu, and google.com. 540 00:24:52,100 --> 00:24:55,870 And the other column has the corresponding IP address or IP 541 00:24:55,870 --> 00:24:56,382 addresses. 542 00:24:56,382 --> 00:24:57,590 And we can actually see this. 543 00:24:57,590 --> 00:24:59,881 So on my Mac-- and you can do this on Windows computers 544 00:24:59,881 --> 00:25:04,330 as well-- if I open up a terminal window here, quite like the one in CD50 IDE, 545 00:25:04,330 --> 00:25:08,190 most computers have a command called nslookup, name server look up. 546 00:25:08,190 --> 00:25:12,380 And if I type something in like yale.edu and hit Enter, what 547 00:25:12,380 --> 00:25:19,250 I should see if my network cooperates as it did for multiple tests before class 548 00:25:19,250 --> 00:25:24,584 began-- let's try google.com. 549 00:25:24,584 --> 00:25:26,390 Of course now nothing's working. 550 00:25:26,390 --> 00:25:27,190 That's great. 551 00:25:27,190 --> 00:25:30,660 All right, stand by for one moment. 552 00:25:30,660 --> 00:25:32,100 nslookup google.com. 553 00:25:32,100 --> 00:25:35,570 554 00:25:35,570 --> 00:25:38,660 >> Well, let's see if the actual internet-- no. 555 00:25:38,660 --> 00:25:39,700 That's what happened. 556 00:25:39,700 --> 00:25:43,090 Oh my god, all right. 557 00:25:43,090 --> 00:25:45,490 The Wi-Fi broke. 558 00:25:45,490 --> 00:25:47,410 >> Hey, want to know what my IP address is? 559 00:25:47,410 --> 00:25:49,722 All right. 560 00:25:49,722 --> 00:25:50,820 YaleSecure. 561 00:25:50,820 --> 00:25:53,330 This is how you troubleshoot things as a computer scientist. 562 00:25:53,330 --> 00:25:54,617 We turn the Wi-Fi off. 563 00:25:54,617 --> 00:25:57,480 564 00:25:57,480 --> 00:25:59,450 OK. 565 00:25:59,450 --> 00:26:02,610 >> And actually, Scaz, do you mind logging us into the secure one? 566 00:26:02,610 --> 00:26:06,246 Otherwise more tests are-- OK, thank you Yale-- or is about to break. 567 00:26:06,246 --> 00:26:07,370 I want to go on YaleSecure. 568 00:26:07,370 --> 00:26:09,880 569 00:26:09,880 --> 00:26:11,870 Oh, and maybe we'll be OK. 570 00:26:11,870 --> 00:26:12,686 Maybe we're back. 571 00:26:12,686 --> 00:26:16,810 572 00:26:16,810 --> 00:26:19,733 And that's how, as a computer scientist fix a computer. 573 00:26:19,733 --> 00:26:22,000 [APPLAUSE] 574 00:26:22,000 --> 00:26:23,220 All right. 575 00:26:23,220 --> 00:26:27,160 So where I was within this so-called terminal window, 576 00:26:27,160 --> 00:26:32,270 and if I do nslookup yale.edu, there we go. 577 00:26:32,270 --> 00:26:38,350 So I get back first the IP address of the DNS server that my laptop is using. 578 00:26:38,350 --> 00:26:41,610 So in addition to a DHCP server that we talked about a moment ago telling 579 00:26:41,610 --> 00:26:44,720 my laptop what my IP address is, that DHCP server 580 00:26:44,720 --> 00:26:46,860 also tells me what DNS server to use. 581 00:26:46,860 --> 00:26:48,860 Otherwise I would have to manually type this in. 582 00:26:48,860 --> 00:26:50,359 >> But that's not all that interesting. 583 00:26:50,359 --> 00:26:54,310 What I care about is that this is the IP address of Yale's website apparently. 584 00:26:54,310 --> 00:26:55,470 So in fact, let's try this. 585 00:26:55,470 --> 00:27:02,710 Let me go up into a browser and go to http://, and then that IP address, 586 00:27:02,710 --> 00:27:04,220 and hit Enter. 587 00:27:04,220 --> 00:27:06,700 And let us see. 588 00:27:06,700 --> 00:27:09,587 That is how else you can visit Yale's websites. 589 00:27:09,587 --> 00:27:10,920 Now it's not all that memorable. 590 00:27:10,920 --> 00:27:12,220 Like, the pre-frosh probably aren't going 591 00:27:12,220 --> 00:27:15,310 to remember this particular address if told to visit there after visiting. 592 00:27:15,310 --> 00:27:16,580 But it does seem to work. 593 00:27:16,580 --> 00:27:21,179 And so DNS really just allows us to have much more human friendly addresses. 594 00:27:21,179 --> 00:27:23,220 But they don't necessarily just yield one answer. 595 00:27:23,220 --> 00:27:25,640 >> In fact, when you're a really big tech company, 596 00:27:25,640 --> 00:27:27,620 you probably want to have lots of servers. 597 00:27:27,620 --> 00:27:29,027 And even this is misleading. 598 00:27:29,027 --> 00:27:31,110 So Yale probably doesn't have just one web server. 599 00:27:31,110 --> 00:27:34,150 Google probably doesn't have just 10 or so web servers. 600 00:27:34,150 --> 00:27:36,960 Google especially probably has thousands of web servers 601 00:27:36,960 --> 00:27:40,030 around the world that can respond to requests from people like us. 602 00:27:40,030 --> 00:27:43,870 >> But they also use a technology called load balancing, which long story short, 603 00:27:43,870 --> 00:27:48,810 has just a few devices in the world spreading the load across more servers. 604 00:27:48,810 --> 00:27:52,320 So it's kind of like a spider web if you will dispatching the requests. 605 00:27:52,320 --> 00:27:54,380 But for now, all that's interesting for today 606 00:27:54,380 --> 00:27:56,870 is that a domain name like google.com even can 607 00:27:56,870 --> 00:28:00,100 have multiple IP addresses like that. 608 00:28:00,100 --> 00:28:04,610 >> But how does all of our data actually get back and forth then in the end? 609 00:28:04,610 --> 00:28:08,320 Well, it turns out that there's these things called routers on the internet. 610 00:28:08,320 --> 00:28:10,980 And what is a router to the extent that you know already? 611 00:28:10,980 --> 00:28:13,730 And I've used the word a couple times in the context of a home, 612 00:28:13,730 --> 00:28:17,155 but in simple terms, what does a router do? 613 00:28:17,155 --> 00:28:18,780 Give me just a guess based on its name? 614 00:28:18,780 --> 00:28:20,082 >> AUDIENCE: So a road or a path? 615 00:28:20,082 --> 00:28:21,790 DAVID J. MALAN: So it's a road or a path. 616 00:28:21,790 --> 00:28:23,980 So a route is a road or path, absolutely. 617 00:28:23,980 --> 00:28:27,000 And a router, so a device that actually routes information, 618 00:28:27,000 --> 00:28:29,690 would move data between points A and B. 619 00:28:29,690 --> 00:28:31,920 >> And so in fact-- and this is perhaps when 620 00:28:31,920 --> 00:28:34,510 you Google depictions of routers on the world, all you get 621 00:28:34,510 --> 00:28:35,900 are cheesy marketing diagrams. 622 00:28:35,900 --> 00:28:38,550 And so this is sort of the most representative one I could find 623 00:28:38,550 --> 00:28:39,841 that looked mildly interesting. 624 00:28:39,841 --> 00:28:44,170 Each of these dots or glimmers of hope around the world represents a router. 625 00:28:44,170 --> 00:28:47,210 And each of them has a line between some other router. 626 00:28:47,210 --> 00:28:49,090 >> Because indeed, there are thousands, probably 627 00:28:49,090 --> 00:28:52,560 millions of routers around the world, some of which are in our homes 628 00:28:52,560 --> 00:28:56,070 and on our campuses, but a lot of which are owned by big companies 629 00:28:56,070 --> 00:29:00,250 and are interconnected so that if I want to send some data from here at Yale 630 00:29:00,250 --> 00:29:04,430 back home to Cambridge, Yale probably doesn't have a single cable, certainly, 631 00:29:04,430 --> 00:29:05,650 going directly to Harvard. 632 00:29:05,650 --> 00:29:07,399 And Yale doesn't have a single cable going 633 00:29:07,399 --> 00:29:10,010 to MIT, or to Stanford, or to Berkeley, or to Google, 634 00:29:10,010 --> 00:29:11,820 or any number of destinations. 635 00:29:11,820 --> 00:29:14,760 >> Rather, Yale, and Harvard, and everyone else on the internet 636 00:29:14,760 --> 00:29:17,610 does have one or more routers connected to it, maybe 637 00:29:17,610 --> 00:29:18,810 on the periphery of campus. 638 00:29:18,810 --> 00:29:21,690 So that when my data wants to leave Yale's campus, 639 00:29:21,690 --> 00:29:24,770 it goes to that nearest router, as depicted by one of these dots. 640 00:29:24,770 --> 00:29:27,940 And then that router figures out whether to send it this way, or this way, 641 00:29:27,940 --> 00:29:33,440 or this way, or this way based on another table in its memory, 642 00:29:33,440 --> 00:29:36,870 another Excel file or Google spreadsheet that in one column 643 00:29:36,870 --> 00:29:41,315 says, if your IP address starts with the number one, go this way. 644 00:29:41,315 --> 00:29:43,690 If your IP address starts with a number two, go that way. 645 00:29:43,690 --> 00:29:47,040 And so you can break it down numerically to have the router sending 646 00:29:47,040 --> 00:29:49,040 data every which way. 647 00:29:49,040 --> 00:29:51,419 >> And we can kind of see this as well. 648 00:29:51,419 --> 00:29:54,210 Let's go ahead into this terminal window again, and let me go ahead 649 00:29:54,210 --> 00:30:00,700 and trace the route to, let's say, www.mit.edu, 650 00:30:00,700 --> 00:30:02,970 which is a couple hundred miles away. 651 00:30:02,970 --> 00:30:04,500 That was really damn fast. 652 00:30:04,500 --> 00:30:06,290 >> So what just happened? 653 00:30:06,290 --> 00:30:10,360 So in just seven steps, and in just four milliseconds, 654 00:30:10,360 --> 00:30:14,660 I sent data over the internet from here at Yale to MIT. 655 00:30:14,660 --> 00:30:18,240 Each of these rows, you can perhaps guess now represents what? 656 00:30:18,240 --> 00:30:19,060 >> AUDIENCE: A router. 657 00:30:19,060 --> 00:30:20,101 >> DAVID J. MALAN: A router. 658 00:30:20,101 --> 00:30:24,090 So indeed, it looks like there's about seven or so routers, 659 00:30:24,090 --> 00:30:29,350 or six routers in between me physically in Yale's law school here 660 00:30:29,350 --> 00:30:31,612 and MIT's website over there. 661 00:30:31,612 --> 00:30:34,570 And what we can glean from this is as follows-- and let me clean it up. 662 00:30:34,570 --> 00:30:38,180 I'm going to rerun it with a command line argument of -q 1 to just say, 663 00:30:38,180 --> 00:30:39,300 just give me one query. 664 00:30:39,300 --> 00:30:40,800 By default, trace route does three. 665 00:30:40,800 --> 00:30:42,350 And that's why we saw bunches of numbers. 666 00:30:42,350 --> 00:30:44,850 I want to see fewer numbers just to keep the output cleaner. 667 00:30:44,850 --> 00:30:46,280 And let's see what happens. 668 00:30:46,280 --> 00:30:49,220 >> So for whatever reason, someone at Yale thought 669 00:30:49,220 --> 00:30:54,130 it would be funny to call it your default router arubacentral, which 670 00:30:54,130 --> 00:30:57,920 is on vlan or virtual LAN, virtual local area 671 00:30:57,920 --> 00:30:59,810 network 30-- so you probably have at least 672 00:30:59,810 --> 00:31:03,050 29 others-- router.net.yale.internal. 673 00:31:03,050 --> 00:31:06,660 And .internal here is kind of a fake top level domain meant to be used just 674 00:31:06,660 --> 00:31:07,210 on campus. 675 00:31:07,210 --> 00:31:10,335 And notice the corresponding IP address of that router, wherever it is here 676 00:31:10,335 --> 00:31:13,920 on campus, is 172.28.204.129. 677 00:31:13,920 --> 00:31:17,470 And it took 36 milliseconds to go from here to there. 678 00:31:17,470 --> 00:31:18,050 >> Funny story. 679 00:31:18,050 --> 00:31:19,716 We'll get back to that in just a moment. 680 00:31:19,716 --> 00:31:22,920 But now the second router-- to which arubacentral apparently 681 00:31:22,920 --> 00:31:25,790 has some kind of physical connection most likely-- 682 00:31:25,790 --> 00:31:27,249 the humans didn't bother naming it. 683 00:31:27,249 --> 00:31:30,373 The Yale humans didn't bother naming it because it's inside of your network 684 00:31:30,373 --> 00:31:30,940 it seems. 685 00:31:30,940 --> 00:31:32,520 And so it just has an IP address. 686 00:31:32,520 --> 00:31:34,660 >> But then a third router here on Yale's network 687 00:31:34,660 --> 00:31:36,700 that's probably a little farther away still 688 00:31:36,700 --> 00:31:41,330 is called cen10g whatever that is asr.net.yale.internal. 689 00:31:41,330 --> 00:31:43,040 And it too has an IP address. 690 00:31:43,040 --> 00:31:44,990 >> Now why are these numbers kind of fluctuating? 691 00:31:44,990 --> 00:31:47,890 2.9, 1.4, 36? 692 00:31:47,890 --> 00:31:48,840 Routers get busy. 693 00:31:48,840 --> 00:31:50,420 And they get congested and backed up. 694 00:31:50,420 --> 00:31:53,200 There's thousands of people on this campus using the internet right now. 695 00:31:53,200 --> 00:31:56,050 There's a hundred people in this room using the internet right now. 696 00:31:56,050 --> 00:31:59,030 >> And so what's happening is that the routers might get congested. 697 00:31:59,030 --> 00:32:01,350 And so those times might fluctuate a little bit. 698 00:32:01,350 --> 00:32:04,620 So that's why they don't necessarily increase straightforwardly. 699 00:32:04,620 --> 00:32:07,510 >> But things get kind of interesting in step four. 700 00:32:07,510 --> 00:32:13,040 Apparently between Yale and step four is another hop. 701 00:32:13,040 --> 00:32:16,480 And where is the router in step four probably? 702 00:32:16,480 --> 00:32:17,410 >> AUDIENCE: [INAUDIBLE] 703 00:32:17,410 --> 00:32:19,410 >> DAVID J. MALAN: JFK maybe, maybe at the airport. 704 00:32:19,410 --> 00:32:21,950 But for whatever reason, system administrators, so 705 00:32:21,950 --> 00:32:24,140 geeks that run servers for years have named routers 706 00:32:24,140 --> 00:32:25,600 after the nearest airport code. 707 00:32:25,600 --> 00:32:27,420 So JFK probably means it's just somewhere 708 00:32:27,420 --> 00:32:29,970 in New York, maybe in Manhattan or one of the boroughs. 709 00:32:29,970 --> 00:32:34,460 nyc2 denotes, presumably, another router that's somewhere in New York. 710 00:32:34,460 --> 00:32:37,140 >> I don't quite know where row six is here, router number six. 711 00:32:37,140 --> 00:32:40,110 quest.net a big ISP, internet service provider, 712 00:32:40,110 --> 00:32:43,240 that provides internet connectivity to big places like Yale and others. 713 00:32:43,240 --> 00:32:47,110 And then this last one, it looks like that MIT doesn't even 714 00:32:47,110 --> 00:32:50,180 have their own website in Cambridge necessarily, 715 00:32:50,180 --> 00:32:52,090 but rather they've outsourced their website, 716 00:32:52,090 --> 00:32:55,150 or at least the physical servers, to a company called Akamai. 717 00:32:55,150 --> 00:32:57,940 And Akamai actually is right down the road from MIT in Cambridge 718 00:32:57,940 --> 00:32:58,790 it turns out. 719 00:32:58,790 --> 00:33:02,360 >> But realize too that even thought you're going to www.mit.edu, 720 00:33:02,360 --> 00:33:05,200 we could really be sent anywhere in the world. 721 00:33:05,200 --> 00:33:06,960 >> And let's see somewhere else in the world. 722 00:33:06,960 --> 00:33:09,240 Let me go ahead and clear this screen and instead 723 00:33:09,240 --> 00:33:15,240 trace the route, just once, so query one, to www.cnn.co.jp, 724 00:33:15,240 --> 00:33:18,390 the Japanese home page for CNN, the news site. 725 00:33:18,390 --> 00:33:20,660 And if I hit Enter now, let's see what happens. 726 00:33:20,660 --> 00:33:23,610 We're again starting at arubacentral. 727 00:33:23,610 --> 00:33:26,510 We're then going to the nameless router, a few more. 728 00:33:26,510 --> 00:33:29,527 So it took 12 hops to get to Japan this time. 729 00:33:29,527 --> 00:33:30,860 And let's see what we can glean. 730 00:33:30,860 --> 00:33:32,450 >> So same hop, same hop. 731 00:33:32,450 --> 00:33:35,170 Slightly different now. 732 00:33:35,170 --> 00:33:36,380 This one's interesting. 733 00:33:36,380 --> 00:33:40,870 So I'm guessing here, stamford1 is a few towns away in Connecticut also. 734 00:33:40,870 --> 00:33:43,810 These routers in row six and seven don't have names. 735 00:33:43,810 --> 00:33:46,370 But this is kind of amazing. 736 00:33:46,370 --> 00:33:53,310 >> So what seems to be between the routers in step seven and eight? 737 00:33:53,310 --> 00:33:54,760 And why do you say as much? 738 00:33:54,760 --> 00:33:55,260 Yeah? 739 00:33:55,260 --> 00:33:56,060 >> AUDIENCE: Ocean. 740 00:33:56,060 --> 00:33:57,640 >> DAVID J. MALAN: Probably an ocean. 741 00:33:57,640 --> 00:34:01,366 We know that's true like, intuitively, right? 742 00:34:01,366 --> 00:34:04,790 But we can confirm as much kind of sort of empirically why? 743 00:34:04,790 --> 00:34:06,860 What has changed between rows seven and eight? 744 00:34:06,860 --> 00:34:09,429 745 00:34:09,429 --> 00:34:14,739 >> It took a lot more time to go to whatever this nameless router seven is, 746 00:34:14,739 --> 00:34:18,670 probably somewhere in the continental US, to step eight, 747 00:34:18,670 --> 00:34:22,639 which is probably somewhere in Japan based on the domain name of .jp there. 748 00:34:22,639 --> 00:34:25,719 And so those additional hundred something milliseconds 749 00:34:25,719 --> 00:34:28,960 or 90 or so milliseconds is the result of our data going 750 00:34:28,960 --> 00:34:31,100 over a pretty large body of water. 751 00:34:31,100 --> 00:34:34,570 >> Now curiously, it seems that maybe that cable goes across the whole US. 752 00:34:34,570 --> 00:34:37,070 If we're actually going over the West Coast to get to Japan, 753 00:34:37,070 --> 00:34:39,111 it's kind of the long way if we go the other way. 754 00:34:39,111 --> 00:34:41,400 So it's not entirely clear what's going on physically. 755 00:34:41,400 --> 00:34:43,830 But the fact that every additional hop indeed 756 00:34:43,830 --> 00:34:46,020 took markedly longer than every other, it's 757 00:34:46,020 --> 00:34:50,440 pretty good confirmation that CNN's Japanese web server is probably indeed 758 00:34:50,440 --> 00:34:51,310 in Japan. 759 00:34:51,310 --> 00:34:54,089 And it's certainly farther away than MIT has been. 760 00:34:54,089 --> 00:34:56,380 And it's worth noting too, your data is not necessarily 761 00:34:56,380 --> 00:34:58,794 going to travel the shortest possible distance. 762 00:34:58,794 --> 00:35:00,960 In fact, if you play around with trace route at home 763 00:35:00,960 --> 00:35:04,170 just picking random websites, you might find that just to send an email 764 00:35:04,170 --> 00:35:06,490 or to visit a website that's here in New Haven, 765 00:35:06,490 --> 00:35:09,200 sometimes your data might first take a detour, go down to DC, 766 00:35:09,200 --> 00:35:10,450 and then come back up. 767 00:35:10,450 --> 00:35:12,860 And that's just because of the dynamic routing decisions 768 00:35:12,860 --> 00:35:14,650 that these computers are making. 769 00:35:14,650 --> 00:35:18,930 >> Now just for fun, the production team trimmed one of these videos for us 770 00:35:18,930 --> 00:35:20,807 to just be a little more succinct. 771 00:35:20,807 --> 00:35:23,640 But to give us a quick sense here-- and we can leave the lights on-- 772 00:35:23,640 --> 00:35:32,363 as to just how much cabling is actually carrying all of our data. 773 00:35:32,363 --> 00:35:33,029 [VIDEO PLAYBACK] 774 00:35:33,029 --> 00:35:36,023 [MUSIC PLAYING] 775 00:35:36,023 --> 00:36:31,911 776 00:36:31,911 --> 00:36:32,777 [END PLAYBACK] 777 00:36:32,777 --> 00:36:35,860 DAVID J. MALAN: All networking videos have cool sounding music apparently. 778 00:36:35,860 --> 00:36:38,084 So that's to get just a sense of just how much have 779 00:36:38,084 --> 00:36:39,500 been going on underneath the hood. 780 00:36:39,500 --> 00:36:41,355 >> But let's look at a slightly lower level now 781 00:36:41,355 --> 00:36:44,150 at what data is actually traversing those lines, 782 00:36:44,150 --> 00:36:46,720 and even going wirelessly in a room like this. 783 00:36:46,720 --> 00:36:49,580 >> So it turns out when you request a web page, or send an e-mail, 784 00:36:49,580 --> 00:36:53,670 or receive a web page, or an e-mail, or a Gchat message, or a Facebook message, 785 00:36:53,670 --> 00:36:57,800 or the like, that is not just one big chunk of bits flowing wirelessly 786 00:36:57,800 --> 00:37:00,600 through the air or electronically on a wire. 787 00:37:00,600 --> 00:37:03,680 Rather, that request or response is generally 788 00:37:03,680 --> 00:37:05,810 chunked up into separate pieces. 789 00:37:05,810 --> 00:37:08,880 >> So in other words, when you have a request to make of another computer, 790 00:37:08,880 --> 00:37:10,980 or you get back a response from another computer-- 791 00:37:10,980 --> 00:37:15,800 like suppose, for instance, if unfamiliar-- as too many people 792 00:37:15,800 --> 00:37:18,400 seem to be these days-- if unfamiliar with this-- 793 00:37:18,400 --> 00:37:20,200 not this fellow-- this fellow. 794 00:37:20,200 --> 00:37:23,950 So suppose this is a message that I want to send to someone in back. 795 00:37:23,950 --> 00:37:28,930 Who in the very back would like to receive a picture of Rick Astley today? 796 00:37:28,930 --> 00:37:29,805 OK, what's your name? 797 00:37:29,805 --> 00:37:30,590 >> AUDIENCE: Cole. 798 00:37:30,590 --> 00:37:31,306 >> DAVID J. MALAN: What is it? 799 00:37:31,306 --> 00:37:31,672 >> AUDIENCE: Cole. 800 00:37:31,672 --> 00:37:32,040 >> DAVID J. MALAN: Holt? 801 00:37:32,040 --> 00:37:32,540 H-O? 802 00:37:32,540 --> 00:37:33,711 AUDIENCE: C-O-L-E. 803 00:37:33,711 --> 00:37:34,960 DAVID J. MALAN: C-O-L-E, Cole. 804 00:37:34,960 --> 00:37:35,520 Sorry. 805 00:37:35,520 --> 00:37:36,430 C-O-L-E. 806 00:37:36,430 --> 00:37:36,930 All right. 807 00:37:36,930 --> 00:37:40,990 So if I want to send Cole this picture here, you know this 808 00:37:40,990 --> 00:37:42,410 is kind of a big picture, right? 809 00:37:42,410 --> 00:37:44,472 This could be a few kilobytes, a few megabytes, 810 00:37:44,472 --> 00:37:45,930 especially if it's high resolution. 811 00:37:45,930 --> 00:37:48,660 And I don't really want to stop everyone else from using the internet 812 00:37:48,660 --> 00:37:50,680 just while I send this really big, high quality picture 813 00:37:50,680 --> 00:37:52,138 of Rick Astley throughout the room. 814 00:37:52,138 --> 00:37:55,310 I'd like your data to continue to traverse the network and the Wi-Fi 815 00:37:55,310 --> 00:37:56,100 as well. 816 00:37:56,100 --> 00:38:00,100 >> And so it makes sense-- and this is recoverable electronically, 817 00:38:00,100 --> 00:38:01,780 not so much in the real world. 818 00:38:01,780 --> 00:38:04,904 Actually, this is going to have multiple meanings if you take my audio out. 819 00:38:04,904 --> 00:38:08,360 So if I tear this in the half like this here, 820 00:38:08,360 --> 00:38:11,912 this now can travel the internet more efficiently, 821 00:38:11,912 --> 00:38:13,120 because it's a smaller piece. 822 00:38:13,120 --> 00:38:16,780 So with lower probability is it going to collide with someone else's traffic 823 00:38:16,780 --> 00:38:17,650 on the internet. 824 00:38:17,650 --> 00:38:21,240 >> And so what your computer indeed does when you want to send a message to Cole 825 00:38:21,240 --> 00:38:24,917 is it chunks up a message like this into smaller pieces, fragments so to speak. 826 00:38:24,917 --> 00:38:28,000 And then it puts them inside of what we'll call sort of virtual envelopes. 827 00:38:28,000 --> 00:38:29,620 >> So I have four paper envelopes here. 828 00:38:29,620 --> 00:38:32,690 And I've pre-numbered them, one, two, three, and four. 829 00:38:32,690 --> 00:38:35,800 And what I'm going to do on the front of this, just like a normal mailing, 830 00:38:35,800 --> 00:38:38,000 is I'm going to put Cole's name there. 831 00:38:38,000 --> 00:38:41,270 And then at the top, I'm going to put my name there, 832 00:38:41,270 --> 00:38:44,995 David, so that the first such packet I'm sending out there on the internet 833 00:38:44,995 --> 00:38:47,620 looks a little something like this, the salient characteristics 834 00:38:47,620 --> 00:38:50,830 of which are that it has a to address, a from address, 835 00:38:50,830 --> 00:38:52,670 and also a number, so that that hopefully 836 00:38:52,670 --> 00:38:55,680 is sufficient information for Cole to reconstruct this message. 837 00:38:55,680 --> 00:38:58,820 >> So let me do the same here, the same here, and the same here, 838 00:38:58,820 --> 00:39:01,310 writing his name in the To field on all of them. 839 00:39:01,310 --> 00:39:04,240 And then let's go ahead and put these pictures inside. 840 00:39:04,240 --> 00:39:06,540 >> So here is one packet that's ready to go. 841 00:39:06,540 --> 00:39:09,780 Here is another packet that's ready to go. 842 00:39:09,780 --> 00:39:14,100 Here is a third packet that's ready to go. 843 00:39:14,100 --> 00:39:16,870 And here is a fourth packet that's ready to go. 844 00:39:16,870 --> 00:39:19,849 >> And now what's interesting about how the internet in reality works 845 00:39:19,849 --> 00:39:22,140 is that even though I've got four packets, all of which 846 00:39:22,140 --> 00:39:24,730 are destined for the same location, they're not necessarily 847 00:39:24,730 --> 00:39:26,870 going to traverse the same route. 848 00:39:26,870 --> 00:39:32,070 And so even though I might hand these packets off to the nearest router 849 00:39:32,070 --> 00:39:36,660 let's say, if you would like to send them every which way, let's see 850 00:39:36,660 --> 00:39:40,706 what actually happens, the goal of which is to get them ultimately to Cole. 851 00:39:40,706 --> 00:39:44,850 852 00:39:44,850 --> 00:39:48,770 And indeed, they're already not necessarily taking the same direction. 853 00:39:48,770 --> 00:39:50,510 And that's fine. 854 00:39:50,510 --> 00:39:52,480 This is a little awkward and Oprah style today. 855 00:39:52,480 --> 00:39:56,540 >> And now let me deliberately take that one back. 856 00:39:56,540 --> 00:40:00,196 And now Cole, if you'd like to reassemble it as best you can. 857 00:40:00,196 --> 00:40:06,610 858 00:40:06,610 --> 00:40:10,180 Of course, we can all guess what the conclusion here is going to be. 859 00:40:10,180 --> 00:40:12,600 You're going to have 3/4 of Rick Astley in just a moment. 860 00:40:12,600 --> 00:40:15,920 861 00:40:15,920 --> 00:40:19,291 And what though is the implication of that? 862 00:40:19,291 --> 00:40:20,540 You want to try to hold it up? 863 00:40:20,540 --> 00:40:23,540 We do have one camera pointed at you if you'd like 864 00:40:23,540 --> 00:40:26,226 to pose with Rick Astley over here. 865 00:40:26,226 --> 00:40:28,610 866 00:40:28,610 --> 00:40:29,510 There we go. 867 00:40:29,510 --> 00:40:30,410 Lovely. 868 00:40:30,410 --> 00:40:33,230 >> But you seem to be missing a fragment of Rick Astley. 869 00:40:33,230 --> 00:40:37,015 So it turns out that the internet is generally driven by not just IP, 870 00:40:37,015 --> 00:40:39,890 but in fact we heard at the very beginning of lecture in that video-- 871 00:40:39,890 --> 00:40:42,473 and you've probably seen this acronym more often-- what really 872 00:40:42,473 --> 00:40:44,360 is the protocol you tend to hear about? 873 00:40:44,360 --> 00:40:45,120 >> AUDIENCE: TCP/IP. 874 00:40:45,120 --> 00:40:48,090 >> DAVID J. MALAN: TCP/IP, which is just a combination 875 00:40:48,090 --> 00:40:49,940 of two protocols, one called IP. 876 00:40:49,940 --> 00:40:52,640 Which again, is just the set of conventions via which we 877 00:40:52,640 --> 00:40:54,740 address every computer in the internet. 878 00:40:54,740 --> 00:40:56,930 And then TCP, which serves another purpose. 879 00:40:56,930 --> 00:41:00,110 >> TCP is a protocol that you typically use in conjunction 880 00:41:00,110 --> 00:41:04,410 with IP, that among other things, guarantees delivery. 881 00:41:04,410 --> 00:41:08,860 In fact, TCP is the protocol that would notice that one of the packets 882 00:41:08,860 --> 00:41:10,930 apparently didn't get to Cole, because he seems 883 00:41:10,930 --> 00:41:12,830 to be missing number four out of four. 884 00:41:12,830 --> 00:41:16,530 And so what TCP, a protocol does, is it tells Cole, 885 00:41:16,530 --> 00:41:19,850 hey Cole, if you receive only three out of four packets, 886 00:41:19,850 --> 00:41:22,600 tell me which one you are missing, essentially, 887 00:41:22,600 --> 00:41:25,570 and then my purpose in life should be to retransmit that. 888 00:41:25,570 --> 00:41:28,580 >> And so if I too, the sender, are using TCP, 889 00:41:28,580 --> 00:41:31,810 I should then create a new packet-- not this wrinkled one here-- 890 00:41:31,810 --> 00:41:35,980 retransmit just this piece of it, so that ultimately Cole has 891 00:41:35,980 --> 00:41:38,280 a complete souvenir, if nothing else. 892 00:41:38,280 --> 00:41:43,000 But so that ultimately the data actually gets to its correct destination. 893 00:41:43,000 --> 00:41:48,020 >> But unfortunately, writing Cole's name on the front isn't sufficient, per se. 894 00:41:48,020 --> 00:41:50,270 And really, I wouldn't write Cole's name, but probably 895 00:41:50,270 --> 00:41:51,655 his IP address on the envelope. 896 00:41:51,655 --> 00:41:52,780 And I wouldn't write David. 897 00:41:52,780 --> 00:41:56,550 I'd write my IP address on the envelope so that the computers can actually 898 00:41:56,550 --> 00:41:57,999 communicate back and forth. 899 00:41:57,999 --> 00:42:00,540 But it turns out that computers can do way more than serve up 900 00:42:00,540 --> 00:42:01,900 pictures of Rick Astley. 901 00:42:01,900 --> 00:42:05,340 They can also resend and receive emails, chat messages. 902 00:42:05,340 --> 00:42:09,780 They can do things like file transfers, and any number of other tools 903 00:42:09,780 --> 00:42:12,330 you use on the internet, servers can do these days. 904 00:42:12,330 --> 00:42:15,300 >> And just because a company, or a school, or a person 905 00:42:15,300 --> 00:42:19,420 wants to have a web server, and an email server, and a chat server, 906 00:42:19,420 --> 00:42:21,420 does not mean you need three computers. 907 00:42:21,420 --> 00:42:26,200 You can have just one computer running multiple services, so to speak. 908 00:42:26,200 --> 00:42:29,190 >> And so when Cole receives a message like that, how 909 00:42:29,190 --> 00:42:32,940 does his computer know whether to show that picture in his browser, 910 00:42:32,940 --> 00:42:37,730 or in Gchat, or in Facebook Messenger, or in any number of other tools? 911 00:42:37,730 --> 00:42:40,430 >> So it turns out also on that as envelope is additional piece 912 00:42:40,430 --> 00:42:43,070 of information known as a port number. 913 00:42:43,070 --> 00:42:45,240 And a port number is just a number indeed, 914 00:42:45,240 --> 00:42:48,342 but it uniquely identifies not the computer, but the service. 915 00:42:48,342 --> 00:42:49,550 And there's bunches of these. 916 00:42:49,550 --> 00:42:51,258 So it turns out that in the world, humans 917 00:42:51,258 --> 00:42:57,095 have decided on a few such conventions, some of which are these. 918 00:42:57,095 --> 00:42:59,220 So there's something called File Transfer Protocol. 919 00:42:59,220 --> 00:42:59,870 It's pretty dated. 920 00:42:59,870 --> 00:43:00,970 It's completely insecure. 921 00:43:00,970 --> 00:43:02,320 A lot of people still use it. 922 00:43:02,320 --> 00:43:04,240 And it uses port number 21. 923 00:43:04,240 --> 00:43:07,250 In other words, if sending a file via FTP, 924 00:43:07,250 --> 00:43:10,570 the envelope would have not only the sender and the receiver's IP address, 925 00:43:10,570 --> 00:43:14,020 it would also have the number 21 so that the receiving computer knows oh, this 926 00:43:14,020 --> 00:43:17,280 is a file, not an email or a chat message. 927 00:43:17,280 --> 00:43:19,016 >> 25 is SMTP. 928 00:43:19,016 --> 00:43:20,516 How many of you have ever used SMTP? 929 00:43:20,516 --> 00:43:22,850 930 00:43:22,850 --> 00:43:23,380 Wrong. 931 00:43:23,380 --> 00:43:24,490 Almost all of you have. 932 00:43:24,490 --> 00:43:28,730 If you've ever used email, you've used SMTP, simple mail transfer protocol, 933 00:43:28,730 --> 00:43:32,300 which is just a fancy way of saying, this is the type of computer or service 934 00:43:32,300 --> 00:43:34,600 that sends your email outbound. 935 00:43:34,600 --> 00:43:38,780 >> And if you've ever seen acronyms like POP, or IMAP, and there's a few others, 936 00:43:38,780 --> 00:43:40,670 those are for receiving email, typically. 937 00:43:40,670 --> 00:43:42,650 That just means it's a different service. 938 00:43:42,650 --> 00:43:45,850 It's software that someone wrote that sends to or listens 939 00:43:45,850 --> 00:43:48,880 on a specific port number so that it doesn't confuse emails 940 00:43:48,880 --> 00:43:50,360 with some other type of data. 941 00:43:50,360 --> 00:43:55,500 >> Now the web is HTTP, which is number 80, and also port 443. 942 00:43:55,500 --> 00:43:57,590 And in fact, even though we humans fortunately 943 00:43:57,590 --> 00:43:59,610 don't have to do this, any time you visit 944 00:43:59,610 --> 00:44:05,810 a website like http://www.yale.edu, the browser 945 00:44:05,810 --> 00:44:07,790 is just being kind of helpful in that it's 946 00:44:07,790 --> 00:44:10,970 assuming that you want numeric port 80. 947 00:44:10,970 --> 00:44:15,710 We already know that DNS can figure out what the IP address is of www.yale.edu. 948 00:44:15,710 --> 00:44:17,970 But the computer is just going to infer that you 949 00:44:17,970 --> 00:44:21,560 want port 80 because you're using Chrome, or IE, or some other browser. 950 00:44:21,560 --> 00:44:24,930 But I could technically do colon 80. 951 00:44:24,930 --> 00:44:28,520 And then I can explicitly tell my browser, send a packet or more 952 00:44:28,520 --> 00:44:32,080 of information to www.yale.edu requesting today's home page. 953 00:44:32,080 --> 00:44:36,070 But specifically, address it to Yale's IP at port 80 954 00:44:36,070 --> 00:44:39,190 so that I actually get back Yale's web server. 955 00:44:39,190 --> 00:44:42,229 >> Now it immediately disappears because browsers just 956 00:44:42,229 --> 00:44:44,020 decide that we don't need to confuse humans 957 00:44:44,020 --> 00:44:46,810 by having yet more arcane information like colon 80. 958 00:44:46,810 --> 00:44:50,640 And frankly, browsers like Chrome don't even 959 00:44:50,640 --> 00:44:55,464 show you HTTP anymore, or the colon, or the slash slash, or the trailing slash, 960 00:44:55,464 --> 00:44:58,380 in some sense because they're trying to make things simpler for users. 961 00:44:58,380 --> 00:45:01,080 In another sense, it's just kind of a user experience thing-- 962 00:45:01,080 --> 00:45:02,720 let's get rid of some of the clutter. 963 00:45:02,720 --> 00:45:05,405 But it's hiding some of these underlying details. 964 00:45:05,405 --> 00:45:09,360 >> And in fact, none of us probably ever type http anymore. 965 00:45:09,360 --> 00:45:12,060 You just type in something like www.harvard.edu. 966 00:45:12,060 --> 00:45:15,310 And again, Chrome infers that you want HTTP. 967 00:45:15,310 --> 00:45:18,970 But there are other protocols that we could certainly be using. 968 00:45:18,970 --> 00:45:24,480 >> So given all of this, if you now sort of put on the so-called engineering hat, 969 00:45:24,480 --> 00:45:27,417 how do things called firewalls work? 970 00:45:27,417 --> 00:45:29,750 So you're probably generally familiar with the firewall, 971 00:45:29,750 --> 00:45:30,990 not so much in the physical sense. 972 00:45:30,990 --> 00:45:32,470 So back in the day, and still to this day, 973 00:45:32,470 --> 00:45:35,430 if you've got like strip malls for instance that have a lot of stores, 974 00:45:35,430 --> 00:45:38,500 generally the walls in between individual stores or shops 975 00:45:38,500 --> 00:45:43,180 are firewalls in the sense that they have special insulation 976 00:45:43,180 --> 00:45:45,310 so that if a fire breaks out in one shop, 977 00:45:45,310 --> 00:45:48,210 it doesn't necessarily spread to the shop next door. 978 00:45:48,210 --> 00:45:51,710 >> The computer world also has firewalls that do something different. 979 00:45:51,710 --> 00:45:52,798 What does a fireball do? 980 00:45:52,798 --> 00:45:53,298 Yeah? 981 00:45:53,298 --> 00:45:55,290 >> AUDIENCE: Basically they cut off connection 982 00:45:55,290 --> 00:45:59,493 if they encounter something like, for example, 983 00:45:59,493 --> 00:46:01,361 they have number of id statements. 984 00:46:01,361 --> 00:46:04,340 And if something happens, they cut the connection. 985 00:46:04,340 --> 00:46:07,570 Like if this malicious attack [INAUDIBLE] your computer, or-- 986 00:46:07,570 --> 00:46:08,630 >> DAVID J. MALAN: OK good. 987 00:46:08,630 --> 00:46:11,220 Yeah, and in fact you're even going a little farther 988 00:46:11,220 --> 00:46:14,590 in describing something that might be called an intrusion detection system, 989 00:46:14,590 --> 00:46:18,305 or IDS for short, whereby you actually have rules defined. 990 00:46:18,305 --> 00:46:22,140 And if you do start to see suspicious behavior, you try to put an end to it. 991 00:46:22,140 --> 00:46:24,250 >> And a firewall, frankly, at a networking level, 992 00:46:24,250 --> 00:46:26,821 is even dumber and simpler than that, generally. 993 00:46:26,821 --> 00:46:29,070 And there's different types of firewalls in the world. 994 00:46:29,070 --> 00:46:31,569 But the ones that operate at the level we're talking today-- 995 00:46:31,569 --> 00:46:35,330 IP and TCP-- work even more straightforwardly. 996 00:46:35,330 --> 00:46:39,180 >> For instance, if you were Yale system administrators, or Harvard system 997 00:46:39,180 --> 00:46:42,020 administrators, or some Big Brother at some company, 998 00:46:42,020 --> 00:46:45,040 and you wanted to prevent all of your students or all of your employees 999 00:46:45,040 --> 00:46:47,619 from going to facebook.com, all you have to do 1000 00:46:47,619 --> 00:46:50,160 is make sure that all of their network traffic, first of all, 1001 00:46:50,160 --> 00:46:51,850 goes through a special device. 1002 00:46:51,850 --> 00:46:53,030 Let's call it a firewall. 1003 00:46:53,030 --> 00:46:54,910 >> And that's fine, because you can make your router 1004 00:46:54,910 --> 00:46:57,618 the same thing as a firewall if you put the same kind of software 1005 00:46:57,618 --> 00:46:58,940 on the same machine. 1006 00:46:58,940 --> 00:47:01,780 So if all of your students or employees traffic 1007 00:47:01,780 --> 00:47:04,450 is going through this central firewall, how 1008 00:47:04,450 --> 00:47:08,540 would we block people from going to facebook.com, for instance? 1009 00:47:08,540 --> 00:47:10,780 What would the system administrator have to do? 1010 00:47:10,780 --> 00:47:11,370 Anyone else? 1011 00:47:11,370 --> 00:47:12,911 Let's try to go around. 1012 00:47:12,911 --> 00:47:15,074 >> AUDIENCE: [INAUDIBLE] 1013 00:47:15,074 --> 00:47:16,365 DAVID J. MALAN: Say that again? 1014 00:47:16,365 --> 00:47:19,215 AUDIENCE: It should just get caught up inside the system. 1015 00:47:19,215 --> 00:47:22,487 So just put Facebook into 127.0.0-- 1016 00:47:22,487 --> 00:47:23,820 DAVID J. MALAN: Oh, interesting. 1017 00:47:23,820 --> 00:47:27,290 So you can actually then hack your DNS system. 1018 00:47:27,290 --> 00:47:31,180 This is indeed a way you could do this whereby any time a Yale student pulls 1019 00:47:31,180 --> 00:47:34,670 up www.facebook.com, all of us here today on campus 1020 00:47:34,670 --> 00:47:38,590 are using Yale's DNS server, because Yale's DHCP server gave us 1021 00:47:38,590 --> 00:47:39,580 that address. 1022 00:47:39,580 --> 00:47:42,490 So yeah, you could kind of break things or break convention 1023 00:47:42,490 --> 00:47:49,190 by just saying, yeah, facebook.com's address is fake, 1024 00:47:49,190 --> 00:47:52,530 is 1.2.3.4, which is not actually legitimate. 1025 00:47:52,530 --> 00:47:53,930 Or maybe it's 278. 1026 00:47:53,930 --> 00:47:57,460 whatever was in the TV show a moment ago so that none of us 1027 00:47:57,460 --> 00:47:59,176 can actually visit facebook.com. 1028 00:47:59,176 --> 00:48:00,590 >> So suppose Yale did that. 1029 00:48:00,590 --> 00:48:03,336 Suppose Yale wanted to keep you out of facebook.com. 1030 00:48:03,336 --> 00:48:05,700 And therefore, they changed the DNS settings 1031 00:48:05,700 --> 00:48:08,812 to give you a bogus IP address for facebook.com. 1032 00:48:08,812 --> 00:48:10,616 How do you respond? 1033 00:48:10,616 --> 00:48:12,990 Technically, not-- oh, now everyone wants to participate. 1034 00:48:12,990 --> 00:48:13,490 OK, yeah. 1035 00:48:13,490 --> 00:48:16,190 AUDIENCE: You just type in the actual IP address of Facebook. 1036 00:48:16,190 --> 00:48:16,710 >> DAVID J. MALAN: OK, good. 1037 00:48:16,710 --> 00:48:19,350 So we could just type in the actual IP address of Facebook, 1038 00:48:19,350 --> 00:48:21,090 much like I did with Yale's website. 1039 00:48:21,090 --> 00:48:24,636 And if the Facebook server is configured to support that, it should indeed work. 1040 00:48:24,636 --> 00:48:26,510 It's a minor pain in the neck, because now we 1041 00:48:26,510 --> 00:48:30,220 have to remember some random 32-bit value, but that could work. 1042 00:48:30,220 --> 00:48:31,622 What else could you do? 1043 00:48:31,622 --> 00:48:32,121 Yeah. 1044 00:48:32,121 --> 00:48:35,117 >> AUDIENCE: You could change those settings [INAUDIBLE]. 1045 00:48:35,117 --> 00:48:37,700 DAVID J. MALAN: Yeah, you could even change your DNS settings. 1046 00:48:37,700 --> 00:48:40,480 So in fact this is actually pretty useful, frankly, 1047 00:48:40,480 --> 00:48:45,590 if you're in an airport, or if you're in a cafe, or something that 1048 00:48:45,590 --> 00:48:48,834 has flaky internet whereby sometimes the DNS server just stops working. 1049 00:48:48,834 --> 00:48:51,000 So even I occasionally do this, not for malicious, I 1050 00:48:51,000 --> 00:48:52,750 want to use Facebook purposes, but really 1051 00:48:52,750 --> 00:48:56,344 because I seem to have a network connection, but nothing is working. 1052 00:48:56,344 --> 00:48:58,260 And so one of the first things I try-- and you 1053 00:48:58,260 --> 00:49:01,710 can do this on Windows too-- but on my Mac, if I go to Network. 1054 00:49:01,710 --> 00:49:03,330 And I choose my Wi-Fi connection. 1055 00:49:03,330 --> 00:49:04,750 And I go to Advanced. 1056 00:49:04,750 --> 00:49:06,080 And I go to DNS. 1057 00:49:06,080 --> 00:49:08,260 These are the three IP addresses that Yale 1058 00:49:08,260 --> 00:49:10,290 is giving me for three DNS servers. 1059 00:49:10,290 --> 00:49:14,110 The purpose then is for me to try any one of these to resolve addresses. 1060 00:49:14,110 --> 00:49:15,946 >> But I can override these by doing a plus. 1061 00:49:15,946 --> 00:49:18,333 And anyone want to propose a DNS server? 1062 00:49:18,333 --> 00:49:19,120 >> AUDIENCE: 8.8.8.8? 1063 00:49:19,120 --> 00:49:20,578 >> DAVID J. MALAN: Oh, you're amazing. 1064 00:49:20,578 --> 00:49:21,650 Yes, 8.8.8.8. 1065 00:49:21,650 --> 00:49:26,170 So Google, bless their hearts, bought the IP address 8.8.8.8, 1066 00:49:26,170 --> 00:49:29,560 because it kind of looks like Gs probably, and it's easy to remember. 1067 00:49:29,560 --> 00:49:34,820 But indeed, now I have configured my computer to use Google's DNS server. 1068 00:49:34,820 --> 00:49:38,050 >> So now if I go to yale.edu, it's still going to work. 1069 00:49:38,050 --> 00:49:40,100 But I'm not using Yale's DNS servers anymore. 1070 00:49:40,100 --> 00:49:44,430 And if I go to facebook.com, all of those look ups 1071 00:49:44,430 --> 00:49:45,960 are going to go through Google. 1072 00:49:45,960 --> 00:49:49,120 >> So on the one hand, I've cleverly circumvented the local system 1073 00:49:49,120 --> 00:49:51,810 administrators just by understanding how networking works. 1074 00:49:51,810 --> 00:49:53,360 But I'm paying a price. 1075 00:49:53,360 --> 00:49:54,740 Nothing is free. 1076 00:49:54,740 --> 00:49:56,096 What have I just given up? 1077 00:49:56,096 --> 00:50:00,180 1078 00:50:00,180 --> 00:50:02,250 What have I just given up? 1079 00:50:02,250 --> 00:50:06,090 All of you clever people who have been using 8.8.8.8, because it's cool 1080 00:50:06,090 --> 00:50:10,440 or solves problems, what have you been doing all this time? 1081 00:50:10,440 --> 00:50:11,910 >> AUDIENCE: Traveling farther? 1082 00:50:11,910 --> 00:50:15,201 >> DAVID J. MALAN: Maybe traveling farther, because Google's probably not quite as 1083 00:50:15,201 --> 00:50:16,770 close as the server down the street. 1084 00:50:16,770 --> 00:50:18,420 But more worrisomely. 1085 00:50:18,420 --> 00:50:18,920 Yeah? 1086 00:50:18,920 --> 00:50:20,940 >> AUDIENCE: So now Google knows where you're going. 1087 00:50:20,940 --> 00:50:23,856 >> DAVID J. MALAN: Google knows literally every website you are visiting, 1088 00:50:23,856 --> 00:50:26,080 because you are literally asking them, hey Google, 1089 00:50:26,080 --> 00:50:28,360 can you translate yale.edu for me? 1090 00:50:28,360 --> 00:50:32,430 Or hey Google, can you translate this other website address for me 1091 00:50:32,430 --> 00:50:33,334 into an IP address. 1092 00:50:33,334 --> 00:50:35,750 And so they're-- I have no idea what you're talking about. 1093 00:50:35,750 --> 00:50:38,080 And so they know everything about you. 1094 00:50:38,080 --> 00:50:42,180 So realize that this is a free service with a purpose from their perspective 1095 00:50:42,180 --> 00:50:42,680 as well. 1096 00:50:42,680 --> 00:50:44,420 But it can certainly get you out of a bind. 1097 00:50:44,420 --> 00:50:46,380 >> Now just to address one other issue that often comes up 1098 00:50:46,380 --> 00:50:48,640 among students, especially when traveling internationally 1099 00:50:48,640 --> 00:50:50,765 in certain countries like China, where there indeed 1100 00:50:50,765 --> 00:50:54,559 is a Great Firewall of China whereby the government there blocks quite 1101 00:50:54,559 --> 00:50:56,100 a bit of traffic at different levels. 1102 00:50:56,100 --> 00:50:58,141 You don't have to just block traffic at the level 1103 00:50:58,141 --> 00:51:01,240 we're talking here, DNS or otherwise, you can block it at other levels. 1104 00:51:01,240 --> 00:51:04,030 >> And in fact, just to be clear, a firewall 1105 00:51:04,030 --> 00:51:08,400 can operate even more simply than just having the system administrators change 1106 00:51:08,400 --> 00:51:09,500 DNS settings. 1107 00:51:09,500 --> 00:51:12,920 A firewall, a device in between us and the rest of the world, 1108 00:51:12,920 --> 00:51:16,850 could just block any outgoing requests to the IP address 1109 00:51:16,850 --> 00:51:21,240 for Facebook on port 80, or the IP address for harvard.edu, 1110 00:51:21,240 --> 00:51:22,580 or the IP address of anything. 1111 00:51:22,580 --> 00:51:26,280 So a firewall can look at your envelopes' IP addresses and even port 1112 00:51:26,280 --> 00:51:29,384 numbers, and if Yale wanted to, it could just stop all of us 1113 00:51:29,384 --> 00:51:32,550 from even using FTP anymore, which would probably be a good thing because it 1114 00:51:32,550 --> 00:51:34,320 is indeed an insecure protocol. 1115 00:51:34,320 --> 00:51:37,620 Yale could even stop us from visiting the entirety of the web 1116 00:51:37,620 --> 00:51:42,632 just by blocking all port traffic on number 80 as well. 1117 00:51:42,632 --> 00:51:43,840 So that might be another way. 1118 00:51:43,840 --> 00:51:45,740 And there's even fancier ways as well. 1119 00:51:45,740 --> 00:51:47,770 >> But when you're traveling abroad for instance, 1120 00:51:47,770 --> 00:51:50,740 or if you're in an internet cafe, or if you're anywhere where there's 1121 00:51:50,740 --> 00:51:53,179 blockages or threats, what can you do? 1122 00:51:53,179 --> 00:51:56,220 Well, if you go down the street to Starbucks or you travel in an airport, 1123 00:51:56,220 --> 00:51:58,780 generally you can just hop on the Wi-Fi by choosing 1124 00:51:58,780 --> 00:52:04,631 like, JFK Wi-Fi of LaGuardia Wi-Fi, or Logan Airport Wi-Fi, or what not. 1125 00:52:04,631 --> 00:52:05,880 And it's not encrypted, right? 1126 00:52:05,880 --> 00:52:06,949 There's no padlock icon. 1127 00:52:06,949 --> 00:52:09,490 And you're probably not prompted for a username and password. 1128 00:52:09,490 --> 00:52:11,240 You're just prompted with some stupid form 1129 00:52:11,240 --> 00:52:15,260 to say like, I agree to use this only for 30 minutes, or something like that. 1130 00:52:15,260 --> 00:52:18,761 >> But there's no encryption between you and Starbucks Wi-Fi access 1131 00:52:18,761 --> 00:52:20,760 point, the things with the antennas on the wall. 1132 00:52:20,760 --> 00:52:24,840 There's no encryption between you and the airport's Wi-Fi signals. 1133 00:52:24,840 --> 00:52:29,060 >> And so technically, that creepy person sitting a few seats down from you 1134 00:52:29,060 --> 00:52:31,970 in Starbucks or at the airport could be, with the right software, 1135 00:52:31,970 --> 00:52:35,164 watching all of your wireless traffic on his or her laptop. 1136 00:52:35,164 --> 00:52:37,080 It's not that hard to put a laptop into what's 1137 00:52:37,080 --> 00:52:39,880 called promiscuous mode, which as the name suggests, 1138 00:52:39,880 --> 00:52:41,760 means you're kind of loose with the rules. 1139 00:52:41,760 --> 00:52:44,740 And it just listens not only for traffic meant for it, 1140 00:52:44,740 --> 00:52:47,700 but also to everyone else's traffic within range. 1141 00:52:47,700 --> 00:52:50,550 >> And by that logic, it can see all of the packets of information 1142 00:52:50,550 --> 00:52:51,360 you're receiving. 1143 00:52:51,360 --> 00:52:53,510 And if those packets aren't encrypted, you 1144 00:52:53,510 --> 00:52:56,680 are putting yourself at risk of your emails, or your messages, 1145 00:52:56,680 --> 00:52:58,620 or anything else getting exposed. 1146 00:52:58,620 --> 00:53:01,220 >> So even if you're not abroad but you're just in Starbucks, 1147 00:53:01,220 --> 00:53:03,800 or you're on some random person's Wi-Fi that's not encrypted, 1148 00:53:03,800 --> 00:53:05,410 a VPN is a good thing. 1149 00:53:05,410 --> 00:53:07,410 A VPN is a virtual private network. 1150 00:53:07,410 --> 00:53:09,480 And it's a technology that allows you to have 1151 00:53:09,480 --> 00:53:14,560 an encrypted, a scrambled connection-- fancier than Caesar or Vigenere-- 1152 00:53:14,560 --> 00:53:17,420 between your laptop, or your phone, or your desktop, 1153 00:53:17,420 --> 00:53:22,460 and a server elsewhere, like a server on Yale's campus. 1154 00:53:22,460 --> 00:53:25,840 >> And if you're traveling abroad-- and in fact, you find this in hotels 1155 00:53:25,840 --> 00:53:26,560 all the time. 1156 00:53:26,560 --> 00:53:28,580 And especially as aspiring computer scientists 1157 00:53:28,580 --> 00:53:32,090 where you guys might, as geeks, want to use ports other than 80, 1158 00:53:32,090 --> 00:53:35,770 and ports other than 443-- and in fact for problem set six, 1159 00:53:35,770 --> 00:53:39,280 we are going to play with multiple TCP ports just by choice-- a lot 1160 00:53:39,280 --> 00:53:42,940 of hotels, and shops, and networks just block that kind of stuff 1161 00:53:42,940 --> 00:53:45,970 because they somewhat naively, or ignorantly, just think 1162 00:53:45,970 --> 00:53:48,010 that no one needs those other ports. 1163 00:53:48,010 --> 00:53:51,150 >> And so by using a VPN can you circumvent those kinds of restrictions, 1164 00:53:51,150 --> 00:53:54,050 because what a VPN does is it allows you at Starbucks, 1165 00:53:54,050 --> 00:53:58,630 or the airport, or anywhere in the world to connect encryptedly to yale.edu, 1166 00:53:58,630 --> 00:54:02,950 to some server here on campus, and then tunnel, so to speak, 1167 00:54:02,950 --> 00:54:06,570 all of your traffic from wherever you are through Yale, at which point 1168 00:54:06,570 --> 00:54:08,720 it then goes to its final destination. 1169 00:54:08,720 --> 00:54:11,150 >> But by encrypting it, you avoid any of these kinds 1170 00:54:11,150 --> 00:54:15,380 of filters or the imposition that some local network has imposed. 1171 00:54:15,380 --> 00:54:17,980 And plus, you have a much more robust defense 1172 00:54:17,980 --> 00:54:19,730 against creepy people around you who might 1173 00:54:19,730 --> 00:54:21,300 be trying to listen in on your traffic. 1174 00:54:21,300 --> 00:54:24,591 There could still be creepy people here back home at Yale watching your traffic 1175 00:54:24,591 --> 00:54:28,440 as it comes out of the VPN, but at least you've pushed the threat farther away. 1176 00:54:28,440 --> 00:54:30,490 And it's here too, a trade off. 1177 00:54:30,490 --> 00:54:33,645 >> Now of course, if you are in China or even in the cafe, 1178 00:54:33,645 --> 00:54:35,770 and you're tunneling all your traffic through Yale, 1179 00:54:35,770 --> 00:54:37,590 what price are we paying perhaps? 1180 00:54:37,590 --> 00:54:38,272 >> AUDIENCE: Speed. 1181 00:54:38,272 --> 00:54:39,480 DAVID J. MALAN: Speed, right? 1182 00:54:39,480 --> 00:54:41,430 There's got to be some math or some fanciness involved 1183 00:54:41,430 --> 00:54:42,574 in the actual encryption. 1184 00:54:42,574 --> 00:54:44,990 There could be thousands of miles of distance or thousands 1185 00:54:44,990 --> 00:54:47,250 of miles of cables between you and Yale. 1186 00:54:47,250 --> 00:54:49,800 And it's really bad if you're in China, for instance, 1187 00:54:49,800 --> 00:54:51,650 and you want to visit a website in China. 1188 00:54:51,650 --> 00:54:54,230 And so your data is going to the US, and then back to China 1189 00:54:54,230 --> 00:54:56,620 just because you're encrypting it through this tunnel. 1190 00:54:56,620 --> 00:54:59,960 >> But it solves technical and work problems alike. 1191 00:54:59,960 --> 00:55:02,050 But it all boils down to these very simple ideas. 1192 00:55:02,050 --> 00:55:06,530 And Harvard, for those curious, has one here as well, at vpn.harvard.edu, 1193 00:55:06,530 --> 00:55:09,150 which operates just like Yale's. 1194 00:55:09,150 --> 00:55:12,580 >> So with all that said, why is this whole network useful? 1195 00:55:12,580 --> 00:55:14,080 And what can we start doing with it? 1196 00:55:14,080 --> 00:55:15,630 Well, let's make this now more real. 1197 00:55:15,630 --> 00:55:17,610 This is the acronym with which most of us 1198 00:55:17,610 --> 00:55:22,140 are probably super familiar-- HTTP-- which stands for hyper text transfer 1199 00:55:22,140 --> 00:55:22,950 protocol. 1200 00:55:22,950 --> 00:55:26,460 And this just means this is the language, the protocol 1201 00:55:26,460 --> 00:55:29,140 that web browsers and web server speak. 1202 00:55:29,140 --> 00:55:31,437 >> The P in HTTP is indeed a protocol. 1203 00:55:31,437 --> 00:55:33,270 And a protocol is just a set of conventions. 1204 00:55:33,270 --> 00:55:36,690 We've seen IP-- internet protocol-- TCP-- transmission control 1205 00:55:36,690 --> 00:55:38,290 protocol-- and HTTP. 1206 00:55:38,290 --> 00:55:40,570 But what is this stupid thing of a protocol? 1207 00:55:40,570 --> 00:55:41,930 It's just a set of conventions. 1208 00:55:41,930 --> 00:55:43,760 >> So if I sort of come down here, and I want to greet you. 1209 00:55:43,760 --> 00:55:44,930 I would say hi, my name is David. 1210 00:55:44,930 --> 00:55:45,600 >> AUDIENCE: Luis. 1211 00:55:45,600 --> 00:55:46,475 >> DAVID J. MALAN: Luis. 1212 00:55:46,475 --> 00:55:49,360 We have this stupid human convention of shaking hands here. 1213 00:55:49,360 --> 00:55:50,570 But that's a protocol, right? 1214 00:55:50,570 --> 00:55:51,470 I extended my hand. 1215 00:55:51,470 --> 00:55:52,530 Luis extended his hand. 1216 00:55:52,530 --> 00:55:53,070 We did this. 1217 00:55:53,070 --> 00:55:54,790 And then complete, done. 1218 00:55:54,790 --> 00:55:58,100 >> And that's exactly the same spirit of a computer protocol 1219 00:55:58,100 --> 00:56:02,770 where as in HTTP, what happens is this. 1220 00:56:02,770 --> 00:56:05,520 If you are the computer on the left here, and there is some web 1221 00:56:05,520 --> 00:56:07,230 server there on the right. 1222 00:56:07,230 --> 00:56:11,130 And the computer on the left wants to request information from that server. 1223 00:56:11,130 --> 00:56:13,140 It's kind of a bi-directional operation. 1224 00:56:13,140 --> 00:56:15,800 The browser on the left asks for some web page. 1225 00:56:15,800 --> 00:56:18,404 The server on the right responds with some web page. 1226 00:56:18,404 --> 00:56:20,570 And we'll see what form those take in just a moment. 1227 00:56:20,570 --> 00:56:24,311 >> And it turns out that those computers-- that browser and server, or client 1228 00:56:24,311 --> 00:56:25,310 and server, so to speak. 1229 00:56:25,310 --> 00:56:28,120 Much like a restaurant where the client is asking for something, 1230 00:56:28,120 --> 00:56:31,670 and the server is bringing him or her something-- get 1231 00:56:31,670 --> 00:56:33,170 is kind of the operative word. 1232 00:56:33,170 --> 00:56:38,560 Literally inside of the envelope that my browser sends from here to a web 1233 00:56:38,560 --> 00:56:40,880 server is the word get. 1234 00:56:40,880 --> 00:56:42,700 Like I want to get today's news. 1235 00:56:42,700 --> 00:56:45,370 I want to get my Facebook news feed, or I 1236 00:56:45,370 --> 00:56:47,330 want to get some page from the server. 1237 00:56:47,330 --> 00:56:50,760 >> Specifically, this is what's going on inside of that envelope. 1238 00:56:50,760 --> 00:56:53,810 So I, with Cole, essentially sent Cole a response. 1239 00:56:53,810 --> 00:56:56,750 If you imagine that Cole actually wanted a picture of Rick Astley, 1240 00:56:56,750 --> 00:57:00,700 he might have sent me a request similar in spirit to this. 1241 00:57:00,700 --> 00:57:04,670 Inside of his envelope to me, where I'm now playing the role of Google, 1242 00:57:04,670 --> 00:57:08,270 would be a request that literally says, get, and then a forward slash-- 1243 00:57:08,270 --> 00:57:10,636 and you've probably seen forward slashes in URLs before. 1244 00:57:10,636 --> 00:57:13,260 It just means give me the default page, the default Rick Astley 1245 00:57:13,260 --> 00:57:14,560 picture in this case. 1246 00:57:14,560 --> 00:57:20,100 >> And by the way, Cole speaks the language HTTP version 1.1, or the protocol 1.1. 1247 00:57:20,100 --> 00:57:22,090 And it turns out there's an older version 1.0. 1248 00:57:22,090 --> 00:57:23,910 But computers tend to use 1.1. 1249 00:57:23,910 --> 00:57:27,840 >> The second line is a useful thing that will come back to perhaps before long. 1250 00:57:27,840 --> 00:57:31,900 But it's just a specification to me, the recipient, that the thing I want 1251 00:57:31,900 --> 00:57:33,586 is www.google.com. 1252 00:57:33,586 --> 00:57:36,340 Because it's very possible these days for dozens, 1253 00:57:36,340 --> 00:57:39,510 hundreds of websites with different domain names to all 1254 00:57:39,510 --> 00:57:40,735 live on the same server. 1255 00:57:40,735 --> 00:57:42,860 It's not going to be true so much in Google's case. 1256 00:57:42,860 --> 00:57:45,261 But in a smaller company's case, could absolutely be. 1257 00:57:45,261 --> 00:57:47,260 So Cole is just kind of putting in the envelope, 1258 00:57:47,260 --> 00:57:50,840 by the way, when this reaches your IP address on port 80, 1259 00:57:50,840 --> 00:57:54,450 just be sure that you know I want www.google.com, not 1260 00:57:54,450 --> 00:57:56,740 some other random website on the same server. 1261 00:57:56,740 --> 00:58:00,360 >> What I then respond to Cole with, at the end of the day, is a picture. 1262 00:58:00,360 --> 00:58:02,920 But atop that picture inside of the envelope 1263 00:58:02,920 --> 00:58:05,600 is actually some text, where I say, OK. 1264 00:58:05,600 --> 00:58:07,970 I speak HTTP version 1.1 also. 1265 00:58:07,970 --> 00:58:09,200 200. 1266 00:58:09,200 --> 00:58:11,730 Which is a status code that most of us have probably 1267 00:58:11,730 --> 00:58:14,185 never seen, because it means OK. 1268 00:58:14,185 --> 00:58:16,810 And this is good, because it means I am responding successfully 1269 00:58:16,810 --> 00:58:18,040 to Cole's request. 1270 00:58:18,040 --> 00:58:21,930 >> What numbers have you probably seen on the web that are not OK? 1271 00:58:21,930 --> 00:58:22,780 >> AUDIENCE: 404. 1272 00:58:22,780 --> 00:58:24,830 >> DAVID J. MALAN: 404-- file not found. 1273 00:58:24,830 --> 00:58:27,520 So indeed, any time you've seen one of those annoying file not 1274 00:58:27,520 --> 00:58:31,010 found errors, because the web page is dead, 1275 00:58:31,010 --> 00:58:34,190 or because you mistyped a URL, that just means 1276 00:58:34,190 --> 00:58:37,600 that the little envelope that your computer received from the server 1277 00:58:37,600 --> 00:58:42,670 contained a message HTTP 1.1, 404-- not found. 1278 00:58:42,670 --> 00:58:44,930 That file or that request you made is not found. 1279 00:58:44,930 --> 00:58:48,660 >> Moreover, inside of the envelope typically is this line, content type. 1280 00:58:48,660 --> 00:58:51,080 Sometimes it's HTML, something we'll soon see. 1281 00:58:51,080 --> 00:58:52,225 Sometimes it's a JPEG. 1282 00:58:52,225 --> 00:58:53,100 Sometimes it's a GIF. 1283 00:58:53,100 --> 00:58:56,060 Sometimes it's a movie file, an audio file, any number of things. 1284 00:58:56,060 --> 00:59:00,059 So inside of the envelope is just a little hint as to what I am receiving. 1285 00:59:00,059 --> 00:59:03,100 There's other status codes too, some of which we'll explore in P set six, 1286 00:59:03,100 --> 00:59:05,890 and you'll stumble across in P set seven and/or eight. 1287 00:59:05,890 --> 00:59:08,580 But some here, like 404 we've seen. 1288 00:59:08,580 --> 00:59:11,700 Forbidden, 403, means like the permissions are wrong, 1289 00:59:11,700 --> 00:59:14,740 like you haven't kind of configured it correctly. 1290 00:59:14,740 --> 00:59:17,830 301 and 302, we rarely see visually. 1291 00:59:17,830 --> 00:59:19,150 But they mean redirect. 1292 00:59:19,150 --> 00:59:21,650 Any time you've gone to one URL and you've been magically 1293 00:59:21,650 --> 00:59:24,410 sent somewhere else, that's because the browser has sent back 1294 00:59:24,410 --> 00:59:27,210 an envelope containing the number 301 or 302, 1295 00:59:27,210 --> 00:59:30,790 and the URL that it wants your browser to go to instead. 1296 00:59:30,790 --> 00:59:32,010 >> 500 is horrible. 1297 00:59:32,010 --> 00:59:34,842 You'll see it before long, probably in P set six or P set seven. 1298 00:59:34,842 --> 00:59:37,050 And it generally means there's some bug in your code, 1299 00:59:37,050 --> 00:59:40,000 because indeed we'll be writing code that responds to web requests. 1300 00:59:40,000 --> 00:59:42,110 And you've just got some error in logic or syntax, 1301 00:59:42,110 --> 00:59:43,820 and the server can't handle it. 1302 00:59:43,820 --> 00:59:47,460 >> So let's see how we can now leverage and understand these requests as follows. 1303 00:59:47,460 --> 00:59:50,716 If I go to, let's say, google.com. 1304 00:59:50,716 --> 00:59:55,240 Let me go to www.google.com. 1305 00:59:55,240 --> 01:00:01,220 And for demonstration's sake, let's see, I need to go to Settings here. 1306 01:00:01,220 --> 01:00:03,180 I'm going to go to Search Settings. 1307 01:00:03,180 --> 01:00:08,236 And Google has increasingly annoying features, but useful features. 1308 01:00:08,236 --> 01:00:11,110 So Google has this thing like instant results where you start typing, 1309 01:00:11,110 --> 01:00:12,889 and automatically things start appearing. 1310 01:00:12,889 --> 01:00:14,680 And that's all fine and technically useful, 1311 01:00:14,680 --> 01:00:16,420 and we'll understand before long how this works. 1312 01:00:16,420 --> 01:00:18,429 But for now, I'm turning off instant results, 1313 01:00:18,429 --> 01:00:20,720 because I want my browser to sort of work old school so 1314 01:00:20,720 --> 01:00:22,940 that I can see what's going on. 1315 01:00:22,940 --> 01:00:23,840 >> So now I'm back here. 1316 01:00:23,840 --> 01:00:25,090 And I want to search for cats. 1317 01:00:25,090 --> 01:00:30,351 And notice I'm seeing some suggestions, some very benign suggestions 1318 01:00:30,351 --> 01:00:30,850 thankfully. 1319 01:00:30,850 --> 01:00:34,730 And now if I hit Enter, let's see what happens. 1320 01:00:34,730 --> 01:00:35,850 >> So there are some cats. 1321 01:00:35,850 --> 01:00:37,540 And the top hit is on Wikipedia. 1322 01:00:37,540 --> 01:00:39,820 But today we care about the technology up here. 1323 01:00:39,820 --> 01:00:42,479 So the URL to which I've been sent is this here. 1324 01:00:42,479 --> 01:00:44,520 And there's some stuff I don't really understand. 1325 01:00:44,520 --> 01:00:47,430 >> So I'm going to go ahead, because I kind of know how Google works, 1326 01:00:47,430 --> 01:00:50,700 and I'm going to distill this URL into its simplest form. 1327 01:00:50,700 --> 01:00:52,510 And now I'm going to hit Enter again. 1328 01:00:52,510 --> 01:00:53,360 And it still works. 1329 01:00:53,360 --> 01:00:55,800 I have a page of results all about cats. 1330 01:00:55,800 --> 01:00:58,460 >> But notice the simplicity of my URL. 1331 01:00:58,460 --> 01:01:00,820 It turns out this is how much of the web works. 1332 01:01:00,820 --> 01:01:03,500 The web is just a whole bunch of computers 1333 01:01:03,500 --> 01:01:05,320 running software that take input. 1334 01:01:05,320 --> 01:01:07,480 It's not get string style input. 1335 01:01:07,480 --> 01:01:09,670 It's not command line arguments like we're used to. 1336 01:01:09,670 --> 01:01:13,449 They take input, these web servers, by way of the URLs quite often. 1337 01:01:13,449 --> 01:01:15,240 And any time you've searched for something, 1338 01:01:15,240 --> 01:01:17,448 any time you've logged into Facebook, any time you've 1339 01:01:17,448 --> 01:01:20,090 done anything interactive with a web page, what you're doing 1340 01:01:20,090 --> 01:01:24,340 is effectively submitting a form, so to speak-- text boxes, check boxes, 1341 01:01:24,340 --> 01:01:27,880 little circles, and whatnot that send information from you to the server. 1342 01:01:27,880 --> 01:01:31,960 >> And it turns out that the web server knows to look at that URL 1343 01:01:31,960 --> 01:01:36,040 and parse it, like look at it character by character looking for anything 1344 01:01:36,040 --> 01:01:38,000 interesting after a question mark. 1345 01:01:38,000 --> 01:01:40,910 Because after a question mark, it turns out, is going to come 1346 01:01:40,910 --> 01:01:42,730 a bunch of key value pairs. 1347 01:01:42,730 --> 01:01:44,570 I mean key=value. 1348 01:01:44,570 --> 01:01:48,130 And then if there's multiple-- maybe an ampersand, some other key=value, 1349 01:01:48,130 --> 01:01:50,200 ampersand, key=value. 1350 01:01:50,200 --> 01:01:54,560 >> So we've kind of seen this idea before where something has a value. 1351 01:01:54,560 --> 01:01:55,880 It's just a new format here. 1352 01:01:55,880 --> 01:01:59,040 And I just know, by convention, Google uses q for query. 1353 01:01:59,040 --> 01:02:02,670 And then if I want to search for dogs, I can manually search for dogs like that. 1354 01:02:02,670 --> 01:02:06,360 And then I'm apparently getting some search results involving dogs. 1355 01:02:06,360 --> 01:02:07,727 >> So that seems to be interesting. 1356 01:02:07,727 --> 01:02:10,060 And indeed, what's going on underneath the hood is this. 1357 01:02:10,060 --> 01:02:11,460 Let me do this. 1358 01:02:11,460 --> 01:02:13,620 This is a-- let's see. 1359 01:02:13,620 --> 01:02:16,320 Let me go back over here for just a moment. 1360 01:02:16,320 --> 01:02:19,810 >> We'll see that there's other ways to submit information. 1361 01:02:19,810 --> 01:02:25,340 So if I'm logging into Facebook, or Gmail, or any other popular website, 1362 01:02:25,340 --> 01:02:30,720 it seems kind of bad if whatever I typed into the search box ends up in my URL, 1363 01:02:30,720 --> 01:02:32,850 in my browser's address bar. 1364 01:02:32,850 --> 01:02:33,690 Why? 1365 01:02:33,690 --> 01:02:35,395 Why is that mildly worrisome? 1366 01:02:35,395 --> 01:02:39,567 1367 01:02:39,567 --> 01:02:40,067 Yeah? 1368 01:02:40,067 --> 01:02:41,380 AUDIENCE: Type in a password. 1369 01:02:41,380 --> 01:02:41,720 DAVID J. MALAN: Yeah. 1370 01:02:41,720 --> 01:02:44,000 So what if what I've typed in is my password? 1371 01:02:44,000 --> 01:02:47,881 I kind of don't want it so obviously visible in my browser's address bar. 1372 01:02:47,881 --> 01:02:50,630 One, because my annoying roommate tends to watch over my shoulder, 1373 01:02:50,630 --> 01:02:53,980 and he or she can now see, even though it was bullets when I'm typing it in, 1374 01:02:53,980 --> 01:02:54,810 little circles. 1375 01:02:54,810 --> 01:02:56,010 Now it's in my address bar. 1376 01:02:56,010 --> 01:02:59,270 >> Moreover, what's true about stuff you tend to type in the address bar. 1377 01:02:59,270 --> 01:03:00,230 >> AUDIENCE: [INAUDIBLE] 1378 01:03:00,230 --> 01:03:01,505 >> DAVID J. MALAN: What's that? 1379 01:03:01,505 --> 01:03:02,630 AUDIENCE: It gets sent out. 1380 01:03:02,630 --> 01:03:03,510 DAVID J. MALAN: It gets sent out. 1381 01:03:03,510 --> 01:03:05,102 And also, it gets remembered. 1382 01:03:05,102 --> 01:03:07,060 Because the next time you type things up there, 1383 01:03:07,060 --> 01:03:10,440 often it autocompletes and it remembers what you've typed before. 1384 01:03:10,440 --> 01:03:13,850 And so there's this veritable history that your sibling, or your roommate, 1385 01:03:13,850 --> 01:03:16,830 or whoever can walk through to pretty much see every website 1386 01:03:16,830 --> 01:03:19,760 you visited because it's logged in that address bar. 1387 01:03:19,760 --> 01:03:22,790 >> Moreover, suppose you want to upload a photo to Facebook. 1388 01:03:22,790 --> 01:03:26,520 How in the world are you going to put a photo in a URL? 1389 01:03:26,520 --> 01:03:30,217 >> Well it turns out you can do it in some way, but it's certainly non-obvious. 1390 01:03:30,217 --> 01:03:33,050 And so there's this other way of sending information in an envelope, 1391 01:03:33,050 --> 01:03:35,680 not via a GET, but via something called POST. 1392 01:03:35,680 --> 01:03:38,060 And in theory, it looks pretty much the same. 1393 01:03:38,060 --> 01:03:41,270 Instead of the word GET, we say POST, and then the same kind of format. 1394 01:03:41,270 --> 01:03:43,310 >> For instance, this is a screenshot of what 1395 01:03:43,310 --> 01:03:46,920 it might look like if I try logging into Facebook, which sends me to a file 1396 01:03:46,920 --> 01:03:51,230 called login.php, which is actually still to this day named as such. 1397 01:03:51,230 --> 01:03:53,910 It's the same filename Mark gave to it many years ago. 1398 01:03:53,910 --> 01:03:58,520 It is the program he wrote in PHP via which users can login to the website. 1399 01:03:58,520 --> 01:04:00,370 >> But you need to send some additional input. 1400 01:04:00,370 --> 01:04:05,170 And rather than it going after the file name as it did before with cats-- 1401 01:04:05,170 --> 01:04:09,720 q=cats-- it can go lower in the request, deeper inside of the envelope if you 1402 01:04:09,720 --> 01:04:12,440 will where no one can see it, and where it does not end up 1403 01:04:12,440 --> 01:04:15,670 in the user's browser bar, and therefore not remember for people to snoop 1404 01:04:15,670 --> 01:04:16,290 around. 1405 01:04:16,290 --> 01:04:21,260 >> And so here my email address and my fake password actually go. 1406 01:04:21,260 --> 01:04:27,400 And if Facebook is using not HTTP, but HTTPS, 1407 01:04:27,400 --> 01:04:30,710 this will all be encrypted, scrambled, ala Caesar or Vigenere, 1408 01:04:30,710 --> 01:04:34,960 but more fancily so that no one can actually see this request. 1409 01:04:34,960 --> 01:04:38,120 >> And so indeed, any time you have a URL that starts with HTTPS, 1410 01:04:38,120 --> 01:04:39,560 it just means it's encrypted. 1411 01:04:39,560 --> 01:04:42,710 But at the end of the day, what's actually inside of these envelopes? 1412 01:04:42,710 --> 01:04:44,070 This was super low level. 1413 01:04:44,070 --> 01:04:46,240 And fortunately, we're not going to necessarily have 1414 01:04:46,240 --> 01:04:49,310 to go so low level every time to start writing interesting software. 1415 01:04:49,310 --> 01:04:51,060 We can start to take the ideas of week one 1416 01:04:51,060 --> 01:04:54,020 through five, assume that there is now this infrastructure that 1417 01:04:54,020 --> 01:04:57,160 lets us write software that operates on the web, 1418 01:04:57,160 --> 01:05:00,120 and it's going to allow us this coming week to start 1419 01:05:00,120 --> 01:05:01,840 looking at something called HTML. 1420 01:05:01,840 --> 01:05:04,750 This is the stuff that is even deeper inside of the envelope, 1421 01:05:04,750 --> 01:05:06,150 but it's the stuff we're going to start writing. 1422 01:05:06,150 --> 01:05:08,020 And it's the stuff more interestingly, we're 1423 01:05:08,020 --> 01:05:11,420 going to write programs that starts generating automatically 1424 01:05:11,420 --> 01:05:15,410 so that our websites are not hard coded, but take input and produce output. 1425 01:05:15,410 --> 01:05:18,810 >> This is perhaps the simplest web page you can make in the world. 1426 01:05:18,810 --> 01:05:23,000 I can indeed open up something stupid like TextEdit 1427 01:05:23,000 --> 01:05:26,160 on my Mac, which just gives me a simple text window like this. 1428 01:05:26,160 --> 01:05:29,510 PC users have Notepad.ext, which is very similar in spirit. 1429 01:05:29,510 --> 01:05:33,212 >> And I can literally type out this-- DOCTYPE HTML, 1430 01:05:33,212 --> 01:05:34,420 which looks a little cryptic. 1431 01:05:34,420 --> 01:05:35,850 But we'll come back to that. 1432 01:05:35,850 --> 01:05:38,730 HTML, with these weird angled brackets and slashes, 1433 01:05:38,730 --> 01:05:42,240 inside of which now I'm going to say here comes the head of my web page. 1434 01:05:42,240 --> 01:05:45,220 Inside of that, I just know, and you'll soon know, 1435 01:05:45,220 --> 01:05:47,850 that I can put the title of my web page. 1436 01:05:47,850 --> 01:05:49,720 And then below the head of the web page is 1437 01:05:49,720 --> 01:05:51,972 going to go to the so-called body of the web page. 1438 01:05:51,972 --> 01:05:54,180 And I'm just indenting just like in C to kind of keep 1439 01:05:54,180 --> 01:05:57,620 things nicely readable stylistically. 1440 01:05:57,620 --> 01:06:04,745 And now I'm going to save this as a file on my desktop, called hello.html. 1441 01:06:04,745 --> 01:06:06,770 >> And I'm going to tell it yes, use HTML. 1442 01:06:06,770 --> 01:06:09,690 Don't change it to .txt, even though all this is a text file, 1443 01:06:09,690 --> 01:06:12,130 just like a C program written with a text editor. 1444 01:06:12,130 --> 01:06:15,080 Although not in CS50 IDE at the moment, just here on my Mac. 1445 01:06:15,080 --> 01:06:18,490 >> And if I now go to my desktop, you'll see hello.html. 1446 01:06:18,490 --> 01:06:20,720 If I double click this, it will open Chrome. 1447 01:06:20,720 --> 01:06:23,260 And even though this file happens to live on my desktop, 1448 01:06:23,260 --> 01:06:26,550 that is perhaps the simplest web page I could make. 1449 01:06:26,550 --> 01:06:30,080 >> Notice that the title of the tab way up top is hello world. 1450 01:06:30,080 --> 01:06:32,470 The body of the web page is indeed hello world. 1451 01:06:32,470 --> 01:06:35,830 And all I've done to get to this point is implement, 1452 01:06:35,830 --> 01:06:38,342 or is write a new language, called HTML. 1453 01:06:38,342 --> 01:06:40,300 It's not a programming language like C. There's 1454 01:06:40,300 --> 01:06:42,508 not going to be conditions, and loops, and functions. 1455 01:06:42,508 --> 01:06:46,560 It's a markup language, in which case you just tell the receiving 1456 01:06:46,560 --> 01:06:48,410 program what you want to do. 1457 01:06:48,410 --> 01:06:51,195 This means hey browser, here comes an HTML page. 1458 01:06:51,195 --> 01:06:53,040 Hey browser, here comes the head of my page. 1459 01:06:53,040 --> 01:06:55,130 Hey browser, here comes the body of my page. 1460 01:06:55,130 --> 01:06:57,100 Hey browser, that's it for the body. 1461 01:06:57,100 --> 01:06:59,350 That's it for the HTML page. 1462 01:06:59,350 --> 01:07:03,560 >> And with those simple definitions alone, we'll soon see that one, 1463 01:07:03,560 --> 01:07:05,122 we can represent this as a tree. 1464 01:07:05,122 --> 01:07:06,080 But more on that later. 1465 01:07:06,080 --> 01:07:08,788 So this will all interconnect to our most recent data structures. 1466 01:07:08,788 --> 01:07:12,460 Two, we'll introduce this stupid joke. 1467 01:07:12,460 --> 01:07:15,680 This is an actual tattoo that this guy had on his neck. 1468 01:07:15,680 --> 01:07:19,660 It's probably funny the first week or two, and thereafter, maybe not so much. 1469 01:07:19,660 --> 01:07:22,960 >> But HTML, and even the web page I just made, super mind 1470 01:07:22,960 --> 01:07:25,670 numbingly disappointing-- just saying hello world 1471 01:07:25,670 --> 01:07:27,210 in black text on a white background. 1472 01:07:27,210 --> 01:07:28,680 Surely we can do much better. 1473 01:07:28,680 --> 01:07:31,552 And we'll do so by introducing another language called CSS. 1474 01:07:31,552 --> 01:07:34,760 This too not a programming language-- no loops, and conditions, or for loops, 1475 01:07:34,760 --> 01:07:38,470 but really, just syntax by which we can say, make this text big. 1476 01:07:38,470 --> 01:07:39,415 Make this text small. 1477 01:07:39,415 --> 01:07:40,040 Right align it. 1478 01:07:40,040 --> 01:07:40,650 Left align it. 1479 01:07:40,650 --> 01:07:41,195 Make it pink. 1480 01:07:41,195 --> 01:07:41,820 Make it purple. 1481 01:07:41,820 --> 01:07:42,650 Make it blue. 1482 01:07:42,650 --> 01:07:44,860 Or do any number of other visual effects. 1483 01:07:44,860 --> 01:07:48,590 And so we'll see how to start stylizing web pages so that they look in a manner 1484 01:07:48,590 --> 01:07:50,480 closer to what we want. 1485 01:07:50,480 --> 01:07:56,930 >> And lastly, we have indeed ruined perhaps much of TV and film for you. 1486 01:07:56,930 --> 01:07:58,930 I thought we'd end here with our final seconds 1487 01:07:58,930 --> 01:08:03,700 on a final clip that shows you how hacking on the internet works. 1488 01:08:03,700 --> 01:08:06,250 If we could dim the lights one final time. 1489 01:08:06,250 --> 01:08:07,250 >> [VIDEO PLAYBACK] 1490 01:08:07,250 --> 01:08:09,520 >> -No way. 1491 01:08:09,520 --> 01:08:10,650 I'm getting hacked. 1492 01:08:10,650 --> 01:08:11,770 >> -Okorsky? 1493 01:08:11,770 --> 01:08:14,230 >> -No-- no, this is major. 1494 01:08:14,230 --> 01:08:17,074 They've already burned through the NCIS public firewall. 1495 01:08:17,074 --> 01:08:19,990 -Well, isolate the node and dump them on the other side of the router. 1496 01:08:19,990 --> 01:08:20,990 -I'm trying. 1497 01:08:20,990 --> 01:08:23,990 It's moving too fast. 1498 01:08:23,990 --> 01:08:25,179 >> -Oh, this is not good. 1499 01:08:25,179 --> 01:08:27,470 They're using our connection [INAUDIBLE] this database. 1500 01:08:27,470 --> 01:08:28,458 Sever it. 1501 01:08:28,458 --> 01:08:28,958 -I can't. 1502 01:08:28,958 --> 01:08:29,454 It's a point attack. 1503 01:08:29,454 --> 01:08:31,438 He or she is only going after my machine. 1504 01:08:31,438 --> 01:08:32,430 >> -It's not possible. 1505 01:08:32,430 --> 01:08:33,847 There's DOD level mine encryption. 1506 01:08:33,847 --> 01:08:35,055 It would take months to get-- 1507 01:08:35,055 --> 01:08:35,857 -Hey, what is that? 1508 01:08:35,857 --> 01:08:36,398 A video game? 1509 01:08:36,398 --> 01:08:37,886 >> -No Tony, we're getting hacked. 1510 01:08:37,886 --> 01:08:40,795 >> -If they get in Abby's computer, the entire NCIS network is next. 1511 01:08:40,795 --> 01:08:42,050 >> -I can't stop him. 1512 01:08:42,050 --> 01:08:43,050 Do something McGee. 1513 01:08:43,050 --> 01:08:44,550 >> -I've never seen code like this. 1514 01:08:44,550 --> 01:08:47,432 1515 01:08:47,432 --> 01:08:48,571 -Oh. 1516 01:08:48,571 --> 01:08:49,196 -Where's it go? 1517 01:08:49,196 --> 01:08:50,604 Abby? 1518 01:08:50,604 --> 01:08:51,520 -I didn't do anything. 1519 01:08:51,520 --> 01:08:53,020 I thought you did. 1520 01:08:53,020 --> 01:08:54,520 >> -No. 1521 01:08:54,520 --> 01:08:57,232 >> -I did. 1522 01:08:57,232 --> 01:08:58,307 >> [END PLAYBACK] 1523 01:08:58,307 --> 01:09:00,390 DAVID J. MALAN: The best part is two people typing 1524 01:09:00,390 --> 01:09:02,170 on the keyboard at the same time. 1525 01:09:02,170 --> 01:09:03,200 >> So that's it for CS50. 1526 01:09:03,200 --> 01:09:04,700 We'll stick around for office hours. 1527 01:09:04,700 --> 01:09:06,344 And we'll see you next time. 1528 01:09:06,344 --> 01:09:07,760 [MUSIC PLAYING - "SEINFELD THEME"] 1529 01:09:07,760 --> 01:09:11,139 1530 01:09:11,139 --> 01:09:11,680 This is CS50. 1531 01:09:11,680 --> 01:09:17,960 1532 01:09:17,960 --> 01:09:20,854 I don't want to be a pirate. 1533 01:09:20,854 --> 01:09:21,770 SPEAKER 2: Yarr David. 1534 01:09:21,770 --> 01:09:23,700 It is a fine doublet you be wearing. 1535 01:09:23,700 --> 01:09:26,450 Lot of luff in that puff. 1536 01:09:26,450 --> 01:09:29,327