1 00:00:00,000 --> 00:00:11,214 >> [MUSIC] 2 00:00:11,214 --> 00:00:11,661 >> DAVID J. MALAN: All right. 3 00:00:11,661 --> 00:00:15,400 So this is CS50 and this is the end of week 10. 4 00:00:15,400 --> 00:00:20,420 So some of you might have seen this already, but being circulated of late 5 00:00:20,420 --> 00:00:25,800 is an article that I thought I'd read an excerpt from and then show you a 6 00:00:25,800 --> 00:00:27,800 three minute video that paints the same picture. 7 00:00:27,800 --> 00:00:30,950 It was really a touching story, I thought, of this intersection of the 8 00:00:30,950 --> 00:00:35,210 real world with genuinely compelling uses of technology. 9 00:00:35,210 --> 00:00:39,785 >> So the article was entitled, "A boy oversleeps on train, uses Google Maps 10 00:00:39,785 --> 00:00:44,930 to find family 25 years later." And the first couple of paragraphs were, 11 00:00:44,930 --> 00:00:48,820 "When Saroo was five years old he went with his older brother to scrounge for 12 00:00:48,820 --> 00:00:51,830 change on a passenger train in a town about two hours 13 00:00:51,830 --> 00:00:53,510 from his small hometown. 14 00:00:53,510 --> 00:00:56,790 Saroo became tired and hopped on a nearby train where he thought his 15 00:00:56,790 --> 00:00:58,880 brother was, then fell asleep. 16 00:00:58,880 --> 00:01:03,360 When he woke up he was in Calcutta, nearly 900 miles away. 17 00:01:03,360 --> 00:01:05,770 Saroo tried to find his way back, but he didn't know 18 00:01:05,770 --> 00:01:07,260 the name of his hometown. 19 00:01:07,260 --> 00:01:11,430 And as a tiny illiterate boy in a vast city full of forgotten children he had 20 00:01:11,430 --> 00:01:13,520 virtually no chance of getting home. 21 00:01:13,520 --> 00:01:16,760 >> He was a street child for a while until a local adoption agency hooked 22 00:01:16,760 --> 00:01:18,840 him up with an Australian couple who brought him to 23 00:01:18,840 --> 00:01:20,600 live in Hobart, Tasmania. 24 00:01:20,600 --> 00:01:23,130 Saroo moved there, learned English, and grew up. 25 00:01:23,130 --> 00:01:27,450 But he never stopped looking for his family and his hometown. 26 00:01:27,450 --> 00:01:32,380 >> Decades later, he discovered Google Earth and followed rail tracks. 27 00:01:32,380 --> 00:01:36,140 And giving himself a prescribed radius based on how long he thought he was 28 00:01:36,140 --> 00:01:40,020 asleep and how fast he thought the train was going, he knew he'd grown up 29 00:01:40,020 --> 00:01:43,930 in a warm climate, he knew he spoke Hindi as a child, and he'd been told 30 00:01:43,930 --> 00:01:46,160 that he looked like he was from East India. 31 00:01:46,160 --> 00:01:49,650 >> Finally, after years of scouring the satellite photos, he 32 00:01:49,650 --> 00:01:51,340 recognized a few landmarks. 33 00:01:51,340 --> 00:01:54,180 And after chatting with an administrator of a nearby town's 34 00:01:54,180 --> 00:01:57,740 Facebook page, he realized he'd found home." 35 00:01:57,740 --> 00:02:03,770 >> So here then is the video telling that tale from his perspective. 36 00:02:03,770 --> 00:02:04,025 >> [VIDEO PLAYBACK] 37 00:02:04,025 --> 00:02:07,480 >> -It was 26 years ago and I was just about to turn five. 38 00:02:07,480 --> 00:02:10,539 We got to the train station and we boarded a train together. 39 00:02:10,539 --> 00:02:13,390 My brother just said I'll stay here and I'll come back. 40 00:02:13,390 --> 00:02:16,363 And I just thought, well, you know, I might as well just go to sleep and 41 00:02:16,363 --> 00:02:17,950 then he'll just wake me up. 42 00:02:17,950 --> 00:02:21,740 And when I wake up the next day, the whole carriage was empty on a runaway 43 00:02:21,740 --> 00:02:24,305 train, a ghost train taking me I don't know where. 44 00:02:24,305 --> 00:02:27,120 45 00:02:27,120 --> 00:02:31,660 >> I was adopted out to Australia to a Australian family. 46 00:02:31,660 --> 00:02:35,360 And Mom had decorated my room with the map of India, which she 47 00:02:35,360 --> 00:02:37,090 put next to my bedside. 48 00:02:37,090 --> 00:02:42,170 I woke up every morning seeing that map, and hence, it sort of kept the 49 00:02:42,170 --> 00:02:43,740 memories alive. 50 00:02:43,740 --> 00:02:46,475 >> People would say, you're trying to find a needle in a haystack. 51 00:02:46,475 --> 00:02:49,060 Saroo, you'll never find it. 52 00:02:49,060 --> 00:02:52,510 I'd have flashes of the places that I used to go, the flashes 53 00:02:52,510 --> 00:02:55,050 of my family's faces. 54 00:02:55,050 --> 00:02:59,200 There was the image of my mother sitting down with her legs crossed 55 00:02:59,200 --> 00:03:00,610 just watching her cry. 56 00:03:00,610 --> 00:03:03,340 Life is just so hard. 57 00:03:03,340 --> 00:03:06,002 That was my treasure. 58 00:03:06,002 --> 00:03:09,390 >> And I was looking in Google Map and realized there's Google Earth as well. 59 00:03:09,390 --> 00:03:13,560 In a world where you could zoom into I started to have all these thoughts and 60 00:03:13,560 --> 00:03:16,650 what possibilities that this could do for me. 61 00:03:16,650 --> 00:03:19,520 I said to myself, well, you know, you've got all the photographic 62 00:03:19,520 --> 00:03:22,340 memories and landmarks where you're from and you know what 63 00:03:22,340 --> 00:03:23,460 the town looks like. 64 00:03:23,460 --> 00:03:27,910 This could be an application that you can use to find your way back. 65 00:03:27,910 --> 00:03:32,750 >> I thought, well, I'll put a dot on Calcutta Train Station in a radius 66 00:03:32,750 --> 00:03:36,350 line that you should be searching around this area. 67 00:03:36,350 --> 00:03:38,850 I came across these train tracks. 68 00:03:38,850 --> 00:03:44,490 And I started following it and I came to a train station which reflected the 69 00:03:44,490 --> 00:03:48,260 same image that was in my memories. 70 00:03:48,260 --> 00:03:49,730 >> Everything matched. 71 00:03:49,730 --> 00:03:50,800 I just thought, yep. 72 00:03:50,800 --> 00:03:51,545 I know where I'm going. 73 00:03:51,545 --> 00:03:55,387 I'm just going to let the map that I have in my head to lead me and take me 74 00:03:55,387 --> 00:03:58,230 back to my hometown. 75 00:03:58,230 --> 00:04:02,290 >> I came to the doorstep of the house that I was born and walked around 76 00:04:02,290 --> 00:04:04,270 about fifteen meters around the corner. 77 00:04:04,270 --> 00:04:08,140 There was three ladies standing outside adjacent to each other. 78 00:04:08,140 --> 00:04:10,230 And the middle one stepped forward. 79 00:04:10,230 --> 00:04:12,910 And I just thought, this is your mother. 80 00:04:12,910 --> 00:04:18,590 She came forward, she hugged me, and we were there for about five minutes. 81 00:04:18,590 --> 00:04:21,670 82 00:04:21,670 --> 00:04:25,787 >> She grabbed my hand and she took me to the house and got on the phone and she 83 00:04:25,787 --> 00:04:31,110 rang my sister and my brother to say that your brother has just all of the 84 00:04:31,110 --> 00:04:34,480 sudden appeared like a ghost. 85 00:04:34,480 --> 00:04:37,590 >> And then the family was reunited again. 86 00:04:37,590 --> 00:04:38,570 Everything's all good. 87 00:04:38,570 --> 00:04:40,250 I help my mother out. 88 00:04:40,250 --> 00:04:42,240 She doesn't have to be slaving away. 89 00:04:42,240 --> 00:04:45,040 She can lead the rest of her life in peace. 90 00:04:45,040 --> 00:04:48,590 >> It was a needle in a haystack, but the needle was there. 91 00:04:48,590 --> 00:04:49,530 Everything's there. 92 00:04:49,530 --> 00:04:53,410 Everything we have in the world is the tap of a button. 93 00:04:53,410 --> 00:04:57,375 But you've got to have the will and the determination to wanting it. 94 00:04:57,375 --> 00:05:02,310 95 00:05:02,310 --> 00:05:02,780 >> [END VIDEO PLAYBACK] 96 00:05:02,780 --> 00:05:04,220 >> So a really sweet story. 97 00:05:04,220 --> 00:05:08,430 And it actually reminds me of quite a topic that's been getting quite a bit 98 00:05:08,430 --> 00:05:11,200 of attention of late in The Crimson, more nationally in general. 99 00:05:11,200 --> 00:05:13,620 Especially as MOOCs are taking the stage of late. 100 00:05:13,620 --> 00:05:17,370 MOOCs being these massive and open online courses of which CS50 is one. 101 00:05:17,370 --> 00:05:20,680 >> And people talking about how, for instance, the humanities aren't really 102 00:05:20,680 --> 00:05:23,900 catching up or aren't nearly as in vogue as they once were. 103 00:05:23,900 --> 00:05:26,680 And I would encourage you guys, much like Jonathan did on Monday, to think 104 00:05:26,680 --> 00:05:29,900 about as you exit 50, and we know already about 50% of you will not 105 00:05:29,900 --> 00:05:32,480 continue on to take another computer science course, and that's totally 106 00:05:32,480 --> 00:05:33,770 fine and expected. 107 00:05:33,770 --> 00:05:36,620 Because one of the overarching goals of a class like this is really to 108 00:05:36,620 --> 00:05:39,790 empower you guys with just an understanding of how all of this stuff 109 00:05:39,790 --> 00:05:41,760 works and how this world of technology works. 110 00:05:41,760 --> 00:05:45,400 >> So that when you are back in your own worlds, whether it's pre-med or 111 00:05:45,400 --> 00:05:48,270 whether it's the humanities or the social sciences or some other field 112 00:05:48,270 --> 00:05:51,830 altogether, that you guys are bringing some technical savvy to the table and 113 00:05:51,830 --> 00:05:54,770 helping to make smart decisions when it comes to the use of and 114 00:05:54,770 --> 00:05:57,530 introduction of technology into your world. 115 00:05:57,530 --> 00:06:00,410 >> For instance I was reminded of late too of two of the undergraduate 116 00:06:00,410 --> 00:06:04,410 classes I took two years ago, which were such simple uses of technology 117 00:06:04,410 --> 00:06:06,180 but ever so compelling. 118 00:06:06,180 --> 00:06:08,845 First Nights with Professor Tom Kelly if you've taken the class. 119 00:06:08,845 --> 00:06:11,640 It's a class on classical music on this stage here where you learn a 120 00:06:11,640 --> 00:06:13,190 little something about music. 121 00:06:13,190 --> 00:06:17,770 It's actually First Nights that CS50 borrowed the idea of tracks for those 122 00:06:17,770 --> 00:06:20,630 less comfortable in between and more comfortable. 123 00:06:20,630 --> 00:06:24,410 >> In my time they had different tracks for kids with absolutely no music 124 00:06:24,410 --> 00:06:27,300 experience like me, and then kids who had been performing since they were 125 00:06:27,300 --> 00:06:28,240 five years old. 126 00:06:28,240 --> 00:06:31,200 And that class, for instance, just had a website like most any other, but it 127 00:06:31,200 --> 00:06:34,210 was a website that allowed you to explore music on it and play back 128 00:06:34,210 --> 00:06:39,120 musical clips from class, from the web, and just use technology in a very 129 00:06:39,120 --> 00:06:40,210 seamless way. 130 00:06:40,210 --> 00:06:44,460 >> Another class years later that I audited, essentially, in grad school, 131 00:06:44,460 --> 00:06:47,430 Anthro 1010, Introduction to Archaeology here. 132 00:06:47,430 --> 00:06:48,190 It was amazing. 133 00:06:48,190 --> 00:06:52,715 And one of the most compelling yet super obvious, in retrospect, uses of 134 00:06:52,715 --> 00:06:56,000 software was that the professors in that class used Google Earth. 135 00:06:56,000 --> 00:06:58,250 We were sitting across the street in some lecture hall. 136 00:06:58,250 --> 00:07:01,240 And you couldn't travel, for instance, to the Middle East to the dig that one 137 00:07:01,240 --> 00:07:04,530 of the professors had just come back on, but we could do that virtually by 138 00:07:04,530 --> 00:07:07,870 flying around in Google Earth and looking at a bird's eye view at the 139 00:07:07,870 --> 00:07:10,360 dig site he had just returned from a week ago. 140 00:07:10,360 --> 00:07:12,630 >> So I would encourage you guys, especially in the humanities, to go 141 00:07:12,630 --> 00:07:16,260 back to those departments after this class bringing your final projects 142 00:07:16,260 --> 00:07:19,960 with you or ideas of your own, and see just what you can do to infuse your 143 00:07:19,960 --> 00:07:23,570 own fields in humanities or beyond with a little bit of this sort of 144 00:07:23,570 --> 00:07:26,770 thing that we've explored here in CS50. 145 00:07:26,770 --> 00:07:31,790 >> So with that picture painted, thought we'd try to tackle two things today. 146 00:07:31,790 --> 00:07:35,040 One, try to give you a sense of where you can go after 50. 147 00:07:35,040 --> 00:07:37,950 And in particular, if you choose to tackle a web based project as is 148 00:07:37,950 --> 00:07:42,580 incredibly common, how you can go about taking off all of CS50's 149 00:07:42,580 --> 00:07:45,810 training wheels and going out there on your own and not having to rely on a 150 00:07:45,810 --> 00:07:48,000 PDF or a specification of a pset? 151 00:07:48,000 --> 00:07:50,510 Not having to rely on a CS50 appliance anymore. 152 00:07:50,510 --> 00:07:52,780 But can really pull yourself up by your bootstraps. 153 00:07:52,780 --> 00:07:55,790 >> With that said, C-based final projects are welcome. 154 00:07:55,790 --> 00:07:58,020 Things that use the stand for a portable library in 155 00:07:58,020 --> 00:07:59,510 graphics are welcome. 156 00:07:59,510 --> 00:08:03,240 We just know that statistically a lot of people bite off projects in PHP and 157 00:08:03,240 --> 00:08:07,860 Python and Ruby and MySQL and other environments, so we'll bias some of 158 00:08:07,860 --> 00:08:09,570 our remarks toward that. 159 00:08:09,570 --> 00:08:10,650 >> But a quick look back. 160 00:08:10,650 --> 00:08:15,940 So we took for granted in pset7 the fact that $_SESSION existed. 161 00:08:15,940 --> 00:08:19,400 This was a super global, a global, associative array. 162 00:08:19,400 --> 00:08:23,040 And what does this let you do? 163 00:08:23,040 --> 00:08:27,130 Functionally, what's the feature this gives us? 164 00:08:27,130 --> 00:08:28,590 Yeah? 165 00:08:28,590 --> 00:08:30,270 To track the user's ID. 166 00:08:30,270 --> 00:08:31,660 And why is this useful? 167 00:08:31,660 --> 00:08:36,059 To be able to store inside of this super global JHarvard or [? Scroobs ?] 168 00:08:36,059 --> 00:08:41,880 or Malan's user ID when he or she visits a site. 169 00:08:41,880 --> 00:08:42,380 >> Exactly. 170 00:08:42,380 --> 00:08:44,049 So you don't have to log in again and again. 171 00:08:44,049 --> 00:08:47,170 It would be a really lame world wide web if every time you clicked a link 172 00:08:47,170 --> 00:08:50,780 on a site like Facebook or every time you clicked on an email in Gmail you 173 00:08:50,780 --> 00:08:54,060 had to re-authenticate to prove that it's still you and not your roommate 174 00:08:54,060 --> 00:08:56,700 who might have walked up to your computer in your absence. 175 00:08:56,700 --> 00:08:59,640 >> So we use SESSION to just remember who you are. 176 00:08:59,640 --> 00:09:01,830 And how is this implemented underneath the hood? 177 00:09:01,830 --> 00:09:07,720 How does a website that uses , the protocol that web browsers and servers 178 00:09:07,720 --> 00:09:12,060 speak, how does HTTP, which is a stateless protocol, let's say. 179 00:09:12,060 --> 00:09:15,510 >> And by stateless I mean, once you connect to a website, download some 180 00:09:15,510 --> 00:09:19,650 HTMLs, some JavaScript, some CSS, your browser's icon stops spinning. 181 00:09:19,650 --> 00:09:23,420 You don't have a constant connection to the server typically. 182 00:09:23,420 --> 00:09:24,170 That's it. 183 00:09:24,170 --> 00:09:26,290 There's no state maintained constantly. 184 00:09:26,290 --> 00:09:30,510 So how is SESSION implemented in such a way that every time you do visit a 185 00:09:30,510 --> 00:09:32,860 new page, the website remembers who you are? 186 00:09:32,860 --> 00:09:36,150 187 00:09:36,150 --> 00:09:38,195 What's the underlying implementation detail? 188 00:09:38,195 --> 00:09:40,810 189 00:09:40,810 --> 00:09:41,490 Shout it out. 190 00:09:41,490 --> 00:09:43,270 It's one word. 191 00:09:43,270 --> 00:09:43,640 >> Cookies. 192 00:09:43,640 --> 00:09:44,190 All right. 193 00:09:44,190 --> 00:09:44,800 So cookies. 194 00:09:44,800 --> 00:09:45,900 Well, how are cookies used? 195 00:09:45,900 --> 00:09:48,870 We'll recall that a cookie is generally just a piece of information. 196 00:09:48,870 --> 00:09:51,590 And it's often a big random number, but not always. 197 00:09:51,590 --> 00:09:55,420 And a cookie is planted on your hard drive or in your computer's RAM so 198 00:09:55,420 --> 00:09:59,070 that every time you revisit that same website, your browser reminds the 199 00:09:59,070 --> 00:10:01,650 server, I am user 1234567. 200 00:10:01,650 --> 00:10:03,570 I am user 1234567. 201 00:10:03,570 --> 00:10:07,590 >> And so long as the server has remembered that user 1234567 is 202 00:10:07,590 --> 00:10:11,300 JHarvard, the website will just assume that you are who you say you are. 203 00:10:11,300 --> 00:10:14,230 And recall that we present these cookies sort of in the form of a 204 00:10:14,230 --> 00:10:15,510 virtual hand stand. 205 00:10:15,510 --> 00:10:20,530 It's sent in the HTTP headers just to remind the server that you are who it 206 00:10:20,530 --> 00:10:21,620 thinks you are. 207 00:10:21,620 --> 00:10:23,320 >> Of course, there's a threat. 208 00:10:23,320 --> 00:10:27,530 What threat does this open us up to if we're essentially using sort of a club 209 00:10:27,530 --> 00:10:30,110 or an amusement park mechanism for remembering who we are? 210 00:10:30,110 --> 00:10:32,630 211 00:10:32,630 --> 00:10:36,170 >> If you copy someone's cookie and hijack their session, so to speak, you 212 00:10:36,170 --> 00:10:39,670 can pretend to be someone else and the website most likely is just going to 213 00:10:39,670 --> 00:10:40,150 believe you. 214 00:10:40,150 --> 00:10:41,030 So we'll come back to that. 215 00:10:41,030 --> 00:10:44,240 Because the other theme for today beyond empowerment is also talking 216 00:10:44,240 --> 00:10:48,170 about the very scary world we live in and just how much of what you do on 217 00:10:48,170 --> 00:10:51,480 the web, how much of what you do even on your cell phones today can be 218 00:10:51,480 --> 00:10:55,170 tracked really by anyone between you and point B. 219 00:10:55,170 --> 00:10:56,240 >> And Ajax, recall. 220 00:10:56,240 --> 00:10:58,740 We looked only briefly at this, although you've been using it 221 00:10:58,740 --> 00:11:02,660 indirectly in pset8 because you're using Google Maps and because you're 222 00:11:02,660 --> 00:11:03,830 using Google Earth. 223 00:11:03,830 --> 00:11:07,780 Google Maps and Google Earth don't download the entire world to your 224 00:11:07,780 --> 00:11:10,490 desktop, obviously, the moment you load pset8. 225 00:11:10,490 --> 00:11:15,020 It only downloads a square of the world or a bigger square of the earth. 226 00:11:15,020 --> 00:11:18,910 And then every time you sort of steer out of range you might notice-- 227 00:11:18,910 --> 00:11:21,790 especially if on a slow connection-- you might see some gray for a moment 228 00:11:21,790 --> 00:11:26,440 or a bit of fuzzy imagery as the computer downloads more such tiles, 229 00:11:26,440 --> 00:11:29,190 more such imagery from the world or the earth. 230 00:11:29,190 --> 00:11:34,620 >> And Ajax is generally the technique by which websites are doing that. 231 00:11:34,620 --> 00:11:39,250 Once you need more of the map, your browser is going to use Ajax, which is 232 00:11:39,250 --> 00:11:42,240 not itself a language or technology, it's just a technique. 233 00:11:42,240 --> 00:11:47,390 It's the use of JavaScript to go get more information from a server that 234 00:11:47,390 --> 00:11:52,320 allows your browser to go get what's to the east or what's to the west of 235 00:11:52,320 --> 00:11:55,110 what's otherwise currently being shown in that map. 236 00:11:55,110 --> 00:11:58,520 So this is a topic that many of you will encounter either directly or 237 00:11:58,520 --> 00:12:01,180 indirectly via final projects if you choose to make something that's 238 00:12:01,180 --> 00:12:05,020 similarly dynamic that's pulling data from some third party website. 239 00:12:05,020 --> 00:12:07,390 >> So we've got a really exciting next Wednesday ahead. 240 00:12:07,390 --> 00:12:12,280 Quiz one, the information for which is on CS50.net already. 241 00:12:12,280 --> 00:12:17,530 Know that there'll be a review session this coming Monday at 5:30. 242 00:12:17,530 --> 00:12:21,010 The date and time is already posted on CS50.net in that About sheet. 243 00:12:21,010 --> 00:12:22,940 And do let us know you have any questions. 244 00:12:22,940 --> 00:12:25,230 Pset8 meanwhile is already in your hands. 245 00:12:25,230 --> 00:12:29,210 >> And let me just address one FAQ to save folks some stress. 246 00:12:29,210 --> 00:12:32,530 For the most part a lot of the chatter we see at office hours and a lot of 247 00:12:32,530 --> 00:12:36,950 the bugs we see reported on Discuss are indeed bugs in a student's code. 248 00:12:36,950 --> 00:12:41,360 But when you've encountered something like the Google Earth plug-in crashing 249 00:12:41,360 --> 00:12:44,310 or not even working and you are confident it's not you, it's not a 250 00:12:44,310 --> 00:12:48,530 [? chamad ?] issue, it is not a bug you introduced into the 251 00:12:48,530 --> 00:12:49,820 distribution code. 252 00:12:49,820 --> 00:12:51,250 >> Realize just FYI-- 253 00:12:51,250 --> 00:12:53,130 this is sort of plan Z-- 254 00:12:53,130 --> 00:12:57,100 that the last time we used this problem set and we ran into similar 255 00:12:57,100 --> 00:13:01,520 issues, there's a line of code in service.js that essentially is this, 256 00:13:01,520 --> 00:13:03,580 that says, turn buildings on. 257 00:13:03,580 --> 00:13:07,100 And they work around the last time we did this in, again, corner cases where 258 00:13:07,100 --> 00:13:11,660 students just couldn't get the darn thing to work is change true to false 259 00:13:11,660 --> 00:13:12,940 in that one line of code. 260 00:13:12,940 --> 00:13:15,520 And you'll find it if you search through service.js. 261 00:13:15,520 --> 00:13:19,990 >> I don't recommend this because you will create the most barren landscape 262 00:13:19,990 --> 00:13:21,720 of Cambridge, Massachusetts. 263 00:13:21,720 --> 00:13:24,930 This will literally flatten your world so that all you see are the teaching 264 00:13:24,930 --> 00:13:28,610 fellows and course assistants on the horizon and no buildings. 265 00:13:28,610 --> 00:13:31,980 But realize for whatever reason the Google Earth plug-in seems still to be 266 00:13:31,980 --> 00:13:35,290 buggy a year later, so this might be your fail save. 267 00:13:35,290 --> 00:13:38,915 So rather than resort to tears, resort to turning buildings off if you know 268 00:13:38,915 --> 00:13:41,980 it's the plug-in that's not cooperating on your Mac or PC. 269 00:13:41,980 --> 00:13:46,060 But, this is again last resort if you're sure it's not a bug. 270 00:13:46,060 --> 00:13:46,890 >> So the Hackathon. 271 00:13:46,890 --> 00:13:48,950 A couple of teasers just to get you excited. 272 00:13:48,950 --> 00:13:50,640 We had quite a few RSVPs. 273 00:13:50,640 --> 00:13:54,230 And just to paint a picture of what awaits, I thought I'd give you a few 274 00:13:54,230 --> 00:13:56,858 seconds recall of this imagery from last year. 275 00:13:56,858 --> 00:14:00,850 >> [MUSIC] 276 00:14:00,850 --> 00:14:02,240 >> DAVID J. MALAN: Wait, oh. 277 00:14:02,240 --> 00:14:05,410 We even have our literal CS50 shuttles. 278 00:14:05,410 --> 00:14:17,920 >> [MUSIC] 279 00:14:17,920 --> 00:14:20,620 >> DAVID J. MALAN: So that's what awaits you in terms of the Hackathon. 280 00:14:20,620 --> 00:14:24,180 And this will be an opportunity, to be clear, not to start your final 281 00:14:24,180 --> 00:14:27,730 projects but to continue working on your final projects alongside 282 00:14:27,730 --> 00:14:30,210 classmates and staff and lots of food. 283 00:14:30,210 --> 00:14:34,340 And again, if you're awake at 5:00 AM we'll take you down the road to IHOP. 284 00:14:34,340 --> 00:14:37,075 >> The CS50 fair, meanwhile, is the climax for the entire class where 285 00:14:37,075 --> 00:14:41,160 you'll bring your laptops and friends, maybe even family to a room on campus 286 00:14:41,160 --> 00:14:44,530 down the street to exhibit your projects on laptops, on tall tables 287 00:14:44,530 --> 00:14:47,570 like this with lots of food and friends and music in the background, 288 00:14:47,570 --> 00:14:49,250 as well as our friends from industry. 289 00:14:49,250 --> 00:14:52,760 Companies like Facebook and Microsoft and Google and Amazon and bunches of 290 00:14:52,760 --> 00:14:55,750 others so that if interested in just hearing about the real world or 291 00:14:55,750 --> 00:14:59,570 chatting with folks about real world internship or full time opportunities, 292 00:14:59,570 --> 00:15:01,950 know that some of our friends from industry will be there. 293 00:15:01,950 --> 00:15:04,970 And a couple of pictures we can paint here are as follows. 294 00:15:04,970 --> 00:15:24,400 >> [MUSIC] 295 00:15:24,400 --> 00:15:24,920 >> DAVID J. MALAN: All right. 296 00:15:24,920 --> 00:15:27,060 So that then is the CS50 fair. 297 00:15:27,060 --> 00:15:31,780 So let's now proceed to tell a story that really will empower you hopefully 298 00:15:31,780 --> 00:15:33,230 for things like final projects. 299 00:15:33,230 --> 00:15:36,940 So one of few little things to seed your mind, either for final projects 300 00:15:36,940 --> 00:15:40,470 or just more generally for projects that you might decide to tackle after 301 00:15:40,470 --> 00:15:45,720 the course, these are all documented on manual.cs50.net where the CS50 302 00:15:45,720 --> 00:15:48,010 manual where we have lots of techniques documented. 303 00:15:48,010 --> 00:15:51,080 >> And this is just shorthand notation for saying that there exists in the 304 00:15:51,080 --> 00:15:55,190 world things called SMS to email gateways, which is a fancy way of 305 00:15:55,190 --> 00:15:58,180 saying, there's servers in the world that know how to convert emails to 306 00:15:58,180 --> 00:15:59,230 text messages. 307 00:15:59,230 --> 00:16:02,450 So if for your final project you want to create some sort of mobile themed 308 00:16:02,450 --> 00:16:06,650 service that allows you to alert friends or users to events on campus 309 00:16:06,650 --> 00:16:10,290 or what's being served in the D Hall that night or any such alert feature, 310 00:16:10,290 --> 00:16:15,150 know that it's simple as sending an email as with PHPMailer which you 311 00:16:15,150 --> 00:16:18,735 might have used for pset7 or we saw briefly a week or so ago, to 312 00:16:18,735 --> 00:16:20,440 addresses like this. 313 00:16:20,440 --> 00:16:26,040 >> And in fact you can text this assuming your friend has an unlimited texting 314 00:16:26,040 --> 00:16:28,310 plan and you don't want to charge them $0.10. 315 00:16:28,310 --> 00:16:31,920 But if you send an email to your friend who you know to have Verizon or 316 00:16:31,920 --> 00:16:35,870 AT&T using Gmail and just sending it to their phone number at whatever the 317 00:16:35,870 --> 00:16:38,980 sub domain there is, realize you will send a text message. 318 00:16:38,980 --> 00:16:41,570 >> But this is one of those things to be careful of. 319 00:16:41,570 --> 00:16:47,430 If you troll through last year's CS50 videos I think it was, a horrific, 320 00:16:47,430 --> 00:16:51,660 horrific, horrific bug I wrote in code ended up sending about 20,000 text 321 00:16:51,660 --> 00:16:55,410 messages live to our students in class. 322 00:16:55,410 --> 00:16:57,970 And only because someone noticed that they were getting multiple text 323 00:16:57,970 --> 00:17:01,860 messages from me did I have the wherewithal to hit Control C quickly 324 00:17:01,860 --> 00:17:03,210 and stop that process. 325 00:17:03,210 --> 00:17:06,200 Control C, you recall, is your friend in instances of infinite loop. 326 00:17:06,200 --> 00:17:10,900 So beware the power we have just given to you rather irresponsibly, most 327 00:17:10,900 --> 00:17:12,950 likely, based on my own experience. 328 00:17:12,950 --> 00:17:15,400 But that's on the web and has been there for some time. 329 00:17:15,400 --> 00:17:15,810 >> All right. 330 00:17:15,810 --> 00:17:17,064 So textmarks.com. 331 00:17:17,064 --> 00:17:18,040 So this is a website. 332 00:17:18,040 --> 00:17:20,829 And there's bunches of others out there as well that we've actually used 333 00:17:20,829 --> 00:17:24,050 as a class for years to be able to receive text messages. 334 00:17:24,050 --> 00:17:27,869 Unfortunately, sending text messages is easy as sending emails like that. 335 00:17:27,869 --> 00:17:30,730 Receiving's a little harder, especially if you want to have one of 336 00:17:30,730 --> 00:17:34,610 those sexy short codes that's only five or six digits long. 337 00:17:34,610 --> 00:17:37,720 >> So for instance, for years you've been able to send a text message-- and you 338 00:17:37,720 --> 00:17:39,200 can try this as well-- 339 00:17:39,200 --> 00:17:41,900 to 41411. 340 00:17:41,900 --> 00:17:44,300 And that's the phone number for this particular startup. 341 00:17:44,300 --> 00:17:48,130 And if you send a message to 41411-- 342 00:17:48,130 --> 00:17:51,190 I'll just write it up here, so 41411-- 343 00:17:51,190 --> 00:17:54,290 and then send them a message like SBOY for Shuttle Boy. 344 00:17:54,290 --> 00:17:56,370 And then type in something like mather quad. 345 00:17:56,370 --> 00:17:59,360 So you send that text message to that phone number. 346 00:17:59,360 --> 00:18:02,630 Within a few seconds you should get back a response from the CS50 Shuttle 347 00:18:02,630 --> 00:18:06,210 Boy service, which is the shuttle scheduling software that we've had out 348 00:18:06,210 --> 00:18:07,290 there on the web for some time. 349 00:18:07,290 --> 00:18:09,450 And it will respond to you via text message. 350 00:18:09,450 --> 00:18:13,410 >> Because what we have done as a class, as a programmer, is to write software, 351 00:18:13,410 --> 00:18:18,760 configured our free account with text marks to listen for text messages sent 352 00:18:18,760 --> 00:18:20,770 to SBOY at that number. 353 00:18:20,770 --> 00:18:25,210 And what they do is forward those text messages to our PHP-based website as 354 00:18:25,210 --> 00:18:27,420 HTTP parameters saying, here. 355 00:18:27,420 --> 00:18:30,380 This user with this phone number sent you this text message. 356 00:18:30,380 --> 00:18:31,850 Do with it what you want. 357 00:18:31,850 --> 00:18:35,180 >> So we wrote some software that upon receiving a string like SBOY mather 358 00:18:35,180 --> 00:18:38,420 quad, we parse it. 359 00:18:38,420 --> 00:18:41,210 We figure out where the spaces are between words. 360 00:18:41,210 --> 00:18:44,220 And we as a class decide how to respond to that. 361 00:18:44,220 --> 00:18:47,335 And if you try that now, for instance, you should see, via response within a 362 00:18:47,335 --> 00:18:51,470 few seconds, the next few shuttles going from mather to the quad if any. 363 00:18:51,470 --> 00:18:52,260 And there's other stops. 364 00:18:52,260 --> 00:18:56,060 You can type in Boylston or other such stops on campus, and it should 365 00:18:56,060 --> 00:18:57,760 recognize those words. 366 00:18:57,760 --> 00:18:58,590 >> So parse.com. 367 00:18:58,590 --> 00:19:01,630 This is another service that we've been pointing some students at for 368 00:19:01,630 --> 00:19:04,390 final projects that's wonderful in that it's free for a 369 00:19:04,390 --> 00:19:05,660 reasonable amount of usage. 370 00:19:05,660 --> 00:19:08,820 And if I go to parse.com you'll see that this is an alternative to 371 00:19:08,820 --> 00:19:13,230 actually having something like your own MySQL database. 372 00:19:13,230 --> 00:19:14,490 And frankly, it's just kind of mesmerizing. 373 00:19:14,490 --> 00:19:17,450 This is what's inside of the cloud even on a cloudy day. 374 00:19:17,450 --> 00:19:21,580 >> So parse.com allows you to do a bunch of interesting things. 375 00:19:21,580 --> 00:19:23,610 And there's other alternatives to this out there. 376 00:19:23,610 --> 00:19:26,870 For instance, you can use them as your back end database. 377 00:19:26,870 --> 00:19:28,980 So you don't need to have a web hosting company. 378 00:19:28,980 --> 00:19:31,180 You don't need to have a MySQL database. 379 00:19:31,180 --> 00:19:32,850 You can instead use their back end. 380 00:19:32,850 --> 00:19:36,350 >> If you're doing a mobile project for Android or iOS or the like, know that 381 00:19:36,350 --> 00:19:39,776 there exists things like push services so you can push alerts to your friends 382 00:19:39,776 --> 00:19:41,390 or your users' home screens. 383 00:19:41,390 --> 00:19:43,600 And then a bunch of other features as well. 384 00:19:43,600 --> 00:19:47,200 >> So if you have interest, check out these websites and websites like them 385 00:19:47,200 --> 00:19:50,720 to just see how many other peoples' shoulders you can stand on to make 386 00:19:50,720 --> 00:19:53,350 really cool software of your own. 387 00:19:53,350 --> 00:19:56,690 >> Now in terms of authentication, an FAQ, is how do you actually guarantee 388 00:19:56,690 --> 00:20:01,220 that your users are people on campus, Harvard students or faculty or staff? 389 00:20:01,220 --> 00:20:05,350 So CS50 has its own authentication service called CS50 ID. 390 00:20:05,350 --> 00:20:09,940 Go to that URL and you can restrict your website to anyone with a Harvard 391 00:20:09,940 --> 00:20:11,340 ID, for instance. 392 00:20:11,340 --> 00:20:12,550 So know that we can handle that. 393 00:20:12,550 --> 00:20:15,280 You guys should not be in the business of saying, what's your Harvard ID? 394 00:20:15,280 --> 00:20:16,160 What's your Harvard PIN? 395 00:20:16,160 --> 00:20:17,550 Let me now do something with it. 396 00:20:17,550 --> 00:20:18,740 We'll do all of that. 397 00:20:18,740 --> 00:20:21,710 And what we'll give you back is someone's name and email address, but 398 00:20:21,710 --> 00:20:23,010 not anything sensitive. 399 00:20:23,010 --> 00:20:26,240 400 00:20:26,240 --> 00:20:30,380 >> An app on a mobile device, it can be made to work on a mobile device, but 401 00:20:30,380 --> 00:20:32,630 it's not quite designed for that. 402 00:20:32,630 --> 00:20:35,640 So you'll end up spending a non trivial amount of time doing so. 403 00:20:35,640 --> 00:20:38,040 So I would discourage that route for now. 404 00:20:38,040 --> 00:20:41,570 This is really intended for web based applications. 405 00:20:41,570 --> 00:20:42,650 >> So web hosting. 406 00:20:42,650 --> 00:20:44,450 So if you haven't seen on the course's homepage-- 407 00:20:44,450 --> 00:20:46,610 and here's where we'll begin a story-- 408 00:20:46,610 --> 00:20:50,900 web hosting is all about paying for usually a service, host a server owned 409 00:20:50,900 --> 00:20:54,800 by someone else on the web that has an IP address, and you then put your 410 00:20:54,800 --> 00:20:55,880 website on it. 411 00:20:55,880 --> 00:20:58,620 And they usually give you email accounts and databases 412 00:20:58,620 --> 00:21:00,160 and other such features. 413 00:21:00,160 --> 00:21:02,930 >> Know that if you don't want to actually pay for such, go to that URL 414 00:21:02,930 --> 00:21:06,280 there and CS50 actually has a non-profit account that you can use to 415 00:21:06,280 --> 00:21:11,490 actually have not http://project inside of the appliance 416 00:21:11,490 --> 00:21:12,470 for your final project. 417 00:21:12,470 --> 00:21:16,465 If you actually want it to be something like, isawyouharvard.com, 418 00:21:16,465 --> 00:21:19,730 you can buy that domain name-- although not that particular one-- and 419 00:21:19,730 --> 00:21:24,070 then you can go about hosting it on a public web server like we can offer 420 00:21:24,070 --> 00:21:25,170 you guys through here. 421 00:21:25,170 --> 00:21:27,240 >> And in fact if unfamiliar, if you've never been to 422 00:21:27,240 --> 00:21:30,590 isawyouharvard.com, one, go there. 423 00:21:30,590 --> 00:21:37,310 But two, know that that was a young woman's name by Tej To Toor Too two 424 00:21:37,310 --> 00:21:41,550 years ago, three years ago, who was a CS50 alumni who happened a day or two 425 00:21:41,550 --> 00:21:46,280 before the CS50 fair sent out an email to her house mailing list and voila. 426 00:21:46,280 --> 00:21:49,770 Two days later by the CS50 fair, she had hundreds of users all creeping on 427 00:21:49,770 --> 00:21:53,240 each other on her website and saying how they had seen 428 00:21:53,240 --> 00:21:55,250 her or him on campus. 429 00:21:55,250 --> 00:21:57,600 So that's one of CS50's favorite success stories from 430 00:21:57,600 --> 00:21:59,650 a CS50 final project. 431 00:21:59,650 --> 00:22:04,090 >> So how do you go about putting a website like that on the internet? 432 00:22:04,090 --> 00:22:07,140 Well, there's a few such ingredients here. 433 00:22:07,140 --> 00:22:09,310 So one, you have to buy a domain name. 434 00:22:09,310 --> 00:22:12,440 There are bunches of places in the world from which you can 435 00:22:12,440 --> 00:22:13,940 buy a domain name. 436 00:22:13,940 --> 00:22:16,660 And for instance, one that we recommend only because it's popular 437 00:22:16,660 --> 00:22:18,855 and it's cheap is called namecheap.com. 438 00:22:18,855 --> 00:22:22,860 But you can go godaddy.com and dozens of others out there. 439 00:22:22,860 --> 00:22:24,420 You can read up on reviews. 440 00:22:24,420 --> 00:22:26,250 >> But for the most part it doesn't matter from whom you 441 00:22:26,250 --> 00:22:27,720 buy a domain name. 442 00:22:27,720 --> 00:22:30,780 And they vary in price and they vary in suffix. 443 00:22:30,780 --> 00:22:37,140 The suffixes like .com, .net, .org, .io, .tv, those 444 00:22:37,140 --> 00:22:38,650 actually vary in price. 445 00:22:38,650 --> 00:22:43,630 But if we wanted to do something like cats.com we can go to this website, 446 00:22:43,630 --> 00:22:44,280 click Search. 447 00:22:44,280 --> 00:22:46,370 Presumably this one is taken. 448 00:22:46,370 --> 00:22:50,170 But apparently, catsagainst.com is available. 449 00:22:50,170 --> 00:22:52,100 pluscats.com is available. 450 00:22:52,100 --> 00:22:53,780 Lovecats, catscorner, dampcats.net. 451 00:22:53,780 --> 00:22:56,320 452 00:22:56,320 --> 00:22:59,135 All of this hopefully pseudo randomly generated. 453 00:22:59,135 --> 00:23:04,670 If you want cats.pw, $1,500 only, which is a bit insane. 454 00:23:04,670 --> 00:23:08,100 So someone has really snatched up all the cat related domain names here for 455 00:23:08,100 --> 00:23:09,840 varying prices. 456 00:23:09,840 --> 00:23:12,360 >> As an aside, let's see. 457 00:23:12,360 --> 00:23:13,710 Who has cats.com? 458 00:23:13,710 --> 00:23:16,290 Know that you guys have at your disposal fairly 459 00:23:16,290 --> 00:23:17,540 sophisticated commands now. 460 00:23:17,540 --> 00:23:20,592 Like I can type literally who is cats.com? 461 00:23:20,592 --> 00:23:23,730 And because of the way the internet is structured you can actually see who 462 00:23:23,730 --> 00:23:25,440 has registered this. 463 00:23:25,440 --> 00:23:30,240 Apparently this person is [INAUDIBLE] using a proxy service. 464 00:23:30,240 --> 00:23:33,900 So whoever owns cats.com doesn't want the world to know who they are. 465 00:23:33,900 --> 00:23:36,610 So they've registered if through some random privacy service. 466 00:23:36,610 --> 00:23:39,100 But sometimes you actually get actual owners. 467 00:23:39,100 --> 00:23:41,420 >> And this is to say, especially if you're pursuing some startup and you 468 00:23:41,420 --> 00:23:44,640 really want some domain name and you're willing to pay someone else for 469 00:23:44,640 --> 00:23:48,050 it, you can figure out contact information in that way. 470 00:23:48,050 --> 00:23:49,940 >> But also interesting is this. 471 00:23:49,940 --> 00:23:53,380 Let me scroll up to this portion. 472 00:23:53,380 --> 00:23:55,330 So this is that same output. 473 00:23:55,330 --> 00:23:56,990 And this is just tacky. 474 00:23:56,990 --> 00:24:00,740 So apparently cats.com can be yours for the right price. 475 00:24:00,740 --> 00:24:03,170 But what's interesting here is that the name servers-- 476 00:24:03,170 --> 00:24:06,040 this is total abuse of what a name server's supposed to be-- your name 477 00:24:06,040 --> 00:24:08,876 server is not supposed to be thisdomainforsale.com. 478 00:24:08,876 --> 00:24:11,050 If we actually choose something like-- 479 00:24:11,050 --> 00:24:15,181 let's choose something a little more legitimate like, who is google.com, 480 00:24:15,181 --> 00:24:17,030 and scroll up here. 481 00:24:17,030 --> 00:24:18,280 So here-- 482 00:24:18,280 --> 00:24:20,600 483 00:24:20,600 --> 00:24:21,740 what happened there? 484 00:24:21,740 --> 00:24:22,480 Interesting. 485 00:24:22,480 --> 00:24:25,290 Beyond who is-- 486 00:24:25,290 --> 00:24:26,610 let's keep it more low key. 487 00:24:26,610 --> 00:24:28,370 >> Who is mit.edu? 488 00:24:28,370 --> 00:24:28,810 OK. 489 00:24:28,810 --> 00:24:29,900 This is helpful. 490 00:24:29,900 --> 00:24:31,400 So this is what I was hoping for. 491 00:24:31,400 --> 00:24:33,930 Legitimate use of the DNS service. 492 00:24:33,930 --> 00:24:36,750 Name servers here indicate the following. 493 00:24:36,750 --> 00:24:40,880 This is MIT's way of saying, whenever someone in the world, wherever they 494 00:24:40,880 --> 00:24:46,950 are, types in mit.edu and hits Enter, your laptop, whether Mac or PC, will 495 00:24:46,950 --> 00:24:51,830 somehow eventually figure out that the people in the world that know what the 496 00:24:51,830 --> 00:24:58,130 IP address is for mit.edu or any of the sub domains at mit.edu or any of 497 00:24:58,130 --> 00:25:01,660 these servers here-- and it actually looks like MITs infrastructure is 498 00:25:01,660 --> 00:25:03,370 pretty robust as you would expect. 499 00:25:03,370 --> 00:25:07,050 They have multiple names servers which is good for redundancy. 500 00:25:07,050 --> 00:25:09,840 And in fact, they seem to be globally distributed across the world. 501 00:25:09,840 --> 00:25:13,250 A bunch of those seem to be in the US, a couple in Asia, one in Europe, two 502 00:25:13,250 --> 00:25:14,540 in somewhere else. 503 00:25:14,540 --> 00:25:18,000 >> But the point here is that DNS that we've been taking for granted and 504 00:25:18,000 --> 00:25:21,990 generally described as a big Excel table that has IP addresses and domain 505 00:25:21,990 --> 00:25:25,890 names is actually fairly sophisticated hierarchical service so that in the 506 00:25:25,890 --> 00:25:29,170 world there's actually a finite number of servers that essentially know where 507 00:25:29,170 --> 00:25:32,880 all of the .coms are or all of the .nets are, all of the 508 00:25:32,880 --> 00:25:34,650 .orgs are, and so forth. 509 00:25:34,650 --> 00:25:37,820 >> So when you go ahead and buy a domain name from a place like Name Cheap or 510 00:25:37,820 --> 00:25:41,450 Go Daddy or any other website, one of the key steps that you'll have to do 511 00:25:41,450 --> 00:25:45,180 you, if you do this even for your final project, is tell the registrar 512 00:25:45,180 --> 00:25:49,020 from whom you're buying the domain name, who in the world knows your 513 00:25:49,020 --> 00:25:52,310 website's IP addresses, who your name servers are. 514 00:25:52,310 --> 00:25:55,750 >> So if you use, for instance CS50's hosting account-- we happen to have 515 00:25:55,750 --> 00:25:57,760 this account through dreamhost.com which is a 516 00:25:57,760 --> 00:25:59,560 popular web hosting company-- 517 00:25:59,560 --> 00:26:03,530 they will tell you that you should buy your domain and tell the world that 518 00:26:03,530 --> 00:26:09,410 your domain's name server is ns1.dreamhost.com, ns2.dreamhost.com, 519 00:26:09,410 --> 00:26:11,470 and ns3.dreamhost.com. 520 00:26:11,470 --> 00:26:12,600 >> But that's it. 521 00:26:12,600 --> 00:26:15,480 Buying a domain name means giving them the money and getting ownership of the 522 00:26:15,480 --> 00:26:17,190 domain, but it's more like a rental though. 523 00:26:17,190 --> 00:26:20,060 You get it for a year and then they bill you recurringly for the rest of 524 00:26:20,060 --> 00:26:22,130 your life until you cancel the domain name. 525 00:26:22,130 --> 00:26:24,510 And then you tell them who the name servers are. 526 00:26:24,510 --> 00:26:26,190 But then you're done with your registrar. 527 00:26:26,190 --> 00:26:30,130 And from there you'll interact only with your web hosting company, which 528 00:26:30,130 --> 00:26:32,030 in CS50's case will be DreamHost. 529 00:26:32,030 --> 00:26:36,080 But again, more documentation will be provided to you if you decide to go 530 00:26:36,080 --> 00:26:37,170 that route. 531 00:26:37,170 --> 00:26:40,750 >> So if you do this after the course's end, simply googling web hosting 532 00:26:40,750 --> 00:26:42,830 company will turn up thousands of options. 533 00:26:42,830 --> 00:26:45,720 And I would generally encourage you to ask friends who might have used a 534 00:26:45,720 --> 00:26:49,350 company before if they recommend them and had a good experience. 535 00:26:49,350 --> 00:26:52,680 >> Because there's a lot of fly by night web hosting companies, like a guy in 536 00:26:52,680 --> 00:26:55,220 his basement with a server that has an IP address. 537 00:26:55,220 --> 00:26:58,980 He has some extra RAM and hard disk space and just sells web hosting 538 00:26:58,980 --> 00:27:02,380 accounts even though there's no way that server could handle hundreds of 539 00:27:02,380 --> 00:27:04,050 users or thousands of users. 540 00:27:04,050 --> 00:27:06,260 So realize you will get what you pay for. 541 00:27:06,260 --> 00:27:09,510 >> For quite a while for my personal home page-- and this was totally acceptable 542 00:27:09,510 --> 00:27:11,830 because I had, like, two visitors a month-- 543 00:27:11,830 --> 00:27:14,990 I was paying, like, $2.95 a month. 544 00:27:14,990 --> 00:27:17,230 And I'm pretty sure it was in someone's basement. 545 00:27:17,230 --> 00:27:20,800 But again, you don't get necessarily any guarantees of uptime or 546 00:27:20,800 --> 00:27:21,840 scalability. 547 00:27:21,840 --> 00:27:24,560 So again, you're typically looking at something more than that. 548 00:27:24,560 --> 00:27:26,220 >> Well, what about SSL? 549 00:27:26,220 --> 00:27:27,690 So what's SSL used for? 550 00:27:27,690 --> 00:27:30,320 Let's now start to steer in the directions of security and things that 551 00:27:30,320 --> 00:27:32,330 can harm us. 552 00:27:32,330 --> 00:27:36,890 Especially as you venture out on your own. 553 00:27:36,890 --> 00:27:41,650 >> What's SSL, or what's SSL used for? 554 00:27:41,650 --> 00:27:42,660 Security, OK. 555 00:27:42,660 --> 00:27:44,000 So it's used for security. 556 00:27:44,000 --> 00:27:44,640 What does that mean? 557 00:27:44,640 --> 00:27:47,170 So it stands for Secure Sockets Layer. 558 00:27:47,170 --> 00:27:52,330 And it is indicated by a URL that starts with https://. 559 00:27:52,330 --> 00:27:58,410 Many of us have probably never typed https://, but you'll often find that 560 00:27:58,410 --> 00:28:03,000 your browser is redirected from HTTP to HTTPS so that everything is there 561 00:28:03,000 --> 00:28:04,260 after encrypted. 562 00:28:04,260 --> 00:28:10,810 >> FYI, using SSL requires typically that you have a unique IP address. 563 00:28:10,810 --> 00:28:13,940 And typically to get a unique IP address you need to pay a web hosting 564 00:28:13,940 --> 00:28:15,850 company a few dollars more per month. 565 00:28:15,850 --> 00:28:19,850 So realize this is very easily implemented these days by buying an IP 566 00:28:19,850 --> 00:28:22,930 address and by buying what's called an SSL certificate. 567 00:28:22,930 --> 00:28:26,520 But realize that it does come at some additional cost. 568 00:28:26,520 --> 00:28:30,880 And, as we'll try to scare in just a bit, it's not even necessarily 100% 569 00:28:30,880 --> 00:28:34,040 protective of whatever it is you're trying to protect. 570 00:28:34,040 --> 00:28:38,620 >> So for security, I'd thought I'd do sort of a random segue here. 571 00:28:38,620 --> 00:28:42,820 As you might know from CS50's lecture videos, our production team has been a 572 00:28:42,820 --> 00:28:46,770 fan as I have of taking really nice photography of campus, and aerial 573 00:28:46,770 --> 00:28:48,370 photography most recently. 574 00:28:48,370 --> 00:28:51,450 If you ever look up and you see something flying with a little camera, 575 00:28:51,450 --> 00:28:53,410 it may actually be CS50. 576 00:28:53,410 --> 00:28:55,830 And I just thought I'd share minute of some of the footage the team has 577 00:28:55,830 --> 00:28:59,450 gathered, particularly as we look to the spring semester and next fall. 578 00:28:59,450 --> 00:29:03,320 If any of you have a knack for photography, videography, we would 579 00:29:03,320 --> 00:29:05,570 love to get you involved behind the scenes. 580 00:29:05,570 --> 00:29:07,595 But more on those details in a week. 581 00:29:07,595 --> 00:29:18,560 >> [MUSIC] 582 00:29:18,560 --> 00:29:20,750 >> DAVID J. MALAN: Turns out there's a miniature golf course on the top of 583 00:29:20,750 --> 00:29:22,754 the stadium that we never knew about. 584 00:29:22,754 --> 00:30:06,150 >> [MUSIC] 585 00:30:06,150 --> 00:30:08,440 >> DAVID J. MALAN: You can see the outline of the drone there. 586 00:30:08,440 --> 00:30:24,160 >> [MUSIC] 587 00:30:24,160 --> 00:30:26,280 >> DAVID J. MALAN: The best part here is, watch the jogger on the left. 588 00:30:26,280 --> 00:30:52,900 >> [MUSIC] 589 00:30:52,900 --> 00:30:56,920 >> DAVID J. MALAN: Another example of what you can do with technology that's 590 00:30:56,920 --> 00:30:58,900 only tangentially, frankly, related to security. 591 00:30:58,900 --> 00:31:01,710 But I thought that would be a more fun way of just saying, security. 592 00:31:01,710 --> 00:31:07,780 So let's see if we can't scare you guys now with not only a bit of a few 593 00:31:07,780 --> 00:31:10,590 threats, but also an underlying understanding of what these threats 594 00:31:10,590 --> 00:31:13,830 are so that moving forward you can decide how and whether to defend 595 00:31:13,830 --> 00:31:17,290 yourself against these things and at least to be mindful of them as you 596 00:31:17,290 --> 00:31:20,530 make decisions as to whether or not to send that email, whether or not to log 597 00:31:20,530 --> 00:31:24,920 into that website, whether or not to use that cyber cafe's Wi-Fi access 598 00:31:24,920 --> 00:31:28,210 point so that you know what the threats are indeed around you. 599 00:31:28,210 --> 00:31:30,990 >> So Jonathan referred to something like this on Monday. 600 00:31:30,990 --> 00:31:32,220 He had a window screen shot. 601 00:31:32,220 --> 00:31:33,630 This one is of a Mac. 602 00:31:33,630 --> 00:31:36,850 How many of you have ever installed software on your Mac or PC? 603 00:31:36,850 --> 00:31:38,420 Obviously everyone. 604 00:31:38,420 --> 00:31:41,590 How many of you have given much thought to typing in your password 605 00:31:41,590 --> 00:31:43,030 when prompted? 606 00:31:43,030 --> 00:31:44,740 I mean, even I don't, frankly. 607 00:31:44,740 --> 00:31:48,730 So a couple of us are good at being paranoid. 608 00:31:48,730 --> 00:31:50,490 But consider what you're actually doing here. 609 00:31:50,490 --> 00:31:53,280 >> On a typical Mac or PC you have an administrator account. 610 00:31:53,280 --> 00:31:56,450 And typically you're the only one using a laptop at least these days. 611 00:31:56,450 --> 00:31:59,780 So your account, Malan or JHarvard or whatever it is, is the 612 00:31:59,780 --> 00:32:00,830 administrator account. 613 00:32:00,830 --> 00:32:03,530 And what that means is you have root access to your computer. 614 00:32:03,530 --> 00:32:06,180 You can install anything you want, delete anything you want. 615 00:32:06,180 --> 00:32:10,800 >> And typically these days, because of dated design decisions from years ago, 616 00:32:10,800 --> 00:32:14,560 the way most software gets installed is as an administrator. 617 00:32:14,560 --> 00:32:18,180 And even if your Mac or PC has at least gotten smart enough over the 618 00:32:18,180 --> 00:32:22,010 years with the latest incarnations of Mac OS and Windows to not run your 619 00:32:22,010 --> 00:32:26,130 username by default as the administrator, when you download some 620 00:32:26,130 --> 00:32:29,160 new program off the internet and try to install it, you're probably going 621 00:32:29,160 --> 00:32:30,880 to be prompted for your password. 622 00:32:30,880 --> 00:32:34,790 But the catch is at that point, you're literally handing the keys of your 623 00:32:34,790 --> 00:32:38,620 computer over to whatever random program you just downloaded and 624 00:32:38,620 --> 00:32:41,590 allowing it to install whatever it wants. 625 00:32:41,590 --> 00:32:45,050 >> And as Jonathan alluded to, realize that it might say that it wants to 626 00:32:45,050 --> 00:32:49,350 install your software that you care about, Spotify or iTunes or whatever 627 00:32:49,350 --> 00:32:50,900 it is you're trying to install. 628 00:32:50,900 --> 00:32:54,710 But you're literally trusting the author or authors of the software to 629 00:32:54,710 --> 00:32:57,570 only do what the program is supposed to do. 630 00:32:57,570 --> 00:33:02,320 >> But there is absolutely nothing stopping most programs on most 631 00:33:02,320 --> 00:33:06,910 operating systems from deleting files, from uploading them to some company's 632 00:33:06,910 --> 00:33:10,040 website, from trolling around, for encrypting things. 633 00:33:10,040 --> 00:33:12,970 And again, we've sort of built an entire infrastructure over 634 00:33:12,970 --> 00:33:14,930 the years on trust. 635 00:33:14,930 --> 00:33:18,690 And so realize that you've just been trusting random people and random 636 00:33:18,690 --> 00:33:20,050 companies for the most part. 637 00:33:20,050 --> 00:33:24,860 >> And Jonathan alluded to too, sometimes those companies themselves are sort of 638 00:33:24,860 --> 00:33:26,410 knowingly malicious, all right? 639 00:33:26,410 --> 00:33:30,200 Sony caught a lot of flack a few years ago for installing what was called a 640 00:33:30,200 --> 00:33:33,220 rootkit kit on people's computers without their knowledge. 641 00:33:33,220 --> 00:33:36,570 And the gist of this was that when you bought a CD for instance that they 642 00:33:36,570 --> 00:33:40,050 didn't want you to be able to copy or rip the music off of, the CD would 643 00:33:40,050 --> 00:33:42,600 install, without your knowing, a rootkit on your computer. 644 00:33:42,600 --> 00:33:46,020 Rootkit just meaning software that runs as administrator that potentially 645 00:33:46,020 --> 00:33:47,260 does bad things. 646 00:33:47,260 --> 00:33:50,780 >> But among the things this thing did was it hid itself. 647 00:33:50,780 --> 00:33:53,660 So some of you might be pretty savvy with your computer and know, well, I 648 00:33:53,660 --> 00:33:57,310 can just open the Task Manager or the Activity Monitor and I can look at all 649 00:33:57,310 --> 00:33:59,150 of the arcanely named programs that are running. 650 00:33:59,150 --> 00:34:01,760 And if anything looks suspicious I'll just kill it or delete it. 651 00:34:01,760 --> 00:34:02,980 But that's what the rootkit did. 652 00:34:02,980 --> 00:34:07,070 It essentially said, if running Task Manager, don't show yourself. 653 00:34:07,070 --> 00:34:08,500 >> So the software was there. 654 00:34:08,500 --> 00:34:12,710 And only if you really, really looked hard could you even find it. 655 00:34:12,710 --> 00:34:15,670 And this was done in the name of copy protection. 656 00:34:15,670 --> 00:34:18,230 But just imagine what could have been done otherwise. 657 00:34:18,230 --> 00:34:19,699 >> Now in terms of protecting yourself. 658 00:34:19,699 --> 00:34:22,190 A lot of websites are wonderfully gracious in that they put these 659 00:34:22,190 --> 00:34:26,480 padlock icons on their homepage which means that the website is secure. 660 00:34:26,480 --> 00:34:28,870 This is from bankofamerica.com this morning. 661 00:34:28,870 --> 00:34:32,239 So what does that little padlock icon there mean next to the Sign In button? 662 00:34:32,239 --> 00:34:35,699 663 00:34:35,699 --> 00:34:36,790 >> Absolutely nothing. 664 00:34:36,790 --> 00:34:39,560 It means someone knows how to use Photoshop to make a picture of a 665 00:34:39,560 --> 00:34:40,590 padlock icon. 666 00:34:40,590 --> 00:34:44,449 Like quite literally, the fact that it's there is meant to be a positive 667 00:34:44,449 --> 00:34:46,880 signal to the user like, ooh, secure website. 668 00:34:46,880 --> 00:34:50,449 I should trust this website and now type in my username and password. 669 00:34:50,449 --> 00:34:53,870 And this has been conventional for years, as recently as this morning. 670 00:34:53,870 --> 00:34:56,949 >> But consider the habits that this is getting us into. 671 00:34:56,949 --> 00:35:00,600 Consider the implicit message that all of these banks in this case have been 672 00:35:00,600 --> 00:35:01,830 sending us for years. 673 00:35:01,830 --> 00:35:05,160 If you see padlock, then secure. 674 00:35:05,160 --> 00:35:05,340 All right? 675 00:35:05,340 --> 00:35:10,520 >> So how can you abuse that system of trust if you're the bad guy? 676 00:35:10,520 --> 00:35:14,100 Put a padlock on your website, and logically, the users have been 677 00:35:14,100 --> 00:35:17,260 conditioned for years to assume padlock means secure. 678 00:35:17,260 --> 00:35:19,310 And it might actually be secure. 679 00:35:19,310 --> 00:35:24,810 You might have a wonderfully secure SSL HTTPS connection to a 680 00:35:24,810 --> 00:35:26,452 fake website .com. 681 00:35:26,452 --> 00:35:30,150 And no one else in the world can see that you're about to hand him or her 682 00:35:30,150 --> 00:35:32,790 your username and password to your account. 683 00:35:32,790 --> 00:35:35,110 >> This though, perhaps, is a little more reassuring. 684 00:35:35,110 --> 00:35:38,600 So this is a screen shot of the top of my browser this morning at 685 00:35:38,600 --> 00:35:39,910 bankofamerica.com. 686 00:35:39,910 --> 00:35:43,270 And notice here too we have a padlock icon. 687 00:35:43,270 --> 00:35:48,040 What does it mean in this context in Chrome at least? 688 00:35:48,040 --> 00:35:49,520 >> So this is now using SSL. 689 00:35:49,520 --> 00:35:51,220 So this is actually a better thing. 690 00:35:51,220 --> 00:35:54,250 And the fact that Chrome is making it green is meant to draw our attention 691 00:35:54,250 --> 00:35:56,750 to the fact that this is not only over SSL. 692 00:35:56,750 --> 00:36:01,400 This is a company that someone out there has verified is actually 693 00:36:01,400 --> 00:36:02,520 bankofamerica.com. 694 00:36:02,520 --> 00:36:05,970 And that means that Bank of America, when buying their so-called SSL 695 00:36:05,970 --> 00:36:09,680 certificate, essentially big random, somewhat random numbers that implement 696 00:36:09,680 --> 00:36:14,710 security for them, they have been verified by some independent third 697 00:36:14,710 --> 00:36:15,570 party that says, yep. 698 00:36:15,570 --> 00:36:19,240 This is actually the CEO of Bank of America trying to buy the certificate. 699 00:36:19,240 --> 00:36:23,290 Chrome will therefore trust that certification authority and say in 700 00:36:23,290 --> 00:36:25,265 green, this is bankofamerica.com. 701 00:36:25,265 --> 00:36:27,997 And Bank of America just pays a few hundred dollars for that or a few 702 00:36:27,997 --> 00:36:30,800 thousand as opposed to a few tens of dollars. 703 00:36:30,800 --> 00:36:34,940 >> But here too, how many of you have ever behaved any differently because 704 00:36:34,940 --> 00:36:38,576 the URL in your browser is green instead of black? 705 00:36:38,576 --> 00:36:39,900 Right? 706 00:36:39,900 --> 00:36:40,600 So a couple of us. 707 00:36:40,600 --> 00:36:42,115 And that's good to be paranoid. 708 00:36:42,115 --> 00:36:45,910 But even then, those of you who even notice these things, do you actually 709 00:36:45,910 --> 00:36:50,720 stop logging into an otherwise secure website if the URL is not green? 710 00:36:50,720 --> 00:36:53,380 All right, so probably not, right? 711 00:36:53,380 --> 00:36:56,740 At least most of us, if it's not green, most likely you're just going 712 00:36:56,740 --> 00:36:57,440 to be like, whatever. 713 00:36:57,440 --> 00:36:58,950 Like, I want to log into this website. 714 00:36:58,950 --> 00:37:00,200 That's why I'm here. 715 00:37:00,200 --> 00:37:02,390 I'm going to log in nonetheless. 716 00:37:02,390 --> 00:37:04,500 >> As an aside, Chrome is a little better about this. 717 00:37:04,500 --> 00:37:07,990 But there's a lot of browsers like Firefox for instance, at least for 718 00:37:07,990 --> 00:37:12,190 some time, where that padlock icon is, you can actually put any 719 00:37:12,190 --> 00:37:13,250 icon of your own. 720 00:37:13,250 --> 00:37:17,480 Let me see what the latest version of Firefox looks like. 721 00:37:17,480 --> 00:37:20,040 So if we go to CS50.net. 722 00:37:20,040 --> 00:37:21,580 >> OK, so they've gotten better as well. 723 00:37:21,580 --> 00:37:24,970 What the browsers used to do is like, here's for instance [? SAAS's ?] 724 00:37:24,970 --> 00:37:25,790 crest up here. 725 00:37:25,790 --> 00:37:29,240 That's the so-called favorite icon for a website. 726 00:37:29,240 --> 00:37:30,190 Years ago-- 727 00:37:30,190 --> 00:37:34,720 actually not that long ago-- that little shield would have been right 728 00:37:34,720 --> 00:37:36,560 here next to the URL. 729 00:37:36,560 --> 00:37:40,300 Because some genius decided that it would just look pretty classy to have 730 00:37:40,300 --> 00:37:43,150 your graphical logo right next to your URL. 731 00:37:43,150 --> 00:37:45,310 And design wise, that actually is pretty compelling. 732 00:37:45,310 --> 00:37:47,240 >> So what did bad guy start doing? 733 00:37:47,240 --> 00:37:50,500 They started changing their favorite icons, or their default icon for a 734 00:37:50,500 --> 00:37:55,250 homepage to be not a crest but a padlock, which had 735 00:37:55,250 --> 00:37:56,600 absolutely no meaning. 736 00:37:56,600 --> 00:37:59,760 Other than their favorite icon was a padlock it had no 737 00:37:59,760 --> 00:38:01,250 indications of security. 738 00:38:01,250 --> 00:38:04,040 >> So the lessons here are a couple I think. 739 00:38:04,040 --> 00:38:07,820 One is that there are actually some well intentioned mechanisms for 740 00:38:07,820 --> 00:38:12,850 teaching us users about security even if you weren't even aware what green 741 00:38:12,850 --> 00:38:15,110 meant or what even HTTPS meant. 742 00:38:15,110 --> 00:38:19,130 But if those mechanisms get us into the bad habit of trusting websites 743 00:38:19,130 --> 00:38:23,390 when we see those positive signals, they're very easily abused as we saw 744 00:38:23,390 --> 00:38:26,480 just a moment ago with something silly like this. 745 00:38:26,480 --> 00:38:29,100 >> So session hijacking comes into play, as we said before, 746 00:38:29,100 --> 00:38:30,510 with cookies for instance. 747 00:38:30,510 --> 00:38:32,130 And what does this actually mean? 748 00:38:32,130 --> 00:38:35,930 Well with session hijacking this is all about stealing someone's cookies. 749 00:38:35,930 --> 00:38:39,860 So if I open up Chrome here, for instance, and I open up the Inspector 750 00:38:39,860 --> 00:38:41,550 down here and I go to the Network Tab-- 751 00:38:41,550 --> 00:38:42,830 and we've done this before-- 752 00:38:42,830 --> 00:38:48,900 and I go to something like http://facebook.com Enter, a whole 753 00:38:48,900 --> 00:38:52,280 bunch of stuff goes across the screen because of all the images and CSS and 754 00:38:52,280 --> 00:38:53,490 JavaScript files. 755 00:38:53,490 --> 00:38:59,420 >> But if I look at this one here notice that Facebook is indeed planting one 756 00:38:59,420 --> 00:39:02,310 or more cookies on my browser right here. 757 00:39:02,310 --> 00:39:05,610 So these are essentially the hand stamps that represent me. 758 00:39:05,610 --> 00:39:08,580 And now hopefully my browser will present this again and again when 759 00:39:08,580 --> 00:39:10,560 revisiting that website. 760 00:39:10,560 --> 00:39:15,810 But that only is secure, we said a couple weeks ago, if you're using SSL. 761 00:39:15,810 --> 00:39:18,860 >> But even SSL itself can be compromised. 762 00:39:18,860 --> 00:39:21,800 Consider after all the way SSL works. 763 00:39:21,800 --> 00:39:28,860 When your browser connects to a remote server via https://, long story short, 764 00:39:28,860 --> 00:39:30,110 cryptography is involved. 765 00:39:30,110 --> 00:39:34,750 It's not as simple as Caesar or Visionaire or even DES, DES from a 766 00:39:34,750 --> 00:39:36,110 while back in pset2. 767 00:39:36,110 --> 00:39:37,410 It's more sophisticated than that. 768 00:39:37,410 --> 00:39:39,110 It's called public key cryptography. 769 00:39:39,110 --> 00:39:42,845 But really big and really random numbers are used to scramble 770 00:39:42,845 --> 00:39:47,125 information between point A, you, and point B, like facebook.com. 771 00:39:47,125 --> 00:39:52,570 >> But the problem is, how many of us again ever type in https:// to start 772 00:39:52,570 --> 00:39:55,790 our website connection in that secure mode? 773 00:39:55,790 --> 00:40:00,900 I mean, how many of you even type http://facebook.com? 774 00:40:00,900 --> 00:40:02,290 All right, if you do, like, hello. 775 00:40:02,290 --> 00:40:03,510 You don't need to do that anymore, right? 776 00:40:03,510 --> 00:40:05,190 The browser will figure it out. 777 00:40:05,190 --> 00:40:08,070 >> But most of us do indeed just type facebook.com. 778 00:40:08,070 --> 00:40:10,960 Because if we're using a browser, the browsers have gotten smart enough by 779 00:40:10,960 --> 00:40:14,920 2013 to assume if you're using a browser, you type in an address, you 780 00:40:14,920 --> 00:40:18,550 probably want to access it not via email or instant message. 781 00:40:18,550 --> 00:40:21,250 You mean HTTP and Port 80. 782 00:40:21,250 --> 00:40:22,970 Those conventions have been adopted. 783 00:40:22,970 --> 00:40:24,830 >> But how does redirection work? 784 00:40:24,830 --> 00:40:26,170 Well, notice what happens here. 785 00:40:26,170 --> 00:40:27,590 If I go back to Chrome-- 786 00:40:27,590 --> 00:40:31,920 and let's do this in incognito mode so that all of my 787 00:40:31,920 --> 00:40:33,620 cookies are thrown away. 788 00:40:33,620 --> 00:40:38,130 And let me go here to, again, facebook.com. 789 00:40:38,130 --> 00:40:39,490 And let's see what happens. 790 00:40:39,490 --> 00:40:43,372 >> Recall that the first request was indeed just for facebook.com. 791 00:40:43,372 --> 00:40:46,580 But what was the response that I got? 792 00:40:46,580 --> 00:40:48,520 It wasn't a 200 OK. 793 00:40:48,520 --> 00:40:53,550 It was 300, or 301, which is a redirect telling me to go to 794 00:40:53,550 --> 00:40:59,050 http://www.facebook.com, which is where Facebook wants me to go. 795 00:40:59,050 --> 00:41:01,900 But then if we look at the next request, and we've seen this before, 796 00:41:01,900 --> 00:41:04,370 notice what their second response is. 797 00:41:04,370 --> 00:41:10,280 Specifically that they want me now to go to the SSL version of Facebook. 798 00:41:10,280 --> 00:41:11,800 >> So here is an opportunity. 799 00:41:11,800 --> 00:41:15,440 This is a wonderfully useful feature of just the web and HTTP. 800 00:41:15,440 --> 00:41:19,570 If the end user like Facebook wants me to stay on the secure version of their 801 00:41:19,570 --> 00:41:20,850 website, great. 802 00:41:20,850 --> 00:41:23,130 They will redirect me for myself. 803 00:41:23,130 --> 00:41:25,250 And so I don't have to even think about that. 804 00:41:25,250 --> 00:41:29,200 >> But what if between point A and B, between you and Facebook, there's some 805 00:41:29,200 --> 00:41:32,220 bad guy, there's some system administrator at Harvard who's curious 806 00:41:32,220 --> 00:41:34,240 to see who your friends are. 807 00:41:34,240 --> 00:41:36,760 Or there's some-- 808 00:41:36,760 --> 00:41:38,340 years ago, this used to sound crazy-- 809 00:41:38,340 --> 00:41:41,950 but there's some government entity like the NSA who's actually interested 810 00:41:41,950 --> 00:41:44,390 in who you're poking on Facebook. 811 00:41:44,390 --> 00:41:45,910 Where's the opportunity there? 812 00:41:45,910 --> 00:41:49,305 Well, so long as someone has enough technical savvy and they have access 813 00:41:49,305 --> 00:41:53,350 to your actual network over Wi-Fi or some physical wire, 814 00:41:53,350 --> 00:41:54,570 what could they do? 815 00:41:54,570 --> 00:41:57,520 >> Well, if they're on the same network as you and they know something about 816 00:41:57,520 --> 00:42:02,050 TCP/IP and IP addresses and DNS and how all of that works, what if that 817 00:42:02,050 --> 00:42:05,970 man in the middle, what if that National Security Agency, whatever it 818 00:42:05,970 --> 00:42:11,480 may be, but what if that entity simply responds more quickly than Facebook to 819 00:42:11,480 --> 00:42:15,820 your HTTP request and says, oh, I am Facebook. 820 00:42:15,820 --> 00:42:19,300 Go ahead, and here's the HTML for facebook.com. 821 00:42:19,300 --> 00:42:20,720 >> Computers are pretty darn fast. 822 00:42:20,720 --> 00:42:25,990 So you could write a program running on a server like nsa.gov that when it 823 00:42:25,990 --> 00:42:29,790 hears a request from you for facebook.com, very quickly behind the 824 00:42:29,790 --> 00:42:34,000 scenes gets the real facebook.com making a perfectly [? esque ?] secure 825 00:42:34,000 --> 00:42:38,290 SSL connection between NSA and between Facebook, getting that HTML very 826 00:42:38,290 --> 00:42:42,670 securely for the login page, and then the NSA server just responds to you 827 00:42:42,670 --> 00:42:44,942 with a login page for facebook.com. 828 00:42:44,942 --> 00:42:49,120 >> Now how many of you would even notice that you're using Facebook over HTTP 829 00:42:49,120 --> 00:42:53,375 still at that point because you've accidentally connected to nsa.gov and 830 00:42:53,375 --> 00:42:53,870 not Facebook? 831 00:42:53,870 --> 00:42:54,980 The URLs not changing. 832 00:42:54,980 --> 00:42:57,040 All of this is being done behind the scenes. 833 00:42:57,040 --> 00:42:59,470 But most of us, myself included, probably wouldn't notice 834 00:42:59,470 --> 00:43:00,800 such a minor detail. 835 00:43:00,800 --> 00:43:05,510 >> So you might have a perfectly workable connection between you and what you 836 00:43:05,510 --> 00:43:08,660 think is Facebook, but there's a so-called man in the middle. 837 00:43:08,660 --> 00:43:12,480 And this is a general term for man in the middle attack where you have some 838 00:43:12,480 --> 00:43:17,670 entity between you and point B that's somehow manipulating, stealing, or 839 00:43:17,670 --> 00:43:18,960 watching your data. 840 00:43:18,960 --> 00:43:22,750 So even SSL is not surefire, especially if you've been tricked into 841 00:43:22,750 --> 00:43:26,790 not turning it on because of how these underlying mechanisms actually work. 842 00:43:26,790 --> 00:43:30,670 >> So a lesson today then too is if you really want to be paranoid-- 843 00:43:30,670 --> 00:43:32,110 and even here there are threats-- 844 00:43:32,110 --> 00:43:37,112 you should really start getting into the habit of typing in https://www 845 00:43:37,112 --> 00:43:39,850 whatever domain name you actually care about. 846 00:43:39,850 --> 00:43:41,820 >> And as an aside too there's yet another threat with 847 00:43:41,820 --> 00:43:43,410 regard to session hijacking. 848 00:43:43,410 --> 00:43:47,440 Very often when you first visit a website like facebook.com, unless the 849 00:43:47,440 --> 00:43:51,050 server has been configured to say that that hand stamp it put on you 850 00:43:51,050 --> 00:43:56,140 yesterday should be secure itself, your browser might very well, upon 851 00:43:56,140 --> 00:44:00,620 visiting things like facebook.com google.com, twitter.com, your browser 852 00:44:00,620 --> 00:44:04,280 might be presenting that hand stamp only to be slapped down and said, no. 853 00:44:04,280 --> 00:44:05,660 Use SSL. 854 00:44:05,660 --> 00:44:07,030 >> But it's too late at that point. 855 00:44:07,030 --> 00:44:10,940 If you have already sent your hand stamp, your cookie, in the clear with 856 00:44:10,940 --> 00:44:15,180 no SSL, you have a split second vulnerability where someone sniffing 857 00:44:15,180 --> 00:44:19,530 your traffic, whether roommate or NSA, can then use that same cookie, and 858 00:44:19,530 --> 00:44:23,860 with a bit of technical savvy, present it as his or her own. 859 00:44:23,860 --> 00:44:25,930 >> Another attack you might not have thought about. 860 00:44:25,930 --> 00:44:30,120 This one is really on you if you screw this up in writing some website that 861 00:44:30,120 --> 00:44:31,580 somehow uses SQL. 862 00:44:31,580 --> 00:44:34,610 So here, for instance, is a screen shot of Harvard's login. 863 00:44:34,610 --> 00:44:36,380 And this is a general example of something with a 864 00:44:36,380 --> 00:44:37,480 username and password. 865 00:44:37,480 --> 00:44:38,440 Super common. 866 00:44:38,440 --> 00:44:41,310 So let's assume that SSL exists and there's no man in the middle or 867 00:44:41,310 --> 00:44:41,920 anything like that. 868 00:44:41,920 --> 00:44:45,660 Now we're focusing on the server's code that you might write. 869 00:44:45,660 --> 00:44:49,830 >> Well, when I type in a username and password, suppose that the PIN service 870 00:44:49,830 --> 00:44:51,740 is implemented in PHP. 871 00:44:51,740 --> 00:44:53,990 And you might have some code on that server like this. 872 00:44:53,990 --> 00:44:57,740 Get the user name from the post super global and get the password, and then 873 00:44:57,740 --> 00:45:01,130 if they're using some pset7 like code there's a query function 874 00:45:01,130 --> 00:45:01,820 that might do this. 875 00:45:01,820 --> 00:45:06,320 Select Star from users where username equals that and password equals that. 876 00:45:06,320 --> 00:45:08,120 >> That looks, at first glance, totally reasonable. 877 00:45:08,120 --> 00:45:11,090 This is syntactically valid PHP code. 878 00:45:11,090 --> 00:45:13,160 Logically there's nothing wrong with this. 879 00:45:13,160 --> 00:45:15,710 Presumably there's some more lines that actually do something with the 880 00:45:15,710 --> 00:45:18,150 result that comes back from the database. 881 00:45:18,150 --> 00:45:20,580 But this is vulnerable for the following reason. 882 00:45:20,580 --> 00:45:23,760 >> Notice that, like a good citizen, I have put in quotes, single 883 00:45:23,760 --> 00:45:25,380 quotes, the user name. 884 00:45:25,380 --> 00:45:26,980 And I put in single quotes the password. 885 00:45:26,980 --> 00:45:28,830 And that's a good thing because they're not supposed to be numbers. 886 00:45:28,830 --> 00:45:30,660 Typically they're going to be text. 887 00:45:30,660 --> 00:45:32,290 So I'm quoting them like strings. 888 00:45:32,290 --> 00:45:37,470 >> And if I now advance further what if-- and I've removed the bullets from the 889 00:45:37,470 --> 00:45:38,870 PIN service temporarily-- 890 00:45:38,870 --> 00:45:41,650 what if I try to log in as President [? Scroob ?] 891 00:45:41,650 --> 00:45:52,540 but I claim that my password is 12345' OR '1'='1, and notice 892 00:45:52,540 --> 00:45:53,830 what I haven't done. 893 00:45:53,830 --> 00:45:56,140 I did not close the other single quote. 894 00:45:56,140 --> 00:45:58,500 Because I'm pretty sharp here as the bad guy. 895 00:45:58,500 --> 00:46:01,870 And I'm assuming they're you're not very good with your 896 00:46:01,870 --> 00:46:03,450 PHP and MySQL code. 897 00:46:03,450 --> 00:46:06,740 I'm guessing that you're not checking for the presence of quotes. 898 00:46:06,740 --> 00:46:11,190 >> So what just happened is that when your user has typed in that string, 899 00:46:11,190 --> 00:46:15,060 the query you're about to create looks like this. 900 00:46:15,060 --> 00:46:18,180 And long story short, if you and something together or you or something 901 00:46:18,180 --> 00:46:21,740 together this is going to return a row from the database. 902 00:46:21,740 --> 00:46:26,570 Because it is always the case that 1 equals 1. 903 00:46:26,570 --> 00:46:30,400 >> And just because you didn't anticipate that your users, good or bad, might 904 00:46:30,400 --> 00:46:35,340 have an apostrophe in their name you have created a SQL query that's still 905 00:46:35,340 --> 00:46:39,040 valid, and will return now more results than you might have intended. 906 00:46:39,040 --> 00:46:42,340 And so this bad guy now has potentially logged in to your server 907 00:46:42,340 --> 00:46:47,060 because your database is returning a row even if he or she has no idea what 908 00:46:47,060 --> 00:46:49,410 [? Scroob's ?] actual password is. 909 00:46:49,410 --> 00:46:50,640 >> Oh, I realized a typo here. 910 00:46:50,640 --> 00:46:53,260 I should've said password equals 12345 like the previous 911 00:46:53,260 --> 00:46:54,990 example or 1 equals 1. 912 00:46:54,990 --> 00:46:56,400 I'll fix that online. 913 00:46:56,400 --> 00:46:59,960 >> So why did we have you using the query function with question marks? 914 00:46:59,960 --> 00:47:04,000 One of the things the query function does for you is it makes sure that 915 00:47:04,000 --> 00:47:07,660 when you pass in arguments after the commas here like this that the query 916 00:47:07,660 --> 00:47:10,330 that's actually sent to the database looks like this. 917 00:47:10,330 --> 00:47:13,830 A lot uglier to look at, but back slashes have been automatically 918 00:47:13,830 --> 00:47:19,030 inserted to avoid precisely that injection attack that I showed a 919 00:47:19,030 --> 00:47:20,270 moment ago. 920 00:47:20,270 --> 00:47:24,930 >> Now a fun XKCD that I thought I'd pull up here that hopefully should now be a 921 00:47:24,930 --> 00:47:28,546 little more understandable is this one here. 922 00:47:28,546 --> 00:47:39,460 923 00:47:39,460 --> 00:47:40,265 >> A little bit? 924 00:47:40,265 --> 00:47:42,370 Maybe we need a little more discussion on that. 925 00:47:42,370 --> 00:47:47,810 So this is alluding to a little kid named Bobby who has somehow taken 926 00:47:47,810 --> 00:47:52,250 advantage of a website that is just trusting that what the user has typed 927 00:47:52,250 --> 00:47:55,100 in is not, in fact, SQL code, but is in fact a string. 928 00:47:55,100 --> 00:47:56,830 >> Now you may recall that drop-- 929 00:47:56,830 --> 00:48:00,190 you might have seen this-- drop means delete a table, delete a database. 930 00:48:00,190 --> 00:48:02,235 So if you essentially claim that your name is Robert";droptabl 931 00:48:02,235 --> 00:48:03,485 estudentsomething, ] 932 00:48:03,485 --> 00:48:06,340 933 00:48:06,340 --> 00:48:09,370 you might very well trick the database not only into checking that you're 934 00:48:09,370 --> 00:48:13,530 indeed Robert, but semicolon also proceed to drop the table. 935 00:48:13,530 --> 00:48:17,560 >> And so SQL injection attacks can actually be as threatening as this 936 00:48:17,560 --> 00:48:20,740 whereby you can delete someone's data, you can select more datas than 937 00:48:20,740 --> 00:48:23,440 intended, you can insert or update data. 938 00:48:23,440 --> 00:48:26,520 And you can actually see this upon at home exercise, not for malicious 939 00:48:26,520 --> 00:48:29,730 purposes but just for instructional, is any time you're prompted to log 940 00:48:29,730 --> 00:48:35,180 into website, especially some sort of non very public, very popular website, 941 00:48:35,180 --> 00:48:38,630 try logging in as John O'Reilly or someone with an 942 00:48:38,630 --> 00:48:39,740 apostrophe in their name. 943 00:48:39,740 --> 00:48:42,990 Or literally just type apostrophe, hit Enter, and see what happens. 944 00:48:42,990 --> 00:48:47,990 >> And all too often, tragically, people have not sanitized their inputs and 945 00:48:47,990 --> 00:48:51,690 made sure that things like quotes or semicolons are escaped. 946 00:48:51,690 --> 00:48:54,430 Which is why in pset7 we give you this query function. 947 00:48:54,430 --> 00:48:59,510 But do not under appreciate exactly what it is doing for you. 948 00:48:59,510 --> 00:49:01,800 >> So with that said, enjoy using the web this week. 949 00:49:01,800 --> 00:49:04,660 And we will see you on Monday. 950 00:49:04,660 --> 00:49:06,180 >> At the next CD50. 951 00:49:06,180 --> 00:49:18,614 >> [MUSIC]