1 00:00:00,000 --> 00:00:00,804 2 00:00:00,804 --> 00:00:02,550 DAVID MALAN: All right. 3 00:00:02,550 --> 00:00:04,350 So we are back. 4 00:00:04,350 --> 00:00:06,150 Now it's time for the cloud. 5 00:00:06,150 --> 00:00:09,950 What the heck is the cloud? 6 00:00:09,950 --> 00:00:12,600 Who's in the cloud? 7 00:00:12,600 --> 00:00:15,970 Who uses the cloud? 8 00:00:15,970 --> 00:00:16,730 Yeah, OK. 9 00:00:16,730 --> 00:00:18,170 Everyone over here and over here. 10 00:00:18,170 --> 00:00:18,800 OK. 11 00:00:18,800 --> 00:00:19,820 So what does this mean? 12 00:00:19,820 --> 00:00:21,930 We kind of take it for granted now. 13 00:00:21,930 --> 00:00:25,210 But what does cloud computing mean? 14 00:00:25,210 --> 00:00:25,710 Yeah? 15 00:00:25,710 --> 00:00:26,530 AUDIENCE: Off premise? 16 00:00:26,530 --> 00:00:27,340 DAVID MALAN: Off premise. 17 00:00:27,340 --> 00:00:27,850 OK, good. 18 00:00:27,850 --> 00:00:29,996 So what's off premise? 19 00:00:29,996 --> 00:00:36,070 AUDIENCE: You're hosting all your data off of your physical premises. 20 00:00:36,070 --> 00:00:37,600 [INAUDIBLE]. 21 00:00:37,600 --> 00:00:38,350 DAVID MALAN: Good. 22 00:00:38,350 --> 00:00:39,842 AUDIENCE: [INAUDIBLE]. 23 00:00:39,842 --> 00:00:41,300 DAVID MALAN: OK, applications, too. 24 00:00:41,300 --> 00:00:41,799 David? 25 00:00:41,799 --> 00:00:44,849 AUDIENCE: [INAUDIBLE] off-site storage. 26 00:00:44,849 --> 00:00:47,890 DAVID MALAN: Yeah, and this holds true for both businesses and consumers. 27 00:00:47,890 --> 00:00:50,931 Like, for us consumers in the room, we might be in the cloud in the sense 28 00:00:50,931 --> 00:00:53,070 that we're backing up our photos to iCloud 29 00:00:53,070 --> 00:00:57,320 or you're using Dropbox or some such service 30 00:00:57,320 --> 00:00:58,790 to store your data in the cloud. 31 00:00:58,790 --> 00:01:00,120 But a company could certainly do it, too. 32 00:01:00,120 --> 00:01:02,369 And that's where we'll focus today-- what you actually 33 00:01:02,369 --> 00:01:04,220 do if you're doing something for a business 34 00:01:04,220 --> 00:01:06,750 and you don't want to run your own on-premise servers. 35 00:01:06,750 --> 00:01:10,440 You don't want to have to worry about power and cooling and electricity 36 00:01:10,440 --> 00:01:13,846 and physical security and hiring someone to run all of that 37 00:01:13,846 --> 00:01:14,970 and paying for all of that. 38 00:01:14,970 --> 00:01:17,910 Rather, you'd much rather leverage someone else's expertise 39 00:01:17,910 --> 00:01:20,350 in and infrastructure for that kind of stuff 40 00:01:20,350 --> 00:01:22,709 and actually use their off-premise servers. 41 00:01:22,709 --> 00:01:25,500 And so cloud computing kind of came onto the scene a few years ago, 42 00:01:25,500 --> 00:01:28,050 when really it was just a nice new rebranding 43 00:01:28,050 --> 00:01:32,360 of what it means to outsource or to rent someone else's server space. 44 00:01:32,360 --> 00:01:35,540 But it's been driven, in part, by technological trends. 45 00:01:35,540 --> 00:01:38,080 Does anyone have a sense of what it is that's been happening 46 00:01:38,080 --> 00:01:42,640 in industry technologically that's made cloud computing all the more the rage 47 00:01:42,640 --> 00:01:46,282 and all the more technologically possible? 48 00:01:46,282 --> 00:01:47,360 AUDIENCE: It's faster? 49 00:01:47,360 --> 00:01:48,193 DAVID MALAN: Faster? 50 00:01:48,193 --> 00:01:48,900 What's faster? 51 00:01:48,900 --> 00:01:54,290 AUDIENCE: The give and take of the data between [INAUDIBLE]. 52 00:01:54,290 --> 00:01:57,370 DAVID MALAN: OK, so transfer rates, bandwidth between points A and B. 53 00:01:57,370 --> 00:01:59,600 It's possible to move data around faster. 54 00:01:59,600 --> 00:02:02,380 And now that that's the case, it's not such a big deal, maybe, 55 00:02:02,380 --> 00:02:05,600 if your photos are stored in the cloud because you can see them almost as 56 00:02:05,600 --> 00:02:06,970 quickly on your device, anyway. 57 00:02:06,970 --> 00:02:07,470 Sean? 58 00:02:07,470 --> 00:02:08,820 AUDIENCE: Cellular technology. 59 00:02:08,820 --> 00:02:09,690 DAVID MALAN: Cellular technology. 60 00:02:09,690 --> 00:02:10,240 How so? 61 00:02:10,240 --> 00:02:12,390 AUDIENCE: As far as speed. [INAUDIBLE]. 62 00:02:12,390 --> 00:02:13,140 DAVID MALAN: Yeah. 63 00:02:13,140 --> 00:02:15,710 64 00:02:15,710 --> 00:02:18,220 Yeah, this is definitely kind of the bottleneck these days. 65 00:02:18,220 --> 00:02:19,220 But it's getting better. 66 00:02:19,220 --> 00:02:23,720 There was, like, edge, and then 3g, and now LTE and other such variants 67 00:02:23,720 --> 00:02:24,220 thereof. 68 00:02:24,220 --> 00:02:26,261 And so it's becoming a little more seamless, such 69 00:02:26,261 --> 00:02:29,640 that it doesn't matter where the data is because if you see it pretty much 70 00:02:29,640 --> 00:02:33,680 instantaneously, it doesn't matter how close or how far the data is. 71 00:02:33,680 --> 00:02:34,320 Other trends? 72 00:02:34,320 --> 00:02:35,154 AUDIENCE: Web hosts. 73 00:02:35,154 --> 00:02:36,195 DAVID MALAN: What's that? 74 00:02:36,195 --> 00:02:37,166 AUDIENCE: Web hosts 75 00:02:37,166 --> 00:02:37,916 DAVID MALAN: Web-- 76 00:02:37,916 --> 00:02:39,997 AUDIENCE: Like, AWS, Google [INAUDIBLE]. 77 00:02:39,997 --> 00:02:41,080 DAVID MALAN: Oh, OK, sure. 78 00:02:41,080 --> 00:02:44,390 So these providers, these big players especially 79 00:02:44,390 --> 00:02:45,910 that have really popularized this. 80 00:02:45,910 --> 00:02:47,970 Amazon, in particular, was one of the first 81 00:02:47,970 --> 00:02:51,870 some years ago to start building out this fairly generic infrastructure 82 00:02:51,870 --> 00:02:57,940 as a service, IAAS, which is kind of the silly buzzword or buzz acronym for it-- 83 00:02:57,940 --> 00:03:02,150 Infrastructure As a Service, which describes the sort of virtualization 84 00:03:02,150 --> 00:03:03,674 of low-level services. 85 00:03:03,674 --> 00:03:06,840 And we'll come back to this in just a bit as to what that menu of options is 86 00:03:06,840 --> 00:03:09,190 and how they are representative of other offerings 87 00:03:09,190 --> 00:03:11,720 from, like, Google and Microsoft and others. 88 00:03:11,720 --> 00:03:13,310 What-- Grace? 89 00:03:13,310 --> 00:03:16,304 AUDIENCE: [INAUDIBLE] scale much faster? 90 00:03:16,304 --> 00:03:18,854 Like, the volumes of data [INAUDIBLE]. 91 00:03:18,854 --> 00:03:19,520 DAVID MALAN: OK. 92 00:03:19,520 --> 00:03:21,610 And what you mean by the ability to scale faster? 93 00:03:21,610 --> 00:03:22,693 Where does that come from? 94 00:03:22,693 --> 00:03:26,970 AUDIENCE: Cloud server could scale up additional memory 95 00:03:26,970 --> 00:03:29,844 that you couldn't do if you were in a physical location. 96 00:03:29,844 --> 00:03:30,510 DAVID MALAN: OK. 97 00:03:30,510 --> 00:03:33,301 AUDIENCE: Like, having to buy and build out more servers at Harvard 98 00:03:33,301 --> 00:03:37,080 is much harder to do, to get more space. 99 00:03:37,080 --> 00:03:38,420 DAVID MALAN: Yeah. 100 00:03:38,420 --> 00:03:39,120 Absolutely. 101 00:03:39,120 --> 00:03:42,080 And let me toss in the word spikiness or spiky traffic, 102 00:03:42,080 --> 00:03:46,000 especially when websites have gotten mentioned 103 00:03:46,000 --> 00:03:49,370 on Buzzfeed or Slashdot or other such websites, where all of a sudden 104 00:03:49,370 --> 00:03:52,737 your baseline users might be some number of hundreds or thousands people per day 105 00:03:52,737 --> 00:03:53,820 or per second or whatever. 106 00:03:53,820 --> 00:03:55,903 And then all of a sudden, there's a massive spike. 107 00:03:55,903 --> 00:03:59,065 And in yesteryear, the results of that would be your website goes offline. 108 00:03:59,065 --> 00:04:00,200 Well, why is that? 109 00:04:00,200 --> 00:04:02,240 Well, let's actually focus on that for a moment. 110 00:04:02,240 --> 00:04:07,370 Why would your website or web server physically go 111 00:04:07,370 --> 00:04:11,210 offline or break or crash just because you have a lot of users? 112 00:04:11,210 --> 00:04:12,934 What's going on? 113 00:04:12,934 --> 00:04:13,434 Sean? 114 00:04:13,434 --> 00:04:16,190 AUDIENCE: Kind of [INAUDIBLE] over here. 115 00:04:16,190 --> 00:04:18,660 The computer only has so many connections, I guess? 116 00:04:18,660 --> 00:04:19,659 DAVID MALAN: Uh-huh, OK. 117 00:04:19,659 --> 00:04:20,529 AUDIENCE: Maxed out? 118 00:04:20,529 --> 00:04:21,279 DAVID MALAN: Yeah. 119 00:04:21,279 --> 00:04:23,520 So if you-- a computer, of course, can only 120 00:04:23,520 --> 00:04:26,030 do a finite amount of work per unit of time, right? 121 00:04:26,030 --> 00:04:27,330 Because it's a finite device. 122 00:04:27,330 --> 00:04:31,370 There's some ceiling on how much disk space it has, CPU cycles, so to speak, 123 00:04:31,370 --> 00:04:34,960 how much it can do per second, how much RAM it actually has. 124 00:04:34,960 --> 00:04:38,530 So if you try to exceed that amount, sometimes the behavior 125 00:04:38,530 --> 00:04:41,890 is undefined, especially if the programmers didn't really 126 00:04:41,890 --> 00:04:44,640 worry about that upper bound scenario. 127 00:04:44,640 --> 00:04:47,580 And at the end of the day, there's really nothing you can do. 128 00:04:47,580 --> 00:04:51,625 If you are just getting request after request after request, at some point, 129 00:04:51,625 --> 00:04:53,250 something's got to break along the way. 130 00:04:53,250 --> 00:04:55,124 And maybe it's the routers in between you 131 00:04:55,124 --> 00:04:57,040 and point B that just start dropping the data. 132 00:04:57,040 --> 00:05:00,020 Maybe it's your own server that can't handle the load. 133 00:05:00,020 --> 00:05:03,150 And it just gets so consumed with handling request, request, 134 00:05:03,150 --> 00:05:06,520 request, even if it runs out of RAM, it might use virtual memory, 135 00:05:06,520 --> 00:05:07,420 as we discussed. 136 00:05:07,420 --> 00:05:10,827 But then it's spending all of its time temporarily moving those requests 137 00:05:10,827 --> 00:05:14,160 until you get locked in this cycle where now you've used all of your disk space. 138 00:05:14,160 --> 00:05:16,700 And frankly, computers do not like it when they run out of disk space. 139 00:05:16,700 --> 00:05:18,949 Bad things happen, mostly because the software doesn't 140 00:05:18,949 --> 00:05:20,456 anticipate that actually happening. 141 00:05:20,456 --> 00:05:22,330 And so things slow to a crawl, and the server 142 00:05:22,330 --> 00:05:25,977 effectively freezes, crashes, or some other ill-defined behavior. 143 00:05:25,977 --> 00:05:27,310 And so your server goes offline. 144 00:05:27,310 --> 00:05:33,910 So what do you do in cases of that spiky traffic, at least 145 00:05:33,910 --> 00:05:35,792 before cloud computing? 146 00:05:35,792 --> 00:05:36,751 AUDIENCE: More servers. 147 00:05:36,751 --> 00:05:38,125 DAVID MALAN: Right, more servers. 148 00:05:38,125 --> 00:05:40,180 So you sort of hope that the customers will still 149 00:05:40,180 --> 00:05:43,340 be there tomorrow or next week or next month when you've actually 150 00:05:43,340 --> 00:05:47,420 bought the equipment and plugged it in and installed it and configured it. 151 00:05:47,420 --> 00:05:49,810 And the thing is, there's a lot of complexity when 152 00:05:49,810 --> 00:05:52,622 it comes to wiring things up, both virtually and physically, 153 00:05:52,622 --> 00:05:54,080 as to how you design your software. 154 00:05:54,080 --> 00:05:55,540 So we'll come to that in just a moment. 155 00:05:55,540 --> 00:05:57,623 So cloud computing's gotten super alluring insofar 156 00:05:57,623 --> 00:06:02,905 as you can amortize your costs over all of the other hardware there-- 157 00:06:02,905 --> 00:06:04,280 you and other people can do this. 158 00:06:04,280 --> 00:06:08,220 And so when you do get spiky behavior, assuming that not every other company 159 00:06:08,220 --> 00:06:11,500 and website on the internet is also getting a spike of behavior, which 160 00:06:11,500 --> 00:06:14,170 stands to reason that's not possible because there's only 161 00:06:14,170 --> 00:06:15,290 a finite number of users. 162 00:06:15,290 --> 00:06:17,140 So they have to go one way or the other. 163 00:06:17,140 --> 00:06:19,140 And there's many different providers out there. 164 00:06:19,140 --> 00:06:23,600 You can consume all the more of Amazon or Microsoft's or Google's services, 165 00:06:23,600 --> 00:06:25,990 and then as soon as people lose interest in your site 166 00:06:25,990 --> 00:06:28,180 or the article that got reblogged or whatnot, 167 00:06:28,180 --> 00:06:31,490 then you sort of turn off those rented servers that you 168 00:06:31,490 --> 00:06:32,960 were borrowing from someone else. 169 00:06:32,960 --> 00:06:35,180 Now, ideally, all of this is automatic, and you 170 00:06:35,180 --> 00:06:37,860 yourself don't have to log in anywhere or make a phone call 171 00:06:37,860 --> 00:06:40,079 and actually scale up your services by saying, hey, 172 00:06:40,079 --> 00:06:42,120 we'd like to place an order for two more servers. 173 00:06:42,120 --> 00:06:47,360 And indeed, all the rage these days would be something like autoscaling, 174 00:06:47,360 --> 00:06:52,510 where you actually configure the service or write software that monitors, well, 175 00:06:52,510 --> 00:06:54,130 how much of my RAM am I using? 176 00:06:54,130 --> 00:06:55,790 How much disk space am I using? 177 00:06:55,790 --> 00:06:58,210 How many users are currently on my website right now? 178 00:06:58,210 --> 00:07:01,350 And you yourself define some threshold such 179 00:07:01,350 --> 00:07:04,610 that if your servers are at, like, 80% of capacity, 180 00:07:04,610 --> 00:07:05,926 you still have 20%, of course. 181 00:07:05,926 --> 00:07:07,800 But maybe you should get ahead of that curve, 182 00:07:07,800 --> 00:07:11,360 turn on automatically some more servers configured identically 183 00:07:11,360 --> 00:07:13,160 so that your overall utilization is maybe 184 00:07:13,160 --> 00:07:15,460 in the more comfortable zone of 50% or whatever 185 00:07:15,460 --> 00:07:19,720 it is you want to be comfortable with so that you can sort of grow 186 00:07:19,720 --> 00:07:22,580 and contract based on actual load. 187 00:07:22,580 --> 00:07:23,080 Yeah? 188 00:07:23,080 --> 00:07:24,755 AUDIENCE: That's like an elastic cloud. 189 00:07:24,755 --> 00:07:25,630 DAVID MALAN: Exactly. 190 00:07:25,630 --> 00:07:31,170 So elastic cloud is-- that's using two different Amazon terms. 191 00:07:31,170 --> 00:07:35,490 But yes, anything elastic means exactly this-- having the software 192 00:07:35,490 --> 00:07:40,600 automatically add or subtract resources based on actual load or thresholds 193 00:07:40,600 --> 00:07:41,290 that you set. 194 00:07:41,290 --> 00:07:42,800 Absolutely. 195 00:07:42,800 --> 00:07:46,662 So what's the downside of this? 196 00:07:46,662 --> 00:07:47,550 AUDIENCE: Security? 197 00:07:47,550 --> 00:07:48,466 DAVID MALAN: Security? 198 00:07:48,466 --> 00:07:49,190 How so? 199 00:07:49,190 --> 00:07:51,274 AUDIENCE: Getting all of your data types. 200 00:07:51,274 --> 00:07:51,940 DAVID MALAN: OK. 201 00:07:51,940 --> 00:07:52,640 So yeah. 202 00:07:52,640 --> 00:07:55,180 I mean, especially for particularly sensitive data, 203 00:07:55,180 --> 00:07:59,010 whether it's HR or financial or intellectual property or otherwise. 204 00:07:59,010 --> 00:08:02,710 Cloud computing literally means moving your data off 205 00:08:02,710 --> 00:08:05,810 of your own, what might have been internal servers, to someone else's 206 00:08:05,810 --> 00:08:06,650 servers. 207 00:08:06,650 --> 00:08:09,520 Now, there are private clouds, which is a way of mitigating this. 208 00:08:09,520 --> 00:08:11,580 And this is sort of a sillier marketing term. 209 00:08:11,580 --> 00:08:15,660 Private cloud means just having your own servers, like you used to. 210 00:08:15,660 --> 00:08:19,930 But maybe more technically it will often mean running certain software on it 211 00:08:19,930 --> 00:08:23,340 so that you abstract away the detail that those are your own servers so 212 00:08:23,340 --> 00:08:26,790 that functionally, they're configured to behave identically 213 00:08:26,790 --> 00:08:28,760 to the third-party servers. 214 00:08:28,760 --> 00:08:31,950 And all it is is like a line in a configuration file that says, 215 00:08:31,950 --> 00:08:35,150 send users to our private cloud or send users to our public cloud. 216 00:08:35,150 --> 00:08:37,500 So abstraction comes into play here, where 217 00:08:37,500 --> 00:08:39,584 it doesn't matter if it's a Dell computer 218 00:08:39,584 --> 00:08:41,820 or IBM computer or anything else. 219 00:08:41,820 --> 00:08:43,929 You're running software that creates the illusion 220 00:08:43,929 --> 00:08:47,050 to your own code, your own programs, that it could just 221 00:08:47,050 --> 00:08:48,920 be third-party servers or your own. 222 00:08:48,920 --> 00:08:50,520 It doesn't matter. 223 00:08:50,520 --> 00:08:53,114 But putting your data out there might not be acceptable. 224 00:08:53,114 --> 00:08:55,530 In fact, there are so many popular web services out there. 225 00:08:55,530 --> 00:08:58,370 GitHub, if you're familiar, for instance, 226 00:08:58,370 --> 00:09:01,670 is a very popular service for hosting your programming code. 227 00:09:01,670 --> 00:09:05,860 And your connection to GitHub will be encrypted between point A and B. 228 00:09:05,860 --> 00:09:07,640 But your code on their servers isn't going 229 00:09:07,640 --> 00:09:10,610 to be encrypted because the whole purpose of that site 230 00:09:10,610 --> 00:09:14,540 is to be able to share your code, either publicly or even internally, 231 00:09:14,540 --> 00:09:19,360 privately, to other people without jumping over hoops 232 00:09:19,360 --> 00:09:21,300 with encryption and whatnot. 233 00:09:21,300 --> 00:09:22,770 So it could exist, but it doesn't. 234 00:09:22,770 --> 00:09:25,420 But so many companies are putting really all of their software 235 00:09:25,420 --> 00:09:29,250 in the cloud because it's kind of trendy to do or it's cheaper to do 236 00:09:29,250 --> 00:09:31,870 or they didn't really think it through, any number of reasons. 237 00:09:31,870 --> 00:09:34,500 But it's very much in vogue, certainly, these days. 238 00:09:34,500 --> 00:09:38,502 What's another downside of using the cloud or enabling autoscaling? 239 00:09:38,502 --> 00:09:40,960 AUDIENCE: If you don't have internet, you don't have cloud. 240 00:09:40,960 --> 00:09:41,710 DAVID MALAN: Yeah. 241 00:09:41,710 --> 00:09:45,999 So if you don't have internet access, you don't have, in turn, the cloud. 242 00:09:45,999 --> 00:09:48,040 And this has actually happened in some ways, too. 243 00:09:48,040 --> 00:09:50,800 It's very much in vogue these days for software developers 244 00:09:50,800 --> 00:09:55,530 to just assume constant internet access and that third-party services will just 245 00:09:55,530 --> 00:09:59,057 be alive so much so that-- and we ourselves here on campus 246 00:09:59,057 --> 00:10:01,640 do this because it's cheaper and easier at the end of the day. 247 00:10:01,640 --> 00:10:03,250 But we increase these risks. 248 00:10:03,250 --> 00:10:06,330 If this is little old me on my laptop, and we're 249 00:10:06,330 --> 00:10:09,890 using some third-party service like GitHub here-- 250 00:10:09,890 --> 00:10:12,400 and there's equivalents of this-- and then maybe 251 00:10:12,400 --> 00:10:15,170 this is Amazon Web Services over here. 252 00:10:15,170 --> 00:10:18,170 And this is our cloud provider, this is the middleman that's 253 00:10:18,170 --> 00:10:20,630 storing our programming code just because it's convenient 254 00:10:20,630 --> 00:10:24,420 and it's an easy way for us to share so that I can use it, 255 00:10:24,420 --> 00:10:28,110 my buddies here on their laptops can use it. 256 00:10:28,110 --> 00:10:34,470 And so all of us might-- let's just put one arrow. 257 00:10:34,470 --> 00:10:37,480 So if these lines represent our laptops connections to GitHub, 258 00:10:37,480 --> 00:10:40,380 where we're constantly sharing code and using it in a cloud 259 00:10:40,380 --> 00:10:41,630 sense to kind of distribute. 260 00:10:41,630 --> 00:10:45,530 If I make changes, I can push it here, then persons B and C 261 00:10:45,530 --> 00:10:47,510 can also access the same. 262 00:10:47,510 --> 00:10:49,580 It's very common these days with cloud services 263 00:10:49,580 --> 00:10:52,190 to have what are called hooks, so to speak, 264 00:10:52,190 --> 00:10:55,910 whereby a hook is a reaction to some event. 265 00:10:55,910 --> 00:10:59,590 And by that I mean if I, for instance, make some change to our website 266 00:10:59,590 --> 00:11:03,170 and I push it, so to speak, to GitHub or whatever third-party service, 267 00:11:03,170 --> 00:11:05,910 you can define a hook on GitHub's server that 268 00:11:05,910 --> 00:11:09,930 says any time you hear a push from one of our customers, 269 00:11:09,930 --> 00:11:13,220 go ahead and deploy that customer's code to some servers that 270 00:11:13,220 --> 00:11:16,390 have been preconfigured to receive code from GitHub. 271 00:11:16,390 --> 00:11:20,670 So you use this as sort of a middleman so that everyone can push changes here. 272 00:11:20,670 --> 00:11:23,090 Then that code automatically gets pushed to Amazon Web 273 00:11:23,090 --> 00:11:26,370 Services, where our customers can actually then see those changes. 274 00:11:26,370 --> 00:11:30,350 And this is an example of something called Continuous Deployment or CD, 275 00:11:30,350 --> 00:11:33,640 whereby whereas in yesteryear or yester-yesteryear, 276 00:11:33,640 --> 00:11:36,860 companies would update their software once a year, every two years. 277 00:11:36,860 --> 00:11:39,370 You would literally receive in the mail a shrink-wrapped box 278 00:11:39,370 --> 00:11:40,420 before the internet. 279 00:11:40,420 --> 00:11:42,280 And then even when there was the internet and things 280 00:11:42,280 --> 00:11:45,113 like Microsoft Office, they might update themselves every few years. 281 00:11:45,113 --> 00:11:52,230 Microsoft Office 2008, Microsoft Office 2013, or whatever the milestones were. 282 00:11:52,230 --> 00:11:54,690 Much more in vogue these days, certainly among startups, 283 00:11:54,690 --> 00:11:59,410 is continuous deployment, whereby you might update your website's code 284 00:11:59,410 --> 00:12:02,220 five times a day, 20 times a day, 30 times a day. 285 00:12:02,220 --> 00:12:05,220 Anytime someone makes even the smallest change to the code, 286 00:12:05,220 --> 00:12:07,510 it gets pushed to some central repository, 287 00:12:07,510 --> 00:12:10,000 you maybe have some tests run-- also known 288 00:12:10,000 --> 00:12:12,790 as Continuous Integration, whereby automatically 289 00:12:12,790 --> 00:12:15,870 are certain tests run to make sure is the code working as expected? 290 00:12:15,870 --> 00:12:18,620 And if so, it gets pushed to someone like Amazon. 291 00:12:18,620 --> 00:12:21,310 So among the upsides here is just the ease of use. 292 00:12:21,310 --> 00:12:25,770 We as users over here, we don't have to worry about how to run our servers. 293 00:12:25,770 --> 00:12:28,780 We don't have to worry about how to share our code among collaborators. 294 00:12:28,780 --> 00:12:31,370 We can just pay each of those folks a few dollars per month, 295 00:12:31,370 --> 00:12:34,000 and they just make all this happen for us. 296 00:12:34,000 --> 00:12:37,490 But beyond money, what other prices must we be paying? 297 00:12:37,490 --> 00:12:39,354 What are the risks? 298 00:12:39,354 --> 00:12:40,270 AUDIENCE: [INAUDIBLE]. 299 00:12:40,270 --> 00:12:43,110 300 00:12:43,110 --> 00:12:44,510 DAVID MALAN: What's that? 301 00:12:44,510 --> 00:12:45,426 AUDIENCE: [INAUDIBLE]. 302 00:12:45,426 --> 00:12:51,760 303 00:12:51,760 --> 00:12:52,510 DAVID MALAN: Sure. 304 00:12:52,510 --> 00:12:55,880 So we're consuming a lot more bandwidth, which at least for stuff 305 00:12:55,880 --> 00:12:57,910 like code, not too worrisome, since it's small. 306 00:12:57,910 --> 00:13:01,270 Video sites, absolutely, would consume an order of magnitude more. 307 00:13:01,270 --> 00:13:03,520 Netflix has run up against that. 308 00:13:03,520 --> 00:13:04,596 Yeah, Victor? 309 00:13:04,596 --> 00:13:05,490 AUDIENCE: Latency? 310 00:13:05,490 --> 00:13:05,920 DAVID MALAN: Latency? 311 00:13:05,920 --> 00:13:06,478 How so? 312 00:13:06,478 --> 00:13:09,830 AUDIENCE: Servers are on different sides of the country. 313 00:13:09,830 --> 00:13:11,650 DAVID MALAN: Yeah. 314 00:13:11,650 --> 00:13:16,240 So whereas if you had a central server at your company, in this building, 315 00:13:16,240 --> 00:13:19,660 for instance, we could save our code centrally within a few milliseconds, 316 00:13:19,660 --> 00:13:21,040 let's say, or, like, a second. 317 00:13:21,040 --> 00:13:24,390 But if we have to push it to California or Virginia 318 00:13:24,390 --> 00:13:26,836 or whatever GitHub's main servers are, that 319 00:13:26,836 --> 00:13:28,960 could take longer, more milliseconds, more seconds. 320 00:13:28,960 --> 00:13:31,830 So you might have latency, the time between when you start an action 321 00:13:31,830 --> 00:13:33,720 and it actually completes. 322 00:13:33,720 --> 00:13:34,380 Other thoughts? 323 00:13:34,380 --> 00:13:34,880 Yeah. 324 00:13:34,880 --> 00:13:39,050 AUDIENCE: What about trust, security issue, [INAUDIBLE]. 325 00:13:39,050 --> 00:13:43,490 DAVID MALAN: Yeah, this is kind of one of the underappreciated details. 326 00:13:43,490 --> 00:13:45,610 I mean, fortunately for our academic uses, 327 00:13:45,610 --> 00:13:49,180 we're not really too worried about people stealing our code or the code 328 00:13:49,180 --> 00:13:51,940 that we use to administer assignments and such. 329 00:13:51,940 --> 00:13:53,840 But we're fairly unique in that sense. 330 00:13:53,840 --> 00:13:57,020 Any number of companies that actually use services like GitHub 331 00:13:57,020 --> 00:14:00,700 are putting all of their intellectual property in a third party's hands 332 00:14:00,700 --> 00:14:02,270 because it's convenient. 333 00:14:02,270 --> 00:14:04,846 But there's a massive potential downside there. 334 00:14:04,846 --> 00:14:06,720 Now thankfully, as an aside, just because I'm 335 00:14:06,720 --> 00:14:08,590 picking on GitHub as the most popular, they 336 00:14:08,590 --> 00:14:11,920 have something called GitHub Enterprise edition, which is the same software, 337 00:14:11,920 --> 00:14:15,130 but you get to run it on your own computer or your own servers 338 00:14:15,130 --> 00:14:16,740 or in the cloud. 339 00:14:16,740 --> 00:14:19,690 But even then, Amazon, in theory, has access to it 340 00:14:19,690 --> 00:14:23,310 because Amazon has physical humans as employees 341 00:14:23,310 --> 00:14:26,069 who could certainly physically access those devices, as well. 342 00:14:26,069 --> 00:14:28,110 And really all that's protecting you in that case 343 00:14:28,110 --> 00:14:33,371 there are SLAs or policy agreements between you and the provider. 344 00:14:33,371 --> 00:14:35,370 But this is, I think, an underappreciated thing. 345 00:14:35,370 --> 00:14:37,570 I mean, most any internet startup certainly just 346 00:14:37,570 --> 00:14:39,486 jumps on the latest and greatest technologies, 347 00:14:39,486 --> 00:14:42,207 I dare say without really thinking through the implications. 348 00:14:42,207 --> 00:14:44,790 But all it takes is for GitHub or whatever third-party service 349 00:14:44,790 --> 00:14:45,540 to be compromised. 350 00:14:45,540 --> 00:14:48,380 You could lose all of your intellectual property. 351 00:14:48,380 --> 00:14:51,970 But even more mundanely but significantly, what else 352 00:14:51,970 --> 00:14:54,832 could go wrong here? 353 00:14:54,832 --> 00:14:56,730 AUDIENCE: If the server went down-- 354 00:14:56,730 --> 00:14:57,480 DAVID MALAN: Yeah. 355 00:14:57,480 --> 00:14:59,490 When GitHub goes down, half of the internet 356 00:14:59,490 --> 00:15:02,060 seems to break these days, at least among trendy startups 357 00:15:02,060 --> 00:15:04,370 and so forth who are using this third-party service. 358 00:15:04,370 --> 00:15:08,850 Because if your whole system is built to deploy your code through this pipeline, 359 00:15:08,850 --> 00:15:12,670 so to speak, if this piece breaks, you're kind of dead in the water. 360 00:15:12,670 --> 00:15:14,900 Now, your servers are probably still running. 361 00:15:14,900 --> 00:15:16,990 But they're running older code because you haven't 362 00:15:16,990 --> 00:15:18,160 been able to push those changes. 363 00:15:18,160 --> 00:15:19,868 Now, you could certainly circumvent this. 364 00:15:19,868 --> 00:15:22,555 There's no fixed requirement of using this middleman. 365 00:15:22,555 --> 00:15:25,430 But it's going to take time, and then you have to re-engineer things. 366 00:15:25,430 --> 00:15:26,804 And it's just kind of a headache. 367 00:15:26,804 --> 00:15:30,450 So it's probably better just to kind of ride it out or wait until it resolves. 368 00:15:30,450 --> 00:15:31,920 But there's that issue, too. 369 00:15:31,920 --> 00:15:36,350 Very common too is for software development-- 370 00:15:36,350 --> 00:15:39,190 more on this tomorrow-- to rely on third-party libraries. 371 00:15:39,190 --> 00:15:41,820 A library is a bunch of code that someone else has written, 372 00:15:41,820 --> 00:15:45,010 often open source, freely available, that you can use in your own project. 373 00:15:45,010 --> 00:15:51,990 But it's very much in vogue these days to have deployment time-- to resolve 374 00:15:51,990 --> 00:15:53,720 your dependencies at deployment time. 375 00:15:53,720 --> 00:15:54,810 What do I mean by this? 376 00:15:54,810 --> 00:15:57,770 Suppose that I'm writing software that uses some third-party library. 377 00:15:57,770 --> 00:16:00,007 Like, I have no idea how to send emails, but I 378 00:16:00,007 --> 00:16:02,340 know that someone else wrote a library, a bunch of code, 379 00:16:02,340 --> 00:16:03,750 that knows how to send emails. 380 00:16:03,750 --> 00:16:07,780 So I'm using his or her library in my website to send my emails out. 381 00:16:07,780 --> 00:16:13,130 It's very common these days not to save copies of your libraries 382 00:16:13,130 --> 00:16:17,030 you're using in your own code repository, partly 'cause of principle. 383 00:16:17,030 --> 00:16:21,780 It's just redundant, and if someone else is already saving and archiving 384 00:16:21,780 --> 00:16:25,480 different versions of their email library, why should you do the same? 385 00:16:25,480 --> 00:16:26,510 It's wasting space. 386 00:16:26,510 --> 00:16:27,930 Things might get out of sync. 387 00:16:27,930 --> 00:16:31,600 And so what people will sometimes do is you store only your website's code 388 00:16:31,600 --> 00:16:32,420 here. 389 00:16:32,420 --> 00:16:34,060 You push it to some central source. 390 00:16:34,060 --> 00:16:36,500 And the moment it gets deployed to Amazon Web Services 391 00:16:36,500 --> 00:16:40,430 or wherever is when automatically, some program grabs 392 00:16:40,430 --> 00:16:43,310 all these other third-party services that you might have 393 00:16:43,310 --> 00:16:45,770 been using that get linked in as well. 394 00:16:45,770 --> 00:16:48,320 And we'll call these libraries. 395 00:16:48,320 --> 00:16:51,630 Of course, the problem there is exactly the same as if GitHub goes down. 396 00:16:51,630 --> 00:16:54,480 And it's funny, it's the stupidest thing-- 397 00:16:54,480 --> 00:17:00,201 let me see-- node.js left shift, left pad. 398 00:17:00,201 --> 00:17:00,700 OK. 399 00:17:00,700 --> 00:17:03,170 So this was all the rage. 400 00:17:03,170 --> 00:17:05,690 Here you go-- how one developer just broke Node, Babel, 401 00:17:05,690 --> 00:17:07,319 and thousands of projects. 402 00:17:07,319 --> 00:17:15,202 So this was delightful to read because in recent years, 403 00:17:15,202 --> 00:17:17,410 it's just become very common, this kind of paradigm-- 404 00:17:17,410 --> 00:17:20,380 to not only use libraries, which has been happening for decades, 405 00:17:20,380 --> 00:17:23,609 but to use third-party libraries that are hosted elsewhere 406 00:17:23,609 --> 00:17:26,609 and are pulled in dynamically for your own project, which has 407 00:17:26,609 --> 00:17:28,900 some upsides but also some downsides. 408 00:17:28,900 --> 00:17:31,280 And essentially, for reasons I'll defer to this article 409 00:17:31,280 --> 00:17:33,580 or can send the URL around later, someone 410 00:17:33,580 --> 00:17:36,640 who was hosting a library called left-pad whose purpose in life 411 00:17:36,640 --> 00:17:39,776 is just to add, I think, white space-- so space characters-- 412 00:17:39,776 --> 00:17:41,400 to the left of a sentence, if you want. 413 00:17:41,400 --> 00:17:43,700 If you want to kind of shift a sentence over this way, 414 00:17:43,700 --> 00:17:44,980 it's not hard to do in code. 415 00:17:44,980 --> 00:17:47,590 But someone wrote this, and it's very popular to make it open source. 416 00:17:47,590 --> 00:17:50,210 And so a lot of people were relying on this very small library. 417 00:17:50,210 --> 00:17:51,850 And for whatever reason-- some of them, I think, 418 00:17:51,850 --> 00:17:55,030 personal-- this fellow removed his library from public distribution. 419 00:17:55,030 --> 00:17:57,520 And to this article's headline, all of these projects 420 00:17:57,520 --> 00:18:00,130 suddenly broke because these companies and persons 421 00:18:00,130 --> 00:18:02,870 are trying to deploy their code or update their code 422 00:18:02,870 --> 00:18:05,880 and no longer can this dependency be resolved. 423 00:18:05,880 --> 00:18:12,199 And I think-- I mean, what's amazing is how simple this is. 424 00:18:12,199 --> 00:18:13,740 So let me see if I can find the code. 425 00:18:13,740 --> 00:18:17,590 426 00:18:17,590 --> 00:18:21,260 OK, so even if you're unfamiliar with programming-- well, 427 00:18:21,260 --> 00:18:22,520 this is not that much code. 428 00:18:22,520 --> 00:18:24,937 It looks a little scary because it has so many lines here. 429 00:18:24,937 --> 00:18:27,353 But half of these lines are what are called comments, just 430 00:18:27,353 --> 00:18:28,550 human-readable strings. 431 00:18:28,550 --> 00:18:31,207 This is not, like, a huge amount of intellectual property. 432 00:18:31,207 --> 00:18:34,290 Someone could whip this up in probably a few minutes and a bit of testing. 433 00:18:34,290 --> 00:18:36,360 But thousands of projects were apparently 434 00:18:36,360 --> 00:18:38,960 using this tiny, tiny piece of software. 435 00:18:38,960 --> 00:18:43,610 And the unavailability of it suddenly broke all of these projects. 436 00:18:43,610 --> 00:18:48,010 So these are the kinds of decisions, too, that might come as a surprise, 437 00:18:48,010 --> 00:18:51,200 certainly, to managers and folks who, why is the website down? 438 00:18:51,200 --> 00:18:53,802 Well, someone took down their third-party library. 439 00:18:53,802 --> 00:18:56,510 This is not, like, a great threat to software development per se. 440 00:18:56,510 --> 00:18:59,500 But it is sort of a side effect of very popular trends 441 00:18:59,500 --> 00:19:03,580 and paradigms in engineering-- having very distributed approaches 442 00:19:03,580 --> 00:19:07,380 to building your software, but you introduce a lot of what we would call, 443 00:19:07,380 --> 00:19:09,110 more generally, single points of failure. 444 00:19:09,110 --> 00:19:11,050 Like if GitHub goes down, you go down. 445 00:19:11,050 --> 00:19:14,100 If Amazon Web Services go down, if you haven't engineered around this, 446 00:19:14,100 --> 00:19:15,700 you go down, as well. 447 00:19:15,700 --> 00:19:19,860 And so that's what's both exciting and sort of risky about the cloud, 448 00:19:19,860 --> 00:19:23,410 is if you don't necessarily understand what building blocks exist 449 00:19:23,410 --> 00:19:27,220 and how you can assemble all of those together. 450 00:19:27,220 --> 00:19:30,200 So let's come back to one final question, but first Vanessa. 451 00:19:30,200 --> 00:19:32,164 AUDIENCE: So what would be a best practice? 452 00:19:32,164 --> 00:19:35,110 'Cause I know engineers I've worked with don't want 453 00:19:35,110 --> 00:19:36,583 to create dependency in their code. 454 00:19:36,583 --> 00:19:39,038 So they would do exactly [INAUDIBLE]. 455 00:19:39,038 --> 00:19:42,730 456 00:19:42,730 --> 00:19:44,320 DAVID MALAN: Yeah. 457 00:19:44,320 --> 00:19:46,490 It kind of depends on cost and convenience. 458 00:19:46,490 --> 00:19:49,560 Like the reality is it is just-- especially 459 00:19:49,560 --> 00:19:54,360 for a young startup, where you really want 460 00:19:54,360 --> 00:19:57,570 to have high returns quickly from the limited resources and labor 461 00:19:57,570 --> 00:19:58,390 that you have. 462 00:19:58,390 --> 00:20:02,830 You don't necessarily want your humans spending a day, a weekend, a week 463 00:20:02,830 --> 00:20:05,290 sort of setting up a centralized code repository 464 00:20:05,290 --> 00:20:07,600 and all of the sort of configuration required for that. 465 00:20:07,600 --> 00:20:11,960 You don't necessarily want them to have to set up their own servers locally 466 00:20:11,960 --> 00:20:16,200 because that could take, like, a week or a month to buy and a week to configure. 467 00:20:16,200 --> 00:20:19,140 And so it's kind of this judgment call, whereby, yes, 468 00:20:19,140 --> 00:20:22,570 those would be better in terms of security and robustness and uptime. 469 00:20:22,570 --> 00:20:26,410 But it's going to cost us a month or two of labor or effort by that person. 470 00:20:26,410 --> 00:20:29,300 And so now we're two months behind where we want to be. 471 00:20:29,300 --> 00:20:32,540 So I would say it is not uncommon to do this. 472 00:20:32,540 --> 00:20:37,020 For startups, it's probably fine because you're young enough and small enough 473 00:20:37,020 --> 00:20:39,790 that if you go offline, it's not great, but you're not 474 00:20:39,790 --> 00:20:44,850 going to be losing millions of dollars a day as a big fish like Amazon might. 475 00:20:44,850 --> 00:20:48,980 So it kind of depends on what the cost benefit ratio is. 476 00:20:48,980 --> 00:20:51,004 And only you and they could determine that. 477 00:20:51,004 --> 00:20:52,670 I would say it's very common to do this. 478 00:20:52,670 --> 00:20:55,720 It is not hard to add your dependencies to your own repository. 479 00:20:55,720 --> 00:20:57,690 And this is perhaps a stupid trend. 480 00:20:57,690 --> 00:21:00,177 So I would just do that because it's really no cost. 481 00:21:00,177 --> 00:21:03,260 But then there's other issues here that we'll start to explore in a moment 482 00:21:03,260 --> 00:21:06,520 because you can really go overboard when it comes to redundancy and planning 483 00:21:06,520 --> 00:21:07,380 for the worst. 484 00:21:07,380 --> 00:21:11,440 And if there's only a, like, 0.001% chance of your website going offline, 485 00:21:11,440 --> 00:21:15,660 do you really want to spend 10,000 times more to avoid that threat? 486 00:21:15,660 --> 00:21:18,680 So it depends on what the expected cost is, as well. 487 00:21:18,680 --> 00:21:21,800 So we'll come to those decisions in just a moment. 488 00:21:21,800 --> 00:21:24,480 So one final question-- what else has spurred forward 489 00:21:24,480 --> 00:21:30,200 the popularity of cloud computing besides the sort of benefits 490 00:21:30,200 --> 00:21:31,810 to users and companies? 491 00:21:31,810 --> 00:21:34,920 What technologically has made this, perhaps, all the more of a thing? 492 00:21:34,920 --> 00:21:35,580 Ave? 493 00:21:35,580 --> 00:21:38,935 AUDIENCE: We're so reliant on [INAUDIBLE]. 494 00:21:38,935 --> 00:21:41,420 495 00:21:41,420 --> 00:21:42,170 DAVID MALAN: Yeah. 496 00:21:42,170 --> 00:21:43,050 So this is a biggie. 497 00:21:43,050 --> 00:21:46,010 I mean, I alluded earlier to this verbal list 498 00:21:46,010 --> 00:21:49,580 of, like, power and cooling and physical space, 499 00:21:49,580 --> 00:21:52,050 not to mention the money required to procure servers. 500 00:21:52,050 --> 00:21:54,737 And back in the day-- it was only, like, 10 or so years ago-- 501 00:21:54,737 --> 00:21:56,695 I still remember doing this consulting gig once 502 00:21:56,695 --> 00:21:59,350 where we bought a whole lot of hardware because we wanted 503 00:21:59,350 --> 00:22:01,570 to run our own servers and run them in a data center 504 00:22:01,570 --> 00:22:03,270 where we were renting space. 505 00:22:03,270 --> 00:22:05,077 And maybe the first time around, it was fun 506 00:22:05,077 --> 00:22:07,410 to kind of crawl around on the floor and wire everything 507 00:22:07,410 --> 00:22:10,230 together and make sure that all of the individual servers 508 00:22:10,230 --> 00:22:13,450 had multiple hard drives for redundancy and multiple power supplies 509 00:22:13,450 --> 00:22:15,740 for redundancy and think through all of this. 510 00:22:15,740 --> 00:22:19,510 But once you've kind of done that once and spent for that much redundancy only 511 00:22:19,510 --> 00:22:22,110 to find that, well, occasionally your usage is here. 512 00:22:22,110 --> 00:22:23,230 Maybe it's over here. 513 00:22:23,230 --> 00:22:25,240 But you sort of have to pay for up here. 514 00:22:25,240 --> 00:22:26,500 It's not all that compelling. 515 00:22:26,500 --> 00:22:28,660 And it's also a huge amount of work that doesn't 516 00:22:28,660 --> 00:22:31,916 need to be done by you and your more limited team. 517 00:22:31,916 --> 00:22:33,040 So that's certainly driven. 518 00:22:33,040 --> 00:22:35,600 519 00:22:35,600 --> 00:22:36,960 Not to mention lack of space. 520 00:22:36,960 --> 00:22:38,793 Like at Harvard, we started using the cloud, 521 00:22:38,793 --> 00:22:41,240 in part, because we, for our team-- we had no space. 522 00:22:41,240 --> 00:22:42,480 We had no cooling. 523 00:22:42,480 --> 00:22:44,630 We kind of didn't really have power. 524 00:22:44,630 --> 00:22:49,056 So we really had no other options other than putting it under someone's desk. 525 00:22:49,056 --> 00:22:53,030 AUDIENCE: I was gonna say one other [INAUDIBLE] server 526 00:22:53,030 --> 00:22:55,100 and storage technology that makes it actually 527 00:22:55,100 --> 00:22:58,036 cost effective for these companies to do this. 528 00:22:58,036 --> 00:23:00,576 Where before, they could only do it for themselves. 529 00:23:00,576 --> 00:23:02,950 DAVID MALAN: That's what's really helped technologically. 530 00:23:02,950 --> 00:23:05,491 If you've heard of Moore's law, whose definition kind of gets 531 00:23:05,491 --> 00:23:07,610 tweaked every few years-- but it generally 532 00:23:07,610 --> 00:23:11,120 says that the number of transistors on a CPU 533 00:23:11,120 --> 00:23:13,870 doubles every 18 months or 12 months or 24 months, 534 00:23:13,870 --> 00:23:16,640 depending on when you've looked at the definition. 535 00:23:16,640 --> 00:23:18,890 But it essentially says that technological trends 536 00:23:18,890 --> 00:23:21,024 double every year, give or take, which means 537 00:23:21,024 --> 00:23:23,940 you have twice as many transistors inside of your computer every year. 538 00:23:23,940 --> 00:23:27,380 You have twice as much storage space for the same amount of money every year. 539 00:23:27,380 --> 00:23:31,470 You have twice as much CPU speed or cores, so 540 00:23:31,470 --> 00:23:33,859 to speak, inside of your computer every year. 541 00:23:33,859 --> 00:23:34,900 So there's this doubling. 542 00:23:34,900 --> 00:23:37,066 And if you think about a doubling, it's the opposite 543 00:23:37,066 --> 00:23:40,150 of the logarithmic curve we saw earlier, which still rises, 544 00:23:40,150 --> 00:23:41,960 but ever more slowly. 545 00:23:41,960 --> 00:23:45,640 Something like Moore's law is more like a hockey stick, where 546 00:23:45,640 --> 00:23:48,277 we're kind of more on this side nowadays, 547 00:23:48,277 --> 00:23:51,360 where the returns of having things double and double and double and double 548 00:23:51,360 --> 00:23:54,710 have really started to yield some exciting returns, so much 549 00:23:54,710 --> 00:24:00,070 so that this Mac here-- let's see, About This Mac. 550 00:24:00,070 --> 00:24:03,790 This is three gigahertz, running an Intel Core i7, 551 00:24:03,790 --> 00:24:06,570 which is a type of CPU, 16 gigabytes of RAM. 552 00:24:06,570 --> 00:24:09,740 So this means in terms of processor, CPU, speed, my computer 553 00:24:09,740 --> 00:24:13,610 can do 3.1 billion things per second. 554 00:24:13,610 --> 00:24:16,890 What is the limiting factor, then, in using a computer? 555 00:24:16,890 --> 00:24:21,230 I can only check my email so fast or reload Facebook so quickly. 556 00:24:21,230 --> 00:24:24,970 The human is by far the slowest piece of equipment standing 557 00:24:24,970 --> 00:24:26,190 in front of this laptop. 558 00:24:26,190 --> 00:24:29,520 And so we're at the point even with desktop or laptop computers 559 00:24:29,520 --> 00:24:33,410 that we have far more resources than even we humans know what to do with. 560 00:24:33,410 --> 00:24:37,140 And servers, by contrast, will have not just one or two CPUS. 561 00:24:37,140 --> 00:24:41,120 They might have 16 or 32 or 64 brains inside of them. 562 00:24:41,120 --> 00:24:44,520 They might have tens of gigabytes or hundreds of gigabytes of RAM 563 00:24:44,520 --> 00:24:45,780 that those CPUs can use. 564 00:24:45,780 --> 00:24:48,420 They might have terabytes and terabytes of space to use. 565 00:24:48,420 --> 00:24:51,964 And it's sort of more than individuals might necessarily need. 566 00:24:51,964 --> 00:24:53,880 You have so much more hardware and performance 567 00:24:53,880 --> 00:24:56,120 packed into such a small package that it would 568 00:24:56,120 --> 00:24:59,540 be nice to amortize the costs over multiple users. 569 00:24:59,540 --> 00:25:03,050 But at the same time, I don't want my intellectual property and my code 570 00:25:03,050 --> 00:25:08,650 and my data sitting alongside Nicholas's data an Ave's data and Sarah's data. 571 00:25:08,650 --> 00:25:14,660 I want at least my own user accounts and administrative privileges. 572 00:25:14,660 --> 00:25:18,100 I want some kind of barrier between my data and their data. 573 00:25:18,100 --> 00:25:22,100 And so the way that has-- what's really popularized this of late 574 00:25:22,100 --> 00:25:24,990 has been virtualization or virtual machines. 575 00:25:24,990 --> 00:25:28,010 And this is a diagram drawn from a Docker's website, which 576 00:25:28,010 --> 00:25:30,470 is an even newer incarnation of this general idea. 577 00:25:30,470 --> 00:25:32,470 But if you're unfamiliar with virtualization, 578 00:25:32,470 --> 00:25:36,880 the user-facing feature that it provides is 579 00:25:36,880 --> 00:25:40,080 it allows you to run one operating system on top of another. 580 00:25:40,080 --> 00:25:42,130 So if you're running Mac OS, you can have Windows 581 00:25:42,130 --> 00:25:44,450 running in a window on your computer. 582 00:25:44,450 --> 00:25:48,044 And conversely, if you run Windows, you can, in theory, 583 00:25:48,044 --> 00:25:49,710 run Mac OS in a window on your computer. 584 00:25:49,710 --> 00:25:52,290 But Apple doesn't like to let people do this, so it's hard. 585 00:25:52,290 --> 00:25:54,360 But you can run Linux or Unix-- these are 586 00:25:54,360 --> 00:25:56,980 other operating systems-- on top of Mac OS or on top 587 00:25:56,980 --> 00:25:59,710 of Windows, again, sort of visually within a window. 588 00:25:59,710 --> 00:26:05,010 But what that means is you are virtualizing one operating system 589 00:26:05,010 --> 00:26:08,240 and one computer, and using one computer to pretend that it can actually 590 00:26:08,240 --> 00:26:09,310 support multiple ones. 591 00:26:09,310 --> 00:26:11,035 So pictorially, you might have this. 592 00:26:11,035 --> 00:26:14,240 So infrastructure is just referring to your Mac or PC in this story. 593 00:26:14,240 --> 00:26:16,410 Host operating system's going to a Mac OS or Windows 594 00:26:16,410 --> 00:26:17,576 for most people in the room. 595 00:26:17,576 --> 00:26:21,620 Hypervisor is the fancy name given to a virtual machine monitor. 596 00:26:21,620 --> 00:26:26,170 It's virtualization software, like VMware Fusion, VMware Workstation, 597 00:26:26,170 --> 00:26:28,730 VMware Player-- suffice it to say, VMware 598 00:26:28,730 --> 00:26:34,650 is a big company in this space-- Oracle VirtualBox, Microsoft Virtual PC. 599 00:26:34,650 --> 00:26:38,350 And there's a few other-- something, the company's name might be Parallels. 600 00:26:38,350 --> 00:26:39,242 Parallels for Mac OS. 601 00:26:39,242 --> 00:26:41,450 There's a lot of different software that can do this. 602 00:26:41,450 --> 00:26:44,350 And as this picture suggests, it runs on top of the operating system. 603 00:26:44,350 --> 00:26:46,516 So it's just a program running on Mac OS or Windows. 604 00:26:46,516 --> 00:26:50,240 But then as these three towers suggest, what hypervisor 605 00:26:50,240 --> 00:26:53,930 does for you is it lets you run as many as three different operating 606 00:26:53,930 --> 00:26:56,690 systems, even more, on top of your own. 607 00:26:56,690 --> 00:26:59,110 And you can think of it as being in separate windows. 608 00:26:59,110 --> 00:27:03,580 So now that this is possible, if I might go out and rent, effectively, 609 00:27:03,580 --> 00:27:07,640 in the cloud a really big server with way more disk space, way more RAM, 610 00:27:07,640 --> 00:27:11,800 way more CPU cycles than I need for my little business, well, you know what? 611 00:27:11,800 --> 00:27:15,070 I could chop this up, effectively, for Nicholas, Ave, Sarah, 612 00:27:15,070 --> 00:27:19,069 and myself so that each of us can run our own operating system-- 613 00:27:19,069 --> 00:27:20,610 different operating systems, no less. 614 00:27:20,610 --> 00:27:22,651 We can each have our own usernames and passwords. 615 00:27:22,651 --> 00:27:25,850 All of our data and code can be isolated from everyone else's. 616 00:27:25,850 --> 00:27:31,200 Now, whoever owns that machine, in theory, could access all of our work 617 00:27:31,200 --> 00:27:33,420 because by having physical access. 618 00:27:33,420 --> 00:27:38,067 But at least Nicholas, Sarah, and Ave are compartmentalized, as am I, 619 00:27:38,067 --> 00:27:39,900 so that no one else can get at our own data. 620 00:27:39,900 --> 00:27:42,840 And so one of the reasons that we have virtualization so trendy 621 00:27:42,840 --> 00:27:46,884 these days is we just have almost more CPUs and more space and more memory 622 00:27:46,884 --> 00:27:49,550 than we even know what to do with, at least within the footprint 623 00:27:49,550 --> 00:27:51,024 of a single machine. 624 00:27:51,024 --> 00:27:52,690 So that too, has spurred things forward. 625 00:27:52,690 --> 00:27:55,820 Now, as an aside, there's another technology-- no break yet. 626 00:27:55,820 --> 00:27:59,463 There's another technology that alluded to a moment ago called containerization 627 00:27:59,463 --> 00:28:05,350 which is, if you've not heard the term, am even lighter-weight version of this, 628 00:28:05,350 --> 00:28:09,890 whereby containers are similar in spirit to virtual machines but can be started 629 00:28:09,890 --> 00:28:15,050 and can be booted much faster than full-fledged virtual machines. 630 00:28:15,050 --> 00:28:16,860 We'll have more on those another time. 631 00:28:16,860 --> 00:28:17,640 Yeah, Anessa. 632 00:28:17,640 --> 00:28:20,416 AUDIENCE: So I know at least the team that I 633 00:28:20,416 --> 00:28:24,010 worked with [INAUDIBLE] containerization is the thing right now. 634 00:28:24,010 --> 00:28:25,970 And they're even building [INAUDIBLE]. 635 00:28:25,970 --> 00:28:29,890 636 00:28:29,890 --> 00:28:34,136 What are some of the-- I just want to get 637 00:28:34,136 --> 00:28:38,750 a better understanding of the values and the risks of containerization. 638 00:28:38,750 --> 00:28:39,570 DAVID MALAN: Sure. 639 00:28:39,570 --> 00:28:40,580 So big fan. 640 00:28:40,580 --> 00:28:47,160 In fact, I and our team are in the process of containerizing everything 641 00:28:47,160 --> 00:28:48,540 that we do right now. 642 00:28:48,540 --> 00:28:49,890 So big fan. 643 00:28:49,890 --> 00:28:52,570 Let me see, what is Docker? 644 00:28:52,570 --> 00:28:55,230 So Docker is sort of the de facto standard right 645 00:28:55,230 --> 00:28:57,510 now, though there's variations of this idea. 646 00:28:57,510 --> 00:29:00,545 And the picture I showed is actually from their own comparison. 647 00:29:00,545 --> 00:29:01,920 Oh, they seem to have changed it. 648 00:29:01,920 --> 00:29:03,450 Now they've changed it to blue. 649 00:29:03,450 --> 00:29:06,400 But here is kind of a side-by-side comparison of the two ideas. 650 00:29:06,400 --> 00:29:10,450 So on the left is virtualization, a sort of two-dimensional version 651 00:29:10,450 --> 00:29:11,780 of what we just saw in blue. 652 00:29:11,780 --> 00:29:14,342 And on the right is containerization. 653 00:29:14,342 --> 00:29:16,980 So one of the takeaways the picture is meant to convey 654 00:29:16,980 --> 00:29:19,927 is look how much lighter-weight Docker is on the right-hand side. 655 00:29:19,927 --> 00:29:21,260 There's just less clutter there. 656 00:29:21,260 --> 00:29:23,010 But that's kind of true. 657 00:29:23,010 --> 00:29:27,160 Containerization does the following-- or rather, virtualization 658 00:29:27,160 --> 00:29:30,390 has you running one base operating system and hypervisor on top of it, 659 00:29:30,390 --> 00:29:36,480 and then multiple copies of some other OS or OSes on top of those. 660 00:29:36,480 --> 00:29:40,800 Containerization has you run one operating system 661 00:29:40,800 --> 00:29:44,670 that all of your so-called containers share access to. 662 00:29:44,670 --> 00:29:49,330 So you install one operating system underneath it all. 663 00:29:49,330 --> 00:29:52,450 And then all of your containers share some other operating system 664 00:29:52,450 --> 00:29:53,850 of your choice. 665 00:29:53,850 --> 00:29:56,820 So that's already reducing from three down to just one operating 666 00:29:56,820 --> 00:29:57,930 system, for instance. 667 00:29:57,930 --> 00:30:03,260 Moreover, containerization tends to use a technique called union file system. 668 00:30:03,260 --> 00:30:07,187 A file system is just the fancy term for the way in which you store data 669 00:30:07,187 --> 00:30:09,520 on your hard drives and solid state drives and so forth. 670 00:30:09,520 --> 00:30:11,230 A union file system gives you the ability 671 00:30:11,230 --> 00:30:15,350 to layer things so that, for instance, the owner of this machine 672 00:30:15,350 --> 00:30:17,620 would install some base layer of software-- 673 00:30:17,620 --> 00:30:19,330 like, only the minimal amount of software 674 00:30:19,330 --> 00:30:21,060 necessary to boot the computer. 675 00:30:21,060 --> 00:30:24,040 But then Anessa, you and your team might need-- 676 00:30:24,040 --> 00:30:26,220 you might be writing your product in Python. 677 00:30:26,220 --> 00:30:28,880 So you need certain Python software and certain libraries. 678 00:30:28,880 --> 00:30:31,110 I, by contrast, might be writing my site in PHP. 679 00:30:31,110 --> 00:30:32,370 I don't need that layer. 680 00:30:32,370 --> 00:30:34,470 I need this layer of software. 681 00:30:34,470 --> 00:30:36,660 And what containerization allows you to do 682 00:30:36,660 --> 00:30:39,390 is all share everything that's down here, 683 00:30:39,390 --> 00:30:43,430 but only optionally add these layers such that only you see your layer, 684 00:30:43,430 --> 00:30:44,800 only I see my layer. 685 00:30:44,800 --> 00:30:46,870 But we share enough of the common resources 686 00:30:46,870 --> 00:30:50,170 that we can do more work on that machine per unit of time 687 00:30:50,170 --> 00:30:53,530 because we're not trying to run one, two three separate operating systems. 688 00:30:53,530 --> 00:30:55,959 We're really just running one at that layer. 689 00:30:55,959 --> 00:30:57,000 So that's the gist of it. 690 00:30:57,000 --> 00:30:59,260 And I would say the risks and the downsides 691 00:30:59,260 --> 00:31:02,320 are it's just so bleeding edge, still. 692 00:31:02,320 --> 00:31:03,545 I mean, it's very popular. 693 00:31:03,545 --> 00:31:07,040 I just came back from the Dockercon, the Docker conference in Seattle 694 00:31:07,040 --> 00:31:08,010 a few weeks ago. 695 00:31:08,010 --> 00:31:10,109 And there were a couple thousand people there. 696 00:31:10,109 --> 00:31:12,150 It was apparently doubled in size from last year. 697 00:31:12,150 --> 00:31:14,200 So containerization is all the rage. 698 00:31:14,200 --> 00:31:19,230 But the result of which is even on my own computer-- you can see Docker 699 00:31:19,230 --> 00:31:21,510 is installed on my computer. 700 00:31:21,510 --> 00:31:23,990 Actually, you can see the version number there. 701 00:31:23,990 --> 00:31:28,740 I am running version 1.12.0, release candidate three, beta 18, 702 00:31:28,740 --> 00:31:31,830 which means this is the 18th beta version or test 703 00:31:31,830 --> 00:31:33,040 version of the software. 704 00:31:33,040 --> 00:31:34,200 So stuff breaks. 705 00:31:34,200 --> 00:31:36,060 And bleeding edge can be painful. 706 00:31:36,060 --> 00:31:38,570 So the upside, though, on the other hand, 707 00:31:38,570 --> 00:31:41,220 is that Amazon, Google, Microsoft and others are all 708 00:31:41,220 --> 00:31:42,940 starting to support this. 709 00:31:42,940 --> 00:31:46,050 And what's nice is that it's a nice commoditization 710 00:31:46,050 --> 00:31:47,640 of what have been cloud providers. 711 00:31:47,640 --> 00:31:49,699 For many years, you would have to write your code 712 00:31:49,699 --> 00:31:51,490 and build your product in a way that's very 713 00:31:51,490 --> 00:31:54,240 specific to Google or Microsoft or Amazon 714 00:31:54,240 --> 00:31:56,050 or any number of third-party companies. 715 00:31:56,050 --> 00:31:57,750 And it's great for them. 716 00:31:57,750 --> 00:31:58,877 You're kind of bought in. 717 00:31:58,877 --> 00:32:01,710 But it's not great for you if you want to jump ship or change or use 718 00:32:01,710 --> 00:32:03,150 multiple cloud providers. 719 00:32:03,150 --> 00:32:05,650 So containerization is nice, popularized by Docker, 720 00:32:05,650 --> 00:32:07,880 the sort of leading player in this, in that it 721 00:32:07,880 --> 00:32:11,720 allows you to abstract away-- perfect tie in to earlier-- 722 00:32:11,720 --> 00:32:13,230 what it means to be the cloud. 723 00:32:13,230 --> 00:32:15,260 And you write your software for Docker, and you 724 00:32:15,260 --> 00:32:17,176 don't have to care if it's ending up on Google 725 00:32:17,176 --> 00:32:19,780 or Amazon or Microsoft or the like. 726 00:32:19,780 --> 00:32:21,081 So it's great in that regard. 727 00:32:21,081 --> 00:32:22,830 So you shouldn't have any regrets, but you 728 00:32:22,830 --> 00:32:25,200 should realize that maybe with higher probability, 729 00:32:25,200 --> 00:32:30,220 you'll run into technical headaches versus other technologies. 730 00:32:30,220 --> 00:32:31,460 Really good question. 731 00:32:31,460 --> 00:32:32,040 All right. 732 00:32:32,040 --> 00:32:38,621 So if we now have the ability to have all of these various-- 733 00:32:38,621 --> 00:32:40,870 if we have the ability to run so many different things 734 00:32:40,870 --> 00:32:43,660 all on the same hardware, that means that no longer 735 00:32:43,660 --> 00:32:46,040 do we have to have just one server for our website. 736 00:32:46,040 --> 00:32:48,140 And indeed, this was inevitable because if you 737 00:32:48,140 --> 00:32:51,430 have a server that can only handle so many users per second or per day, 738 00:32:51,430 --> 00:32:53,270 surely once you're popular enough, you're 739 00:32:53,270 --> 00:32:56,450 going to need more hardware to support more users. 740 00:32:56,450 --> 00:32:59,620 So let's consider what starts to happen when we do that. 741 00:32:59,620 --> 00:33:07,430 So if I have just one little server here, called my www web server, 742 00:33:07,430 --> 00:33:13,520 per our conversation before lunch, what does that web server 743 00:33:13,520 --> 00:33:15,871 need to have in order to work on the internet? 744 00:33:15,871 --> 00:33:16,870 AUDIENCE: An IP address. 745 00:33:16,870 --> 00:33:17,995 DAVID MALAN: An IP address. 746 00:33:17,995 --> 00:33:20,210 So it has to have an IP address. 747 00:33:20,210 --> 00:33:25,436 And it has to have a DNS entry so that if I type in www.something.com, 748 00:33:25,436 --> 00:33:28,200 the servers in the world can convert that to an IP address, 749 00:33:28,200 --> 00:33:31,760 and Macs and PCs and everyone can find this on the internet. 750 00:33:31,760 --> 00:33:34,360 And we'll just abstract away the internet as a cloud 751 00:33:34,360 --> 00:33:37,104 so that the server is somehow connected to the internet. 752 00:33:37,104 --> 00:33:37,770 So that's great. 753 00:33:37,770 --> 00:33:40,228 The world is nice and simple when you just have one server. 754 00:33:40,228 --> 00:33:43,000 Now, suppose I want to store data for my website. 755 00:33:43,000 --> 00:33:44,120 Users are registering. 756 00:33:44,120 --> 00:33:45,406 Users are buying things. 757 00:33:45,406 --> 00:33:46,780 I want to store that information. 758 00:33:46,780 --> 00:33:52,060 And where, of course, is data like that usually stored, if generally familiar? 759 00:33:52,060 --> 00:33:54,297 What kind of technology do you use to store data? 760 00:33:54,297 --> 00:33:55,130 AUDIENCE: A databse. 761 00:33:55,130 --> 00:33:56,130 DAVID MALAN: A database. 762 00:33:56,130 --> 00:33:57,570 Yeah, so a database. 763 00:33:57,570 --> 00:33:59,692 You could just store it in files. 764 00:33:59,692 --> 00:34:02,400 You could just save text files every time someone buys something, 765 00:34:02,400 --> 00:34:03,210 and that works. 766 00:34:03,210 --> 00:34:04,440 But it's not very scalable. 767 00:34:04,440 --> 00:34:05,481 It's not very searchable. 768 00:34:05,481 --> 00:34:12,270 So databases are products like Microsoft Access is a tiny version. 769 00:34:12,270 --> 00:34:15,230 Microsoft's SQL Server is a bigger one. 770 00:34:15,230 --> 00:34:17,009 Oracle is a behemoth. 771 00:34:17,009 --> 00:34:17,800 There's PostgreSQL. 772 00:34:17,800 --> 00:34:20,860 773 00:34:20,860 --> 00:34:25,362 There's MySQL and bunches of others. 774 00:34:25,362 --> 00:34:27,070 But at the end of the day-- and actually, 775 00:34:27,070 --> 00:34:29,580 these are only the relational databases. 776 00:34:29,580 --> 00:34:31,055 There's also things like MongoDB. 777 00:34:31,055 --> 00:34:34,170 778 00:34:34,170 --> 00:34:37,814 There's Redis for certain applications, though not necessarily as persistent, 779 00:34:37,814 --> 00:34:38,980 and bunches of others still. 780 00:34:38,980 --> 00:34:42,400 And those are object-oriented databases or document stores. 781 00:34:42,400 --> 00:34:45,280 But there's just a long list of ways of storing your data. 782 00:34:45,280 --> 00:34:47,989 And generally what all of these things provide, a database, 783 00:34:47,989 --> 00:34:52,429 is a way to save information, delete information, update information, 784 00:34:52,429 --> 00:34:55,489 and search for information. 785 00:34:55,489 --> 00:34:59,140 And the last one is the really juicy detail because especially as you're big 786 00:34:59,140 --> 00:35:01,840 and you're popular-- and to your analytics comment earlier, 787 00:35:01,840 --> 00:35:04,874 it'd be nice if you could actually select and search over data quickly 788 00:35:04,874 --> 00:35:06,290 so as to get answers more quickly. 789 00:35:06,290 --> 00:35:07,620 And that's what databases do. 790 00:35:07,620 --> 00:35:10,360 Oracle's intellectual property is sort of the secret sauce that 791 00:35:10,360 --> 00:35:12,360 helps you find your data fast, and same with all 792 00:35:12,360 --> 00:35:15,910 of these products, doing it better, for instance, than the competitor. 793 00:35:15,910 --> 00:35:21,610 So with that said, you could run not only web server software 794 00:35:21,610 --> 00:35:23,827 and a database on one physical server. 795 00:35:23,827 --> 00:35:26,660 In fact, super common, especially for startups or someone who's just 796 00:35:26,660 --> 00:35:28,950 got a test server under his or her desk. 797 00:35:28,950 --> 00:35:34,850 You just run all of these same servers on the same device. 798 00:35:34,850 --> 00:35:39,810 And among the servers you might run, you might have-- so these are databases. 799 00:35:39,810 --> 00:35:43,390 Let me keep this all together. 800 00:35:43,390 --> 00:35:45,790 These are database technologies. 801 00:35:45,790 --> 00:35:50,930 And on the other hand, we might have web servers like Microsoft IIS-- Internet 802 00:35:50,930 --> 00:35:52,990 Information Server. 803 00:35:52,990 --> 00:35:57,810 Apache is a very popular web server for Linux and other operating systems. 804 00:35:57,810 --> 00:36:01,910 There's NGINX, which is also very popular, and bunches of others. 805 00:36:01,910 --> 00:36:04,140 So this is web server software. 806 00:36:04,140 --> 00:36:09,080 This is the server software that knows what to do when it receives a request 807 00:36:09,080 --> 00:36:12,280 like GET/HTTP/1.1. 808 00:36:12,280 --> 00:36:16,336 So when we did that quick example earlier when I visited google.com, 809 00:36:16,336 --> 00:36:19,550 they are running something like this on their server. 810 00:36:19,550 --> 00:36:22,230 But if they want to store data because people are buying things 811 00:36:22,230 --> 00:36:24,170 or they're logging information, they probably 812 00:36:24,170 --> 00:36:26,464 need to also run one of these servers. 813 00:36:26,464 --> 00:36:29,630 And a server, even though almost all of us think of it as a physical device, 814 00:36:29,630 --> 00:36:32,120 a server is really just a piece of software. 815 00:36:32,120 --> 00:36:34,250 And you can have multiple servers running 816 00:36:34,250 --> 00:36:37,480 on one physical device, one server. 817 00:36:37,480 --> 00:36:38,230 So it's confusing. 818 00:36:38,230 --> 00:36:40,480 The term means different things in different contexts. 819 00:36:40,480 --> 00:36:43,990 But you can certainly run multiple things on the same server. 820 00:36:43,990 --> 00:36:47,100 In fact, if this server is supposed to send email confirmations when 821 00:36:47,100 --> 00:36:50,100 people check out, this could be an email server, as well. 822 00:36:50,100 --> 00:36:52,780 If they've got built-in chat software for customer service, 823 00:36:52,780 --> 00:36:54,270 it could also be running there. 824 00:36:54,270 --> 00:36:57,228 But at the end of the day, no matter how much work this thing is doing, 825 00:36:57,228 --> 00:36:59,300 it can only do a finite amount of work. 826 00:36:59,300 --> 00:37:04,080 So what starts to break as soon as we need a second server? 827 00:37:04,080 --> 00:37:06,850 So suppose we need to invest in a second server. 828 00:37:06,850 --> 00:37:07,980 We have the money. 829 00:37:07,980 --> 00:37:10,280 We can do so. 830 00:37:10,280 --> 00:37:16,010 What do you do now if this now becomes one, and this becomes www2. 831 00:37:16,010 --> 00:37:18,760 832 00:37:18,760 --> 00:37:20,680 What kinds of questions do you need to ask? 833 00:37:20,680 --> 00:37:23,690 Or what might the engineers need to do to make this work? 834 00:37:23,690 --> 00:37:27,420 And I've deliberately removed the line because now, what gets wired to what? 835 00:37:27,420 --> 00:37:29,440 How does it all work? 836 00:37:29,440 --> 00:37:29,940 Yeah? 837 00:37:29,940 --> 00:37:33,916 AUDIENCE: Could you update www1 has www2 get updated, as well? 838 00:37:33,916 --> 00:37:35,040 DAVID MALAN: Good question. 839 00:37:35,040 --> 00:37:35,840 Updated in what sense? 840 00:37:35,840 --> 00:37:36,506 Like, your code? 841 00:37:36,506 --> 00:37:39,142 AUDIENCE: Yeah, [INAUDIBLE] any server aspect. 842 00:37:39,142 --> 00:37:40,350 DAVID MALAN: Yeah, hopefully. 843 00:37:40,350 --> 00:37:41,430 So there's this wrinkle, right? 844 00:37:41,430 --> 00:37:43,263 If you want to update the servers, you could 845 00:37:43,263 --> 00:37:45,214 try to push the updates simultaneously. 846 00:37:45,214 --> 00:37:46,630 But there could be a slight delay. 847 00:37:46,630 --> 00:37:48,590 So one user might see the old software. 848 00:37:48,590 --> 00:37:52,890 One user might see the new, which doesn't feel great, but is a reality. 849 00:37:52,890 --> 00:37:54,142 What else comes to mind? 850 00:37:54,142 --> 00:37:58,300 851 00:37:58,300 --> 00:37:59,223 Yeah? 852 00:37:59,223 --> 00:38:00,139 AUDIENCE: [INAUDIBLE]. 853 00:38:00,139 --> 00:38:02,040 854 00:38:02,040 --> 00:38:02,790 DAVID MALAN: Yeah. 855 00:38:02,790 --> 00:38:04,480 It's more worrisome with the database. 856 00:38:04,480 --> 00:38:07,490 If I now continue my super simple world where I have a web 857 00:38:07,490 --> 00:38:09,490 server and an email server and a database server 858 00:38:09,490 --> 00:38:15,090 all on the same physical box, what if I happen to log in here, 859 00:38:15,090 --> 00:38:16,420 but Sarah-- sorry. 860 00:38:16,420 --> 00:38:18,140 I happen to end up here. 861 00:38:18,140 --> 00:38:19,380 Sarah ends up here. 862 00:38:19,380 --> 00:38:21,404 Now our data is on separate servers. 863 00:38:21,404 --> 00:38:23,570 And then maybe tomorrow, we visit the website again, 864 00:38:23,570 --> 00:38:26,410 and somehow, Sarah ends up over here, doesn't see her data. 865 00:38:26,410 --> 00:38:27,557 I end up over here. 866 00:38:27,557 --> 00:38:28,390 I don't see my data. 867 00:38:28,390 --> 00:38:30,430 This does not feel like a great design. 868 00:38:30,430 --> 00:38:33,770 So already, our super simple initial design of 869 00:38:33,770 --> 00:38:38,310 assume one server, everything running on it, breaks. 870 00:38:38,310 --> 00:38:39,330 What else might break? 871 00:38:39,330 --> 00:38:41,918 Or what else might we want to consider before we start fixing? 872 00:38:41,918 --> 00:38:48,390 873 00:38:48,390 --> 00:38:51,467 Or if you put on the head of the manager-- so I'm the engineering guy. 874 00:38:51,467 --> 00:38:52,800 I can answer all your questions. 875 00:38:52,800 --> 00:38:57,380 But you have to ask me the technical questions 876 00:38:57,380 --> 00:39:00,300 to get to a place of comfort yourself that this will actually work. 877 00:39:00,300 --> 00:39:02,530 What other questions should spring to mind? 878 00:39:02,530 --> 00:39:03,570 This is your business. 879 00:39:03,570 --> 00:39:06,810 880 00:39:06,810 --> 00:39:07,990 No questions? 881 00:39:07,990 --> 00:39:10,406 Because I will just as soon leave everything disconnected. 882 00:39:10,406 --> 00:39:14,255 AUDIENCE: Yeah, I was gonna say, how do you have two databases-- no matter how 883 00:39:14,255 --> 00:39:18,062 a person logs into our website, how do we make sure their data is intact 884 00:39:18,062 --> 00:39:19,312 no matter where they log into? 885 00:39:19,312 --> 00:39:20,190 DAVID MALAN: Ah. 886 00:39:20,190 --> 00:39:20,690 OK. 887 00:39:20,690 --> 00:39:21,660 Well, I've thought about that. 888 00:39:21,660 --> 00:39:22,160 Don't worry. 889 00:39:22,160 --> 00:39:25,330 We're actually going to have a third server. 890 00:39:25,330 --> 00:39:27,670 It's often drawn as a cylinder like this here. 891 00:39:27,670 --> 00:39:29,490 This will be database. 892 00:39:29,490 --> 00:39:32,190 And these guys are both going to be connected to it. 893 00:39:32,190 --> 00:39:33,880 So I'm now going to have two tiers. 894 00:39:33,880 --> 00:39:35,490 And let me introduce some new jargon. 895 00:39:35,490 --> 00:39:43,150 I would typically call this my front end, here, or my front end. 896 00:39:43,150 --> 00:39:45,430 And this I shall call the back end. 897 00:39:45,430 --> 00:39:47,330 And generally, front end means anything user 898 00:39:47,330 --> 00:39:50,480 facing, that the user's laptop or desktop might somehow talk to. 899 00:39:50,480 --> 00:39:53,260 Back end is something that the servers might only talk to. 900 00:39:53,260 --> 00:39:55,260 The user's not going to be allowed to talk here. 901 00:39:55,260 --> 00:39:55,480 All right? 902 00:39:55,480 --> 00:39:56,130 So I've answered that. 903 00:39:56,130 --> 00:39:58,504 We're going to centralize the database here so that there 904 00:39:58,504 --> 00:40:00,400 is no more data on individual servers. 905 00:40:00,400 --> 00:40:02,480 It's now centralized here. 906 00:40:02,480 --> 00:40:04,320 What other questions have you now? 907 00:40:04,320 --> 00:40:06,540 AUDIENCE: Do we need another DNS entry? 908 00:40:06,540 --> 00:40:09,770 Or how does it-- we have one IP address? 909 00:40:09,770 --> 00:40:13,520 DAVID MALAN: Well, we'll just tell our customers to go to www1.something.com. 910 00:40:13,520 --> 00:40:14,940 Or if it seems busy, go to www2. 911 00:40:14,940 --> 00:40:17,940 912 00:40:17,940 --> 00:40:19,272 So how do we fix that? 913 00:40:19,272 --> 00:40:23,120 914 00:40:23,120 --> 00:40:26,010 AUDIENCE: Both those should go to the same IP address. 915 00:40:26,010 --> 00:40:29,230 DAVID MALAN: Both of those should go to the same-- ideally, yes. 916 00:40:29,230 --> 00:40:31,720 OK, so I-- oh, Anessa, do you want to comment? 917 00:40:31,720 --> 00:40:34,440 AUDIENCE: I mean, you somehow need to be able to run [INAUDIBLE]. 918 00:40:34,440 --> 00:40:39,036 919 00:40:39,036 --> 00:40:40,910 DAVID MALAN: And though, to be clear, I claim 920 00:40:40,910 --> 00:40:44,840 now there is no right server because the database is central. 921 00:40:44,840 --> 00:40:46,720 So now these are commodity. 922 00:40:46,720 --> 00:40:51,064 It doesn't matter which one you end up on so long as it has capacity for you. 923 00:40:51,064 --> 00:40:51,730 AUDIENCE: Right. 924 00:40:51,730 --> 00:40:56,574 So you need to do something to make sure that you're going to one [INAUDIBLE]. 925 00:40:56,574 --> 00:40:57,240 DAVID MALAN: OK. 926 00:40:57,240 --> 00:41:01,780 So what's the simplest way we've seen a company do this so far? 927 00:41:01,780 --> 00:41:03,070 We've only seen one. 928 00:41:03,070 --> 00:41:06,220 What did Yahoo do to balance load across their servers? 929 00:41:06,220 --> 00:41:07,630 AUDIENCE: [INAUDIBLE]. 930 00:41:07,630 --> 00:41:09,130 DAVID MALAN: Yeah, the round robin. 931 00:41:09,130 --> 00:41:10,880 Rotating their traffic via DNS. 932 00:41:10,880 --> 00:41:15,760 So, for instance, if someone's laptop out there on the internet requests 933 00:41:15,760 --> 00:41:20,240 www.something.com or Yahoo, the DNS server, which is not pictured here 934 00:41:20,240 --> 00:41:22,080 but is somewhere-- let's just say, yeah. 935 00:41:22,080 --> 00:41:23,140 We have a DNS server. 936 00:41:23,140 --> 00:41:25,130 It's over here. 937 00:41:25,130 --> 00:41:28,150 And I won't bother drawing the lines because it's kind of-- we'll 938 00:41:28,150 --> 00:41:30,330 just assume it exists. 939 00:41:30,330 --> 00:41:32,510 The first time someone asks for something.com, 940 00:41:32,510 --> 00:41:34,827 I'm going to give them the IP address of this server. 941 00:41:34,827 --> 00:41:37,660 The second time someone asks, I'm going to give them the IP address. 942 00:41:37,660 --> 00:41:40,630 And then this one, and then this, and then da, da, da, and back and forth. 943 00:41:40,630 --> 00:41:41,588 What's good about this? 944 00:41:41,588 --> 00:41:44,520 945 00:41:44,520 --> 00:41:46,850 AUDIENCE: [INAUDIBLE]. 946 00:41:46,850 --> 00:41:48,850 DAVID MALAN: Now one doesn't get too busy, and-- 947 00:41:48,850 --> 00:41:49,580 AUDIENCE: It's pretty simple. 948 00:41:49,580 --> 00:41:51,060 DAVID MALAN: Simple is good, right? 949 00:41:51,060 --> 00:41:53,730 And this is underappreciated, but the easier you 950 00:41:53,730 --> 00:41:57,440 can architect your systems, in theory, the fewer things that might go wrong. 951 00:41:57,440 --> 00:41:59,030 So simple is good. 952 00:41:59,030 --> 00:42:01,750 It's really like one line in a configuration file 953 00:42:01,750 --> 00:42:04,960 on Yahoo's end to implement load balancing in apparently that way, 954 00:42:04,960 --> 00:42:07,210 though, to be fair, they have more than three servers. 955 00:42:07,210 --> 00:42:10,510 So there's a whole other layer of load balancing they're absolutely doing. 956 00:42:10,510 --> 00:42:12,230 So I'm oversimplifying. 957 00:42:12,230 --> 00:42:16,170 What's bad about this approach? 958 00:42:16,170 --> 00:42:17,410 Yeah? 959 00:42:17,410 --> 00:42:21,500 AUDIENCE: It depends what you're having people do when they get those servers. 960 00:42:21,500 --> 00:42:23,916 If people are doing things that are radically out of scale 961 00:42:23,916 --> 00:42:26,776 with one another, [INAUDIBLE]. 962 00:42:26,776 --> 00:42:27,650 DAVID MALAN: Exactly. 963 00:42:27,650 --> 00:42:30,950 Even though with 50% odds, you're going one place or the other, 964 00:42:30,950 --> 00:42:34,790 what if Sarah is spending way more time on the website than I am? 965 00:42:34,790 --> 00:42:38,260 So she's consuming disproportionately more resources. 966 00:42:38,260 --> 00:42:41,030 I might want to send more users to the server I'm on 967 00:42:41,030 --> 00:42:44,480 and avoid sending anyone to her server for some amount of time. 968 00:42:44,480 --> 00:42:45,610 So 50-50. 969 00:42:45,610 --> 00:42:47,319 Overall, given an infinite amount of time 970 00:42:47,319 --> 00:42:50,318 and an infinite number of resources, we'll all just kind of average out. 971 00:42:50,318 --> 00:42:52,190 But in reality, there might be spikiness. 972 00:42:52,190 --> 00:42:54,569 So that might not necessarily be the best way. 973 00:42:54,569 --> 00:42:55,610 So what else could we do? 974 00:42:55,610 --> 00:42:57,840 DNS feels overly simplistic. 975 00:42:57,840 --> 00:42:58,700 Let's not go there. 976 00:42:58,700 --> 00:43:01,540 977 00:43:01,540 --> 00:43:03,874 Some companies, as an aside-- and look for this in life. 978 00:43:03,874 --> 00:43:05,998 It doesn't seem to happen too often, but it usually 979 00:43:05,998 --> 00:43:07,870 happens with kind of bad, bigger companies. 980 00:43:07,870 --> 00:43:11,860 Sometimes, you will see in the URL that you are at, 981 00:43:11,860 --> 00:43:16,570 literally www1.something.com and www.2.something.com. 982 00:43:16,570 --> 00:43:19,160 And this is, frankly, because of moronic technical design 983 00:43:19,160 --> 00:43:23,310 decisions where they are somehow redirecting the user to a different 984 00:43:23,310 --> 00:43:26,130 named server simply to balance load. 985 00:43:26,130 --> 00:43:29,930 And I say moronic partly to be judgmental, technologically, but also 986 00:43:29,930 --> 00:43:32,460 because it's technologically unnecessary, 987 00:43:32,460 --> 00:43:34,660 and it actually has downsides. 988 00:43:34,660 --> 00:43:42,110 Why might you not want to send a user to a name like www1, www2, and so forth? 989 00:43:42,110 --> 00:43:45,074 Why might you regret that decision? 990 00:43:45,074 --> 00:43:47,287 AUDIENCE: Then they might go there themselves. 991 00:43:47,287 --> 00:43:49,120 DAVID MALAN: They might go there themselves. 992 00:43:49,120 --> 00:43:51,490 So I go to www2.something.com. 993 00:43:51,490 --> 00:43:53,620 Why is that bad? 994 00:43:53,620 --> 00:43:55,280 Won't it be there? 995 00:43:55,280 --> 00:43:56,349 AUDIENCE: Maybe. 996 00:43:56,349 --> 00:43:57,140 DAVID MALAN: Maybe. 997 00:43:57,140 --> 00:43:58,454 AUDIENCE: Maybe next time, they want you to go to one. 998 00:43:58,454 --> 00:43:59,880 And now [INAUDIBLE]. 999 00:43:59,880 --> 00:44:00,630 DAVID MALAN: Yeah. 1000 00:44:00,630 --> 00:44:03,900 So if your users maybe bookmark a specific URL, 1001 00:44:03,900 --> 00:44:06,820 and they just kind of out of habit always go back to that bookmark, 1002 00:44:06,820 --> 00:44:09,442 now your whole 50-50 fantasy is kind of out the window, 1003 00:44:09,442 --> 00:44:12,150 unless people bookmark the websites with equal probability, which 1004 00:44:12,150 --> 00:44:12,990 might be the case. 1005 00:44:12,990 --> 00:44:16,730 But in either case, you're sort of losing a bit of control 1006 00:44:16,730 --> 00:44:17,830 over the process. 1007 00:44:17,830 --> 00:44:23,100 What if-- we're only talking about two servers, but what if it was www20? 1008 00:44:23,100 --> 00:44:23,850 And you know what? 1009 00:44:23,850 --> 00:44:27,410 You only need 19 servers nowadays, so you turned off number 20 1010 00:44:27,410 --> 00:44:29,660 or you stopped renting or paying for those resources. 1011 00:44:29,660 --> 00:44:32,709 Now they've bookmarked a dead end, which isn't good. 1012 00:44:32,709 --> 00:44:35,250 And frankly, most users won't have the wherewithal to realize 1013 00:44:35,250 --> 00:44:37,940 when they click on that bookmark or whatnot, why isn't it working? 1014 00:44:37,940 --> 00:44:40,398 They're just going to assume your whole business is offline 1015 00:44:40,398 --> 00:44:41,790 and maybe shop elsewhere. 1016 00:44:41,790 --> 00:44:43,660 So that's not good, either. 1017 00:44:43,660 --> 00:44:45,330 So what could we do to solve this? 1018 00:44:45,330 --> 00:44:48,540 And, in fact, let's not even give these things names 1019 00:44:48,540 --> 00:44:51,690 because if we don't want users knowing about them or seeing them, 1020 00:44:51,690 --> 00:44:55,230 they might as well just have an IP address number one. 1021 00:44:55,230 --> 00:44:56,440 We'll just abstract it away. 1022 00:44:56,440 --> 00:44:58,230 This is IP address number 2. 1023 00:44:58,230 --> 00:45:01,010 But the user doesn't need to know or care what those are. 1024 00:45:01,010 --> 00:45:01,530 Yeah, Daria? 1025 00:45:01,530 --> 00:45:04,400 AUDIENCE: Are those both running the same amount of services? 1026 00:45:04,400 --> 00:45:09,090 Like, you've got an email server on one and, like, the web server software 1027 00:45:09,090 --> 00:45:09,590 on one? 1028 00:45:09,590 --> 00:45:12,940 Each of those IP one and two have everything except for the database? 1029 00:45:12,940 --> 00:45:14,950 DAVID MALAN: At the moment, I was assuming that. 1030 00:45:14,950 --> 00:45:19,190 But as an aside, even if they weren't, turns out with most of the strategies 1031 00:45:19,190 --> 00:45:22,769 we'll figure out, you can weight your percentages a little differently. 1032 00:45:22,769 --> 00:45:24,060 So it doesn't have to be 50-50. 1033 00:45:24,060 --> 00:45:26,310 Could be 75-25, or you can take into account how much. 1034 00:45:26,310 --> 00:45:28,726 AUDIENCE: It's like, can you pull more things out and just 1035 00:45:28,726 --> 00:45:30,356 run them like a database? 1036 00:45:30,356 --> 00:45:31,500 DAVID MALAN: Ah, OK. 1037 00:45:31,500 --> 00:45:33,540 So let's say there is an email server. 1038 00:45:33,540 --> 00:45:36,300 So let's call this the email server. 1039 00:45:36,300 --> 00:45:40,090 Let's factor that out because it was consuming some resources unnecessarily. 1040 00:45:40,090 --> 00:45:41,130 So I like the instincts. 1041 00:45:41,130 --> 00:45:43,730 Unfortunately, you can only do this finitely many times 1042 00:45:43,730 --> 00:45:46,400 until all that's left is the web server on both. 1043 00:45:46,400 --> 00:45:49,099 And even then, if we get more users than we can handle, 1044 00:45:49,099 --> 00:45:50,640 we're just talking about two servers. 1045 00:45:50,640 --> 00:45:52,500 We might need a third or a fourth. 1046 00:45:52,500 --> 00:45:54,590 So we can never quite escape this problem. 1047 00:45:54,590 --> 00:45:58,040 We can just postpone it, which is reasonable. 1048 00:45:58,040 --> 00:45:58,540 Katie? 1049 00:45:58,540 --> 00:46:03,555 AUDIENCE: Is there a way to put a cap on once one server has a certain amount 1050 00:46:03,555 --> 00:46:05,014 of traffic, go to the other server? 1051 00:46:05,014 --> 00:46:06,013 DAVID MALAN: Absolutely. 1052 00:46:06,013 --> 00:46:06,980 We could impose caps. 1053 00:46:06,980 --> 00:46:10,490 But to my same comment here, that still breaks 1054 00:46:10,490 --> 00:46:13,722 as soon as we overload server number one and server number 2. 1055 00:46:13,722 --> 00:46:15,430 So we're still going to need to add more. 1056 00:46:15,430 --> 00:46:19,580 But even then, how do we decide how to route the traffic? 1057 00:46:19,580 --> 00:46:22,330 One idea that doesn't seem to have come yet-- a buzzword, 1058 00:46:22,330 --> 00:46:26,402 too, that we can toss up here is what's called vertical scaling. 1059 00:46:26,402 --> 00:46:29,670 You can throw money at the problem, so to speak. 1060 00:46:29,670 --> 00:46:31,540 So we kind of skipped a step, right? 1061 00:46:31,540 --> 00:46:34,240 Instead of going from-- we went from one server to two 1062 00:46:34,240 --> 00:46:35,990 servers, which was nice because it created 1063 00:46:35,990 --> 00:46:39,090 a lot of possibilities but problems. 1064 00:46:39,090 --> 00:46:41,740 But why don't we just sell the old server, 1065 00:46:41,740 --> 00:46:46,820 buy a bigger, better server, more RAM, more CPUs, more disk space, 1066 00:46:46,820 --> 00:46:50,712 and just literally throw money at the problem, an upside of which 1067 00:46:50,712 --> 00:46:53,420 is this whole conversation we're having now, let's just avoid it. 1068 00:46:53,420 --> 00:46:53,920 Right? 1069 00:46:53,920 --> 00:46:55,452 Let's just get rid of this. 1070 00:46:55,452 --> 00:46:57,070 This is just too hard. 1071 00:46:57,070 --> 00:46:58,220 Too many problems arise. 1072 00:46:58,220 --> 00:47:02,390 Let's just put this as our web server. 1073 00:47:02,390 --> 00:47:03,320 What's the upside? 1074 00:47:03,320 --> 00:47:06,510 1075 00:47:06,510 --> 00:47:07,654 What's an upside? 1076 00:47:07,654 --> 00:47:08,460 AUDIENCE: Simplest. 1077 00:47:08,460 --> 00:47:09,190 DAVID MALAN: Simplest, right? 1078 00:47:09,190 --> 00:47:11,023 I literally didn't have to think about this. 1079 00:47:11,023 --> 00:47:13,210 All I had to do was buy a server, configure it, 1080 00:47:13,210 --> 00:47:14,610 but it's configured identically. 1081 00:47:14,610 --> 00:47:16,330 I just spent more money on it. 1082 00:47:16,330 --> 00:47:18,811 What's the downside, of course? 1083 00:47:18,811 --> 00:47:19,310 Same thing. 1084 00:47:19,310 --> 00:47:20,570 I spent a lot of money on it. 1085 00:47:20,570 --> 00:47:24,076 And, more fundamentally, what's the problem here? 1086 00:47:24,076 --> 00:47:25,829 AUDIENCE: It's not a long-term solution. 1087 00:47:25,829 --> 00:47:27,620 DAVID MALAN: It's not a long-term solution. 1088 00:47:27,620 --> 00:47:30,110 I've postponed the issue, which is reasonable, 1089 00:47:30,110 --> 00:47:32,260 if I've just got to get through some sales cycle 1090 00:47:32,260 --> 00:47:37,790 or somehow get through the holiday season or something like that. 1091 00:47:37,790 --> 00:47:40,740 But there's going to be this ceiling on just how many resources 1092 00:47:40,740 --> 00:47:42,329 you can fit into one machine. 1093 00:47:42,329 --> 00:47:45,370 Typically, especially from companies like Apple and even Dell and others, 1094 00:47:45,370 --> 00:47:48,161 you're going to pay a premium for getting the very top of the line. 1095 00:47:48,161 --> 00:47:49,970 So you're overspending on the hardware. 1096 00:47:49,970 --> 00:47:52,030 And so companies like Google years ago began 1097 00:47:52,030 --> 00:47:55,390 to popularize what has been called horizontal scaling, 1098 00:47:55,390 --> 00:47:59,260 where instead of getting one big, souped-up version of something, 1099 00:47:59,260 --> 00:48:00,880 you get the cheapest version, perhaps. 1100 00:48:00,880 --> 00:48:04,550 You go the other extreme and just get lots and lots of cheaper or medium spec 1101 00:48:04,550 --> 00:48:05,570 devices. 1102 00:48:05,570 --> 00:48:09,390 But unfortunately-- well, fortunately, that's 1103 00:48:09,390 --> 00:48:12,190 great because in theory, it allows you to scale infinitely, 1104 00:48:12,190 --> 00:48:15,280 so long as you have the money and the space and so forth for it. 1105 00:48:15,280 --> 00:48:18,970 But it creates a whole slew of problems. 1106 00:48:18,970 --> 00:48:22,340 So we're kind of back to where we were before. 1107 00:48:22,340 --> 00:48:23,465 So DNS we proposed. 1108 00:48:23,465 --> 00:48:24,790 Eh, it doesn't really cut it. 1109 00:48:24,790 --> 00:48:28,460 It's not smart enough because DNS has no notion of weights or load. 1110 00:48:28,460 --> 00:48:29,610 It has no feedback loop. 1111 00:48:29,610 --> 00:48:34,370 All it does is translate domain names to IP addresses and vice versa. 1112 00:48:34,370 --> 00:48:36,808 So what could we introduce to help solve this problem? 1113 00:48:36,808 --> 00:48:43,660 1114 00:48:43,660 --> 00:48:45,540 The answer's kind of implicit on the board 1115 00:48:45,540 --> 00:48:49,430 because we used a technique twice already now 1116 00:48:49,430 --> 00:48:51,140 that could help us balance load. 1117 00:48:51,140 --> 00:48:53,600 1118 00:48:53,600 --> 00:48:54,100 Yeah. 1119 00:48:54,100 --> 00:48:55,522 AUDIENCE: Could you just have a feedback loop 1120 00:48:55,522 --> 00:48:57,418 so that when you need more service space, you scale up, 1121 00:48:57,418 --> 00:48:59,320 and when you need less, you scale down? 1122 00:48:59,320 --> 00:48:59,660 DAVID MALAN: OK. 1123 00:48:59,660 --> 00:49:01,618 So that'll get us to the point of auto scaling. 1124 00:49:01,618 --> 00:49:04,790 And that'll allow us to add IP address number three and four and five. 1125 00:49:04,790 --> 00:49:06,980 But fundamentally, two is interesting because it's 1126 00:49:06,980 --> 00:49:09,410 representative of an infinite supply of problems now, 1127 00:49:09,410 --> 00:49:11,990 which is what if you have more than one server? 1128 00:49:11,990 --> 00:49:18,310 Question at hand is how do we decide or what pieces of hardware or features 1129 00:49:18,310 --> 00:49:23,150 do we need to add to this story in order to get data from users to server one 1130 00:49:23,150 --> 00:49:25,360 or two or three or four or five or six. 1131 00:49:25,360 --> 00:49:25,860 Yeah? 1132 00:49:25,860 --> 00:49:30,360 AUDIENCE: Could you put-- I don't know-- another server or something on top 1133 00:49:30,360 --> 00:49:32,680 of it that's just directing? 1134 00:49:32,680 --> 00:49:33,430 DAVID MALAN: Yeah. 1135 00:49:33,430 --> 00:49:34,325 In fact it has a wonderful-- 1136 00:49:34,325 --> 00:49:35,130 AUDIENCE: Like a router? 1137 00:49:35,130 --> 00:49:35,980 DAVID MALAN: --word. 1138 00:49:35,980 --> 00:49:38,021 Yeah, it wouldn't technically be called a router, 1139 00:49:38,021 --> 00:49:39,970 though it's similar in spirit. 1140 00:49:39,970 --> 00:49:42,580 Load balancer, which is, in a sense, a router. 1141 00:49:42,580 --> 00:49:44,400 So a lot of these terms are interchangeable 1142 00:49:44,400 --> 00:49:46,275 and more just conventions than anything else. 1143 00:49:46,275 --> 00:49:48,409 I'll call this LB for load balancer. 1144 00:49:48,409 --> 00:49:49,450 And that's exactly right. 1145 00:49:49,450 --> 00:49:50,580 Now let me connect some lines. 1146 00:49:50,580 --> 00:49:51,371 This looks like CB. 1147 00:49:51,371 --> 00:49:54,100 That's LB, load balancer. 1148 00:49:54,100 --> 00:49:58,230 So now, it is on the internet somehow with a public IP address. 1149 00:49:58,230 --> 00:50:00,970 And these two servers have an IP address. 1150 00:50:00,970 --> 00:50:01,720 But you know what? 1151 00:50:01,720 --> 00:50:05,330 I'm going to call this private. 1152 00:50:05,330 --> 00:50:07,920 And this one too will be private. 1153 00:50:07,920 --> 00:50:14,840 This guy needs an IP that's public, which is not unlike our home router. 1154 00:50:14,840 --> 00:50:17,477 So calling it a router is not unreasonable in that regard. 1155 00:50:17,477 --> 00:50:19,310 And what does this load balancer need to do? 1156 00:50:19,310 --> 00:50:22,770 Well, he's got to decide whether to route data to the left or to the right. 1157 00:50:22,770 --> 00:50:25,925 And just to be clear, what might feed into that decision? 1158 00:50:25,925 --> 00:50:30,446 1159 00:50:30,446 --> 00:50:31,649 AUDIENCE: Usage. 1160 00:50:31,649 --> 00:50:32,440 DAVID MALAN: Usage. 1161 00:50:32,440 --> 00:50:36,870 So I'm going to specifically draw these lines as bi-directional arrows. 1162 00:50:36,870 --> 00:50:40,730 So there's some kind of feedback loop, or constantly these servers are saying, 1163 00:50:40,730 --> 00:50:42,110 I'm at 10% capacity. 1164 00:50:42,110 --> 00:50:43,670 I'm at 20% capacity. 1165 00:50:43,670 --> 00:50:45,970 Or I have 1,000 users, or I have no users. 1166 00:50:45,970 --> 00:50:48,140 Whatever the metric is that you care about, 1167 00:50:48,140 --> 00:50:49,624 there could be that feedback loop. 1168 00:50:49,624 --> 00:50:52,040 And then the load balancer could indeed route the traffic. 1169 00:50:52,040 --> 00:50:55,230 And so long as the response that goes back also 1170 00:50:55,230 --> 00:50:57,510 knows to go through the load balancer, it'll 1171 00:50:57,510 --> 00:51:00,670 just kind of work seamlessly, much like our home network. 1172 00:51:00,670 --> 00:51:01,990 So we've fixed that problem. 1173 00:51:01,990 --> 00:51:03,480 I like that. 1174 00:51:03,480 --> 00:51:07,680 What new problems have we created at this point in the whole story? 1175 00:51:07,680 --> 00:51:10,977 1176 00:51:10,977 --> 00:51:12,040 AUDIENCE: Bottleneck? 1177 00:51:12,040 --> 00:51:13,040 DAVID MALAN: Bottleneck? 1178 00:51:13,040 --> 00:51:14,000 Where at? 1179 00:51:14,000 --> 00:51:15,430 AUDIENCE: In the load balancer? 1180 00:51:15,430 --> 00:51:16,320 DAVID MALAN: Yeah. 1181 00:51:16,320 --> 00:51:19,970 This is kind of besides the point, right? 1182 00:51:19,970 --> 00:51:23,530 Like, Grace, haven't you kind of broke-- it's a regression. 1183 00:51:23,530 --> 00:51:25,740 Like we solved our problem of load earlier 1184 00:51:25,740 --> 00:51:27,410 by doubling the number of servers. 1185 00:51:27,410 --> 00:51:31,200 But to get that to work, you've proposed that we go back to one server 1186 00:51:31,200 --> 00:51:35,400 because then it all just kind of works and we somehow flail the traffic 1187 00:51:35,400 --> 00:51:37,900 to the left or to the right. 1188 00:51:37,900 --> 00:51:39,400 So it's not wrong. 1189 00:51:39,400 --> 00:51:40,430 So what's a pushback? 1190 00:51:40,430 --> 00:51:41,890 This is OK, in some sense. 1191 00:51:41,890 --> 00:51:46,160 Why is this OK, even though before it was not OK to just confine ourselves 1192 00:51:46,160 --> 00:51:47,334 to one server? 1193 00:51:47,334 --> 00:51:49,250 AUDIENCE: Because the load balancer's only job 1194 00:51:49,250 --> 00:51:52,070 is to push people to [INAUDIBLE]. 1195 00:51:52,070 --> 00:51:53,580 DAVID MALAN: Exactly. 1196 00:51:53,580 --> 00:51:55,480 That's its sole purpose in life. 1197 00:51:55,480 --> 00:51:57,880 And if it's reasonable to assume, which it 1198 00:51:57,880 --> 00:51:59,827 kind of is, that the web servers probably 1199 00:51:59,827 --> 00:52:01,410 have to do a little more work-- right? 1200 00:52:01,410 --> 00:52:02,700 They have to talk to a database. 1201 00:52:02,700 --> 00:52:03,970 They have to check a user out. 1202 00:52:03,970 --> 00:52:05,470 They might have to trigger emails to be sent. 1203 00:52:05,470 --> 00:52:07,990 It just feels like there's a bunch of work they need to do. 1204 00:52:07,990 --> 00:52:10,490 Load balancer literally, in the dumbest sense, 1205 00:52:10,490 --> 00:52:13,120 needs to just send 50% of traffic here, 50% here. 1206 00:52:13,120 --> 00:52:14,380 But we know we can do better. 1207 00:52:14,380 --> 00:52:16,270 So it needs to have a little bit of sense of metrics. 1208 00:52:16,270 --> 00:52:18,561 But at the end of the day, it's just like a traffic cop 1209 00:52:18,561 --> 00:52:20,000 going this way or that way. 1210 00:52:20,000 --> 00:52:22,390 Intuitively, that feels like a little bit less work. 1211 00:52:22,390 --> 00:52:26,530 And so indeed, you could throw maybe more resources at this one server 1212 00:52:26,530 --> 00:52:28,820 and then get really good economy of scale 1213 00:52:28,820 --> 00:52:31,220 by horizontally scaling your front end tier, 1214 00:52:31,220 --> 00:52:34,490 so to speak, up until some actual threshold. 1215 00:52:34,490 --> 00:52:38,080 And the thresholds are going to be twofold-- whether this is software 1216 00:52:38,080 --> 00:52:41,674 or hardware that you either download or you buy physically, 1217 00:52:41,674 --> 00:52:44,340 either you're going to have one, licensing restrictions, whereby 1218 00:52:44,340 --> 00:52:46,381 whatever company you buy it from is going to say, 1219 00:52:46,381 --> 00:52:49,839 this can handle 10,000 concurrent connections at a time. 1220 00:52:49,839 --> 00:52:52,880 After that, you need to upgrade to our more expensive device or something 1221 00:52:52,880 --> 00:52:53,380 like that. 1222 00:52:53,380 --> 00:52:57,160 Or it could just be technological, like this device only has so much capacity. 1223 00:52:57,160 --> 00:52:59,196 It can only physically handle 10,000 connections 1224 00:52:59,196 --> 00:53:02,320 at a time, after which you're going to need to upgrade to some other device 1225 00:53:02,320 --> 00:53:02,960 altogether. 1226 00:53:02,960 --> 00:53:04,940 So it can be a mix of those. 1227 00:53:04,940 --> 00:53:09,540 So that actually is a nice revelation of the next problem. 1228 00:53:09,540 --> 00:53:13,790 OK, so I can easily spread this load out here to three. 1229 00:53:13,790 --> 00:53:16,380 And I can add in another one over here. 1230 00:53:16,380 --> 00:53:22,760 But what's going to break next, if not my front end web tier, so to speak? 1231 00:53:22,760 --> 00:53:24,210 AUDIENCE: [INAUDIBLE]. 1232 00:53:24,210 --> 00:53:25,710 DAVID MALAN: Database? 1233 00:53:25,710 --> 00:53:28,296 And what makes you think that? 1234 00:53:28,296 --> 00:53:32,330 AUDIENCE: [INAUDIBLE] simultaneous queries. 1235 00:53:32,330 --> 00:53:33,080 DAVID MALAN: Yeah. 1236 00:53:33,080 --> 00:53:37,090 Like, to my concern earlier, we're horizontally 1237 00:53:37,090 --> 00:53:38,810 scaling this to handle more users. 1238 00:53:38,810 --> 00:53:41,400 But we just still are sending all the data to one place. 1239 00:53:41,400 --> 00:53:45,040 And at some point, if we're successful-- and it's a good problem to have-- 1240 00:53:45,040 --> 00:53:47,130 we're going to overwhelm our one database. 1241 00:53:47,130 --> 00:53:49,190 So what might we do there? 1242 00:53:49,190 --> 00:53:52,120 How do we fix that? 1243 00:53:52,120 --> 00:53:53,120 More, all right. 1244 00:53:53,120 --> 00:53:56,700 But now, the problem with doing this to your database 1245 00:53:56,700 --> 00:53:59,190 is that the database, of course, is stateful. 1246 00:53:59,190 --> 00:54:02,610 It actually stores information, otherwise generally known as state. 1247 00:54:02,610 --> 00:54:07,450 The web servers I proposed earlier can be ignorant of state. 1248 00:54:07,450 --> 00:54:10,360 All of their permanent data gets stored on the database. 1249 00:54:10,360 --> 00:54:17,710 So if we do add another database into the picture, like down here, 1250 00:54:17,710 --> 00:54:19,430 where do I put my data? 1251 00:54:19,430 --> 00:54:19,930 Right? 1252 00:54:19,930 --> 00:54:23,530 I could put my data here, Sarah's data here. 1253 00:54:23,530 --> 00:54:28,200 But then we need to make sure that every subsequent request from me goes here 1254 00:54:28,200 --> 00:54:31,930 and from Sarah goes here so that it's consistent. 1255 00:54:31,930 --> 00:54:34,772 Or, of course, she's not going to see her data. 1256 00:54:34,772 --> 00:54:35,980 I'm not going to see my data. 1257 00:54:35,980 --> 00:54:37,313 So how do we solve that problem? 1258 00:54:37,313 --> 00:54:39,354 AUDIENCE: The types of data [INAUDIBLE]. 1259 00:54:39,354 --> 00:54:40,020 DAVID MALAN: OK. 1260 00:54:40,020 --> 00:54:43,730 So we can use a technique that-- let me toss in a buzzword, sharding. 1261 00:54:43,730 --> 00:54:49,190 To shard data means to, using some decision-making process, 1262 00:54:49,190 --> 00:54:52,250 send certain data this way and other data this way. 1263 00:54:52,250 --> 00:54:54,390 And an example we often use here on campus 1264 00:54:54,390 --> 00:54:57,260 is back in the day of Facebook or thefacebook.com, 1265 00:54:57,260 --> 00:55:01,360 when Mark finally started expanding it to other campuses, 1266 00:55:01,360 --> 00:55:04,970 it was harvard.thefacebook.com and mit.thefacebook.com 1267 00:55:04,970 --> 00:55:10,680 and berkeley.thefacebook.com or whatever the additional schools were, 1268 00:55:10,680 --> 00:55:13,860 which was a way of sharding your setup because Harvard people were 1269 00:55:13,860 --> 00:55:17,150 going to one setup and MIT people were going to another setup. 1270 00:55:17,150 --> 00:55:19,840 But, of course, a downside early on, if you remember back then, 1271 00:55:19,840 --> 00:55:22,050 is you couldn't really be friends with people in other networks. 1272 00:55:22,050 --> 00:55:23,341 That was a whole other feature. 1273 00:55:23,341 --> 00:55:25,140 But that was an example of sharding. 1274 00:55:25,140 --> 00:55:29,240 Now, if you just have one website, it's possible and reasonable 1275 00:55:29,240 --> 00:55:32,150 that anyone whose name starts with D might go to the left database. 1276 00:55:32,150 --> 00:55:35,039 Anyone whose name starts with S might go to the second database. 1277 00:55:35,039 --> 00:55:36,830 Or you could divide the alphabet similarly. 1278 00:55:36,830 --> 00:55:39,020 So you could shard based on that. 1279 00:55:39,020 --> 00:55:43,352 What might bite you there, if you're just sharding based on people's names? 1280 00:55:43,352 --> 00:55:44,987 AUDIENCE: Growth. 1281 00:55:44,987 --> 00:55:45,820 DAVID MALAN: Growth? 1282 00:55:45,820 --> 00:55:49,502 You could maybe put A through M on one database server and N 1283 00:55:49,502 --> 00:55:50,460 through Z on the other. 1284 00:55:50,460 --> 00:55:53,180 And that works fine initially, but eventually, you got to split, like, 1285 00:55:53,180 --> 00:55:54,300 the A's through the M's. 1286 00:55:54,300 --> 00:55:57,680 And what about-- then you get down to the A's and the BA's, and then you have 1287 00:55:57,680 --> 00:55:58,380 just A's. 1288 00:55:58,380 --> 00:56:00,570 But one server's not enough for all the A's. 1289 00:56:00,570 --> 00:56:04,000 So it's an ongoing problem. 1290 00:56:04,000 --> 00:56:06,472 What else? 1291 00:56:06,472 --> 00:56:10,456 AUDIENCE: Would sharding also apply if you're topically separating your data? 1292 00:56:10,456 --> 00:56:13,291 Like, this becomes my sales database and my profile database-- 1293 00:56:13,291 --> 00:56:14,290 DAVID MALAN: Absolutely. 1294 00:56:14,290 --> 00:56:15,925 AUDIENCE: --and my product database. 1295 00:56:15,925 --> 00:56:17,760 DAVID MALAN: Yep, you can decide it however you want. 1296 00:56:17,760 --> 00:56:20,232 Names is one approach for very consumer-oriented databases. 1297 00:56:20,232 --> 00:56:22,690 You can put different types of data on different databases. 1298 00:56:22,690 --> 00:56:24,620 Same problem ultimately, as Griff proposes, 1299 00:56:24,620 --> 00:56:26,990 whereby if you just have so much sales data, 1300 00:56:26,990 --> 00:56:29,100 you might still need to shard that somehow. 1301 00:56:29,100 --> 00:56:32,860 So at some point, you need to kind of figure out how to part the waters 1302 00:56:32,860 --> 00:56:34,360 and scale out. 1303 00:56:34,360 --> 00:56:35,914 But quite possible. 1304 00:56:35,914 --> 00:56:36,414 Sarah? 1305 00:56:36,414 --> 00:56:38,080 AUDIENCE: Could you shard based on time? 1306 00:56:38,080 --> 00:56:40,350 Like long-term data gets stored-- 1307 00:56:40,350 --> 00:56:41,610 DAVID MALAN: Oh, of course. 1308 00:56:41,610 --> 00:56:42,110 Yep. 1309 00:56:42,110 --> 00:56:45,300 You could put short-term data on some servers, long-term data on others. 1310 00:56:45,300 --> 00:56:49,450 And long-term data, frankly, could be on older, even slower, or temporarily 1311 00:56:49,450 --> 00:56:50,470 offline servers. 1312 00:56:50,470 --> 00:56:52,700 And in fact, I can only conjecture, but I really 1313 00:56:52,700 --> 00:56:56,370 don't understand why certain banks and big companies like Comcast and others 1314 00:56:56,370 --> 00:56:59,530 only let me see my statements a year past. 1315 00:56:59,530 --> 00:57:04,230 Surely in 2016, if I can get my order history on Amazon from the 2000s, 1316 00:57:04,230 --> 00:57:07,070 you can give me my bank accounts from 13 months ago. 1317 00:57:07,070 --> 00:57:10,800 And it's probably either a foolish or a conscious design decision 1318 00:57:10,800 --> 00:57:14,270 that they're just offloading old data because it costs them, ironically, 1319 00:57:14,270 --> 00:57:17,260 too much money to keep around their user's data. 1320 00:57:17,260 --> 00:57:17,907 Other thoughts? 1321 00:57:17,907 --> 00:57:19,695 AUDIENCE: What about instead of sharding, 1322 00:57:19,695 --> 00:57:25,440 what about have some sort of overlay that's able to point to-- 1323 00:57:25,440 --> 00:57:26,550 DAVID MALAN: OK. 1324 00:57:26,550 --> 00:57:31,690 So we could have, even for sharding, some notion of load balancing here. 1325 00:57:31,690 --> 00:57:34,857 And it's not load balancing in a generic sense. 1326 00:57:34,857 --> 00:57:36,190 It needs to be a decision maker. 1327 00:57:36,190 --> 00:57:37,280 Am I misinterpreting? 1328 00:57:37,280 --> 00:57:37,863 AUDIENCE: Yes. 1329 00:57:37,863 --> 00:57:39,686 So it can choose either one. 1330 00:57:39,686 --> 00:57:42,634 It could figure out which database it's in. 1331 00:57:42,634 --> 00:57:43,300 DAVID MALAN: OK. 1332 00:57:43,300 --> 00:57:46,240 So based on that, I'm either going to call it-- well, 1333 00:57:46,240 --> 00:57:49,255 it wouldn't be a load balancer if it's just generically doing this. 1334 00:57:49,255 --> 00:57:51,380 It's some kind of device that's doing the sharding. 1335 00:57:51,380 --> 00:57:54,860 And the sharding could either be done in software, to be fair, 1336 00:57:54,860 --> 00:57:59,120 and the web servers could be programmed to say the A through M's over here 1337 00:57:59,120 --> 00:58:00,906 and the N's through Z's over here. 1338 00:58:00,906 --> 00:58:02,530 You don't necessarily need a middleman. 1339 00:58:02,530 --> 00:58:04,700 But we certainly could, and it could be making 1340 00:58:04,700 --> 00:58:09,270 those decisions such that all of these servers talk to that middleman first. 1341 00:58:09,270 --> 00:58:11,310 But it turns out you could load balance. 1342 00:58:11,310 --> 00:58:14,680 We could take the same principle of earlier of load balancing, 1343 00:58:14,680 --> 00:58:16,980 but also solve this problem in a different way. 1344 00:58:16,980 --> 00:58:20,220 Let me push this up for a moment. 1345 00:58:20,220 --> 00:58:28,080 And if we erase this, let me actually call this a load balancer. 1346 00:58:28,080 --> 00:58:36,560 And let me assume that now this is going to go to databases as follows. 1347 00:58:36,560 --> 00:58:40,210 Let's actually do this, just so I have some more room. 1348 00:58:40,210 --> 00:58:43,510 So here, I'll draw a slightly bigger database. 1349 00:58:43,510 --> 00:58:46,730 Not uncommon with databases maybe to throw a little more money at it 1350 00:58:46,730 --> 00:58:51,050 because it's a lot easier to keep your data initially all in one place. 1351 00:58:51,050 --> 00:58:54,560 And so we might just vertically scale this thing initially. 1352 00:58:54,560 --> 00:58:57,630 So throw money at it, so we still have the simplicity of one database, 1353 00:58:57,630 --> 00:59:00,230 albeit a problem for downtime if this thing goes offline. 1354 00:59:00,230 --> 00:59:02,100 But more on that in a moment. 1355 00:59:02,100 --> 00:59:05,030 But what I'm really concerned with is rights. 1356 00:59:05,030 --> 00:59:08,420 Changing information is what's ideally centralized. 1357 00:59:08,420 --> 00:59:11,980 But reading information could come from redundant copies. 1358 00:59:11,980 --> 00:59:17,625 And so what's fairly common is maybe you have a bunch of read replicas. 1359 00:59:17,625 --> 00:59:20,537 1360 00:59:20,537 --> 00:59:22,370 And that's not necessarily a technical term. 1361 00:59:22,370 --> 00:59:24,640 It's just kind of a term of art here, where 1362 00:59:24,640 --> 00:59:27,100 the replica, as the name suggests, is really 1363 00:59:27,100 --> 00:59:28,880 just a duplication of this thing. 1364 00:59:28,880 --> 00:59:31,320 But maybe it's a little smaller or slower or cheaper. 1365 00:59:31,320 --> 00:59:34,765 But there's some kind of synchronization from this one to this one. 1366 00:59:34,765 --> 00:59:40,690 So writes are coming into this one, and I'll represent writes with a W. 1367 00:59:40,690 --> 00:59:44,730 But when the user's code running on the web servers want to read information, 1368 00:59:44,730 --> 00:59:47,150 that data's not going to come from here. 1369 00:59:47,150 --> 00:59:49,570 So that arrow is not going to exist. 1370 00:59:49,570 --> 00:59:53,950 Rather, all of the reads going back to the web servers 1371 00:59:53,950 --> 00:59:56,300 are going to come from this device, historically called 1372 00:59:56,300 --> 01:00:01,310 the slave or secondary, whereas this would be the master or primary. 1373 01:00:01,310 --> 01:00:04,770 And what's nice about this topology is that we 1374 01:00:04,770 --> 01:00:08,220 could have multiple read replicas. 1375 01:00:08,220 --> 01:00:11,320 We could even add a third one in here. 1376 01:00:11,320 --> 01:00:18,630 And the decision as to whether or not this works well 1377 01:00:18,630 --> 01:00:22,600 is kind of a side effect of whatever your businesses is or the use case is. 1378 01:00:22,600 --> 01:00:26,680 If your website is very read heavy, this works great. 1379 01:00:26,680 --> 01:00:30,580 You have few or one server databases devoted 1380 01:00:30,580 --> 01:00:33,930 to writes-- so changes, deletions, additions, that kind of thing. 1381 01:00:33,930 --> 01:00:37,780 But you can have as many read replicas allocated as you want, 1382 01:00:37,780 --> 01:00:41,340 which are just real-time copies of the master database 1383 01:00:41,340 --> 01:00:43,290 that your code actually reads from. 1384 01:00:43,290 --> 01:00:46,510 So something like Facebook-- depends on the user. 1385 01:00:46,510 --> 01:00:50,580 Many people on Facebook probably read more information than they post, right? 1386 01:00:50,580 --> 01:00:54,110 Every time you log in, you might post one thing, maybe, let's say. 1387 01:00:54,110 --> 01:00:56,400 But you might read 10 posts from friends. 1388 01:00:56,400 --> 01:00:58,646 So in that sense, you're sort of read-heavy. 1389 01:00:58,646 --> 01:01:01,020 And you can imagine other applications-- maybe in Amazon, 1390 01:01:01,020 --> 01:01:02,478 maybe you tend to window shop more. 1391 01:01:02,478 --> 01:01:05,750 So you rarely buy things, but you go to shop around a lot. 1392 01:01:05,750 --> 01:01:08,710 So you might be very read-heavy, but you only checkout infrequently. 1393 01:01:08,710 --> 01:01:10,750 So this topology might work well. 1394 01:01:10,750 --> 01:01:12,880 Now, there's kind of a problem here. 1395 01:01:12,880 --> 01:01:16,752 If I keep drawing more and more databases like this, 1396 01:01:16,752 --> 01:01:18,335 what might start to break, eventually? 1397 01:01:18,335 --> 01:01:21,436 1398 01:01:21,436 --> 01:01:22,360 AUDIENCE: The master. 1399 01:01:22,360 --> 01:01:23,651 DAVID MALAN: The master, right? 1400 01:01:23,651 --> 01:01:27,270 If we're asking the master to copy itself-- in parallel, no less-- to all 1401 01:01:27,270 --> 01:01:31,010 of these secondary databases, at some point this has got to break, right? 1402 01:01:31,010 --> 01:01:34,130 It can't infinitely handle traffic over here and then infinitely 1403 01:01:34,130 --> 01:01:36,960 duplicate itself to all of these replicas down here. 1404 01:01:36,960 --> 01:01:39,580 But what's nice and what's not uncommon is 1405 01:01:39,580 --> 01:01:41,677 to have a whole hierarchical structure. 1406 01:01:41,677 --> 01:01:42,260 You know what? 1407 01:01:42,260 --> 01:01:46,840 So if that is worrisome, let's just then have one medium sized 1408 01:01:46,840 --> 01:01:49,950 or large replica here, but then replicate in 1409 01:01:49,950 --> 01:01:54,610 sort of tree fashion off of it to these other replicas. 1410 01:01:54,610 --> 01:01:58,990 So kind of push the problem away from the super-important special database, 1411 01:01:58,990 --> 01:02:05,060 the write database-- write with a W. And then the read replicas down here 1412 01:02:05,060 --> 01:02:07,400 have their own sort of hierarchy that gives us 1413 01:02:07,400 --> 01:02:10,440 a bit of defense against that issue. 1414 01:02:10,440 --> 01:02:15,260 Now, what problem still remains in this picture? 1415 01:02:15,260 --> 01:02:18,865 What could go wrong, critique, somehow? 1416 01:02:18,865 --> 01:02:21,156 AUDIENCE: I don't understand how you would [INAUDIBLE]. 1417 01:02:21,156 --> 01:02:24,640 1418 01:02:24,640 --> 01:02:27,840 DAVID MALAN: So what I'm proposing is this is allowing us to scale. 1419 01:02:27,840 --> 01:02:32,800 So if, I assume, as in the Facebook scenario described, 1420 01:02:32,800 --> 01:02:36,340 that most of my business involves reads, where 1421 01:02:36,340 --> 01:02:38,740 I want to have as many read servers as possible, 1422 01:02:38,740 --> 01:02:42,230 but I can get by with just one writeable server, then 1423 01:02:42,230 --> 01:02:46,310 what's nice about this is that we can add additional read replicas, 1424 01:02:46,310 --> 01:02:49,000 so to speak, and handle more and more and more 1425 01:02:49,000 --> 01:02:51,770 users without having to deal with the problem 1426 01:02:51,770 --> 01:02:55,380 that I proposed earlier as a bit of a headache-- how do we actually decide, 1427 01:02:55,380 --> 01:02:58,100 based on sharding or some other logic, how to split our data? 1428 01:02:58,100 --> 01:03:00,870 I just avoid splitting our data altogether. 1429 01:03:00,870 --> 01:03:03,290 So that's the problem we've solved, is scaling. 1430 01:03:03,290 --> 01:03:07,030 We can handle lots and lots and lots of reads this way. 1431 01:03:07,030 --> 01:03:10,800 But there's still a problem, even if this is plenty of capacity for writing. 1432 01:03:10,800 --> 01:03:11,640 Grace? 1433 01:03:11,640 --> 01:03:15,480 AUDIENCE: The timing, then, between the writing and the reading 1434 01:03:15,480 --> 01:03:18,600 and if you want to overwrite or update something you've just 1435 01:03:18,600 --> 01:03:20,490 written where you're reading it from. 1436 01:03:20,490 --> 01:03:21,240 DAVID MALAN: Yeah. 1437 01:03:21,240 --> 01:03:24,250 There's a bit of latency, again, so to speak, between the time 1438 01:03:24,250 --> 01:03:26,830 you start to do something and that actually happens. 1439 01:03:26,830 --> 01:03:30,340 And you can sometimes see this, in fact, because of latency or caching, 1440 01:03:30,340 --> 01:03:31,580 even on things like Facebook. 1441 01:03:31,580 --> 01:03:35,680 An example I always think of is on occasion, I feel like I've posted 1442 01:03:35,680 --> 01:03:37,140 or commented something on Facebook. 1443 01:03:37,140 --> 01:03:39,930 Then in another tab, I might hit Reload, and I don't see the change. 1444 01:03:39,930 --> 01:03:41,888 And then I have to reload, and then it's there. 1445 01:03:41,888 --> 01:03:44,750 But it didn't have the immediate effect that I assumed it would. 1446 01:03:44,750 --> 01:03:46,810 And that could be any number of reasons. 1447 01:03:46,810 --> 01:03:49,120 But one of them could just be propagation delays. 1448 01:03:49,120 --> 01:03:51,157 Like, this does not happen instantly. 1449 01:03:51,157 --> 01:03:53,990 It's going to take some amount of time, some number of milliseconds, 1450 01:03:53,990 --> 01:03:55,470 for that data to propagate. 1451 01:03:55,470 --> 01:03:57,390 So you get minor inconsistencies. 1452 01:03:57,390 --> 01:04:01,801 And that is problematic if, for instance, you read a value here, 1453 01:04:01,801 --> 01:04:03,800 you write that change, then you're like, oh, no. 1454 01:04:03,800 --> 01:04:04,383 Wait a minute. 1455 01:04:04,383 --> 01:04:05,900 I want to fix whatever I just did. 1456 01:04:05,900 --> 01:04:07,650 I want to fix a typo or something. 1457 01:04:07,650 --> 01:04:10,460 You might end up changing this version instead of that version. 1458 01:04:10,460 --> 01:04:13,260 1459 01:04:13,260 --> 01:04:16,090 That has to be a conscious design decision. 1460 01:04:16,090 --> 01:04:20,440 Yes, that is possible, so this is imperfect. 1461 01:04:20,440 --> 01:04:23,594 What else is worrisome here? 1462 01:04:23,594 --> 01:04:25,510 Management should not sign off on this design, 1463 01:04:25,510 --> 01:04:31,250 I would propose, at least if management has money with which they're 1464 01:04:31,250 --> 01:04:32,500 willing to solve this problem. 1465 01:04:32,500 --> 01:04:37,420 AUDIENCE: From a user point of view, it might be a lot of time [INAUDIBLE]. 1466 01:04:37,420 --> 01:04:39,370 DAVID MALAN: Good-- not bad instinct. 1467 01:04:39,370 --> 01:04:41,010 But we're talking milliseconds. 1468 01:04:41,010 --> 01:04:44,620 So I would push back at this point and say, users are rarely if ever 1469 01:04:44,620 --> 01:04:46,100 going to notice this. 1470 01:04:46,100 --> 01:04:46,661 But fair. 1471 01:04:46,661 --> 01:04:49,250 1472 01:04:49,250 --> 01:04:50,170 What could go wrong? 1473 01:04:50,170 --> 01:04:53,420 Always consider that, and especially if you are starting something. 1474 01:04:53,420 --> 01:04:56,732 What are the questions you would ask of the engineers you're working with? 1475 01:04:56,732 --> 01:04:57,690 What could break first? 1476 01:04:57,690 --> 01:05:00,630 You don't even have to be an engineer to sort of identify 1477 01:05:00,630 --> 01:05:02,550 intuitively what could go wrong. 1478 01:05:02,550 --> 01:05:03,550 Grace? 1479 01:05:03,550 --> 01:05:06,550 AUDIENCE: You still have one master where you're writing everything. 1480 01:05:06,550 --> 01:05:08,882 There's no redundancy there. 1481 01:05:08,882 --> 01:05:10,970 There's no [INAUDIBLE]. 1482 01:05:10,970 --> 01:05:11,720 DAVID MALAN: Yeah. 1483 01:05:11,720 --> 01:05:16,020 And the buzz word here is single point of failure. 1484 01:05:16,020 --> 01:05:18,080 Single points of failure, generally bad. 1485 01:05:18,080 --> 01:05:21,105 And here, too, it does not take an engineering degree to isolate that, 1486 01:05:21,105 --> 01:05:23,480 so long as you have a conceptual understanding of things. 1487 01:05:23,480 --> 01:05:25,960 The fact that we have just one database for 1488 01:05:25,960 --> 01:05:29,230 writes, one master database-- very bad in that sense. 1489 01:05:29,230 --> 01:05:33,650 If this goes offline, it would seem that our entire back end goes offline, 1490 01:05:33,650 --> 01:05:37,000 which is probably bad if our back end is where we're storing all of our products 1491 01:05:37,000 --> 01:05:39,310 or all of our user data or all of our Facebook posts 1492 01:05:39,310 --> 01:05:41,050 or whatever the tool might be doing. 1493 01:05:41,050 --> 01:05:45,500 This is really the stuff we care about, the actual data. 1494 01:05:45,500 --> 01:05:46,360 So not good. 1495 01:05:46,360 --> 01:05:49,780 And, in fact, we dealt with that earlier by introducing some redundancy 1496 01:05:49,780 --> 01:05:51,000 at the web tier. 1497 01:05:51,000 --> 01:05:55,160 So propose what's good and bad about this-- suppose that, all right, 1498 01:05:55,160 --> 01:05:59,622 I'm going to deal with this by adding a second writeable database. 1499 01:05:59,622 --> 01:06:01,080 I realize it's going to cost money. 1500 01:06:01,080 --> 01:06:04,790 But if you want me to fix this, that's the price we pay, literally. 1501 01:06:04,790 --> 01:06:07,850 But there's technical prices we now need to pay. 1502 01:06:07,850 --> 01:06:13,080 How do we kind of wire this thing in, the second database? 1503 01:06:13,080 --> 01:06:16,815 Or what questions does it invite again? 1504 01:06:16,815 --> 01:06:18,940 AUDIENCE: Write things simultaneously? 1505 01:06:18,940 --> 01:06:19,690 DAVID MALAN: Yeah. 1506 01:06:19,690 --> 01:06:20,990 Like, all right. 1507 01:06:20,990 --> 01:06:24,610 So why don't we do that-- like sharding sounded like so much work. 1508 01:06:24,610 --> 01:06:25,750 It's so hard to solve. 1509 01:06:25,750 --> 01:06:28,070 It doesn't fundamentally solve the problem long-term. 1510 01:06:28,070 --> 01:06:32,790 Let me just go ahead and write my data in duplicate to two places. 1511 01:06:32,790 --> 01:06:36,570 Well, this would be an incorrect approach, but the theory isn't bad. 1512 01:06:36,570 --> 01:06:38,670 What would typically happen, though, is this-- 1513 01:06:38,670 --> 01:06:42,320 there's the notion of master-master relationships, whereby it 1514 01:06:42,320 --> 01:06:44,620 doesn't matter which one you write to. 1515 01:06:44,620 --> 01:06:47,370 You can configure databases to make sure that any 1516 01:06:47,370 --> 01:06:50,160 changes that happen on one immediately and automatically 1517 01:06:50,160 --> 01:06:51,380 get synced to the other. 1518 01:06:51,380 --> 01:06:55,110 So it's called master-master replication, in this case. 1519 01:06:55,110 --> 01:06:56,640 So that helps with that. 1520 01:06:56,640 --> 01:07:00,559 And now, frankly, now this opens up really interesting opportunities 1521 01:07:00,559 --> 01:07:02,600 because we could have another database over here, 1522 01:07:02,600 --> 01:07:04,120 and it could have its own databases. 1523 01:07:04,120 --> 01:07:05,994 So we have this whole family tree thing going 1524 01:07:05,994 --> 01:07:08,940 on that really gives us a lot of capacity-- horizontal scaling 1525 01:07:08,940 --> 01:07:14,660 even though, paradoxically, it's all hierarchical in this case. 1526 01:07:14,660 --> 01:07:16,920 So that's pretty good. 1527 01:07:16,920 --> 01:07:19,210 And that's kind of a pretty common solution, 1528 01:07:19,210 --> 01:07:21,300 if you have the money with which to cover that 1529 01:07:21,300 --> 01:07:23,633 and you're willing to take on the additional complexity. 1530 01:07:23,633 --> 01:07:25,990 Like Anessa, to some of your concerns with the team, 1531 01:07:25,990 --> 01:07:27,460 this is more complicated. 1532 01:07:27,460 --> 01:07:29,460 To someone else's point earlier, simple is good. 1533 01:07:29,460 --> 01:07:31,380 This is no longer very simple. 1534 01:07:31,380 --> 01:07:33,600 Thankfully, this is a common problem, so there's 1535 01:07:33,600 --> 01:07:36,100 plenty of documentation and precedent for doing this. 1536 01:07:36,100 --> 01:07:38,810 But this is the added kind of complexity that we have. 1537 01:07:38,810 --> 01:07:42,560 There's still some other single points of failure on the screen. 1538 01:07:42,560 --> 01:07:44,270 What are those? 1539 01:07:44,270 --> 01:07:45,629 AUDIENCE: Load balancer? 1540 01:07:45,629 --> 01:07:47,170 DAVID MALAN: Yeah, the load balancer. 1541 01:07:47,170 --> 01:07:51,220 So damn it, that was such a nice solution earlier. 1542 01:07:51,220 --> 01:07:55,700 But if really we want to be uptight about this, got to fix this, too. 1543 01:07:55,700 --> 01:08:04,060 So let me go in there and put in one, two load balancers. 1544 01:08:04,060 --> 01:08:07,710 Of course now, all right-- so we have two problems, on the outside 1545 01:08:07,710 --> 01:08:08,920 and on the inside. 1546 01:08:08,920 --> 01:08:11,770 So how does this actually work? 1547 01:08:11,770 --> 01:08:15,440 What would you propose we do to get this topology working? 1548 01:08:15,440 --> 01:08:17,806 AUDIENCE: Put another load balancer on top? 1549 01:08:17,806 --> 01:08:19,930 DAVID MALAN: Yeah, we could kind of do this all day 1550 01:08:19,930 --> 01:08:22,939 long, just kind of keep stacking and unstacking and stacking. 1551 01:08:22,939 --> 01:08:28,090 So not bad, but not going to solve the problem fundamentally. 1552 01:08:28,090 --> 01:08:29,270 What else might work here? 1553 01:08:29,270 --> 01:08:32,180 AUDIENCE: Master-master load balancer [INAUDIBLE]. 1554 01:08:32,180 --> 01:08:37,063 DAVID MALAN: Yeah, that's not bad. 1555 01:08:37,063 --> 01:08:39,729 That doesn't really solve-- let's solve the first problem first. 1556 01:08:39,729 --> 01:08:43,560 How do we decide for the users where their traffic ends up? 1557 01:08:43,560 --> 01:08:44,830 So I am someone on a laptop. 1558 01:08:44,830 --> 01:08:46,470 I type in something.com. 1559 01:08:46,470 --> 01:08:47,590 I'm here in the cloud. 1560 01:08:47,590 --> 01:08:50,930 I'm coming out of the cloud, ready to go to your website. 1561 01:08:50,930 --> 01:08:54,510 Which load balancer do I hit and how? 1562 01:08:54,510 --> 01:08:55,600 Yeah, Anessa? 1563 01:08:55,600 --> 01:08:57,630 AUDIENCE: [INAUDIBLE]. 1564 01:08:57,630 --> 01:08:58,380 DAVID MALAN: Yeah. 1565 01:08:58,380 --> 01:09:00,005 So we didn't talked about this earlier. 1566 01:09:00,005 --> 01:09:02,350 You could use DNS to geographically segregate 1567 01:09:02,350 --> 01:09:04,270 your users based on geography. 1568 01:09:04,270 --> 01:09:06,569 And companies like Akamai have done this for years, 1569 01:09:06,569 --> 01:09:09,220 whereby if they detect an IP address that they're 1570 01:09:09,220 --> 01:09:11,136 pretty sure is coming from North America, 1571 01:09:11,136 --> 01:09:12,760 they might send you to one destination. 1572 01:09:12,760 --> 01:09:15,160 If you're coming from Africa, you might go to another destination. 1573 01:09:15,160 --> 01:09:17,368 And with high probability, they can take into account 1574 01:09:17,368 --> 01:09:20,050 where they IP addresses are coming from just based 1575 01:09:20,050 --> 01:09:23,800 on the allocation of IP addresses globally around the world, which 1576 01:09:23,800 --> 01:09:25,359 is a centralized process. 1577 01:09:25,359 --> 01:09:26,398 So that could help. 1578 01:09:26,398 --> 01:09:28,939 We could do a little bit of that, and that's not a bad thing, 1579 01:09:28,939 --> 01:09:30,189 thereby splitting our traffic. 1580 01:09:30,189 --> 01:09:34,399 Unfortunately, in that model, if America goes offline, its load balancer, 1581 01:09:34,399 --> 01:09:36,520 then only African customers can visit. 1582 01:09:36,520 --> 01:09:39,810 Or conversely, if the other one goes down, only the others can. 1583 01:09:39,810 --> 01:09:42,060 So we'd have to do something adaptive, where we'd then 1584 01:09:42,060 --> 01:09:45,750 have to quickly change DNS so that, OK, if the American load 1585 01:09:45,750 --> 01:09:48,840 balancer's offline, then we better send all of our traffic 1586 01:09:48,840 --> 01:09:52,460 to the servers and load balancer that's in Africa. 1587 01:09:52,460 --> 01:09:57,670 Unfortunately, in the world of DNS, what have other DNS servers and Macs and PCs 1588 01:09:57,670 --> 01:10:01,170 and browsers unfortunately done? 1589 01:10:01,170 --> 01:10:04,500 They've cached the damn old address, which means some of our users 1590 01:10:04,500 --> 01:10:07,360 might still be given the appearance that we're offline, 1591 01:10:07,360 --> 01:10:09,830 even though we're up and running just fine with our servers 1592 01:10:09,830 --> 01:10:11,170 in Africa in this case. 1593 01:10:11,170 --> 01:10:12,860 So trade-off there. 1594 01:10:12,860 --> 01:10:16,070 I'll spoil this one, only because it's perhaps not obvious. 1595 01:10:16,070 --> 01:10:21,270 Typically, what you would do with a load balancer situation is only one 1596 01:10:21,270 --> 01:10:24,780 of them would really be operational at a time, just because it's nice and simple 1597 01:10:24,780 --> 01:10:28,000 and it avoids exactly that kind of slippery slope of a problem. 1598 01:10:28,000 --> 01:10:31,870 But these two things are talking to one another via a technique 1599 01:10:31,870 --> 01:10:33,620 you generally call heartbeats. 1600 01:10:33,620 --> 01:10:37,430 And as the name implies, both of them kind of have heartbeats. 1601 01:10:37,430 --> 01:10:40,570 And that means in technical forms that each of them 1602 01:10:40,570 --> 01:10:43,220 just kind of sends a signal to the other every second or minute 1603 01:10:43,220 --> 01:10:44,800 or whatever-- I'm alive. 1604 01:10:44,800 --> 01:10:45,850 I'm alive. 1605 01:10:45,850 --> 01:10:46,620 I'm alive. 1606 01:10:46,620 --> 01:10:49,570 Because what the other can do when it stops 1607 01:10:49,570 --> 01:10:53,620 hearing the heartbeat from the other, it can infer with reasonable accuracy 1608 01:10:53,620 --> 01:10:58,010 that, oh, that server must have died, literally or figuratively. 1609 01:10:58,010 --> 01:11:02,100 I am going to proactively take over its IP address 1610 01:11:02,100 --> 01:11:05,540 and start listening for requests on the internet on that same IP address. 1611 01:11:05,540 --> 01:11:09,390 So you have one IP address, still, but it floats between the load balancers 1612 01:11:09,390 --> 01:11:13,770 based on whichever one has decided I am now in charge, which then allows you 1613 01:11:13,770 --> 01:11:16,290 to tolerate one of them going offline. 1614 01:11:16,290 --> 01:11:19,120 And in theory, you could make this true for three or four, 1615 01:11:19,120 --> 01:11:20,790 but you get diminishing returns. 1616 01:11:20,790 --> 01:11:22,750 We could do the same thing down here. 1617 01:11:22,750 --> 01:11:24,820 So if we really care and worry about this, 1618 01:11:24,820 --> 01:11:28,010 we could do the same kind of heartbeat approach down here. 1619 01:11:28,010 --> 01:11:31,550 And then with the databases, we're still OK with this kind of hierarchy. 1620 01:11:31,550 --> 01:11:35,740 And this isn't so much a heartbeat, recall, as a synchronization. 1621 01:11:35,740 --> 01:11:41,490 But dear God, look at what we've just built. What a nightmare, right? 1622 01:11:41,490 --> 01:11:42,740 Remember what we started with. 1623 01:11:42,740 --> 01:11:46,980 We started with this. 1624 01:11:46,980 --> 01:11:48,630 And now we're up to this. 1625 01:11:48,630 --> 01:11:53,280 But this is truly what it means to design, like, an enterprise class 1626 01:11:53,280 --> 01:11:57,870 architecture and to be resilient against failure and to handle load 1627 01:11:57,870 --> 01:12:01,700 and to have built into the whole architecture the ability to scale. 1628 01:12:01,700 --> 01:12:03,110 So a lot of desirable features. 1629 01:12:03,110 --> 01:12:05,230 And we didn't go from that directly to that. 1630 01:12:05,230 --> 01:12:09,390 It was, hopefully, a fairly logical story of frustrations and solutions 1631 01:12:09,390 --> 01:12:10,500 along the way. 1632 01:12:10,500 --> 01:12:12,190 But dear God, the complexity. 1633 01:12:12,190 --> 01:12:16,200 Like, we are a far cry from what was proposed as simple before. 1634 01:12:16,200 --> 01:12:18,050 So what does this mean? 1635 01:12:18,050 --> 01:12:22,340 So in the real, physical world back in the day, you would buy these servers. 1636 01:12:22,340 --> 01:12:23,650 You would wire them together. 1637 01:12:23,650 --> 01:12:24,870 You would configure them. 1638 01:12:24,870 --> 01:12:27,380 And when something dies, hopefully you have alerts set up. 1639 01:12:27,380 --> 01:12:30,720 And it's just a laundry list of operational things. 1640 01:12:30,720 --> 01:12:34,262 And so a common role in a company would be ops or operations, 1641 01:12:34,262 --> 01:12:36,220 which is all the hardware stuff, the networking 1642 01:12:36,220 --> 01:12:40,340 stuff, the behind-the-scenes stuff, the lower level details. 1643 01:12:40,340 --> 01:12:42,540 And it's fun, and it appeals to certain people. 1644 01:12:42,540 --> 01:12:46,490 But it is being supplanted, in part, by cloud services. 1645 01:12:46,490 --> 01:12:49,230 And what you get from these-- where's our acronym? 1646 01:12:49,230 --> 01:12:54,360 Might have erased it-- Infrastructure As A Service, IAAS, is 1647 01:12:54,360 --> 01:12:57,690 these same capabilities from companies like Amazon and Microsoft, 1648 01:12:57,690 --> 01:12:59,120 but in the cloud. 1649 01:12:59,120 --> 01:13:01,600 So if you want a load balancer, you don't buy a server 1650 01:13:01,600 --> 01:13:02,990 and physically plug it in. 1651 01:13:02,990 --> 01:13:06,590 You click a button on a website that gives you a software 1652 01:13:06,590 --> 01:13:08,850 implementation of a load balancer. 1653 01:13:08,850 --> 01:13:11,590 So what's nice is because of virtualization and because 1654 01:13:11,590 --> 01:13:14,150 of software being so configurable, you can 1655 01:13:14,150 --> 01:13:17,630 implement in software what has historically been a physical device. 1656 01:13:17,630 --> 01:13:19,880 You can create the illusion that it's the same thing. 1657 01:13:19,880 --> 01:13:22,150 And so that's what you're doing with a lot of cloud services. 1658 01:13:22,150 --> 01:13:23,480 You're saying, give me a load balancer. 1659 01:13:23,480 --> 01:13:24,560 Give it this IP address. 1660 01:13:24,560 --> 01:13:27,560 Give me two back end servers or four back end servers. 1661 01:13:27,560 --> 01:13:29,210 Give me a database, two databases. 1662 01:13:29,210 --> 01:13:32,450 And it's all click, click, click or with a command line, textual interface. 1663 01:13:32,450 --> 01:13:34,520 You're wiring things together virtually. 1664 01:13:34,520 --> 01:13:36,830 So at the end of the day, it's the same skill 1665 01:13:36,830 --> 01:13:39,944 set other than you don't need to physically plug things in anymore. 1666 01:13:39,944 --> 01:13:41,360 But it's the same mental paradigm. 1667 01:13:41,360 --> 01:13:45,240 It's the same kind of decision process, the same amount of complexity. 1668 01:13:45,240 --> 01:13:47,224 But it's more virtual than it is physical. 1669 01:13:47,224 --> 01:13:49,640 And, in fact, if I pull up the one I keep mentioning, only 1670 01:13:49,640 --> 01:13:53,000 because I tend to use them myself here. 1671 01:13:53,000 --> 01:13:56,480 But this is Amazon Web Services. 1672 01:13:56,480 --> 01:13:59,460 You'll see an overwhelming list of products these days. 1673 01:13:59,460 --> 01:14:04,532 Frankly, it's to a fault, I think, how many damn different services they have. 1674 01:14:04,532 --> 01:14:06,990 It is completely overwhelming, I think, to the uninitiated. 1675 01:14:06,990 --> 01:14:10,341 And even I have kind of started to get confused as to what exists. 1676 01:14:10,341 --> 01:14:13,340 But just to give you a teaser so you've at least heard of a few of them, 1677 01:14:13,340 --> 01:14:16,500 Amazon EC2 is elastic compute cloud. 1678 01:14:16,500 --> 01:14:19,590 This is what you would use, typically, to implement your front end, 1679 01:14:19,590 --> 01:14:20,900 your web server tier. 1680 01:14:20,900 --> 01:14:24,020 But it really is just generic, virtualized servers that 1681 01:14:24,020 --> 01:14:25,530 can do anything you want them to do. 1682 01:14:25,530 --> 01:14:28,560 In our story, we would use them as web servers. 1683 01:14:28,560 --> 01:14:33,030 Amazon has elastic load balancing, which replaces our load balancers. 1684 01:14:33,030 --> 01:14:36,310 And it's elastic in the sense that if you start to get a lot of traffic, 1685 01:14:36,310 --> 01:14:39,360 they give you more capacity, either by moving your load balancer 1686 01:14:39,360 --> 01:14:42,640 to a different virtual machine, a bigger one with more resources, 1687 01:14:42,640 --> 01:14:44,180 or maybe giving you multiple ones. 1688 01:14:44,180 --> 01:14:46,555 But what's nice and what's beautiful about their latching 1689 01:14:46,555 --> 01:14:49,360 on to this word elastic is you don't have to think about it. 1690 01:14:49,360 --> 01:14:51,240 Conceptually, though, it's doing this. 1691 01:14:51,240 --> 01:14:54,880 So understanding the problem is still important and daresay requisite. 1692 01:14:54,880 --> 01:14:57,770 But you don't have to worry as much about the solution. 1693 01:14:57,770 --> 01:15:02,120 Autoscaling is what decides how many of these front end 1694 01:15:02,120 --> 01:15:04,300 web servers in our story to turn on. 1695 01:15:04,300 --> 01:15:06,870 So Amazon for you, albeit with some complexity-- 1696 01:15:06,870 --> 01:15:12,130 it's not nearly as easily done as said-- can 1697 01:15:12,130 --> 01:15:16,812 decide to turn these things on or off and give you one or two or three or 100 1698 01:15:16,812 --> 01:15:18,770 based on the load you're currently experiencing 1699 01:15:18,770 --> 01:15:19,894 in metric you've specified. 1700 01:15:19,894 --> 01:15:23,630 1701 01:15:23,630 --> 01:15:26,120 Amazon RDS, Relational Database Server. 1702 01:15:26,120 --> 01:15:28,507 That's what can replace all of this complexity. 1703 01:15:28,507 --> 01:15:30,590 You can just say, give me one big database server, 1704 01:15:30,590 --> 01:15:35,680 and they'll figure out how big to make it and how to contract and expand. 1705 01:15:35,680 --> 01:15:39,470 And actually, you can literally check a box that says replicate this, 1706 01:15:39,470 --> 01:15:41,260 so you get an automated backup of it. 1707 01:15:41,260 --> 01:15:43,259 So this is the kind of stuff that you would just 1708 01:15:43,259 --> 01:15:46,560 spend so much time and money as a human, building up, figuring out, updating, 1709 01:15:46,560 --> 01:15:47,137 configuring. 1710 01:15:47,137 --> 01:15:49,220 All of this has been abstracted away-- there, too, 1711 01:15:49,220 --> 01:15:52,527 to our story earlier about abstraction. 1712 01:15:52,527 --> 01:15:54,360 We certainly won't go through most of these, 1713 01:15:54,360 --> 01:15:57,276 partly because I don't know all of them and partly because they're not 1714 01:15:57,276 --> 01:15:58,200 so germane. 1715 01:15:58,200 --> 01:16:02,012 But S3 is a common one-- Simple Storage Service. 1716 01:16:02,012 --> 01:16:04,720 This is a way of getting nearly infinite disk space in the cloud. 1717 01:16:04,720 --> 01:16:06,700 So for some years, Dropbox, for instance, 1718 01:16:06,700 --> 01:16:08,331 was using Amazon S3 for their data. 1719 01:16:08,331 --> 01:16:11,330 I believe they have since moved to running their own servers, presumably 1720 01:16:11,330 --> 01:16:13,840 because of cost or security or the like. 1721 01:16:13,840 --> 01:16:16,070 But you have gigabytes or terabytes of space. 1722 01:16:16,070 --> 01:16:18,110 And, in fact, for courses I teach, we move 1723 01:16:18,110 --> 01:16:21,125 all of our video files, which tend to be big-- we don't run anything 1724 01:16:21,125 --> 01:16:22,000 on Harvard's servers. 1725 01:16:22,000 --> 01:16:24,700 It's all run in the Amazon cloud because they abstract 1726 01:16:24,700 --> 01:16:26,890 all of that detail away for us. 1727 01:16:26,890 --> 01:16:29,160 So it's both overwhelming but also exciting in that 1728 01:16:29,160 --> 01:16:30,493 there are all these ingredients. 1729 01:16:30,493 --> 01:16:32,810 And the fact that this is so low-level, so to speak, 1730 01:16:32,810 --> 01:16:35,370 is what makes it Infrastructure As A Service. 1731 01:16:35,370 --> 01:16:39,720 Frankly, for startups and the like, more, I 1732 01:16:39,720 --> 01:16:43,490 think, appealing is Platform As A Service, which is, if you will, 1733 01:16:43,490 --> 01:16:45,876 a layer on top of this in terms of abstraction 1734 01:16:45,876 --> 01:16:47,750 because at the end of the day, as interesting 1735 01:16:47,750 --> 01:16:52,320 as it might be to a computer scientist or an engineer, oh, my dear God. 1736 01:16:52,320 --> 01:16:54,530 I don't really care about load balancing, sharding. 1737 01:16:54,530 --> 01:16:57,071 I don't really need to think about this to build my business, 1738 01:16:57,071 --> 01:16:59,100 especially if it's just me or just a few people. 1739 01:16:59,100 --> 01:17:02,770 The returns are probably higher on focusing on our application, not 1740 01:17:02,770 --> 01:17:04,060 this whole narrative. 1741 01:17:04,060 --> 01:17:07,320 And so if you go to companies like Heroku, which is very popular 1742 01:17:07,320 --> 01:17:10,660 and is built on top of Amazon, you'll see, one, 1743 01:17:10,660 --> 01:17:13,470 a much more reasonable list of options. 1744 01:17:13,470 --> 01:17:18,470 But what's nice is-- let me find the page. 1745 01:17:18,470 --> 01:17:20,330 It's a Platform As A Service in the sense 1746 01:17:20,330 --> 01:17:24,460 that, ah, now we're focusing on what I care about as a software developer. 1747 01:17:24,460 --> 01:17:25,830 What language am I using? 1748 01:17:25,830 --> 01:17:27,800 What database technology do I want to use? 1749 01:17:27,800 --> 01:17:31,440 I don't care what's wired to what or where the load balancing is happening. 1750 01:17:31,440 --> 01:17:33,260 Just please abstract that all away from me, 1751 01:17:33,260 --> 01:17:36,160 so you just give me a black box, effectively, 1752 01:17:36,160 --> 01:17:38,710 that I can put my product on, and it just runs. 1753 01:17:38,710 --> 01:17:42,050 And so here, your first decision point isn't the lower level details 1754 01:17:42,050 --> 01:17:43,462 I just rattled off on Amazon. 1755 01:17:43,462 --> 01:17:45,670 It's higher level details like languages, which we'll 1756 01:17:45,670 --> 01:17:47,910 talk about more tomorrow, as well. 1757 01:17:47,910 --> 01:17:51,800 So for a younger startup, honestly, starting it like the Heroku layer 1758 01:17:51,800 --> 01:17:55,350 tends to be much more pleasurable than the Amazon layer. 1759 01:17:55,350 --> 01:18:00,880 Google, for instance, has App Engine and their Compute Cloud. 1760 01:18:00,880 --> 01:18:02,740 They have any number of options, as well. 1761 01:18:02,740 --> 01:18:04,100 Microsoft has their own. 1762 01:18:04,100 --> 01:18:07,530 So if you google Microsoft or you bing Microsoft Azure, 1763 01:18:07,530 --> 01:18:10,320 you'll find your way here. 1764 01:18:10,320 --> 01:18:13,070 And in terms of how you decide which to use, 1765 01:18:13,070 --> 01:18:16,810 I would generally, especially for a startup, go with what you know 1766 01:18:16,810 --> 01:18:18,814 or what you know someone knows. 1767 01:18:18,814 --> 01:18:19,980 Looking for recommendations? 1768 01:18:19,980 --> 01:18:21,260 You can google around. 1769 01:18:21,260 --> 01:18:23,670 Just so I don't forget to mention it, if you 1770 01:18:23,670 --> 01:18:31,785 go to Hacker News, whose website is news.ycombinator.com, an investment 1771 01:18:31,785 --> 01:18:35,510 fund, this is a good place to stay current 1772 01:18:35,510 --> 01:18:36,910 with these kinds of technologies. 1773 01:18:36,910 --> 01:18:39,490 I would say that quora.com is very good, too. 1774 01:18:39,490 --> 01:18:43,160 It's kind of the right community to have these kinds of technical decisions. 1775 01:18:43,160 --> 01:18:46,620 And TechCrunch, although that's more newsy than it is thoughtful discussion. 1776 01:18:46,620 --> 01:18:49,050 So those three sites together, I would say-- especially 1777 01:18:49,050 --> 01:18:52,592 if you are part of a startup, keeping those kinds of sources in mind 1778 01:18:52,592 --> 01:18:54,550 and just kind of passively reading those things 1779 01:18:54,550 --> 01:18:56,590 will help keep you at least current on a lot 1780 01:18:56,590 --> 01:18:59,110 of these options and tools and techniques. 1781 01:18:59,110 --> 01:19:01,146 But more on that tomorrow, as well. 1782 01:19:01,146 --> 01:19:04,040 1783 01:19:04,040 --> 01:19:06,421 Any questions? 1784 01:19:06,421 --> 01:19:06,920 Yeah? 1785 01:19:06,920 --> 01:19:08,150 AUDIENCE: Could you go back to Heroku? 1786 01:19:08,150 --> 01:19:08,900 DAVID MALAN: Sure. 1787 01:19:08,900 --> 01:19:12,188 1788 01:19:12,188 --> 01:19:15,152 AUDIENCE: So essentially, it's all about [INAUDIBLE]. 1789 01:19:15,152 --> 01:19:19,407 1790 01:19:19,407 --> 01:19:20,240 DAVID MALAN: You do. 1791 01:19:20,240 --> 01:19:23,240 And let me see if I can find one more screen. 1792 01:19:23,240 --> 01:19:25,070 The docs kind of change pretty frequently. 1793 01:19:25,070 --> 01:19:29,945 1794 01:19:29,945 --> 01:19:32,180 AUDIENCE: [INAUDIBLE]. 1795 01:19:32,180 --> 01:19:33,450 DAVID MALAN: I'm sorry? 1796 01:19:33,450 --> 01:19:34,366 AUDIENCE: [INAUDIBLE]. 1797 01:19:34,366 --> 01:19:40,650 1798 01:19:40,650 --> 01:19:42,020 DAVID MALAN: Oh, let's see. 1799 01:19:42,020 --> 01:19:43,380 Deploy, build-- oh, yeah. 1800 01:19:43,380 --> 01:19:45,950 So this is actually a very clever way-- OK. 1801 01:19:45,950 --> 01:19:49,020 So this is all the stuff we just spent an hour talking about. 1802 01:19:49,020 --> 01:19:52,220 Heroku makes the world feel like that. 1803 01:19:52,220 --> 01:19:56,850 Yeah, so infrastructure, platform, infrastructure, platform. 1804 01:19:56,850 --> 01:20:00,650 So this is nice, and this is compelling. 1805 01:20:00,650 --> 01:20:02,890 So what are some of the down-- work with, let's see. 1806 01:20:02,890 --> 01:20:04,300 This is just random [INAUDIBLE]. 1807 01:20:04,300 --> 01:20:05,350 I do want to show one thing. 1808 01:20:05,350 --> 01:20:05,850 Let's see. 1809 01:20:05,850 --> 01:20:07,661 Pricing is interesting. 1810 01:20:07,661 --> 01:20:09,571 Hobby. 1811 01:20:09,571 --> 01:20:10,070 Dyno. 1812 01:20:10,070 --> 01:20:12,778 So they have some of their-- dyno is not really a technical term. 1813 01:20:12,778 --> 01:20:15,350 This is their own marketing thing for how much resource 1814 01:20:15,350 --> 01:20:17,180 you get on their servers. 1815 01:20:17,180 --> 01:20:19,530 This is what I wanted, like databases. 1816 01:20:19,530 --> 01:20:21,100 No, that's now what I want. 1817 01:20:21,100 --> 01:20:24,010 Feature, add ons. 1818 01:20:24,010 --> 01:20:26,790 Explore add ons pricing. 1819 01:20:26,790 --> 01:20:27,830 OK, this is what's fun. 1820 01:20:27,830 --> 01:20:31,170 So most of these-- well, it's also overwhelming, too. 1821 01:20:31,170 --> 01:20:35,420 There are so many third-party tools, databases, libraries, software, 1822 01:20:35,420 --> 01:20:36,270 services. 1823 01:20:36,270 --> 01:20:39,302 We could spend a month, I'm sure, just even looking 1824 01:20:39,302 --> 01:20:40,760 at the definitions of these things. 1825 01:20:40,760 --> 01:20:44,360 What's nice about Heroku is they have figured out 1826 01:20:44,360 --> 01:20:47,370 how to install all of this stuff, how to configure this stuff. 1827 01:20:47,370 --> 01:20:50,000 And so if you want it, you just sort of click it and add it 1828 01:20:50,000 --> 01:20:51,710 to your shopping cart, so to speak. 1829 01:20:51,710 --> 01:20:53,770 And so long as you adhere to certain conventions 1830 01:20:53,770 --> 01:20:58,700 that Heroku has-- so you have to design your app a little bit 1831 01:20:58,700 --> 01:21:01,660 to be consistent with their design approach. 1832 01:21:01,660 --> 01:21:05,510 So you're tied a little bit to their topology, though not hugely. 1833 01:21:05,510 --> 01:21:08,510 You can just so much more easily use these services. 1834 01:21:08,510 --> 01:21:11,010 And it's a beautiful, beautiful thing. 1835 01:21:11,010 --> 01:21:13,060 But what's the downside then? 1836 01:21:13,060 --> 01:21:14,629 Sounds like all win. 1837 01:21:14,629 --> 01:21:19,030 1838 01:21:19,030 --> 01:21:21,410 AUDIENCE: A little less customizable. 1839 01:21:21,410 --> 01:21:23,340 DAVID MALAN: A little less customizable. 1840 01:21:23,340 --> 01:21:29,350 You're much more dependent on their own design decisions and their sort 1841 01:21:29,350 --> 01:21:32,950 of optimization, presumably, for common cases that maybe your own unique cases 1842 01:21:32,950 --> 01:21:35,171 or whatever don't fit well. 1843 01:21:35,171 --> 01:21:35,743 Yeah, Sarah? 1844 01:21:35,743 --> 01:21:37,576 AUDIENCE: So are we monitoring the databases 1845 01:21:37,576 --> 01:21:40,622 less frequently so you're less likely to predict something bad is 1846 01:21:40,622 --> 01:21:41,910 going to happen down the road? 1847 01:21:41,910 --> 01:21:43,397 DAVID MALAN: Yeah, good point. 1848 01:21:43,397 --> 01:21:45,230 So maybe they're monitoring less frequently. 1849 01:21:45,230 --> 01:21:47,620 And I would also say, even if that's not true, 1850 01:21:47,620 --> 01:21:50,660 the fact that there's another party involved-- so it's not just Amazon. 1851 01:21:50,660 --> 01:21:53,460 Now there's multiple layers where things could go wrong. 1852 01:21:53,460 --> 01:21:55,870 That feels, too, worrisome along those lines. 1853 01:21:55,870 --> 01:22:01,090 1854 01:22:01,090 --> 01:22:02,240 Sounds all good. 1855 01:22:02,240 --> 01:22:04,430 So what's the catch? 1856 01:22:04,430 --> 01:22:06,294 What's another catch? 1857 01:22:06,294 --> 01:22:08,250 AUDIENCE: [INAUDIBLE]. 1858 01:22:08,250 --> 01:22:10,260 DAVID MALAN: Yeah, it costs more. 1859 01:22:10,260 --> 01:22:14,110 It's great to have all of these features and live in this little world 1860 01:22:14,110 --> 01:22:16,640 where none of that underlying infrastructure exists, 1861 01:22:16,640 --> 01:22:18,782 which is probably fine for a lot of people, 1862 01:22:18,782 --> 01:22:21,990 certainly when they're first starting or getting a startup off of the ground. 1863 01:22:21,990 --> 01:22:23,790 But certainly once you have good problems 1864 01:22:23,790 --> 01:22:27,210 like lots of users, that's the point at which you might need to decide, 1865 01:22:27,210 --> 01:22:29,370 did we really engineer this at the right level? 1866 01:22:29,370 --> 01:22:33,960 Should we have maybe started in advance, albeit at greater cost or complexity, 1867 01:22:33,960 --> 01:22:37,650 so that over time, we're just ready to go and we're ready to scale? 1868 01:22:37,650 --> 01:22:39,960 Or was it the right call to simplify things, 1869 01:22:39,960 --> 01:22:41,950 pay for this value-added service, and not 1870 01:22:41,950 --> 01:22:44,890 have to worry about those lower-level implementation details? 1871 01:22:44,890 --> 01:22:46,330 So it totally depends. 1872 01:22:46,330 --> 01:22:49,920 But I would say in general, certainly for a small, up-and-coming startup, 1873 01:22:49,920 --> 01:22:52,550 simpler is probably good, at least if you have the money 1874 01:22:52,550 --> 01:22:53,830 to cover the marginal costs. 1875 01:22:53,830 --> 01:22:55,880 And I'll defer to the pricing pages here. 1876 01:22:55,880 --> 01:23:02,110 But what is typical here is if we look at AWS pricing, for instance, 1877 01:23:02,110 --> 01:23:06,140 there's any number of things they charge for, nickel and diming here and there. 1878 01:23:06,140 --> 01:23:08,670 But it's often literally nickels and dimes. 1879 01:23:08,670 --> 01:23:11,380 So thankfully, it takes a while for the costs to actually add up. 1880 01:23:11,380 --> 01:23:15,230 Just to give you a sense, though, if you get, 1881 01:23:15,230 --> 01:23:19,560 let's call it, a small server that has two gigabytes of memory, which 1882 01:23:19,560 --> 01:23:23,160 is about as much as a small laptop might have these days, and essentially 1883 01:23:23,160 --> 01:23:28,880 one CPU, you'll pay 0.02 cents per hour to use that server. 1884 01:23:28,880 --> 01:23:33,536 So if you do this-- if we pull up my calculator again, 1885 01:23:33,536 --> 01:23:34,910 so it's that many cents per hour. 1886 01:23:34,910 --> 01:23:37,040 There's 24 hours in a day. 1887 01:23:37,040 --> 01:23:39,550 There's 365 days in a year. 1888 01:23:39,550 --> 01:23:43,127 You'll pay $227 to rent that server year round. 1889 01:23:43,127 --> 01:23:44,960 And this, too, is where there's a trade-off. 1890 01:23:44,960 --> 01:23:46,800 In the consulting gig I alluded to earlier, 1891 01:23:46,800 --> 01:23:50,260 we had to handle some-- I forget the number offhand-- hundreds of thousands 1892 01:23:50,260 --> 01:23:52,570 of hits per day, which was kind of a lot. 1893 01:23:52,570 --> 01:23:55,622 And we were moving from one old architecture to another. 1894 01:23:55,622 --> 01:23:58,580 And so when we did the math back in the day, and it was some years ago, 1895 01:23:58,580 --> 01:24:01,930 it was actually going to cost us quite a bit in the cloud 1896 01:24:01,930 --> 01:24:04,840 because we were going to have so many recurring costs. 1897 01:24:04,840 --> 01:24:06,910 Certainly after a year, after two years, we 1898 01:24:06,910 --> 01:24:08,630 worried it was really going to add up. 1899 01:24:08,630 --> 01:24:11,620 By contrast, we happened to go the hardware route at the time. 1900 01:24:11,620 --> 01:24:13,980 The cloud and Amazon were not as mature at the time, 1901 01:24:13,980 --> 01:24:16,770 either, so there were some risk concerns. 1902 01:24:16,770 --> 01:24:21,680 But the upfront costs, significant-- thousands and thousands of dollars. 1903 01:24:21,680 --> 01:24:26,040 But very low marginal costs or recurring costs thereafter. 1904 01:24:26,040 --> 01:24:29,360 So that, too, was a trade-off, as well. 1905 01:24:29,360 --> 01:24:32,644 Other questions or comments? 1906 01:24:32,644 --> 01:24:35,176 AUDIENCE: [INAUDIBLE]. 1907 01:24:35,176 --> 01:24:36,134 DAVID MALAN: I'm sorry? 1908 01:24:36,134 --> 01:24:37,633 AUDIENCE: Why do they do it by hour? 1909 01:24:37,633 --> 01:24:38,630 Why not do it by year? 1910 01:24:38,630 --> 01:24:39,940 DAVID MALAN: Oh, so that's a good question. 1911 01:24:39,940 --> 01:24:41,620 So why do they do it per hour, per year? 1912 01:24:41,620 --> 01:24:44,779 It's not uncommon in this cloud-based economy 1913 01:24:44,779 --> 01:24:47,070 to really just want to spin up servers for a few hours. 1914 01:24:47,070 --> 01:24:49,480 Like, if you get spiky traffic, you might get a real hit 1915 01:24:49,480 --> 01:24:50,500 around the holidays. 1916 01:24:50,500 --> 01:24:52,250 Or maybe you get blogged about, and so you 1917 01:24:52,250 --> 01:24:54,720 have to tolerate this for a few hours, a few days or weeks. 1918 01:24:54,720 --> 01:24:57,428 But after that, you definitely don't want to commit, necessarily, 1919 01:24:57,428 --> 01:24:58,490 for the whole year. 1920 01:24:58,490 --> 01:25:02,040 One of the best articles years ago when Amazon was first maturing 1921 01:25:02,040 --> 01:25:03,730 was the New York Times-- let me see. 1922 01:25:03,730 --> 01:25:09,480 New York Times Amazon EC2 tiff. 1923 01:25:09,480 --> 01:25:13,220 They did this-- yeah, 2008. 1924 01:25:13,220 --> 01:25:14,160 Oh, no. 1925 01:25:14,160 --> 01:25:16,290 It's this one, 2007. 1926 01:25:16,290 --> 01:25:19,460 So this was an article I remember being so inspired by at the time. 1927 01:25:19,460 --> 01:25:23,920 They had-- let's see, public domain articles. 1928 01:25:23,920 --> 01:25:27,480 They had 11 million articles as PDF. 1929 01:25:27,480 --> 01:25:32,210 Or they wanted to create, it sounds, 11 million articles as PDFs. 1930 01:25:32,210 --> 01:25:34,965 So this was an example of something that hopefully 1931 01:25:34,965 --> 01:25:36,590 wasn't going to take them a whole year. 1932 01:25:36,590 --> 01:25:38,610 And, in fact, if you read through the article, 1933 01:25:38,610 --> 01:25:40,490 one of the inspiring takeaways at the time 1934 01:25:40,490 --> 01:25:45,182 was that the person who set this up used Amazon's cloud service to sort of scale 1935 01:25:45,182 --> 01:25:48,140 up suddenly from zero to, I don't know, a few hundred or a few thousand 1936 01:25:48,140 --> 01:25:51,940 servers, ran them for a few hours or days, shut it all down, 1937 01:25:51,940 --> 01:25:55,410 and paid some dollar amount, but some modest dollar amount in the article. 1938 01:25:55,410 --> 01:25:59,202 And I think one of his cute comments is he screwed up at one point. 1939 01:25:59,202 --> 01:26:00,660 And the PDFs didn't come out right. 1940 01:26:00,660 --> 01:26:02,326 But no big deal, they just ran it again. 1941 01:26:02,326 --> 01:26:05,250 So twice the cost, but it was still relatively few dollars. 1942 01:26:05,250 --> 01:26:07,810 And that was without having to buy or set up 1943 01:26:07,810 --> 01:26:09,770 a single server at the New York Times. 1944 01:26:09,770 --> 01:26:12,082 So for those kinds of workloads or data analytics 1945 01:26:12,082 --> 01:26:14,540 where you really just need to do a lot of number crunching, 1946 01:26:14,540 --> 01:26:16,450 then shut it all down, the cloud is amazing 1947 01:26:16,450 --> 01:26:18,950 because it would cost you massive amounts of money and time 1948 01:26:18,950 --> 01:26:22,230 to do it locally otherwise. 1949 01:26:22,230 --> 01:26:23,616 Other questions or comments? 1950 01:26:23,616 --> 01:26:27,720 1951 01:26:27,720 --> 01:26:29,080 That was the cloud. 1952 01:26:29,080 --> 01:26:33,230 Let me propose this-- I sent around an email last night. 1953 01:26:33,230 --> 01:26:35,960 And if you haven't already, do read that email, 1954 01:26:35,960 --> 01:26:40,350 and make sure you are able to log in to cs50.io during the break. 1955 01:26:40,350 --> 01:26:42,350 If not, just call me over, and I'll lend a hand. 1956 01:26:42,350 --> 01:26:45,308 But otherwise, why don't we take our 15-minute break here and come back 1957 01:26:45,308 --> 01:26:48,880 right after 3:15 to finish off the day? 1958 01:26:48,880 --> 01:26:51,578