1 00:00:00,000 --> 00:00:09,443 2 00:00:09,443 --> 00:00:12,680 SPEAKER: So much as you might understand how the internet works, 3 00:00:12,680 --> 00:00:17,505 whether it's HTTP that transports data or HTML that is that data, 4 00:00:17,505 --> 00:00:20,630 and as much as you might understand some of the fundamentals of programming 5 00:00:20,630 --> 00:00:24,290 like loops and conditions and Boolean expressions, variables, and more, 6 00:00:24,290 --> 00:00:27,350 it turns out there are so many different ways in which you 7 00:00:27,350 --> 00:00:29,420 can implement those ideas. 8 00:00:29,420 --> 00:00:31,850 And so, indeed, when it comes time to actually build 9 00:00:31,850 --> 00:00:35,240 a website, a web application, a mobile application, 10 00:00:35,240 --> 00:00:39,020 it turns out that it's rather non obvious where to begin some times 11 00:00:39,020 --> 00:00:42,050 if only because you have so many options ahead of you. 12 00:00:42,050 --> 00:00:46,010 And much like the world of clothing and the world of fashion, more generally, 13 00:00:46,010 --> 00:00:50,150 is constantly evolving such that what is cool and appropriate to use now 14 00:00:50,150 --> 00:00:54,950 or to wear now might not necessarily be appropriate some months or years hence, 15 00:00:54,950 --> 00:00:59,540 the same can be said for better or for worse of the technology world in so far 16 00:00:59,540 --> 00:01:01,250 as humans are constantly innovating. 17 00:01:01,250 --> 00:01:05,450 Humans are constantly finding fault or opportunities for improvement 18 00:01:05,450 --> 00:01:08,750 in languages that we've used for years, in server software that we've 19 00:01:08,750 --> 00:01:11,210 used for years, and intuitively improving on it. 20 00:01:11,210 --> 00:01:15,050 And so the reality is that staying current with this whole world 21 00:01:15,050 --> 00:01:19,670 takes some effort even as the fundamentals largely remain constant. 22 00:01:19,670 --> 00:01:23,034 And so what we'll try to do here is give you a sense of some of the languages, 23 00:01:23,034 --> 00:01:24,950 some of the frameworks, some of the libraries, 24 00:01:24,950 --> 00:01:29,000 some of the overarching design decisions that are both in vogue and both here 25 00:01:29,000 --> 00:01:32,930 to stay right now, as well as take the lid off of some of these technologies 26 00:01:32,930 --> 00:01:35,420 and give you a better understanding of how 27 00:01:35,420 --> 00:01:38,930 some of the fundamental types of technologies from which you 28 00:01:38,930 --> 00:01:41,820 can choose actually work. 29 00:01:41,820 --> 00:01:46,190 So let's consider for the moment the so-called front end of an application. 30 00:01:46,190 --> 00:01:49,530 Front end generally refers to that which is facing the user. 31 00:01:49,530 --> 00:01:53,120 So it's the user interface and more with which the human user typically 32 00:01:53,120 --> 00:01:53,930 interacts. 33 00:01:53,930 --> 00:01:57,530 Now, we've discussed, for instance, the world of the web 34 00:01:57,530 --> 00:02:02,720 and how you might assemble a web-based experience using HTML and CSS and, even 35 00:02:02,720 --> 00:02:05,090 more dynamically, using JavaScript. 36 00:02:05,090 --> 00:02:07,760 But that's by using those native languages 37 00:02:07,760 --> 00:02:10,139 right as that come out of the box, so to speak. 38 00:02:10,139 --> 00:02:13,700 But it turns out that some tasks are not as easily done 39 00:02:13,700 --> 00:02:17,030 in those various languages as might be convenient. 40 00:02:17,030 --> 00:02:20,370 It turns out that there are certain design patterns, 41 00:02:20,370 --> 00:02:23,660 so to speak, types of code, types of markup, 42 00:02:23,660 --> 00:02:27,080 types of properties that people have found themselves using again 43 00:02:27,080 --> 00:02:28,350 and again and again. 44 00:02:28,350 --> 00:02:32,510 And so much like you can factor out into your own CSS files and JavaScript 45 00:02:32,510 --> 00:02:35,810 files, code that you want to share across multiple files 46 00:02:35,810 --> 00:02:39,170 or even across multiple projects so can-- 47 00:02:39,170 --> 00:02:42,050 so has the world more generally realized, you know what? 48 00:02:42,050 --> 00:02:45,800 Maybe I should package up my CSS or my JavaScript in such a way 49 00:02:45,800 --> 00:02:48,290 that other people can actually use it as well 50 00:02:48,290 --> 00:02:52,400 and thus have been born things called libraries, collections of code 51 00:02:52,400 --> 00:02:56,106 that other people have written that we can all use, often under an open source 52 00:02:56,106 --> 00:02:58,730 license, which means the code is freely available for the world 53 00:02:58,730 --> 00:03:03,620 to critique, to use, to adapt some times, and contribute back to. 54 00:03:03,620 --> 00:03:06,350 Now, within the world of the front end, there 55 00:03:06,350 --> 00:03:08,210 are so many different JavaScript frameworks. 56 00:03:08,210 --> 00:03:12,390 Indeed, depicted here just a few of perhaps the most popular right now. 57 00:03:12,390 --> 00:03:15,380 But even this list is going to grow stale over the coming months, 58 00:03:15,380 --> 00:03:17,460 certainly over the coming years, and the like. 59 00:03:17,460 --> 00:03:19,430 And so really rather than dive into the weeds 60 00:03:19,430 --> 00:03:21,263 of some of these technologies in particular, 61 00:03:21,263 --> 00:03:23,240 we really aspire to give you just a sense 62 00:03:23,240 --> 00:03:26,540 of what's current, what should be in your vocabulary perhaps now, 63 00:03:26,540 --> 00:03:30,350 and perhaps some context when it comes to recruiting engineers or deciding 64 00:03:30,350 --> 00:03:34,220 among engineers which technologies to build a business on, just 65 00:03:34,220 --> 00:03:37,070 how current you are, just how dated you are, and the like. 66 00:03:37,070 --> 00:03:40,100 But invariably this kind of thing requires some due diligence 67 00:03:40,100 --> 00:03:42,770 when the time comes to design an actual project. 68 00:03:42,770 --> 00:03:46,760 Angular, Ember, Meteor, React, View, these and more 69 00:03:46,760 --> 00:03:50,420 are the names for various JavaScript frameworks. 70 00:03:50,420 --> 00:03:53,400 And a framework is not just a library per se. 71 00:03:53,400 --> 00:03:56,770 A framework is also typically a way of doing things. 72 00:03:56,770 --> 00:03:58,760 So a framework includes some code that you 73 00:03:58,760 --> 00:04:01,910 should integrate into your own projects, whether it's CSS, JavaScript, 74 00:04:01,910 --> 00:04:05,000 or something more but it also includes typically 75 00:04:05,000 --> 00:04:08,210 a way of doing things, a way of naming your own files, 76 00:04:08,210 --> 00:04:13,350 a way of formatting your files, a way of building ultimately your application. 77 00:04:13,350 --> 00:04:15,350 And reasonable people, of course, will disagree, 78 00:04:15,350 --> 00:04:19,430 and so you'll find among several of these frameworks different design 79 00:04:19,430 --> 00:04:23,990 paradigms, different design beliefs, the best way as to do things. 80 00:04:23,990 --> 00:04:26,540 And again different-- reasonable people will disagree, 81 00:04:26,540 --> 00:04:29,900 and so part of the process of choosing these frameworks 82 00:04:29,900 --> 00:04:32,920 really boils down to what resonates most with you 83 00:04:32,920 --> 00:04:35,450 or with the engineers with whom you're working. 84 00:04:35,450 --> 00:04:39,140 And indeed what resonates above all else perhaps 85 00:04:39,140 --> 00:04:41,570 is what one is most familiar with. 86 00:04:41,570 --> 00:04:45,860 In fact, it's often the case that you or engineers you're working with simply 87 00:04:45,860 --> 00:04:49,370 have done a previous project in one of these frameworks but not the others. 88 00:04:49,370 --> 00:04:55,460 And so even if that framework is for some definition of inferior inferior, 89 00:04:55,460 --> 00:04:58,760 that might not necessarily be an overriding concern 90 00:04:58,760 --> 00:05:02,240 if you can actually build your MVP or your prototype 91 00:05:02,240 --> 00:05:05,420 faster with that particular framework because you know it already. 92 00:05:05,420 --> 00:05:07,580 Then if you could build it a little bit better, 93 00:05:07,580 --> 00:05:11,010 quote unquote, in quotes and so far as reasonable people 94 00:05:11,010 --> 00:05:14,310 can disagree as to what's inferior or superior in this world, 95 00:05:14,310 --> 00:05:16,740 then if you were to design it using a completely 96 00:05:16,740 --> 00:05:20,470 new framework for which there's just a non-trivial learning curve for you. 97 00:05:20,470 --> 00:05:22,746 And so there's, as in the case of data structures, 98 00:05:22,746 --> 00:05:25,620 as in the case of algorithms, as in the case of computer science more 99 00:05:25,620 --> 00:05:29,820 generally, there's these tradeoffs, and human time, developer time, 100 00:05:29,820 --> 00:05:32,850 learning time is certainly one of the resources 101 00:05:32,850 --> 00:05:37,170 that you have to decide how much of which you want to spend upfront. 102 00:05:37,170 --> 00:05:39,420 Meanwhile, in the world of CSS, there are also 103 00:05:39,420 --> 00:05:44,100 libraries there, collections of CSS files and frameworks 104 00:05:44,100 --> 00:05:47,910 really, methodologies for which you lay-- via which you lay out your site, 105 00:05:47,910 --> 00:05:51,760 like Bootstrap, Foundation, Semantic UI, and more, 106 00:05:51,760 --> 00:05:56,220 and these focus more so on the aesthetics of a user's experience, 107 00:05:56,220 --> 00:05:58,980 more so on the presentation of information 108 00:05:58,980 --> 00:06:03,780 and the types of user interface mechanisms, the buttons, the menus, 109 00:06:03,780 --> 00:06:06,450 the windows, and the like that a user might see on the screen. 110 00:06:06,450 --> 00:06:10,890 But here, too, there are so many different wheels 111 00:06:10,890 --> 00:06:12,420 that have been invented in the past. 112 00:06:12,420 --> 00:06:14,850 So many different people have decided, you know what? 113 00:06:14,850 --> 00:06:17,550 That default link on a web page could look much prettier. 114 00:06:17,550 --> 00:06:21,150 Or that button on a web page could look much better if you used my design. 115 00:06:21,150 --> 00:06:22,530 And so this is what's happened. 116 00:06:22,530 --> 00:06:27,630 The world has created and shared with others in the world various files 117 00:06:27,630 --> 00:06:32,430 that you, either in the context of JavaScript or in CSS or beyond, 118 00:06:32,430 --> 00:06:35,320 can integrate into your own projects. 119 00:06:35,320 --> 00:06:37,640 So how to even begin to vet these kinds of things, 120 00:06:37,640 --> 00:06:39,390 particularly since in a class like this we 121 00:06:39,390 --> 00:06:41,760 won't go into the weeds of evaluating these 122 00:06:41,760 --> 00:06:45,130 and even then we might not reach any sort of consensus. 123 00:06:45,130 --> 00:06:47,780 So the reality is typically relying on the engineers 124 00:06:47,780 --> 00:06:50,530 with whom you're working is first and foremost the place to start. 125 00:06:50,530 --> 00:06:51,363 What do people know? 126 00:06:51,363 --> 00:06:52,690 What are they comfortable with? 127 00:06:52,690 --> 00:06:53,520 What did they like? 128 00:06:53,520 --> 00:06:55,200 What they dislike about some framework? 129 00:06:55,200 --> 00:06:57,060 Did it actually speed up the work? 130 00:06:57,060 --> 00:06:58,680 Did it slow down the work? 131 00:06:58,680 --> 00:07:03,940 Did it create-- did it build up technical debt for them so to speak? 132 00:07:03,940 --> 00:07:08,100 For instance, just because something is easy and quick to get started 133 00:07:08,100 --> 00:07:13,440 with from the get go, is it so easy because it's riddled with poor design 134 00:07:13,440 --> 00:07:16,470 decisions such that as you get more and more users, 135 00:07:16,470 --> 00:07:20,430 maybe your application or your website's going to be slower and slower? 136 00:07:20,430 --> 00:07:23,010 Or maybe it's going to become harder to maintain, 137 00:07:23,010 --> 00:07:26,040 or it's going to be harder to onboard new people altogether. 138 00:07:26,040 --> 00:07:27,660 There's various trade offs there. 139 00:07:27,660 --> 00:07:29,776 And so considering what is optimal now, what 140 00:07:29,776 --> 00:07:32,400 is optimal in the medium term, what is optimal in the long term 141 00:07:32,400 --> 00:07:35,780 should perhaps be part of that whole conversation. 142 00:07:35,780 --> 00:07:39,510 Meanwhile, it's certainly a compelling thing from professional development 143 00:07:39,510 --> 00:07:42,760 perspective, for keeping things fresh, to actually go and learn something new. 144 00:07:42,760 --> 00:07:45,316 And so certainly punctuating one's experience in tech, 145 00:07:45,316 --> 00:07:47,940 should there be an opportunity to both pick up some new skills, 146 00:07:47,940 --> 00:07:53,040 to familiarize oneself with the latest and greatest and not necessarily change 147 00:07:53,040 --> 00:07:55,560 direction with each and every fad but generally 148 00:07:55,560 --> 00:07:59,050 be familiar with some of the trends in industry. 149 00:07:59,050 --> 00:08:01,200 And there's a bunch of ways with which to do that. 150 00:08:01,200 --> 00:08:03,900 I mean one, certainly relying on Google and other search engines 151 00:08:03,900 --> 00:08:06,510 just to get a sense of what the most popular hits are 152 00:08:06,510 --> 00:08:10,230 or search results when you search for something like popular JavaScript 153 00:08:10,230 --> 00:08:13,230 framework or some such search string like that. 154 00:08:13,230 --> 00:08:17,837 Looking on websites like Hacker News from Y Combinator, where there's 155 00:08:17,837 --> 00:08:20,170 an active community of folks from the startup community, 156 00:08:20,170 --> 00:08:23,130 especially talking about these kinds of technical decisions 157 00:08:23,130 --> 00:08:25,130 and design decisions more generally. 158 00:08:25,130 --> 00:08:28,440 Websites like Quora or other Q&A websites. 159 00:08:28,440 --> 00:08:32,220 Looking at GitHub.com, a popular web site where people store their code 160 00:08:32,220 --> 00:08:35,370 and can actually follow or star other people's 161 00:08:35,370 --> 00:08:39,630 repositories of code from which you can infer a sense of popularity 162 00:08:39,630 --> 00:08:43,740 based on how many people are following a framework x or y or z. 163 00:08:43,740 --> 00:08:46,560 But this is always a moving target, and so it's simply 164 00:08:46,560 --> 00:08:49,350 part of the conversation to have from the get go. 165 00:08:49,350 --> 00:08:51,870 And you're not necessarily going to regret a decision 166 00:08:51,870 --> 00:08:54,600 if you don't necessarily pick the most trendy 167 00:08:54,600 --> 00:08:59,250 or the one that's poised to take over all others 168 00:08:59,250 --> 00:09:00,840 because this is a fast changing world. 169 00:09:00,840 --> 00:09:03,330 And, in fact, one of the most frustrating if not expensive 170 00:09:03,330 --> 00:09:06,510 aspects of this world is just how quickly it changes. 171 00:09:06,510 --> 00:09:09,780 And so what you design today might not be what you design tomorrow, 172 00:09:09,780 --> 00:09:12,940 but that's also part of the excitement of this space. 173 00:09:12,940 --> 00:09:18,060 So with that said, that's just a glance at what the front end design process 174 00:09:18,060 --> 00:09:19,740 or decision process might be like. 175 00:09:19,740 --> 00:09:23,520 Let's take a look now at the back end, at least in the context of languages. 176 00:09:23,520 --> 00:09:26,880 So here you have an even longer list because at least in the front end 177 00:09:26,880 --> 00:09:29,580 world, recall that the de facto standard is 178 00:09:29,580 --> 00:09:33,570 to use JavaScript in the user facing web browser experience, 179 00:09:33,570 --> 00:09:38,490 but on the back end on the servers from which the HTML and the CSS and even 180 00:09:38,490 --> 00:09:42,900 the JavaScript are ultimately coming, you have so many more design decisions. 181 00:09:42,900 --> 00:09:47,040 So you have languages like Go and Java, JavaScript, .NET, PHP, Python Ruby, 182 00:09:47,040 --> 00:09:48,850 Scala, and so many others. 183 00:09:48,850 --> 00:09:52,920 These are perhaps just a few of the most popular these days. 184 00:09:52,920 --> 00:09:55,020 And all of them have their pluses and minuses. 185 00:09:55,020 --> 00:09:57,810 All of them have their supporters and the detractors. 186 00:09:57,810 --> 00:10:00,510 And all of them have folks who already know them 187 00:10:00,510 --> 00:10:04,500 or who might have to learn them among engineers with whom you might work. 188 00:10:04,500 --> 00:10:08,810 Meanwhile, though, those languages out of the box, so to speak, 189 00:10:08,810 --> 00:10:13,796 don't necessarily make designing a web-based application easy or as easy 190 00:10:13,796 --> 00:10:14,420 as it could be. 191 00:10:14,420 --> 00:10:18,290 They don't necessarily make building a mobile application as easy as it could 192 00:10:18,290 --> 00:10:21,320 be, or even if it is relatively easy, humans 193 00:10:21,320 --> 00:10:25,370 have found over time that, gosh, every time I build a mobile application, 194 00:10:25,370 --> 00:10:28,850 I'm like copying and pasting dozens or hundreds of lines of code 195 00:10:28,850 --> 00:10:31,460 because they all share a common framework or maybe 196 00:10:31,460 --> 00:10:35,120 a common meaning system or a common set of functionality. 197 00:10:35,120 --> 00:10:39,260 And so in this world to have libraries of reusable code built 198 00:10:39,260 --> 00:10:43,700 up and frameworks, libraries of code and methodologies via which you're 199 00:10:43,700 --> 00:10:46,430 building your applications, arisen. 200 00:10:46,430 --> 00:10:51,540 Among them Django, Flask, Laravel, .NET, Node.js, Rails, and the like, 201 00:10:51,540 --> 00:10:54,710 .NET being up there, too, because it generally refers to a set of languages 202 00:10:54,710 --> 00:10:59,880 you might use as well as the framework that oversees those various languages. 203 00:10:59,880 --> 00:11:03,080 And there's even more options ahead of you here. 204 00:11:03,080 --> 00:11:07,970 So how do you begin to pick among these options as well? 205 00:11:07,970 --> 00:11:12,680 Well, here, too, you're often guided by what your engineering team knows, 206 00:11:12,680 --> 00:11:15,800 perhaps what your own system administrators 207 00:11:15,800 --> 00:11:18,560 or your operational people know, so the folks who were actually 208 00:11:18,560 --> 00:11:21,050 maintaining the servers, whether they're locally on site, 209 00:11:21,050 --> 00:11:23,711 maybe they are the ones running things in the cloud, 210 00:11:23,711 --> 00:11:26,960 whether that's Amazon's or Google's or Microsoft's cloud or some other company 211 00:11:26,960 --> 00:11:30,440 still, depending on what that cloud infrastructure supports, 212 00:11:30,440 --> 00:11:34,820 might influence your decision making as to what language you might use. 213 00:11:34,820 --> 00:11:37,400 Sometimes the nature of your application might 214 00:11:37,400 --> 00:11:39,260 influence the language you might use. 215 00:11:39,260 --> 00:11:42,500 For instance, some of these languages make it a little bit easier 216 00:11:42,500 --> 00:11:45,950 to make real-time applications, applications 217 00:11:45,950 --> 00:11:50,090 that support chat servers or immediate interactivity, 218 00:11:50,090 --> 00:11:53,330 where there's a constant connection or the illusion of a constant connection 219 00:11:53,330 --> 00:11:54,890 between browser and server. 220 00:11:54,890 --> 00:11:57,170 PHP doesn't really make that all that easy. 221 00:11:57,170 --> 00:12:00,170 You can do it, but it wasn't really designed with that use case in mind. 222 00:12:00,170 --> 00:12:04,040 By contrast, JavaScript, via framework called Node.js, 223 00:12:04,040 --> 00:12:07,370 makes it really easy to do, and it was designed more so 224 00:12:07,370 --> 00:12:08,870 with that kind of use case in mind. 225 00:12:08,870 --> 00:12:13,100 And so here, too, you see hints of why some of these languages and in terms 226 00:12:13,100 --> 00:12:18,590 frameworks have arisen because they are actual solutions to concrete problems 227 00:12:18,590 --> 00:12:20,677 people have experienced in the past. 228 00:12:20,677 --> 00:12:22,760 And some of these languages are newer than others, 229 00:12:22,760 --> 00:12:25,140 and so they might come with more features 230 00:12:25,140 --> 00:12:27,980 so you don't have to rely as much on third-party libraries. 231 00:12:27,980 --> 00:12:32,420 Some of them less vetted or maybe less robust or even less secure than what 232 00:12:32,420 --> 00:12:34,910 comes with the language itself. 233 00:12:34,910 --> 00:12:39,390 And recall, too, that these languages are often constantly evolving, 234 00:12:39,390 --> 00:12:42,710 some more quickly than others, but there are new versions 235 00:12:42,710 --> 00:12:44,210 of these languages coming out. 236 00:12:44,210 --> 00:12:47,630 And so even within the confines of a given language like Java 237 00:12:47,630 --> 00:12:51,620 might there be new and improved features every year, every couple of years. 238 00:12:51,620 --> 00:12:54,230 And so sometimes actually picking a version 239 00:12:54,230 --> 00:12:59,220 of these languages or frameworks is one of the design decisions to bear. 240 00:12:59,220 --> 00:13:02,780 And I would say from a non-technical perspective, most compelling is just 241 00:13:02,780 --> 00:13:05,930 to be aware of these kinds of technologies, 242 00:13:05,930 --> 00:13:11,120 these kinds of buzzwords du jour, but also aware of these kinds of tradeoffs. 243 00:13:11,120 --> 00:13:13,010 It's not necessary to get into the weeds I 244 00:13:13,010 --> 00:13:16,010 think of understanding each and every language and what it's good for. 245 00:13:16,010 --> 00:13:18,470 Although once you have a general understanding 246 00:13:18,470 --> 00:13:21,920 of this world, of programming, of server side architecture, 247 00:13:21,920 --> 00:13:26,870 of HTTP, and web pages and the like, can you via Google and other websites 248 00:13:26,870 --> 00:13:29,480 I think start to wrap your mind around some of the tradeoffs 249 00:13:29,480 --> 00:13:32,210 and perhaps even start to tease apart which 250 00:13:32,210 --> 00:13:35,600 are technically compelling arguments that you might see online versus just 251 00:13:35,600 --> 00:13:38,990 religious objections to this language or that because that's often 252 00:13:38,990 --> 00:13:41,900 the case when folks get into heated discussions of language 253 00:13:41,900 --> 00:13:43,940 choices for instance. 254 00:13:43,940 --> 00:13:47,780 But I think ultimately understanding the tradeoffs, the onboarding time 255 00:13:47,780 --> 00:13:52,130 or the learning curve for various languages, the appropriateness 256 00:13:52,130 --> 00:13:54,350 of language for certain specific use cases 257 00:13:54,350 --> 00:13:59,180 like the real-time chat applications or whatever your own product happens 258 00:13:59,180 --> 00:14:02,400 to be, what your engineers already know what they're good at, 259 00:14:02,400 --> 00:14:05,780 what they prefer to use what language and framework is easiest 260 00:14:05,780 --> 00:14:10,490 for new hires, maybe six months hence or two years and to actually come on board 261 00:14:10,490 --> 00:14:14,540 and understand so that you're not expecting the most experienced 262 00:14:14,540 --> 00:14:16,924 of new hires to constantly be in your pipeline. 263 00:14:16,924 --> 00:14:20,090 So appreciating these kinds of tradeoffs and asking these kinds of questions 264 00:14:20,090 --> 00:14:23,600 even among the engineers that are perhaps making the decision ultimately 265 00:14:23,600 --> 00:14:26,060 is a valuable way to contribute to the conversation 266 00:14:26,060 --> 00:14:29,390 and take some comfort in the fact that your product need not 267 00:14:29,390 --> 00:14:31,310 be a complete black box. 268 00:14:31,310 --> 00:14:34,580 You might not necessarily be able to implement it from scratch yourself, 269 00:14:34,580 --> 00:14:36,620 but you can at least ask the right questions 270 00:14:36,620 --> 00:14:40,460 and be a sounding board for some of the answers that come back. 271 00:14:40,460 --> 00:14:43,760 Now, there are some fundamental differences 272 00:14:43,760 --> 00:14:47,570 and some of these architectural decisions among which 273 00:14:47,570 --> 00:14:49,490 are around choice of database. 274 00:14:49,490 --> 00:14:54,860 Indeed, most any web application today has a back end database inside 275 00:14:54,860 --> 00:14:59,030 of which is stored data from users, whether its purchase orders or user 276 00:14:59,030 --> 00:15:01,487 registrations and passwords and any amount of data that's 277 00:15:01,487 --> 00:15:04,070 collected from users at the end of the day is stored somewhere 278 00:15:04,070 --> 00:15:06,260 and that somewhere is called a database. 279 00:15:06,260 --> 00:15:08,590 But there's different types of databases. 280 00:15:08,590 --> 00:15:14,930 Two of the biggest categories these days or perhaps SQL and the opposite NoSQL, 281 00:15:14,930 --> 00:15:19,120 as it's playfully called, SQL being structured query language and NoSQL 282 00:15:19,120 --> 00:15:21,640 referring to a class of databases that doesn't support SQL 283 00:15:21,640 --> 00:15:23,920 and indeed is not generally relational. 284 00:15:23,920 --> 00:15:25,660 And we'll soon see what that means. 285 00:15:25,660 --> 00:15:29,020 But even with this world, do you have a veritable menu of options 286 00:15:29,020 --> 00:15:32,950 MariaDB mySQL, Oracle, PostgreSQL Server. 287 00:15:32,950 --> 00:15:34,825 And then within those-- 288 00:15:34,825 --> 00:15:37,867 within that relational world do you also have the cont-- 289 00:15:37,867 --> 00:15:39,700 in addition to that relational world, do you 290 00:15:39,700 --> 00:15:43,270 have the contrast of the object-oriented or document store 291 00:15:43,270 --> 00:15:47,770 world, things like Bigtable, Cassandra, HBase, MongoDB, and others. 292 00:15:47,770 --> 00:15:50,500 And already it can be sort of overwhelming to feel like just 293 00:15:50,500 --> 00:15:54,010 as you're getting up to speed on what the web is and how web pages work, 294 00:15:54,010 --> 00:15:55,240 oh my god! 295 00:15:55,240 --> 00:15:57,900 We're just beginning to make our decisions. 296 00:15:57,900 --> 00:15:59,650 But generally these decisions, too, can be 297 00:15:59,650 --> 00:16:02,192 guided by what your team knows, what you're comfortable with, 298 00:16:02,192 --> 00:16:03,941 what the price might be for some of these. 299 00:16:03,941 --> 00:16:05,800 And some of these are free and open source. 300 00:16:05,800 --> 00:16:08,600 Some of them have commercial licenses associated with them. 301 00:16:08,600 --> 00:16:12,670 Some of them are supported easily for you with your cloud provider, 302 00:16:12,670 --> 00:16:15,920 wherever you're hosting your servers or you might have to host them yourself. 303 00:16:15,920 --> 00:16:18,619 So you can begin to narrow the field of options. 304 00:16:18,619 --> 00:16:21,160 And indeed, especially when building multiple products, might 305 00:16:21,160 --> 00:16:23,410 you build on past experience of yourself. 306 00:16:23,410 --> 00:16:26,800 So, for instance, for the course and all of our web-based applications, 307 00:16:26,800 --> 00:16:30,160 we tend to use a lot of the same technologies 308 00:16:30,160 --> 00:16:33,610 and only recently have we begun to transition from one main language 309 00:16:33,610 --> 00:16:36,430 to another but doing it pretty much for all of our applications 310 00:16:36,430 --> 00:16:41,590 across the board so that we don't have to worry about some of the team members 311 00:16:41,590 --> 00:16:42,980 knowing x and y and z. 312 00:16:42,980 --> 00:16:46,450 It's just there's an economy to scale to focusing on relatively 313 00:16:46,450 --> 00:16:48,700 fewer technologies internally. 314 00:16:48,700 --> 00:16:53,920 But let's dive in a little deeper into SQL and NoSQL if only because they're 315 00:16:53,920 --> 00:16:57,740 so cleanly bucketized into SQL and not-SQL really. 316 00:16:57,740 --> 00:17:00,700 What do we mean by this, and what does a database really do? 317 00:17:00,700 --> 00:17:05,589 So a database typically supports these-- at least these four operations 318 00:17:05,589 --> 00:17:07,990 or categories of operations playfully called 319 00:17:07,990 --> 00:17:14,400 CRUD, which stands for create, read, update, and delete, 320 00:17:14,400 --> 00:17:17,200 though you might see some variations on what the actual words are. 321 00:17:17,200 --> 00:17:21,099 But CRUD refers to those four fundamental operations. 322 00:17:21,099 --> 00:17:25,599 Now, in the world of SQL, or S-Q-L, structured query language, 323 00:17:25,599 --> 00:17:27,970 which itself is a programming language, and it's 324 00:17:27,970 --> 00:17:33,310 a programming language you use to query a database, to add data to a database, 325 00:17:33,310 --> 00:17:35,740 remove it, edit it, and the like. 326 00:17:35,740 --> 00:17:39,190 Within the world of SQL, there are-- 327 00:17:39,190 --> 00:17:41,620 is a direct mapping of these four operations, 328 00:17:41,620 --> 00:17:45,160 the four keywords, for features of the SQL language, 329 00:17:45,160 --> 00:17:48,280 namely create, select, update, and delete. 330 00:17:48,280 --> 00:17:51,250 So it's almost CRUD, but it doesn't quite line up perfectly. 331 00:17:51,250 --> 00:17:55,720 So create, read, update, delete is the general notion of the four operations 332 00:17:55,720 --> 00:17:59,320 database might support, and in the world of SQL, which we're about to dive into, 333 00:17:59,320 --> 00:18:04,120 might you see these four commands specifically create, select, update, 334 00:18:04,120 --> 00:18:05,290 and delete. 335 00:18:05,290 --> 00:18:08,410 So what does it mean to be a SQL database, or more 336 00:18:08,410 --> 00:18:11,110 generally, what does it mean to be a relational database? 337 00:18:11,110 --> 00:18:14,350 Because a relational database is often historically 338 00:18:14,350 --> 00:18:17,680 what people think of when they think of databases and only in recent years 339 00:18:17,680 --> 00:18:20,980 has this NoSQL trend been catching up that changes the paradigm. 340 00:18:20,980 --> 00:18:24,230 And we'll look at the flip side in just a moment. 341 00:18:24,230 --> 00:18:27,310 So if you've ever seen this, whether it's this version of Excel 342 00:18:27,310 --> 00:18:31,990 or some equivalent version of Numbers or Google Spreadsheets or the like, 343 00:18:31,990 --> 00:18:35,990 this is kind of a relational database. 344 00:18:35,990 --> 00:18:39,160 It is a piece of software that allows you to lay out 345 00:18:39,160 --> 00:18:41,650 your data in rows and columns. 346 00:18:41,650 --> 00:18:45,850 And among those rows and columns are there typically relationships. 347 00:18:45,850 --> 00:18:51,010 Consider after all the last time you used a spreadsheet, if ever, odds 348 00:18:51,010 --> 00:18:54,040 are there was some kind of meaning if you put data 349 00:18:54,040 --> 00:18:59,150 in column A versus B versus C versus D. In other words, 350 00:18:59,150 --> 00:19:02,500 when adding data to a spreadsheet, typically if you're using it correctly, 351 00:19:02,500 --> 00:19:05,740 you don't just start plopping your data in any random box that doesn't yet 352 00:19:05,740 --> 00:19:08,290 have a number or a word in it. 353 00:19:08,290 --> 00:19:11,500 You generally organize the data such that in column A 354 00:19:11,500 --> 00:19:13,570 might be one type of data, column B might 355 00:19:13,570 --> 00:19:16,340 be another type of data, and so forth. 356 00:19:16,340 --> 00:19:20,890 And so it's relational in the sense that the numbers and the data within 357 00:19:20,890 --> 00:19:22,090 relate to one another. 358 00:19:22,090 --> 00:19:25,720 And it's also relational in the sense that you can have multiple sheets 359 00:19:25,720 --> 00:19:26,700 even within a file. 360 00:19:26,700 --> 00:19:30,250 So by default with Excel and Numbers and Google Sheets do you get just one sheet 361 00:19:30,250 --> 00:19:33,190 or worksheet by default. But if you ever want 362 00:19:33,190 --> 00:19:36,400 to have multiple types of data but all in the same file 363 00:19:36,400 --> 00:19:39,250 just because it's kind of nice and orderly to keep it all together, 364 00:19:39,250 --> 00:19:42,310 you can click the plus and create a new sheet 365 00:19:42,310 --> 00:19:45,610 and have a completely different tabular structure, a different number of rows 366 00:19:45,610 --> 00:19:48,160 and columns and different meanings for those columns 367 00:19:48,160 --> 00:19:51,520 but somehow the data is all related, ergo 368 00:19:51,520 --> 00:19:53,390 this notion of a relational database. 369 00:19:53,390 --> 00:19:56,800 So a relational database stores data in tables, 370 00:19:56,800 --> 00:20:01,670 and a table is in turn a set of rows and columns. 371 00:20:01,670 --> 00:20:04,700 So why does this actually matter? 372 00:20:04,700 --> 00:20:09,880 Well, Excel is not all that powerful when it comes large datasets. 373 00:20:09,880 --> 00:20:13,540 In, fact it wasn't all that long ago that Excel I believe only 374 00:20:13,540 --> 00:20:22,250 supported as many as 65,536 or 35 rows probably, 375 00:20:22,250 --> 00:20:24,000 and that's because, long story short, they 376 00:20:24,000 --> 00:20:28,750 used the 16-bit integer, the biggest number for which is 65,535, 377 00:20:28,750 --> 00:20:35,650 and so Excel physically couldn't count as high as 65,536 or 7 or 8 378 00:20:35,650 --> 00:20:38,530 because they just didn't use enough storage underneath the hood. 379 00:20:38,530 --> 00:20:42,220 But even if you had that much data, and that's quite a lot of rows, 380 00:20:42,220 --> 00:20:44,110 the software just tended to be super slow 381 00:20:44,110 --> 00:20:46,002 at least in my own experience back in the day 382 00:20:46,002 --> 00:20:47,710 trying to manipulate very large datasets. 383 00:20:47,710 --> 00:20:51,700 And Excel just wasn't designed for tens of thousands of rows. 384 00:20:51,700 --> 00:20:53,740 By the time you're at that much data, you 385 00:20:53,740 --> 00:20:58,240 should really be graduating, so to speak, to an actual relational database 386 00:20:58,240 --> 00:21:03,220 management system, a server-driven database that actually leverages not 387 00:21:03,220 --> 00:21:06,820 just files but memory more effectively. 388 00:21:06,820 --> 00:21:09,070 In fact, what-- among the features you get 389 00:21:09,070 --> 00:21:13,810 from products like MariaDB and MySQL and Oracle and Postgres and the like 390 00:21:13,810 --> 00:21:17,950 is you get really smart people who have implemented the software in such a way 391 00:21:17,950 --> 00:21:22,750 that it makes your creates and your reads and your updates and your deletes 392 00:21:22,750 --> 00:21:26,770 faster than they might be if you were just storing all of your data 393 00:21:26,770 --> 00:21:27,910 in a big file. 394 00:21:27,910 --> 00:21:30,790 For instance, in a big file like Excel, if you 395 00:21:30,790 --> 00:21:33,880 want to search for some information, you can hit Command F or Control F 396 00:21:33,880 --> 00:21:38,230 or whatever, type in a keyword, and then Excel or Numbers or Sheets 397 00:21:38,230 --> 00:21:39,520 will search for it. 398 00:21:39,520 --> 00:21:43,600 But generally these spreadsheet programs are going to search for your data 399 00:21:43,600 --> 00:21:46,100 pretty much by a brute force, search top to bottom, 400 00:21:46,100 --> 00:21:48,782 left to right, looking in every darn cell for that data. 401 00:21:48,782 --> 00:21:50,990 That's fine if you've got a pretty small spreadsheet. 402 00:21:50,990 --> 00:21:53,230 We slow humans aren't going to notice the difference. 403 00:21:53,230 --> 00:21:56,710 But in the context of really big datasets, tens of thousands 404 00:21:56,710 --> 00:21:59,500 of rows, let alone millions or billions, it 405 00:21:59,500 --> 00:22:01,660 does not suffice to look at every piece of data 406 00:22:01,660 --> 00:22:07,150 when looking for a certain phrase or a certain number or some such value. 407 00:22:07,150 --> 00:22:13,030 You want the database itself to do some anticipatory optimization, sort 408 00:22:13,030 --> 00:22:16,600 of working its magic underneath the hood using various algorithms and data 409 00:22:16,600 --> 00:22:19,570 structures, so as to optimize those queries 410 00:22:19,570 --> 00:22:22,360 and to give me answers in logarithmic time, 411 00:22:22,360 --> 00:22:28,070 not linear time, or time that's faster than searching the whole darn thing. 412 00:22:28,070 --> 00:22:31,540 So at some point, spreadsheet software does not cut it, 413 00:22:31,540 --> 00:22:35,020 and you transition to a more proper relational database. 414 00:22:35,020 --> 00:22:38,050 But moreover at that point, you have to start 415 00:22:38,050 --> 00:22:41,350 deciding how you want the database to store your data. 416 00:22:41,350 --> 00:22:43,750 Because at the end of the day, we humans generally 417 00:22:43,750 --> 00:22:48,550 know a little more about the programs we write about the data 418 00:22:48,550 --> 00:22:50,530 we're going to be storing. 419 00:22:50,530 --> 00:22:54,820 And by this I mean, if I am storing a bunch of data in a database, 420 00:22:54,820 --> 00:22:59,072 I probably know better than the computer might know which of these values 421 00:22:59,072 --> 00:23:01,780 is always going to be like an integer or which of these is always 422 00:23:01,780 --> 00:23:04,330 going to be a dollar amount or which phrase is always 423 00:23:04,330 --> 00:23:07,660 going to look like a time of day or a date or day 424 00:23:07,660 --> 00:23:09,970 of the week or some other such value. 425 00:23:09,970 --> 00:23:13,870 And so we humans can actually help databases help us 426 00:23:13,870 --> 00:23:17,750 be more highly performing by providing them with hints, 427 00:23:17,750 --> 00:23:22,480 otherwise known as data types, that tell the database what type of data to store 428 00:23:22,480 --> 00:23:26,380 and therefore how to store it most efficiently. 429 00:23:26,380 --> 00:23:30,110 Some of those popular data types in the world of SQL then are these, 430 00:23:30,110 --> 00:23:32,020 and let's just take a look at a few of these. 431 00:23:32,020 --> 00:23:33,730 So char and varchar. 432 00:23:33,730 --> 00:23:36,820 So char being shorthand for character, and it's not 433 00:23:36,820 --> 00:23:39,690 a single character like a or b or c. 434 00:23:39,690 --> 00:23:43,510 Character, or char, generally refers to a column 435 00:23:43,510 --> 00:23:48,040 in a database that is going to store one or more 436 00:23:48,040 --> 00:23:51,320 characters a little confusingly, a string, so to speak, 437 00:23:51,320 --> 00:23:54,380 where a string is a sequence of 0 or more characters. 438 00:23:54,380 --> 00:23:57,340 So when designing a database column that you know 439 00:23:57,340 --> 00:24:01,210 is going to contain a word or a sentence or even a paragraph, 440 00:24:01,210 --> 00:24:03,250 you can tell the database, hey, database, 441 00:24:03,250 --> 00:24:07,690 make this column this many characters wide, i.e. 442 00:24:07,690 --> 00:24:10,000 allocate that much data upfront. 443 00:24:10,000 --> 00:24:12,520 But if you're not sure, as might often be the case-- 444 00:24:12,520 --> 00:24:14,230 maybe someone has a short name. 445 00:24:14,230 --> 00:24:16,300 Maybe someone has a long name. 446 00:24:16,300 --> 00:24:18,970 Maybe someone has a long address or a short address. 447 00:24:18,970 --> 00:24:21,760 If you don't really know what the right length is 448 00:24:21,760 --> 00:24:25,240 for a column for the values a user is going to provide, 449 00:24:25,240 --> 00:24:29,980 you can instead use varchar for variable length character strings, which is 450 00:24:29,980 --> 00:24:32,570 to say you specify only an upper bound. 451 00:24:32,570 --> 00:24:36,680 So I don't know what the longest name is in the whole world. 452 00:24:36,680 --> 00:24:40,660 But my name is D-a-v-i-d, five feels like it's kind of short. 453 00:24:40,660 --> 00:24:43,000 Probably some people with longer names in the world. 454 00:24:43,000 --> 00:24:44,079 20, is that enough? 455 00:24:44,079 --> 00:24:44,620 I don't know? 456 00:24:44,620 --> 00:24:45,340 50? 457 00:24:45,340 --> 00:24:46,280 I don't know, 100? 458 00:24:46,280 --> 00:24:47,222 Probably. 459 00:24:47,222 --> 00:24:49,930 I should probably Google to find out with a bit more reassurance, 460 00:24:49,930 --> 00:24:55,410 but this is a decision that the web designer or the database designer 461 00:24:55,410 --> 00:24:56,512 is going to have to make. 462 00:24:56,512 --> 00:24:58,720 You can't just tell, and you don't want to just tell, 463 00:24:58,720 --> 00:25:04,309 the database accept any length string because the more flexible 464 00:25:04,309 --> 00:25:07,600 you expect the database to be, the more generous you expect the database to be, 465 00:25:07,600 --> 00:25:10,450 the less optimization it can do for you. 466 00:25:10,450 --> 00:25:14,980 By contrast, the more precise you can be, the more conservative you can be, 467 00:25:14,980 --> 00:25:19,080 the more optimization algorithmically the database can 468 00:25:19,080 --> 00:25:23,010 do so that when you ask for data back, it can give you those answers faster. 469 00:25:23,010 --> 00:25:25,480 When you insert data, it can insert it faster. 470 00:25:25,480 --> 00:25:28,350 So the more helpful we humans can be with our databases, 471 00:25:28,350 --> 00:25:30,160 the more help the database can be in turn, 472 00:25:30,160 --> 00:25:33,368 and that's probably a good thing when we have lots and lots of data and users 473 00:25:33,368 --> 00:25:38,130 because we want the common case to be highly performing. 474 00:25:38,130 --> 00:25:41,790 It might cost me a minute, five minutes upfront to really noodle on the problem 475 00:25:41,790 --> 00:25:43,560 and figure out what the best design is. 476 00:25:43,560 --> 00:25:46,830 But that cost is going to be amortized over thousands 477 00:25:46,830 --> 00:25:50,430 of users, millions of users, who are then benefiting thereafter 478 00:25:50,430 --> 00:25:52,680 from a better database design. 479 00:25:52,680 --> 00:25:54,320 So where is the line to be drawn? 480 00:25:54,320 --> 00:25:58,260 We'll explore this in the context of an example, but it kind of depends. 481 00:25:58,260 --> 00:26:00,900 There is no right answer necessarily. 482 00:26:00,900 --> 00:26:05,250 It really depends on your use case and the data you're trying to store. 483 00:26:05,250 --> 00:26:08,660 With numbers, too, do you have some discretion. 484 00:26:08,660 --> 00:26:11,542 Integer means what it is, generally a 32-bit value, which 485 00:26:11,542 --> 00:26:13,750 means you can have a number from negative two billion 486 00:26:13,750 --> 00:26:16,070 to positive two billion, give or take. 487 00:26:16,070 --> 00:26:17,210 But that might be overkill. 488 00:26:17,210 --> 00:26:19,460 If you know you're dealing with really small integers, 489 00:26:19,460 --> 00:26:21,500 maybe you don't need 32 bits. 490 00:26:21,500 --> 00:26:25,094 Maybe you want fewer and so you might just say, small int. 491 00:26:25,094 --> 00:26:26,510 Doesn't need to be that many bits. 492 00:26:26,510 --> 00:26:28,534 I know my values aren't going to get that large. 493 00:26:28,534 --> 00:26:30,200 I might as well save the database space. 494 00:26:30,200 --> 00:26:32,210 Or by contrast, wait a minute. 495 00:26:32,210 --> 00:26:36,560 Going to have more than two billion users, success permitting. 496 00:26:36,560 --> 00:26:39,790 I'm going to, therefore, want to use a big int like 64 bits, 497 00:26:39,790 --> 00:26:45,170 so I can have many, many, many, many users or rows in my database. 498 00:26:45,170 --> 00:26:47,930 And indeed some of the most popular websites out there 499 00:26:47,930 --> 00:26:49,142 have run into this issue. 500 00:26:49,142 --> 00:26:51,350 The Facebooks, YouTubes, and the others of the world, 501 00:26:51,350 --> 00:26:53,600 well, they just have so much darn data, they 502 00:26:53,600 --> 00:26:58,180 had better not cap the number of rows in their database table 503 00:26:58,180 --> 00:27:03,060 at only two billion because they might well have that many and more. 504 00:27:03,060 --> 00:27:05,820 Then why not just choose varchar with a really big number? 505 00:27:05,820 --> 00:27:09,600 Why not choose big int with a really big number of bits? 506 00:27:09,600 --> 00:27:12,090 Well, it's wasteful. 507 00:27:12,090 --> 00:27:15,660 You shouldn't over-allocate because then you're just spending more space, 508 00:27:15,660 --> 00:27:18,210 and space costs money and might even cost time 509 00:27:18,210 --> 00:27:20,560 to search if there's more bits to be looked at. 510 00:27:20,560 --> 00:27:24,600 And so you don't necessarily want to just cop out and say, use as much space 511 00:27:24,600 --> 00:27:29,340 as you want or as is necessary because, again, we can't be as helpful therefore 512 00:27:29,340 --> 00:27:31,210 to the database. 513 00:27:31,210 --> 00:27:35,400 Now, Numbers are an interesting one, and this is true in programming languages 514 00:27:35,400 --> 00:27:39,630 whether it's SQL or C or C++ or yet others. 515 00:27:39,630 --> 00:27:44,460 It turns out that choosing how big your data is, or anticipating it, 516 00:27:44,460 --> 00:27:47,730 has some real impact in some cases of numbers. 517 00:27:47,730 --> 00:27:54,900 So it turns out that a integer of course is just a number like negative 101 518 00:27:54,900 --> 00:27:57,100 and on up in both directions. 519 00:27:57,100 --> 00:28:00,690 But a floating point value or float is a real number, 520 00:28:00,690 --> 00:28:03,840 a number that's not necessarily an integer, but a real number that 521 00:28:03,840 --> 00:28:07,500 has a decimal point and some number of digits after that decimal point 522 00:28:07,500 --> 00:28:11,190 that may or may not be representable precisely as a fraction. 523 00:28:11,190 --> 00:28:14,670 So that's a real number or a float. 524 00:28:14,670 --> 00:28:16,637 If you want more bits or precision than that, 525 00:28:16,637 --> 00:28:19,470 you can actually specify double precision, which gives you more bits 526 00:28:19,470 --> 00:28:22,870 and therefore you can have even more digits after the decimal point. 527 00:28:22,870 --> 00:28:26,020 But the key takeaway here is that at the end of the day, 528 00:28:26,020 --> 00:28:29,050 it's going to be finite if you're representing a number. 529 00:28:29,050 --> 00:28:32,940 And so if you do use something like a float, even a double precision float, 530 00:28:32,940 --> 00:28:35,340 which gives you more bits of precision. 531 00:28:35,340 --> 00:28:38,700 At the end of the day, last I recall from grade school, 532 00:28:38,700 --> 00:28:43,050 there is an infinite number of numbers in the world, both integers 533 00:28:43,050 --> 00:28:44,620 and real numbers for that matter. 534 00:28:44,620 --> 00:28:48,960 So in both the case of these integer base numbers and these floating point 535 00:28:48,960 --> 00:28:52,530 values, you can only count so high, or you can only 536 00:28:52,530 --> 00:28:54,900 specify a number so precisely. 537 00:28:54,900 --> 00:28:58,020 And at the end of the day, you might have some overflow 538 00:28:58,020 --> 00:29:01,860 where you just can't represent bigger numbers, whether positive or negative, 539 00:29:01,860 --> 00:29:05,040 or you just can't represent enough decimal points-- 540 00:29:05,040 --> 00:29:09,330 enough numbers after the decimal point to represent a number 541 00:29:09,330 --> 00:29:11,980 perfectly accurately. 542 00:29:11,980 --> 00:29:13,290 And so there's this tradeoff. 543 00:29:13,290 --> 00:29:16,300 You might want more and more space, but at some point, 544 00:29:16,300 --> 00:29:18,599 you can have an infinite amount of space. 545 00:29:18,599 --> 00:29:19,890 Computers are physical devices. 546 00:29:19,890 --> 00:29:23,040 They only have a physical amount of memory inside of them. 547 00:29:23,040 --> 00:29:24,580 You might have to draw a line. 548 00:29:24,580 --> 00:29:27,840 And so if you've ever seen some older movies like Superman 3, 549 00:29:27,840 --> 00:29:31,260 which has a great incarnation of this or somewhat more recently, 550 00:29:31,260 --> 00:29:35,580 Office Space, where there's money making scam whereby the companies in question, 551 00:29:35,580 --> 00:29:41,182 long story short, were constantly manipulating monetary amounts 552 00:29:41,182 --> 00:29:43,140 in their database systems, but they were always 553 00:29:43,140 --> 00:29:45,090 rounding off fractions of pennies. 554 00:29:45,090 --> 00:29:48,540 And so the masterminds in both movies started 555 00:29:48,540 --> 00:29:51,040 pocketing all of those fractions of pennies, 556 00:29:51,040 --> 00:29:53,610 but hilariousness ensues when they don't quite 557 00:29:53,610 --> 00:29:56,070 realize how much those fractions of pennies add up. 558 00:29:56,070 --> 00:29:58,275 But that too is an issue of imprecision. 559 00:29:58,275 --> 00:30:02,610 We in the human world generally, when going to stores and such, 560 00:30:02,610 --> 00:30:05,650 use only two decimal points of precision. 561 00:30:05,650 --> 00:30:07,830 But investment banks and banks more generally 562 00:30:07,830 --> 00:30:10,140 might actually use more decimal point-- more numbers 563 00:30:10,140 --> 00:30:12,030 after the decimal point than that. 564 00:30:12,030 --> 00:30:16,740 And so having the ability of expressing numbers more precisely is compelling. 565 00:30:16,740 --> 00:30:20,460 Thankfully, there does exist Decimal, which 566 00:30:20,460 --> 00:30:24,060 allows you to specify how many numbers maximally you essentially 567 00:30:24,060 --> 00:30:29,250 want before and after the decimal point or the total number in question. 568 00:30:29,250 --> 00:30:31,480 And so that would be an alternative to these others. 569 00:30:31,480 --> 00:30:34,380 But it might end up then costing you more space just 570 00:30:34,380 --> 00:30:35,800 to get that more precision. 571 00:30:35,800 --> 00:30:38,010 So here, too, as with the decisions around frameworks 572 00:30:38,010 --> 00:30:40,710 and libraries and languages, here, too, there's a tradeoff. 573 00:30:40,710 --> 00:30:44,310 Even at this lower level, when you really get into it, deciding 574 00:30:44,310 --> 00:30:46,890 how to store your data in a database. 575 00:30:46,890 --> 00:30:51,630 Lastly, and a little more easily, there are data types like to Date and Time 576 00:30:51,630 --> 00:30:54,210 and Timestamp, which do as they say. 577 00:30:54,210 --> 00:30:55,230 They look like dates. 578 00:30:55,230 --> 00:30:56,105 They look like times. 579 00:30:56,105 --> 00:30:58,230 They look like timestamps, just some counter 580 00:30:58,230 --> 00:31:00,540 from some preordained moment in time. 581 00:31:00,540 --> 00:31:02,460 And these data types are commonly used as you 582 00:31:02,460 --> 00:31:05,580 might guess to store these types of data in a database. 583 00:31:05,580 --> 00:31:07,137 When did the user last log in? 584 00:31:07,137 --> 00:31:08,970 When did he or she register for the website? 585 00:31:08,970 --> 00:31:12,600 When did he or she buy something from our catalog or the like? 586 00:31:12,600 --> 00:31:15,040 You can represent those and more data types 587 00:31:15,040 --> 00:31:19,500 in a standard relational database that supports SQL. 588 00:31:19,500 --> 00:31:22,470 But you have some other options, too. 589 00:31:22,470 --> 00:31:24,570 It turns out that in a relational database, 590 00:31:24,570 --> 00:31:28,770 you can be even more helpful to the database by telling it in advance 591 00:31:28,770 --> 00:31:31,170 if any of your columns should be considered 592 00:31:31,170 --> 00:31:35,590 a primary key or a foreign key or a unique constraint. 593 00:31:35,590 --> 00:31:36,910 Now, what does this mean? 594 00:31:36,910 --> 00:31:41,400 Well, typically with data, it is useful to be 595 00:31:41,400 --> 00:31:47,984 able to uniquely identify a row in your table in your spreadsheets 596 00:31:47,984 --> 00:31:49,650 without having to look at the whole row. 597 00:31:49,650 --> 00:31:52,680 For instance, when using Excel or Numbers or Google Sheets, 598 00:31:52,680 --> 00:31:54,930 you'll notice that by default, all of the rows 599 00:31:54,930 --> 00:31:57,360 are just numbered 1 through whatever. 600 00:31:57,360 --> 00:32:01,040 That's useful because if you are collaborating with someone or you 601 00:32:01,040 --> 00:32:03,050 yourself are just trying to find some value, 602 00:32:03,050 --> 00:32:08,030 you could just jump ahead to like row 50 to identify the 50th row of your data. 603 00:32:08,030 --> 00:32:10,640 You don't have to look for a specific name 604 00:32:10,640 --> 00:32:13,490 or address or purchase order or whatever it 605 00:32:13,490 --> 00:32:15,440 is that you're storing in this table. 606 00:32:15,440 --> 00:32:17,720 You can just jump to the row number in question. 607 00:32:17,720 --> 00:32:20,540 A relational database very often takes the same approach, 608 00:32:20,540 --> 00:32:23,690 using some piece of data, usually just an integer 1, 2, 3, 4, 609 00:32:23,690 --> 00:32:26,870 just like the spreadsheet programs, to uniquely identify 610 00:32:26,870 --> 00:32:29,450 the rows so that you can access them very 611 00:32:29,450 --> 00:32:33,020 quickly via that number or that index. 612 00:32:33,020 --> 00:32:36,350 A foreign key, we'll see, is a notion of a piece 613 00:32:36,350 --> 00:32:39,830 of data that exists in two separate tables-- 614 00:32:39,830 --> 00:32:42,276 two sheets where there's an interrelationship 615 00:32:42,276 --> 00:32:44,150 but more on that kind of example in a moment. 616 00:32:44,150 --> 00:32:48,620 And a unique key, a unique column, is one where 617 00:32:48,620 --> 00:32:50,660 you should not see any duplicates. 618 00:32:50,660 --> 00:32:53,870 So, for instance, maybe when building a website that has users 619 00:32:53,870 --> 00:32:56,930 register for your website, if you want to ensure 620 00:32:56,930 --> 00:33:01,850 that no user can have the same email address as another, 621 00:33:01,850 --> 00:33:04,220 you can specify to your database, hey, database, make 622 00:33:04,220 --> 00:33:08,330 sure that Mayland@Harvard.edu, or whatever the user's email address is, 623 00:33:08,330 --> 00:33:11,720 only appears once in a column in my database. 624 00:33:11,720 --> 00:33:18,660 Don't let David or not-David register with that same email address. 625 00:33:18,660 --> 00:33:21,320 And so this is a useful way to ensure that you 626 00:33:21,320 --> 00:33:25,400 have correct behavior of your website and you have integrity of your data 627 00:33:25,400 --> 00:33:28,520 so that you don't accidentally have duplicate values, which 628 00:33:28,520 --> 00:33:30,320 would lead potentially to ambiguity. 629 00:33:30,320 --> 00:33:34,800 And there's even more features you might get from a typical database. 630 00:33:34,800 --> 00:33:38,540 So let's indeed now try an example whereby we decide 631 00:33:38,540 --> 00:33:41,210 how best to store data in my database. 632 00:33:41,210 --> 00:33:44,560 But to simulate my database I'm going to quite simply just use Excel here. 633 00:33:44,560 --> 00:33:46,310 I could use Apple Numbers or Google Sheets 634 00:33:46,310 --> 00:33:48,860 or the like or any spreadsheet program, but at the end of the day 635 00:33:48,860 --> 00:33:51,890 I'm really just using this because it's a program with rows and columns. 636 00:33:51,890 --> 00:33:56,510 In reality, if I am a business owner and I have a web-based store 637 00:33:56,510 --> 00:34:00,341 and I sell widgets and sprockets on my store, 638 00:34:00,341 --> 00:34:02,090 the reality is I want to keep track of who 639 00:34:02,090 --> 00:34:05,270 has bought what so I know what my revenue is, 640 00:34:05,270 --> 00:34:08,000 so I know to whom I need to ship things, and so forth. 641 00:34:08,000 --> 00:34:10,430 And so I'm going to pretend to be the database 642 00:34:10,430 --> 00:34:14,510 here so that we can walk through an example where we design this database 643 00:34:14,510 --> 00:34:17,840 but realize that the actual data that's being inputted by the user 644 00:34:17,840 --> 00:34:23,030 into my website's front end, the HTML, JavaScript, and CSS user interface, 645 00:34:23,030 --> 00:34:26,840 is going to get sent to the server, as by an HTML form, 646 00:34:26,840 --> 00:34:31,699 where my back end language, whether it's Python or PHP or Java or Ruby 647 00:34:31,699 --> 00:34:33,710 or the like with some framework probably, 648 00:34:33,710 --> 00:34:37,080 is going to be ultimately storing it in a database. 649 00:34:37,080 --> 00:34:40,909 And that database in turn might be my SQL or MariaDB or Oracle or Postgres 650 00:34:40,909 --> 00:34:42,469 or something else. 651 00:34:42,469 --> 00:34:46,610 We're just going to focus on what any of those databases might-- 652 00:34:46,610 --> 00:34:50,630 how any of those databases might potentially store the information. 653 00:34:50,630 --> 00:34:52,844 So someone has just submitted a form on our website. 654 00:34:52,844 --> 00:34:55,219 They've given their credit card information and the like, 655 00:34:55,219 --> 00:34:58,250 and therefore it is time for my website to store 656 00:34:58,250 --> 00:34:59,630 this information in a database. 657 00:34:59,630 --> 00:35:00,730 What am I going to store? 658 00:35:00,730 --> 00:35:05,420 Well if they've bought a widget, I might type in widgets, quantity 1, 659 00:35:05,420 --> 00:35:09,530 and maybe it was Zamyla Chan who bought this widget, 660 00:35:09,530 --> 00:35:13,760 and she is at the CS Building at 33 Oxford Street. 661 00:35:13,760 --> 00:35:19,370 And that's in Cambridge, and that's in Massachusetts in 02138, USA. 662 00:35:19,370 --> 00:35:23,210 So here is some information therefore that I might store. 663 00:35:23,210 --> 00:35:25,190 This is good because I know to whom to ship it. 664 00:35:25,190 --> 00:35:27,500 I know how many widgets I have sold. 665 00:35:27,500 --> 00:35:30,350 And maybe the price should be in there as well. 666 00:35:30,350 --> 00:35:35,630 So she paid maybe $9.99 for this widget. 667 00:35:35,630 --> 00:35:37,430 All right, now, let's fast forward in time, 668 00:35:37,430 --> 00:35:39,846 and let's assume that someone else has visited my website, 669 00:35:39,846 --> 00:35:43,530 and they too have decided to buy a widget but multiple widgets. 670 00:35:43,530 --> 00:35:45,050 We upsold them. 671 00:35:45,050 --> 00:35:47,550 So a widget was bought, quantity 2. 672 00:35:47,550 --> 00:35:53,720 This is, say, Rob Bouden also in 33 Oxford Street, Cambridge, Mass, 02138, 673 00:35:53,720 --> 00:35:59,630 USA, and this one was a total of $19.98 since he bought two of them. 674 00:35:59,630 --> 00:36:03,200 So I've just been storing this data in sort of freeform format 675 00:36:03,200 --> 00:36:06,360 but each of these columns clearly has meaning. 676 00:36:06,360 --> 00:36:10,430 So maybe this first column should really be called Product. 677 00:36:10,430 --> 00:36:13,750 678 00:36:13,750 --> 00:36:16,020 This one should be called Quantity. 679 00:36:16,020 --> 00:36:18,090 This one could be called Name. 680 00:36:18,090 --> 00:36:19,890 This is maybe Street. 681 00:36:19,890 --> 00:36:21,360 This is maybe City. 682 00:36:21,360 --> 00:36:23,490 This is maybe State. 683 00:36:23,490 --> 00:36:27,260 This is Zip. 684 00:36:27,260 --> 00:36:29,030 This is maybe Country. 685 00:36:29,030 --> 00:36:30,920 And this is maybe total. 686 00:36:30,920 --> 00:36:34,670 But even here there are some opportunities for disagreement. 687 00:36:34,670 --> 00:36:38,400 This is a little US centric, the fact that we have cities and states, 688 00:36:38,400 --> 00:36:39,704 as well as zip codes. 689 00:36:39,704 --> 00:36:41,870 Indeed it might be the case that zip codes don't all 690 00:36:41,870 --> 00:36:44,900 follow the same format indeed even in the US sometimes people write them 691 00:36:44,900 --> 00:36:47,990 with five digits, sometimes with nine digits and a hyphen, 692 00:36:47,990 --> 00:36:49,910 so there's a design opportunity there. 693 00:36:49,910 --> 00:36:53,360 But let's drill in deeper as to what data type these various fields 694 00:36:53,360 --> 00:36:56,270 should be at least right now. 695 00:36:56,270 --> 00:37:00,050 Let me make room at the top here so we can just make notes 696 00:37:00,050 --> 00:37:02,060 as to the data types but in reality these 697 00:37:02,060 --> 00:37:06,900 would be stored not in the table itself but somewhere else in the database. 698 00:37:06,900 --> 00:37:09,230 What data type should product be? 699 00:37:09,230 --> 00:37:15,690 And remember that among our options are data types like these product. 700 00:37:15,690 --> 00:37:16,560 It's not a number. 701 00:37:16,560 --> 00:37:18,320 So we can knock off most of these options. 702 00:37:18,320 --> 00:37:22,340 It's definitely not a date, time, or timestamp, so then it boils down to 703 00:37:22,340 --> 00:37:24,320 is it a char or a varchar? 704 00:37:24,320 --> 00:37:27,960 705 00:37:27,960 --> 00:37:31,665 So char is a fixed-length field, so we have to decide in advance 706 00:37:31,665 --> 00:37:36,300 is it going to be eight characters, 16 characters, 100 characters or something 707 00:37:36,300 --> 00:37:37,330 else. 708 00:37:37,330 --> 00:37:40,360 Varchar would mean we just know the upper bound. 709 00:37:40,360 --> 00:37:42,180 So I don't know. 710 00:37:42,180 --> 00:37:48,750 W-i-d-g-e-t is 6, so minimally it's got to be six characters but then 711 00:37:48,750 --> 00:37:50,750 there's sprocket, s-p-r-o-c-k-e-t. 712 00:37:50,750 --> 00:37:53,710 713 00:37:53,710 --> 00:37:55,920 Not sure if I've ever had to spell that word. 714 00:37:55,920 --> 00:37:58,720 That's eight characters so six isn't going to cut it. 715 00:37:58,720 --> 00:38:02,070 So maybe we do something like char8. 716 00:38:02,070 --> 00:38:03,600 But there's a tradeoff here. 717 00:38:03,600 --> 00:38:10,500 If I specify now that this field is maximally going to be eight characters, 718 00:38:10,500 --> 00:38:15,540 then I can't sell anything with a longer name than sprocket. 719 00:38:15,540 --> 00:38:18,079 I could change the database size-- or the column size 720 00:38:18,079 --> 00:38:21,120 later on but it's ideal to get these things right from the get go and not 721 00:38:21,120 --> 00:38:24,390 have to go back in and change your infrastructure or hire someone 722 00:38:24,390 --> 00:38:26,340 to come in and make modifications. 723 00:38:26,340 --> 00:38:28,230 So maybe that's a little shortsighted. 724 00:38:28,230 --> 00:38:29,730 Maybe it shouldn't be eight. 725 00:38:29,730 --> 00:38:33,810 Maybe it should be twice that, like 16. 726 00:38:33,810 --> 00:38:34,949 I don't know, and I don't-- 727 00:38:34,949 --> 00:38:36,990 I'm not necessarily going to offer an answer here 728 00:38:36,990 --> 00:38:39,656 because it entirely depends on what data you're trying to store, 729 00:38:39,656 --> 00:38:41,180 what items you're trying to sell. 730 00:38:41,180 --> 00:38:44,100 This, in fact, now is even more wasteful because even though I'm now 731 00:38:44,100 --> 00:38:47,640 anticipating product names that are up to 16 characters, 732 00:38:47,640 --> 00:38:53,050 the char data type is going to use for every product name 16 characters, 733 00:38:53,050 --> 00:38:56,640 even if a whole bunch of those are blank because the word isn't long enough 734 00:38:56,640 --> 00:38:58,650 to need 16 characters. 735 00:38:58,650 --> 00:39:03,600 So maybe I would go for a variable length field not char but varchar 736 00:39:03,600 --> 00:39:08,280 whereby the maximum length of my column should be 16 characters. 737 00:39:08,280 --> 00:39:11,370 But here as in CS more generally tradeoff 738 00:39:11,370 --> 00:39:13,500 like it might seem like a win like OK wolf 739 00:39:13,500 --> 00:39:15,970 the problem is I'm using too much space all the time, 740 00:39:15,970 --> 00:39:17,710 Let me just put an upper bound. 741 00:39:17,710 --> 00:39:20,700 There's gotta be some price you pay there's got to be a tradeoff 742 00:39:20,700 --> 00:39:21,900 and indeed there is. 743 00:39:21,900 --> 00:39:24,510 It turns out that a database can generally 744 00:39:24,510 --> 00:39:28,140 search your data more quickly if it knows 745 00:39:28,140 --> 00:39:30,420 the entire column is the same width. 746 00:39:30,420 --> 00:39:34,462 Long story short, if it knows that this column has eight characters, eight 747 00:39:34,462 --> 00:39:36,420 characters, eight characters, eight characters, 748 00:39:36,420 --> 00:39:40,080 it can use very simple arithmetic to jump mathematically 749 00:39:40,080 --> 00:39:44,760 from one row in that column to another because they're all the same distance 750 00:39:44,760 --> 00:39:46,050 apart essentially. 751 00:39:46,050 --> 00:39:49,650 But if you have a varchar column and variable length, 752 00:39:49,650 --> 00:39:53,220 you can think of the column not as being perfectly smooth on both sides 753 00:39:53,220 --> 00:39:55,940 but kind of jagged on one side. 754 00:39:55,940 --> 00:39:57,030 Some rows are short. 755 00:39:57,030 --> 00:39:57,990 Some rows are long. 756 00:39:57,990 --> 00:40:00,960 And so you can't just blindly use simple arithmetic 757 00:40:00,960 --> 00:40:05,700 and jump eight characters at a time if the length were eight or 16 characters 758 00:40:05,700 --> 00:40:07,830 at a time if the length were 16. 759 00:40:07,830 --> 00:40:09,000 So it's a tradeoff. 760 00:40:09,000 --> 00:40:12,980 If we want to be able to search through our product names quickly, 761 00:40:12,980 --> 00:40:14,520 might not want to use a varchar. 762 00:40:14,520 --> 00:40:16,860 So here, too, no right answer. 763 00:40:16,860 --> 00:40:18,300 It's a tradeoff. 764 00:40:18,300 --> 00:40:20,210 And it might not matter for small datasets. 765 00:40:20,210 --> 00:40:21,210 Indeed probably doesn't. 766 00:40:21,210 --> 00:40:24,376 If you don't have many customers, you don't have many products but certainly 767 00:40:24,376 --> 00:40:26,720 and scale these kinds of things matter. 768 00:40:26,720 --> 00:40:30,060 And even when building something that's not going to have that many users, 769 00:40:30,060 --> 00:40:32,860 just getting it right doesn't cost that much upfront time. 770 00:40:32,860 --> 00:40:36,360 The most important thing, I dare say, is to actually give it some thought 771 00:40:36,360 --> 00:40:39,780 and not just leave it to chance or take the easiest way out because invariably 772 00:40:39,780 --> 00:40:42,960 over time, you will build up so-called technical debt, 773 00:40:42,960 --> 00:40:45,810 where you make poor decision, poor decision, poor decision, and now 774 00:40:45,810 --> 00:40:49,770 you have a very expensive decision later on if you have to go back and change 775 00:40:49,770 --> 00:40:51,540 a lot of those things. 776 00:40:51,540 --> 00:40:52,800 What about quantity? 777 00:40:52,800 --> 00:40:56,580 Well, quantity, nicely enough, would seem to fall more cleanly 778 00:40:56,580 --> 00:40:58,080 into one of these fields. 779 00:40:58,080 --> 00:40:59,250 Now, is that an integer? 780 00:40:59,250 --> 00:41:00,270 Is that a big integer? 781 00:41:00,270 --> 00:41:05,370 I think we're doing pretty well if we need more than 2 billion products sold. 782 00:41:05,370 --> 00:41:07,380 Maybe we're sort of a smaller shop, and we 783 00:41:07,380 --> 00:41:10,145 can get away with an integer or even a small int, 784 00:41:10,145 --> 00:41:11,520 but this too would be a tradeoff. 785 00:41:11,520 --> 00:41:12,970 How many bits do you want to spend? 786 00:41:12,970 --> 00:41:16,136 I would say that the default typically would be an integer unless you really 787 00:41:16,136 --> 00:41:20,020 are expecting a huge amount of data, billions of rows. 788 00:41:20,020 --> 00:41:23,805 So we might say something like integer here, name. 789 00:41:23,805 --> 00:41:25,800 Oh, gosh, this is that can of worms again. 790 00:41:25,800 --> 00:41:27,270 How long is a maximum name? 791 00:41:27,270 --> 00:41:31,290 Maybe I do some googling and some due diligence as the maximum length names. 792 00:41:31,290 --> 00:41:32,769 Maybe I just want to cut it off. 793 00:41:32,769 --> 00:41:35,310 You've probably been to a website before where you're happily 794 00:41:35,310 --> 00:41:39,810 entering your information, and then you keep typing and nothing is happening. 795 00:41:39,810 --> 00:41:42,870 And that's because the programmer, or the database designer, 796 00:41:42,870 --> 00:41:45,315 has decided your name doesn't need to be that long. 797 00:41:45,315 --> 00:41:47,190 Or your address doesn't need to be that long. 798 00:41:47,190 --> 00:41:49,260 And it's infuriating sometimes because there 799 00:41:49,260 --> 00:41:52,980 are assumptions, naive, insensitive assumptions sometimes, 800 00:41:52,980 --> 00:41:57,420 but that boiled down to perhaps either calculated design decisions or maybe 801 00:41:57,420 --> 00:42:00,180 just poor design decisions. 802 00:42:00,180 --> 00:42:01,890 So I don't know what is right here. 803 00:42:01,890 --> 00:42:06,180 16 feels a little too conservative, so maybe I 804 00:42:06,180 --> 00:42:08,890 would say something like varchar 128. 805 00:42:08,890 --> 00:42:11,640 But even that I'd probably want to take a look at my customer base 806 00:42:11,640 --> 00:42:15,360 and see if that's well beyond the limit of what I might actually need. 807 00:42:15,360 --> 00:42:16,380 Same thing for street. 808 00:42:16,380 --> 00:42:17,340 Same thing for city. 809 00:42:17,340 --> 00:42:19,048 I don't really know what the right length 810 00:42:19,048 --> 00:42:23,380 is, but let's assume it's going to be varchars for those. 811 00:42:23,380 --> 00:42:28,070 So we'll just use a dot, dot, dot to suggest that it's an open question. 812 00:42:28,070 --> 00:42:29,890 State is an interesting one. 813 00:42:29,890 --> 00:42:35,080 If we expect to have only US customers, we can do a little optimization here. 814 00:42:35,080 --> 00:42:38,630 If every US state has a two character abbreviation, 815 00:42:38,630 --> 00:42:42,730 we could do char2 so that we get that performance 816 00:42:42,730 --> 00:42:47,530 benefit of knowing that every row is the same width, two characters, 817 00:42:47,530 --> 00:42:50,920 so long as we're comfortable not selling products to anyone else in the world 818 00:42:50,920 --> 00:42:54,760 beyond the United States zip code 2. 819 00:42:54,760 --> 00:42:56,480 Design opportunity there. 820 00:42:56,480 --> 00:43:01,810 I think it's fair to say that integer, while seemingly correct, 821 00:43:01,810 --> 00:43:04,870 might get you into some trouble, at least here in Massachusetts, 822 00:43:04,870 --> 00:43:09,790 where we have a whole bunch of zip codes that start with zero. 823 00:43:09,790 --> 00:43:11,860 Like in the world of numbers and integers, 824 00:43:11,860 --> 00:43:13,970 leading zeros are meaningless. 825 00:43:13,970 --> 00:43:16,240 You can have as many zeros to the left of your number 826 00:43:16,240 --> 00:43:20,090 and they don't change the actual value of your number. 827 00:43:20,090 --> 00:43:22,700 But in a zip code, it does have meaning. 828 00:43:22,700 --> 00:43:25,150 It is the first of five digits here, and so 829 00:43:25,150 --> 00:43:29,290 calling this an integer probably isn't very wise because if the database is 830 00:43:29,290 --> 00:43:30,550 like most humans. 831 00:43:30,550 --> 00:43:32,780 The database might ignore that first digit, 832 00:43:32,780 --> 00:43:38,230 and so my zip code is going to appear to be 2138, which really isn't right. 833 00:43:38,230 --> 00:43:39,559 Now, we could fix that in code. 834 00:43:39,559 --> 00:43:42,100 We can make sure that, well, if we ever see a zip code that's 835 00:43:42,100 --> 00:43:45,935 only four or fewer digits, this let's pre-pin some zeros, that feels messy. 836 00:43:45,935 --> 00:43:48,310 If we're going to put that data in there from the get go, 837 00:43:48,310 --> 00:43:51,400 let's make sure it comes back to us correctly. 838 00:43:51,400 --> 00:43:54,020 And so I might actually say something here. 839 00:43:54,020 --> 00:43:56,440 Even though it looks like a number, maybe I 840 00:43:56,440 --> 00:43:59,380 would actually say it's a char5 field, or maybe it's 841 00:43:59,380 --> 00:44:03,850 nine or 10 if I want to have a hyphen in there for US zip codes. 842 00:44:03,850 --> 00:44:05,230 Country, too. 843 00:44:05,230 --> 00:44:10,810 Here maybe it's-- going to be a three-character abbreviation 844 00:44:10,810 --> 00:44:13,930 of two-character abbreviation, not sure what's best there. 845 00:44:13,930 --> 00:44:17,290 846 00:44:17,290 --> 00:44:19,210 Really depends, too, on what countries want 847 00:44:19,210 --> 00:44:23,560 to sell to if not just the US perhaps, so there's a design opportunity there. 848 00:44:23,560 --> 00:44:26,680 And then perhaps the last to consider is this total. 849 00:44:26,680 --> 00:44:30,280 I think it's fair to say that integer would not be correct 850 00:44:30,280 --> 00:44:33,430 because we would either be rounding down or rounding up 851 00:44:33,430 --> 00:44:36,560 all of the money we're supposed to be collecting from our customers. 852 00:44:36,560 --> 00:44:38,880 So we probably want one of these. 853 00:44:38,880 --> 00:44:43,637 And some databases differ, but generally a data type like Decimal is ideal. 854 00:44:43,637 --> 00:44:45,970 You don't want to even get into the business of worrying 855 00:44:45,970 --> 00:44:49,000 about these rounding errors or errors of imprecision 856 00:44:49,000 --> 00:44:51,280 as in Superman 3 and Office Space. 857 00:44:51,280 --> 00:44:54,677 Much better to just say that you want a fixed number of digits 858 00:44:54,677 --> 00:44:57,760 to the left and a fixed number of digits to the right of the decimal place 859 00:44:57,760 --> 00:45:01,000 so that you are not losing even fractions of pennies 860 00:45:01,000 --> 00:45:04,300 or mischarging anyone or losing out in any way. 861 00:45:04,300 --> 00:45:06,130 So we might use Decimal in that way. 862 00:45:06,130 --> 00:45:08,350 Some databases, though, have an actual currency data 863 00:45:08,350 --> 00:45:12,070 type, which operates similarly. 864 00:45:12,070 --> 00:45:20,860 So there remains to be seen some opportunities for improvement. 865 00:45:20,860 --> 00:45:24,730 If I continue to sell widgets, let alone sprockets, 866 00:45:24,730 --> 00:45:28,090 I'm going to have more and more and more rows in this table. 867 00:45:28,090 --> 00:45:31,140 And if Rob and Zamyla end up being repeat customers, 868 00:45:31,140 --> 00:45:36,240 I might have more and more Robs and more and more Zamylas in the same table. 869 00:45:36,240 --> 00:45:41,770 And as that happens, there begins to be quite a bit of redundancy. 870 00:45:41,770 --> 00:45:45,710 Indeed, what can you factor out over time? 871 00:45:45,710 --> 00:45:49,780 Well, certainly if Zamyla and Rob keep ordering more and more items 872 00:45:49,780 --> 00:45:55,076 from my database, I could just keep updating the quantity, 873 00:45:55,076 --> 00:45:56,325 but that feels a little messy. 874 00:45:56,325 --> 00:45:59,980 It'd be nice to have a veritable history of all of my sales. 875 00:45:59,980 --> 00:46:01,990 I don't want to just aggregate everything. 876 00:46:01,990 --> 00:46:06,970 So adding more and more rows for every sale seems pretty compelling, 877 00:46:06,970 --> 00:46:09,790 but then I'm going to see Zamyla Chan and Rob 878 00:46:09,790 --> 00:46:15,010 Bouden again and again and again and again and again in this table. 879 00:46:15,010 --> 00:46:18,940 And I'm also going to see their address again and again and again and again. 880 00:46:18,940 --> 00:46:24,520 And herein lies now the capabilities and of the feature 881 00:46:24,520 --> 00:46:25,810 of a relational database. 882 00:46:25,810 --> 00:46:29,080 You know what I'm going to do rather than just treat this as my one 883 00:46:29,080 --> 00:46:32,380 and only table, let me go ahead and just rename this sheet 884 00:46:32,380 --> 00:46:34,960 or worksheet to be Orders. 885 00:46:34,960 --> 00:46:36,590 I could call it anything I want. 886 00:46:36,590 --> 00:46:37,340 And you know what? 887 00:46:37,340 --> 00:46:42,530 Let me create another table or sheet, and let me call this Customers. 888 00:46:42,530 --> 00:46:44,680 So even though, again, I'm using Excel here, 889 00:46:44,680 --> 00:46:49,630 this is just like I might be doing in Oracle or a SQL Server 890 00:46:49,630 --> 00:46:53,500 or in Postgres or mySQL or the like, I've just created a second table. 891 00:46:53,500 --> 00:46:55,630 But as per the name relational database, there's 892 00:46:55,630 --> 00:46:59,340 going to be a relation across these two tables now. 893 00:46:59,340 --> 00:47:00,970 And what's that relation going to be? 894 00:47:00,970 --> 00:47:01,803 Well, you know what? 895 00:47:01,803 --> 00:47:06,760 I'm going to go ahead and copy all of this customer data 896 00:47:06,760 --> 00:47:12,790 and actually cut it and paste it over into this new table called Customers. 897 00:47:12,790 --> 00:47:15,010 And now this isn't quite sufficient. 898 00:47:15,010 --> 00:47:18,520 I'm going to go ahead and notice that Excel has already 899 00:47:18,520 --> 00:47:20,230 numbered these things for me. 900 00:47:20,230 --> 00:47:23,710 But I'm going to go ahead just for clarity and add my own column, 901 00:47:23,710 --> 00:47:25,992 and I'm going to call this ID. 902 00:47:25,992 --> 00:47:27,575 And it's going to be, say, an integer. 903 00:47:27,575 --> 00:47:29,430 904 00:47:29,430 --> 00:47:34,230 And I'm going to cause Zamyla my first customer, Rob my second customer, 905 00:47:34,230 --> 00:47:38,350 and in this case, notice now these unique identifiers are part of my data. 906 00:47:38,350 --> 00:47:41,940 It's not just part of Excel's arbitrary numbering on the left and arbitrary 907 00:47:41,940 --> 00:47:43,530 lettering on the top. 908 00:47:43,530 --> 00:47:47,370 Rather these are now actual pieces of data in my database 909 00:47:47,370 --> 00:47:50,430 that will be stored and backed up and so forth. 910 00:47:50,430 --> 00:47:54,150 But notice now that Zamyla is customer number 1 911 00:47:54,150 --> 00:47:59,130 and Rob is customer number 2, each of whom lives at these addresses, 912 00:47:59,130 --> 00:48:03,690 I don't have to worry now about redundantly storing that data because 913 00:48:03,690 --> 00:48:07,380 in my orders table now, any time Rob or Zamyla 914 00:48:07,380 --> 00:48:10,740 or some other customer purchase from my website-- 915 00:48:10,740 --> 00:48:13,080 notice I can shrink this. 916 00:48:13,080 --> 00:48:15,810 And I can say, you know what? 917 00:48:15,810 --> 00:48:19,470 This is the customer who bought this. 918 00:48:19,470 --> 00:48:21,510 It's going to be an integer. 919 00:48:21,510 --> 00:48:23,430 And you know who bought that first widget? 920 00:48:23,430 --> 00:48:24,720 Well, it was Zamyla. 921 00:48:24,720 --> 00:48:27,600 And you know who bought that second widget and the third widget, 922 00:48:27,600 --> 00:48:30,060 too, since quantity was 2 was Rob. 923 00:48:30,060 --> 00:48:33,100 And if some new customer comes into my database-- 924 00:48:33,100 --> 00:48:36,690 so, for instance, suppose that someone new orders from my website, 925 00:48:36,690 --> 00:48:38,820 they are going to become customer number 3. 926 00:48:38,820 --> 00:48:41,010 That will be, for instance, Doug Lloyd, and suppose 927 00:48:41,010 --> 00:48:45,360 he, too, is at that same address at that same zip code in the USA. 928 00:48:45,360 --> 00:48:49,080 But now in my orders table, suppose that Doug has bought 10 widgets. 929 00:48:49,080 --> 00:48:50,430 He really went all in. 930 00:48:50,430 --> 00:48:54,600 Well, he, too, is going to have widget there, quantity 10, 931 00:48:54,600 --> 00:49:04,260 his customer ID is 3, and he, of course, is going to have spent $99.90 with us 932 00:49:04,260 --> 00:49:05,220 in total. 933 00:49:05,220 --> 00:49:08,460 So notice how we've factored out the common information 934 00:49:08,460 --> 00:49:10,080 to eliminate a redundancy. 935 00:49:10,080 --> 00:49:16,159 Notice now that if Doug or Rob or Zamyla move addresses or change their address, 936 00:49:16,159 --> 00:49:19,200 or if we were storing more information, like their phone number and email 937 00:49:19,200 --> 00:49:23,470 address and other personal data, too, we could change it in just one place. 938 00:49:23,470 --> 00:49:26,970 And not in our orders table because, indeed, there's now distinct semantics. 939 00:49:26,970 --> 00:49:29,010 Our orders table stores orders. 940 00:49:29,010 --> 00:49:31,750 Our customers table stores customers. 941 00:49:31,750 --> 00:49:34,530 And if we wanted yet another table, as we probably should have, 942 00:49:34,530 --> 00:49:36,630 it could store, say, products. 943 00:49:36,630 --> 00:49:38,520 In fact, there's still this redundancy. 944 00:49:38,520 --> 00:49:41,130 Let's go ahead and create another table called Products inside 945 00:49:41,130 --> 00:49:44,370 of which is an ID field as well as a product field, 946 00:49:44,370 --> 00:49:47,760 and then, just as before, let's start numbering our IDs from 1. 947 00:49:47,760 --> 00:49:51,600 So our first product is a widget and while we've not sold any yet, 948 00:49:51,600 --> 00:49:55,230 our second product, ID 2, is a sprocket. 949 00:49:55,230 --> 00:49:59,020 Now, in this way in my orders table, can I store not a product per se, 950 00:49:59,020 --> 00:50:00,720 but a product ID. 951 00:50:00,720 --> 00:50:04,260 And so now even though my table is frankly 952 00:50:04,260 --> 00:50:06,750 becoming more and more cryptic and a little harder 953 00:50:06,750 --> 00:50:08,070 for me to wrap my mind around-- 954 00:50:08,070 --> 00:50:10,200 what am I looking at, it's all just numbers, 955 00:50:10,200 --> 00:50:13,560 it is now what we would call normalized in the context of a database. 956 00:50:13,560 --> 00:50:17,160 And a database typically is not meant to be looked at by human eyes just 957 00:50:17,160 --> 00:50:17,700 like this. 958 00:50:17,700 --> 00:50:21,625 Rather it's meant to be queried and data created and updated and deleted. 959 00:50:21,625 --> 00:50:24,000 And so there are certain commands in this language called 960 00:50:24,000 --> 00:50:27,420 SQL that actually facilitate programmatically, 961 00:50:27,420 --> 00:50:31,890 using a programming language, what I've been doing with my keyboard and fingers 962 00:50:31,890 --> 00:50:32,640 alone. 963 00:50:32,640 --> 00:50:35,190 Indeed, the commands with which you can manipulate the data 964 00:50:35,190 --> 00:50:38,580 in the database itself is going to be SQL's commands, 965 00:50:38,580 --> 00:50:41,740 create, select, update, delete, and others. 966 00:50:41,740 --> 00:50:44,490 Indeed, much like Scratch has the various puzzle pieces 967 00:50:44,490 --> 00:50:47,280 via which you can implement logic in a Scratch-based program, 968 00:50:47,280 --> 00:50:50,400 so does SQL will have these puzzle pieces, if you will, 969 00:50:50,400 --> 00:50:54,180 via which you can create and select and update and delete data 970 00:50:54,180 --> 00:50:57,780 from your database just like I've been simulating by using Excel here 971 00:50:57,780 --> 00:50:58,950 and my keyboard. 972 00:50:58,950 --> 00:51:02,070 And indeed, some of these more sophisticated concepts, 973 00:51:02,070 --> 00:51:05,220 like primary key and foreign key and unique key, 974 00:51:05,220 --> 00:51:08,640 now rather start to jump out at us because if we consider what my orders 975 00:51:08,640 --> 00:51:14,470 table now looks like, notice that it's indeed mostly numbers, 976 00:51:14,470 --> 00:51:18,550 but those numbers are essentially keys into another table. 977 00:51:18,550 --> 00:51:21,660 In fact, if you look at products, my products table 978 00:51:21,660 --> 00:51:25,510 has an ID column, which has unique numbers 1 2, 979 00:51:25,510 --> 00:51:29,550 and so forth my customers table has its own ID column. 980 00:51:29,550 --> 00:51:31,980 And these are same numbers, but different meaning. 981 00:51:31,980 --> 00:51:36,400 These are customer numbers 1, 2, 3, and so forth. 982 00:51:36,400 --> 00:51:40,290 So in each of these tables customers and in products 983 00:51:40,290 --> 00:51:46,230 is that ID column a primary key for the customers and products table 984 00:51:46,230 --> 00:51:47,160 respectively. 985 00:51:47,160 --> 00:51:50,460 Within each of those tables, it is that ID column 986 00:51:50,460 --> 00:51:53,530 that uniquely identifies rows. 987 00:51:53,530 --> 00:51:57,030 Zamyla is and shall always be customer number one. 988 00:51:57,030 --> 00:51:59,640 Rob is and shall always be customer number 2. 989 00:51:59,640 --> 00:52:02,050 Doug is and shall always be customer number 3. 990 00:52:02,050 --> 00:52:06,310 So those IDs those primary keys must not change. 991 00:52:06,310 --> 00:52:08,460 They must be invariant, and as such they can 992 00:52:08,460 --> 00:52:11,700 be reliably used to uniquely identify customers 993 00:52:11,700 --> 00:52:14,970 or, in the context of products, uniquely identify 994 00:52:14,970 --> 00:52:18,030 a product or, in the case of orders-- 995 00:52:18,030 --> 00:52:19,410 we forgot something. 996 00:52:19,410 --> 00:52:22,260 It would seem valuable if we continue this train of thought 997 00:52:22,260 --> 00:52:30,610 to also have here in my orders table an order ID that should probably represent 998 00:52:30,610 --> 00:52:35,110 each of these orders, which is just going to similarly be an integer that 999 00:52:35,110 --> 00:52:38,650 just keeps track really of how many total orders have been placed, 1000 00:52:38,650 --> 00:52:45,040 1, 2, 3, 4, 5, 6 on up, all the way up to 2 billion or best yet even higher 1001 00:52:45,040 --> 00:52:45,830 than that. 1002 00:52:45,830 --> 00:52:47,560 But notice these other numbers now. 1003 00:52:47,560 --> 00:52:51,610 The product column is no longer the name widget or sprocket. 1004 00:52:51,610 --> 00:52:53,770 The quantity column, still just an integer. 1005 00:52:53,770 --> 00:52:56,860 That's not anything to do with a key even though it's also an integer 1006 00:52:56,860 --> 00:52:59,440 but customer is an ID. 1007 00:52:59,440 --> 00:53:03,280 But it's not a primary key, nor is product a primary key here. 1008 00:53:03,280 --> 00:53:07,450 In this context of my orders table is product 1009 00:53:07,450 --> 00:53:11,710 and is customer a foreign key because those two columns are 1010 00:53:11,710 --> 00:53:14,870 primary keys in two other tables. 1011 00:53:14,870 --> 00:53:17,710 So within one table if you have an ID, it 1012 00:53:17,710 --> 00:53:19,750 should be generally considered your primary key 1013 00:53:19,750 --> 00:53:22,540 if that is the role it's playing, uniquely identifying your rows. 1014 00:53:22,540 --> 00:53:25,660 But if that same number appears in some other table 1015 00:53:25,660 --> 00:53:30,370 for the purpose of cross-referencing really, is it a foreign key? 1016 00:53:30,370 --> 00:53:34,480 And suffice it to say that in SQL, this database language, 1017 00:53:34,480 --> 00:53:38,230 even though this looks cryptic to us humans, realize that with SQL 1018 00:53:38,230 --> 00:53:42,790 can you stitch these distinct tables or sheets back together. 1019 00:53:42,790 --> 00:53:46,690 You can quote unquote join SQL tables in such a way 1020 00:53:46,690 --> 00:53:49,780 that you can take your customers table and your orders table 1021 00:53:49,780 --> 00:53:54,030 and reassemble them so that you see next to each order, 1022 00:53:54,030 --> 00:53:56,290 say, on your administrative web page that 1023 00:53:56,290 --> 00:53:58,450 allows you to see all of your recent orders, 1024 00:53:58,450 --> 00:54:01,947 not the customer IDs of who has bought what but actually 1025 00:54:01,947 --> 00:54:04,780 the customer names and their addresses and maybe their phone numbers 1026 00:54:04,780 --> 00:54:05,937 and e-mails and more. 1027 00:54:05,937 --> 00:54:08,020 You can join this information back together again. 1028 00:54:08,020 --> 00:54:12,400 And what databases are good at is doing exactly that kind of joining, 1029 00:54:12,400 --> 00:54:15,620 not to mention searching or more. 1030 00:54:15,620 --> 00:54:19,300 But sheesh, this was a lot of work just to get to this point right? 1031 00:54:19,300 --> 00:54:23,140 It was pretty easy to make one worksheet just put all of my orders in there. 1032 00:54:23,140 --> 00:54:26,751 But then we went down this slope of oh, well, maybe we should factor this out. 1033 00:54:26,751 --> 00:54:27,250 Oh, wait. 1034 00:54:27,250 --> 00:54:28,208 We can factor this out. 1035 00:54:28,208 --> 00:54:29,860 Oh, maybe we should add some IDs here. 1036 00:54:29,860 --> 00:54:32,540 We just created a whole lot of work for ourselves. 1037 00:54:32,540 --> 00:54:36,340 Now, I dare say it will pay off over the long run, 1038 00:54:36,340 --> 00:54:39,130 and indeed our database will be much better designed 1039 00:54:39,130 --> 00:54:42,370 where better design will lead to faster performance, less 1040 00:54:42,370 --> 00:54:46,360 redundant storage of data, and more, but it certainly took a lot of work. 1041 00:54:46,360 --> 00:54:49,090 So it turns out there is the opposite of a SQL database 1042 00:54:49,090 --> 00:54:53,020 that's been in vogue for some time called a noSQL database, 1043 00:54:53,020 --> 00:54:56,710 or a document store, an object-oriented database where 1044 00:54:56,710 --> 00:55:00,340 the defining characteristic really is that it is not SQL. 1045 00:55:00,340 --> 00:55:03,100 It does not store data in rows and columns. 1046 00:55:03,100 --> 00:55:06,880 It does not store data in one or more tables that can then be joined. 1047 00:55:06,880 --> 00:55:10,810 Rather it stores all of your data really all together 1048 00:55:10,810 --> 00:55:12,340 in a hierarchical structure. 1049 00:55:12,340 --> 00:55:15,410 And that's an oversimplification because there are other features. 1050 00:55:15,410 --> 00:55:17,500 But consider this example here. 1051 00:55:17,500 --> 00:55:21,460 This is written in essentially a format that's called JSON, JavaScript Object 1052 00:55:21,460 --> 00:55:23,920 Notation, but this idea of a noSQL database 1053 00:55:23,920 --> 00:55:27,490 has no fundamental connection to JavaScript the language. 1054 00:55:27,490 --> 00:55:31,660 Just so happens this tends to be the language with which these data 1055 00:55:31,660 --> 00:55:33,430 structures are represented. 1056 00:55:33,430 --> 00:55:37,060 The curly brace here and here just means here is an object of information. 1057 00:55:37,060 --> 00:55:39,610 The quotes are just used around words and numbers, 1058 00:55:39,610 --> 00:55:43,440 and the colon separate keys from values where keys and values is 1059 00:55:43,440 --> 00:55:46,270 a very common paradigm where on the left is metadata 1060 00:55:46,270 --> 00:55:50,170 and on the right is data typically, key and value respectively. 1061 00:55:50,170 --> 00:55:52,060 Square brackets just mean an array, which 1062 00:55:52,060 --> 00:55:55,020 means that this is an array or a list of two values, 1063 00:55:55,020 --> 00:55:59,590 something comma something, which happens to be GPS coordinates, latitude 1064 00:55:59,590 --> 00:56:00,820 and longitude here. 1065 00:56:00,820 --> 00:56:01,690 So what is this? 1066 00:56:01,690 --> 00:56:02,860 What are we looking at? 1067 00:56:02,860 --> 00:56:05,680 This appears to be an object, shall we say, 1068 00:56:05,680 --> 00:56:09,220 that represents the city of Austin where Harvard's business 1069 00:56:09,220 --> 00:56:11,980 school is, where Harvard's engineering school will soon be. 1070 00:56:11,980 --> 00:56:16,030 And so this object contains a bit of hierarchical information, 1071 00:56:16,030 --> 00:56:18,760 not a huge amount, but notice it has an ID, 1072 00:56:18,760 --> 00:56:22,330 which happens to be its zip code 02134. 1073 00:56:22,330 --> 00:56:24,490 It has a city name, Austin. 1074 00:56:24,490 --> 00:56:29,770 Has a location which by convention is a comma-separated list of two values, 1075 00:56:29,770 --> 00:56:33,470 latitude and longitude, and so that's kind of some hierarchy. 1076 00:56:33,470 --> 00:56:35,260 It's not just a simple value. 1077 00:56:35,260 --> 00:56:40,750 And then there's a population of 23,775 at last count though surely to rise. 1078 00:56:40,750 --> 00:56:43,100 And then in the state of Massachusetts. 1079 00:56:43,100 --> 00:56:45,790 So this is actually a snippet from a database called 1080 00:56:45,790 --> 00:56:50,830 MongoDB, which is a very popular noSQL database that stores data essentially 1081 00:56:50,830 --> 00:56:51,740 like this. 1082 00:56:51,740 --> 00:56:55,060 So rather than flatten all of your data as is 1083 00:56:55,060 --> 00:56:57,580 the case in a relational database using SQL 1084 00:56:57,580 --> 00:57:01,720 an object-oriented database or a document store like this noSQL database 1085 00:57:01,720 --> 00:57:06,070 called MongoDB, really stores things as key value pairs. 1086 00:57:06,070 --> 00:57:10,280 And those key value pairs might actually have some hierarchical structure. 1087 00:57:10,280 --> 00:57:14,800 So if you, for instance, stored an order like we just did, 1088 00:57:14,800 --> 00:57:16,750 instead of storing it in rows and columns, 1089 00:57:16,750 --> 00:57:21,580 you would just store it is one big chunk of information like this. 1090 00:57:21,580 --> 00:57:25,390 And inside of that object, an order object 1091 00:57:25,390 --> 00:57:27,730 might actually be the entire customer. 1092 00:57:27,730 --> 00:57:35,400 Inside of that customer might be his or her city and state and so forth. 1093 00:57:35,400 --> 00:57:41,000 So there might actually be retained in some hierarchy like you see here. 1094 00:57:41,000 --> 00:57:43,850 And so this is just a different way of viewing the world. 1095 00:57:43,850 --> 00:57:47,892 It has typically been a more efficient way of viewing 1096 00:57:47,892 --> 00:57:49,600 and modeling your world because you don't 1097 00:57:49,600 --> 00:57:53,350 have to give frankly as much thought to the design and the division of some 1098 00:57:53,350 --> 00:57:55,460 of your data and the normalization thereof, 1099 00:57:55,460 --> 00:57:58,060 but you do sometimes pay a performance penalty. 1100 00:57:58,060 --> 00:58:01,330 You do sometimes pay a penalty and redundancy of data, 1101 00:58:01,330 --> 00:58:05,680 though there are ways to avoid that by reusing something like that ID field. 1102 00:58:05,680 --> 00:58:08,800 So it really is ultimately a different philosophy right now. 1103 00:58:08,800 --> 00:58:11,920 And its a competing alternative to something like a relational database, 1104 00:58:11,920 --> 00:58:13,900 and here, too, will there be an opportunity 1105 00:58:13,900 --> 00:58:16,810 to read up on and to debate exactly what is 1106 00:58:16,810 --> 00:58:20,110 best for your actual problem at hand. 1107 00:58:20,110 --> 00:58:21,010 And now mobile. 1108 00:58:21,010 --> 00:58:23,170 Up until now we focused on the front end, 1109 00:58:23,170 --> 00:58:26,740 on the back end of really web-based applications 1110 00:58:26,740 --> 00:58:30,590 that you might access on a laptop or desktop or even a mobile device. 1111 00:58:30,590 --> 00:58:33,610 But what we haven't given thought to is the design opportunities 1112 00:58:33,610 --> 00:58:35,890 for mobile devices specifically. 1113 00:58:35,890 --> 00:58:39,630 Indeed, most any of you who have a smartphone these days, 1114 00:58:39,630 --> 00:58:41,680 iPhone, Android, or the like, have probably 1115 00:58:41,680 --> 00:58:45,580 downloaded some application that did not come with your phone. 1116 00:58:45,580 --> 00:58:49,270 And you downloaded that from the Google Play Store or the Apple App Store 1117 00:58:49,270 --> 00:58:54,950 and that software is quite likely written in a very specific language. 1118 00:58:54,950 --> 00:58:56,980 Indeed, the language for Android is typically 1119 00:58:56,980 --> 00:58:58,900 Java in which programs are written. 1120 00:58:58,900 --> 00:59:02,630 The languages in which iPhone and iPad applications are written 1121 00:59:02,630 --> 00:59:07,030 is Objective-C or more recently Swift, and so there, too, 1122 00:59:07,030 --> 00:59:09,070 at least in the world of iPhones and iPads, 1123 00:59:09,070 --> 00:59:11,584 do you have design discretion over what language 1124 00:59:11,584 --> 00:59:13,750 you use with Swift being the more modern and the one 1125 00:59:13,750 --> 00:59:15,400 that Apple's really been pushing. 1126 00:59:15,400 --> 00:59:21,180 But even then, do you have the option to not implement a native application per 1127 00:59:21,180 --> 00:59:26,280 se, one that is implemented in Java or in Objective-C or Swift, all of which 1128 00:59:26,280 --> 00:59:28,710 are programming languages, you can actually 1129 00:59:28,710 --> 00:59:33,240 implement a web-based application but package it up 1130 00:59:33,240 --> 00:59:37,230 in a way that makes it seem like it's a native application, 1131 00:59:37,230 --> 00:59:41,160 allows you to distribute it via the App Store, via the Google Play Store, 1132 00:59:41,160 --> 00:59:45,900 so that it ends up putting an icon on your customers phones. 1133 00:59:45,900 --> 00:59:50,430 But when they click it, they're not seeing an iPhone application per se 1134 00:59:50,430 --> 00:59:52,510 or an Android application per se. 1135 00:59:52,510 --> 00:59:55,890 They are seeing really a secretly embedded web 1136 00:59:55,890 --> 00:59:58,920 browser whereby your application is implemented 1137 00:59:58,920 --> 01:00:02,490 at the end of the day in JavaScript and HTML and CSS, 1138 01:00:02,490 --> 01:00:05,250 but it's got a nice little rectangular window around it, 1139 01:00:05,250 --> 01:00:09,420 so the users don't realize that they're looking at Safari or Chrome 1140 01:00:09,420 --> 01:00:13,200 because all of the menus of those browsers have been stripped away. 1141 01:00:13,200 --> 01:00:15,600 All you get is an embedded web browser. 1142 01:00:15,600 --> 01:00:19,860 And so here, do you have an opportunity to choose among these options. 1143 01:00:19,860 --> 01:00:22,620 And so one of the design decisions one makes 1144 01:00:22,620 --> 01:00:28,260 when designing for a mobile user base is do we develop an iAndroid application? 1145 01:00:28,260 --> 01:00:31,260 Do we develop an iPhone or iPad application? 1146 01:00:31,260 --> 01:00:34,560 Do we do both environments still? 1147 01:00:34,560 --> 01:00:36,090 And how do you choose among those? 1148 01:00:36,090 --> 01:00:38,460 Well, it certainly depends on your demographic. 1149 01:00:38,460 --> 01:00:42,030 Android is by far the most popular mobile operating system these days. 1150 01:00:42,030 --> 01:00:45,060 But in certain contexts, a campus like this, iPhones 1151 01:00:45,060 --> 01:00:47,040 are actually even more popular. 1152 01:00:47,040 --> 01:00:51,550 So do you want to cater to one demographic or another or ideally both. 1153 01:00:51,550 --> 01:00:55,770 Both is probably your instinctive answer, but that comes with a tradeoff. 1154 01:00:55,770 --> 01:00:57,960 That certainly comes with a price. 1155 01:00:57,960 --> 01:01:01,556 If you want to ship some new and improved tool that you 1156 01:01:01,556 --> 01:01:03,430 want to make available to the world or a game 1157 01:01:03,430 --> 01:01:05,471 or any other piece of mobile software, well, it'd 1158 01:01:05,471 --> 01:01:09,120 be nice to have it available to all users with smartphones. 1159 01:01:09,120 --> 01:01:11,670 But then you're going to have to know how to program in Java. 1160 01:01:11,670 --> 01:01:14,503 You're going to have to know how to program in Swift or Objective-C. 1161 01:01:14,503 --> 01:01:18,720 Or you're going to have to know how to take this hybrid approach of developing 1162 01:01:18,720 --> 01:01:24,240 it using JavaScript and HTML and CSS, but there, too, there's a tradeoff. 1163 01:01:24,240 --> 01:01:27,540 You tend to get very good performance out of Android applications 1164 01:01:27,540 --> 01:01:31,411 that are natively written in Java and native applications in iOS that 1165 01:01:31,411 --> 01:01:32,910 are written in Objective-C or Swift. 1166 01:01:32,910 --> 01:01:34,470 They just tend to be very responsive. 1167 01:01:34,470 --> 01:01:36,900 They tend to follow a very similar paradigm. 1168 01:01:36,900 --> 01:01:39,810 Menus and buttons and so forth all tend to look and feel the same 1169 01:01:39,810 --> 01:01:41,640 and therefore be familiar to users. 1170 01:01:41,640 --> 01:01:43,290 And they're very responsive. 1171 01:01:43,290 --> 01:01:45,040 Touch a button, something happens quickly. 1172 01:01:45,040 --> 01:01:47,640 There doesn't seem typically to be much latency. 1173 01:01:47,640 --> 01:01:51,480 In hybrid applications, that are really written in JavaScript, HTML, and CSS, 1174 01:01:51,480 --> 01:01:55,410 especially if they're technically server side hosted on your servers 1175 01:01:55,410 --> 01:01:57,330 or in some cloud server, they might actually 1176 01:01:57,330 --> 01:02:00,780 feel a little slower because there's a whole internet between you 1177 01:02:00,780 --> 01:02:02,280 and your user's experience. 1178 01:02:02,280 --> 01:02:05,571 Or they might have to download more data that would be better to just bundle up 1179 01:02:05,571 --> 01:02:08,100 in the application itself where the menus and the buttons 1180 01:02:08,100 --> 01:02:15,210 they don't quite feel as native as the default Android and iOS user 1181 01:02:15,210 --> 01:02:16,042 experiences. 1182 01:02:16,042 --> 01:02:17,250 So a bit of a tradeoff there. 1183 01:02:17,250 --> 01:02:21,120 OK, so if you don't want to pay that penalty of performance and perception, 1184 01:02:21,120 --> 01:02:23,570 implement the Android app and the iOS app. 1185 01:02:23,570 --> 01:02:27,810 But now you need two developers or one developer who knows both platforms. 1186 01:02:27,810 --> 01:02:33,910 So that, too, comes with a cost, both in time or salary or talent or the like. 1187 01:02:33,910 --> 01:02:36,480 So there, too, it's not obvious how to go. 1188 01:02:36,480 --> 01:02:41,370 And even more recently are there frameworks like Cordova, Ionic, Meteor, 1189 01:02:41,370 --> 01:02:44,160 React Native, Supersonic, Xamarin, and more 1190 01:02:44,160 --> 01:02:48,690 that actually offer yet a fourth option whereby you implement 1191 01:02:48,690 --> 01:02:52,650 your application in some neutral language like JavaScript, 1192 01:02:52,650 --> 01:02:57,210 and then using these frameworks, these tools that other people have kindly 1193 01:02:57,210 --> 01:02:59,610 or commercially developed for us to use, you 1194 01:02:59,610 --> 01:03:05,010 can essentially convert or translate that middle language JavaScript 1195 01:03:05,010 --> 01:03:10,890 to Objective C or to Swift or to Java or really to the underlying code that 1196 01:03:10,890 --> 01:03:13,530 gets shipped ultimately to the app stores 1197 01:03:13,530 --> 01:03:17,040 so that you can actually develop native applications 1198 01:03:17,040 --> 01:03:18,960 but in an intermediate language. 1199 01:03:18,960 --> 01:03:21,900 But there the learning curve might be a little bit a little higher. 1200 01:03:21,900 --> 01:03:23,940 Indeed, the menu of options is even longer 1201 01:03:23,940 --> 01:03:26,110 than the list of native languages itself. 1202 01:03:26,110 --> 01:03:30,870 So that requires some learning curve or some time or some talent or money 1203 01:03:30,870 --> 01:03:31,510 again. 1204 01:03:31,510 --> 01:03:33,590 And so there too were there are some tradeoffs. 1205 01:03:33,590 --> 01:03:36,660 And so it really depends ultimately on what is your application 1206 01:03:36,660 --> 01:03:39,750 and who you have working with you and what is most important 1207 01:03:39,750 --> 01:03:41,580 and how much time do you have and what do 1208 01:03:41,580 --> 01:03:45,870 you view the technological horizon looking like some months ahead? 1209 01:03:45,870 --> 01:03:49,950 So at the end of the day, these technology stacks, as they're called, 1210 01:03:49,950 --> 01:03:52,360 are really just menus of options. 1211 01:03:52,360 --> 01:03:55,590 And those menus are constantly evolving, and they focus on the front 1212 01:03:55,590 --> 01:03:57,720 and on the back and on the server, on the client, 1213 01:03:57,720 --> 01:03:59,610 on mobile devices, laptops, and desktops. 1214 01:03:59,610 --> 01:04:01,800 There are solutions to any number of problems. 1215 01:04:01,800 --> 01:04:04,950 Indeed, the process of software engineering and developing a product 1216 01:04:04,950 --> 01:04:07,620 and developing a web app or a native application 1217 01:04:07,620 --> 01:04:11,610 itself is first doing some due diligence and bringing yourself up to speed 1218 01:04:11,610 --> 01:04:14,245 on what the design possibilities are, having a discussion, 1219 01:04:14,245 --> 01:04:17,120 having a debate even, with the engineers with whom you'll be working. 1220 01:04:17,120 --> 01:04:19,730 And ultimately making the most informed decision that you can 1221 01:04:19,730 --> 01:04:22,430 with an eye toward what is trending now, what has been trending, 1222 01:04:22,430 --> 01:04:25,100 and where the industry might be going but ultimately 1223 01:04:25,100 --> 01:04:28,400 focusing on solving optimally your own problems 1224 01:04:28,400 --> 01:04:33,750 and choosing among these various and ever-changing technology stacks. 1225 01:04:33,750 --> 01:04:35,363