1 00:00:00,000 --> 00:00:03,451 [MUSIC PLAYING] 2 00:00:03,451 --> 00:00:20,230 3 00:00:20,230 --> 00:00:23,530 CARTER ZENKE: Well, hello, one and all, and welcome back to CS50's Introduction 4 00:00:23,530 --> 00:00:25,960 to Programming with R. My name is Carter Zenke, 5 00:00:25,960 --> 00:00:29,470 and this is our lecture on packaging programs, by which we can share them 6 00:00:29,470 --> 00:00:30,790 with the world. 7 00:00:30,790 --> 00:00:33,820 Now, together today, we'll make a package called ducksay. 8 00:00:33,820 --> 00:00:38,110 And the goal of ducksay is take as input some character string and output 9 00:00:38,110 --> 00:00:41,690 a textual representation of a duck saying that character string. 10 00:00:41,690 --> 00:00:43,990 So for instance, if I typed in "hello, world," 11 00:00:43,990 --> 00:00:48,940 I might see in my console now this duck saying hello to the world. 12 00:00:48,940 --> 00:00:52,190 Now, if you're familiar with CS50 or with programming more generally, 13 00:00:52,190 --> 00:00:55,390 you might have heard of a package or a program called cowsay. 14 00:00:55,390 --> 00:00:57,460 And cowsay is very similar in spirit. 15 00:00:57,460 --> 00:01:02,500 It takes some piece of text and outputs in your terminal a cow saying that text. 16 00:01:02,500 --> 00:01:06,940 Now, cowsay is not an R package, but we will take some inspiration from it 17 00:01:06,940 --> 00:01:10,990 to make our own package, one called ducksay. 18 00:01:10,990 --> 00:01:14,200 But to do so, we need to learn more about packages. 19 00:01:14,200 --> 00:01:17,050 Up until now, we've been users of packages. 20 00:01:17,050 --> 00:01:19,970 We have seen how to install, how to load, 21 00:01:19,970 --> 00:01:22,850 and how to use functions inside of packages. 22 00:01:22,850 --> 00:01:26,960 But what we haven't seen is the source code behind our packages. 23 00:01:26,960 --> 00:01:28,550 How are they made? 24 00:01:28,550 --> 00:01:30,710 Well, it turns out when you download a package 25 00:01:30,710 --> 00:01:34,880 from something like the CRAN, what you get is a single binary file. 26 00:01:34,880 --> 00:01:38,780 But thankfully, you yourself don't have to write binary to create a package. 27 00:01:38,780 --> 00:01:42,920 You can instead use what we'll call source code-- the individual dot R files 28 00:01:42,920 --> 00:01:45,590 and folders that compose your package. 29 00:01:45,590 --> 00:01:48,230 And when it comes time to share your package with others, 30 00:01:48,230 --> 00:01:51,290 we'll take the source code and build it or compile it 31 00:01:51,290 --> 00:01:55,070 into some single file you can share around the world. 32 00:01:55,070 --> 00:01:57,410 But let's focus first on source code. 33 00:01:57,410 --> 00:02:00,890 I'll come back now to RStudio to make some source code for this package. 34 00:02:00,890 --> 00:02:03,290 And by convention, we write the source code 35 00:02:03,290 --> 00:02:07,680 for some package inside a folder that has that package's name. 36 00:02:07,680 --> 00:02:10,039 So if I want to make a package called ducksay, 37 00:02:10,039 --> 00:02:14,780 I should first create a folder in my working directory called ducksay. 38 00:02:14,780 --> 00:02:18,890 To do so, I can use this function here, dir.create, 39 00:02:18,890 --> 00:02:21,950 to create a directory-- in this case, called ducksay. 40 00:02:21,950 --> 00:02:23,220 I'll hit Enter here. 41 00:02:23,220 --> 00:02:28,580 And notice how in my File Explorer, I see a new folder called ducksay. 42 00:02:28,580 --> 00:02:33,050 Well, I want to make all my future files and folders inside of this folder here-- 43 00:02:33,050 --> 00:02:34,100 my future package. 44 00:02:34,100 --> 00:02:38,480 So I'll set now my working directory to ducksay. 45 00:02:38,480 --> 00:02:41,660 And now, it's a bit equivalent to me going inside of this folder, 46 00:02:41,660 --> 00:02:45,680 by clicking on it and seeing this blank slate in which we can begin writing 47 00:02:45,680 --> 00:02:48,200 the source code for our package. 48 00:02:48,200 --> 00:02:51,080 Now, this blank slate is a bit scary at first, 49 00:02:51,080 --> 00:02:53,000 but thankfully, there's a structure to how 50 00:02:53,000 --> 00:02:56,810 packages are organized in R. Let's take a look at what a typical package might 51 00:02:56,810 --> 00:02:57,710 look like. 52 00:02:57,710 --> 00:03:00,350 So in general, if you're making a package in R, 53 00:03:00,350 --> 00:03:03,050 you will tend to have these files and folders here. 54 00:03:03,050 --> 00:03:08,030 One first of all called DESCRIPTION in all caps, which describes your package-- 55 00:03:08,030 --> 00:03:12,290 what is the name of it, what version number is it, who wrote it, and so on. 56 00:03:12,290 --> 00:03:14,810 You'll also have a file called NAMESPACE, 57 00:03:14,810 --> 00:03:17,300 which you'll use to define the functions in your package 58 00:03:17,300 --> 00:03:20,180 some end user might be able to use. 59 00:03:20,180 --> 00:03:24,770 And then you'll have some folders here-- one called man, which stands for manual. 60 00:03:24,770 --> 00:03:28,220 You'll put inside that folder the documentation for your functions-- 61 00:03:28,220 --> 00:03:30,110 some instructions on how to use them. 62 00:03:30,110 --> 00:03:32,240 And here we have an R folder, too. 63 00:03:32,240 --> 00:03:36,320 We'll actually place your R files that have your function definitions inside 64 00:03:36,320 --> 00:03:36,920 of them. 65 00:03:36,920 --> 00:03:39,990 And of course, we'll need some tests to test our code, 66 00:03:39,990 --> 00:03:42,792 so we'll put those in this folder called tests. 67 00:03:42,792 --> 00:03:44,750 Now, if you're running a more advanced package, 68 00:03:44,750 --> 00:03:46,542 you might have other files and folders too, 69 00:03:46,542 --> 00:03:52,100 but these are the core ones that say this folder here is an R package. 70 00:03:52,100 --> 00:03:53,360 So let's begin. 71 00:03:53,360 --> 00:03:55,640 Let's begin with our description file and describe 72 00:03:55,640 --> 00:03:57,230 the package we want to create. 73 00:03:57,230 --> 00:04:02,900 I'll come back now to RStudio and create for this folder here a description file. 74 00:04:02,900 --> 00:04:06,380 | can do so with file.create, create, and I'll 75 00:04:06,380 --> 00:04:10,730 choose to make a DESCRIPTION file, in all caps with no file extension. 76 00:04:10,730 --> 00:04:14,720 Notice how now inside my ducksay folder, I have this file 77 00:04:14,720 --> 00:04:19,250 called description, which I can open up and see, well, another blank slate. 78 00:04:19,250 --> 00:04:21,649 And so it turns out here too, there are some conventions 79 00:04:21,649 --> 00:04:24,623 on how to organize our description file. 80 00:04:24,623 --> 00:04:26,540 In fact, our description file will be composed 81 00:04:26,540 --> 00:04:30,980 of individual fields that tell us some information about this package. 82 00:04:30,980 --> 00:04:33,020 But what are those fields? 83 00:04:33,020 --> 00:04:36,770 Well, the ones you need to know about-- you need to include in your file-- 84 00:04:36,770 --> 00:04:38,060 are these here. 85 00:04:38,060 --> 00:04:41,060 One called package, which is the name of our package 86 00:04:41,060 --> 00:04:43,550 as we want somebody to install it. 87 00:04:43,550 --> 00:04:46,550 We've seen install.packages, and we'll often 88 00:04:46,550 --> 00:04:49,580 include the name of the package we want to install as the input 89 00:04:49,580 --> 00:04:50,472 to that function. 90 00:04:50,472 --> 00:04:52,430 Well, this is the name we want the user to type 91 00:04:52,430 --> 00:04:56,450 in to install.packages to install our package. 92 00:04:56,450 --> 00:04:59,160 We also have to say the title of our package, which 93 00:04:59,160 --> 00:05:00,410 a bit more English-friendly. 94 00:05:00,410 --> 00:05:02,760 You could capitalize things, include spaces, and so on, 95 00:05:02,760 --> 00:05:05,570 but similar idea to the package field as well. 96 00:05:05,570 --> 00:05:07,425 We have here a description as well. 97 00:05:07,425 --> 00:05:09,050 What is the description of our package? 98 00:05:09,050 --> 00:05:10,760 What does it do, and so on? 99 00:05:10,760 --> 00:05:11,990 And a version number. 100 00:05:11,990 --> 00:05:12,920 What version is it? 101 00:05:12,920 --> 00:05:17,930 If we changed over time, is it version 1.0, 2.0, 3.0, or so on? 102 00:05:17,930 --> 00:05:20,600 And then finally here, we have so information on the authors. 103 00:05:20,600 --> 00:05:22,010 Who wrote this package? 104 00:05:22,010 --> 00:05:23,210 What was their role? 105 00:05:23,210 --> 00:05:25,250 And then a license file. 106 00:05:25,250 --> 00:05:27,680 That is, what is the legal terms in which you can actually 107 00:05:27,680 --> 00:05:31,340 use the code if you want to install this package and use it yourself? 108 00:05:31,340 --> 00:05:34,820 So let's now add these fields to our description file 109 00:05:34,820 --> 00:05:37,040 and make it an R package. 110 00:05:37,040 --> 00:05:39,530 I'll come back now to RStudio, and let's go ahead 111 00:05:39,530 --> 00:05:41,810 and start by naming our package. 112 00:05:41,810 --> 00:05:44,660 Well, we said before, our package will be called ducksay. 113 00:05:44,660 --> 00:05:49,370 So in the description file, I'll add this package field followed by a colon, 114 00:05:49,370 --> 00:05:53,030 and I'll say that our package is named, in this case, ducksay-- 115 00:05:53,030 --> 00:05:55,020 just some English text here. 116 00:05:55,020 --> 00:05:59,660 Then I'll add the title, and I'll choose to maybe title case our package here. 117 00:05:59,660 --> 00:06:02,450 I'll call it Duck Say, just like this, and I'll 118 00:06:02,450 --> 00:06:04,050 give a description to it as well. 119 00:06:04,050 --> 00:06:08,780 I'll say the description of this package is going to be "Say hello with a duck." 120 00:06:08,780 --> 00:06:11,690 The purpose is to say hello with a duck. 121 00:06:11,690 --> 00:06:14,330 And this is the first version of the package. 122 00:06:14,330 --> 00:06:18,290 I'm just starting out on my journey of developing this package. 123 00:06:18,290 --> 00:06:21,210 But now I need to include some more information too, 124 00:06:21,210 --> 00:06:24,810 like who wrote this package and what are the legal terms in which we can actually 125 00:06:24,810 --> 00:06:26,400 use the package too? 126 00:06:26,400 --> 00:06:32,010 So to define who wrote the package, I could use this field called Authors@R. 127 00:06:32,010 --> 00:06:35,550 And whereas up above, we've been using some English text, just 128 00:06:35,550 --> 00:06:38,580 regular old characters and so on, in this field, 129 00:06:38,580 --> 00:06:43,020 we can actually use an R function to define who wrote the package 130 00:06:43,020 --> 00:06:45,360 and what role they played. 131 00:06:45,360 --> 00:06:48,720 Now, to add a new author to this package, I'll use a function, 132 00:06:48,720 --> 00:06:50,400 one called person. 133 00:06:50,400 --> 00:06:52,680 And if I look at documentation, I would know 134 00:06:52,680 --> 00:06:57,130 that the first argument to this person function is the person's first name. 135 00:06:57,130 --> 00:06:58,860 So my first name is Carter. 136 00:06:58,860 --> 00:07:01,060 And the second argument is their last name. 137 00:07:01,060 --> 00:07:03,060 So in this case, my last name is Zenke. 138 00:07:03,060 --> 00:07:07,200 And there are a few other parameters as well that this person can have-- 139 00:07:07,200 --> 00:07:08,580 namely, an email. 140 00:07:08,580 --> 00:07:13,920 I could say my email here is carter@cs50.harvard.edu. 141 00:07:13,920 --> 00:07:16,860 And this lets people know if they want to contact the package author, 142 00:07:16,860 --> 00:07:19,620 they can email me at this email here. 143 00:07:19,620 --> 00:07:24,300 And then we should also specify what role each person played. 144 00:07:24,300 --> 00:07:28,210 So there's a parameter here called role as well. 145 00:07:28,210 --> 00:07:31,440 Now, because package authors can play more than one role, 146 00:07:31,440 --> 00:07:34,890 role will take as input a vector of roles. 147 00:07:34,890 --> 00:07:36,973 And there are actually a defined set of roles, 148 00:07:36,973 --> 00:07:39,390 which you can learn if you look at the documentation here, 149 00:07:39,390 --> 00:07:41,700 but I'll focus on a few in particular. 150 00:07:41,700 --> 00:07:46,200 One role is an author of the package-- somebody who contributed to it in full. 151 00:07:46,200 --> 00:07:50,370 And we denote somebody as an author by typing in aut-- 152 00:07:50,370 --> 00:07:52,650 this abbreviation here for author. 153 00:07:52,650 --> 00:07:55,560 So in this case, I'm saying that me, Carter Zenke, I'm 154 00:07:55,560 --> 00:07:58,020 an author of this package. 155 00:07:58,020 --> 00:07:59,520 Now, there's more roles, too. 156 00:07:59,520 --> 00:08:02,790 One role that is important as well is the creator 157 00:08:02,790 --> 00:08:05,100 of the package, which we note with cre. 158 00:08:05,100 --> 00:08:07,200 It's the abbreviation for creator. 159 00:08:07,200 --> 00:08:11,550 Now, creator and author seem pretty similar in meaning, 160 00:08:11,550 --> 00:08:14,280 but in R, they have two distinct meanings. 161 00:08:14,280 --> 00:08:18,570 An author is anybody who at any time contributed to the package. 162 00:08:18,570 --> 00:08:22,650 A creator, though, is the person who now maintains the package. 163 00:08:22,650 --> 00:08:25,890 They're in charge of updating it, making sure it's up to date and so on, 164 00:08:25,890 --> 00:08:26,620 over time. 165 00:08:26,620 --> 00:08:28,537 So these two are distinct, and there has to be 166 00:08:28,537 --> 00:08:32,799 at least one creator of some package-- a person who's maintaining it over time. 167 00:08:32,799 --> 00:08:35,530 So these are two main roles, but there is one more as well. 168 00:08:35,530 --> 00:08:40,200 I should also say that I am the copyright holder for this code. 169 00:08:40,200 --> 00:08:42,900 I own it, and I'm the person who owns a license that I will then 170 00:08:42,900 --> 00:08:45,430 specify underneath myself here. 171 00:08:45,430 --> 00:08:46,800 So these three roles-- 172 00:08:46,800 --> 00:08:49,620 author, creator, and copyright holder-- are the ones 173 00:08:49,620 --> 00:08:52,830 you will need to create a package in R. 174 00:08:52,830 --> 00:08:55,650 But now let's think through the license field-- 175 00:08:55,650 --> 00:08:57,120 license down below. 176 00:08:57,120 --> 00:09:00,342 Well, if you're a lawyer, you could maybe write your own license. 177 00:09:00,342 --> 00:09:03,300 But in general, it's best practice to rely on some standardized license 178 00:09:03,300 --> 00:09:04,710 that already exists. 179 00:09:04,710 --> 00:09:08,130 And in fact, if you want to share your code for free online for others 180 00:09:08,130 --> 00:09:10,410 to use freely as well, there's a whole community 181 00:09:10,410 --> 00:09:14,970 that has created various licenses-- one called the Free and Open Source Software 182 00:09:14,970 --> 00:09:15,600 community. 183 00:09:15,600 --> 00:09:17,440 This community has created several licenses 184 00:09:17,440 --> 00:09:22,800 that you can just use and adopt or adapt to share your software online for free. 185 00:09:22,800 --> 00:09:26,970 Among the typical licenses are these-- the MIT license, by a bunch of friends 186 00:09:26,970 --> 00:09:30,360 down the road, and the GNU General Public License, similar in spirit 187 00:09:30,360 --> 00:09:32,800 to the MIT license as well. 188 00:09:32,800 --> 00:09:35,730 Now, the MIT license begins as follows. 189 00:09:35,730 --> 00:09:38,150 "Permission is hereby granted, free of charge, 190 00:09:38,150 --> 00:09:40,192 to any person obtaining a copy of this software," 191 00:09:40,192 --> 00:09:42,567 basically to deal with the software without restriction-- 192 00:09:42,567 --> 00:09:44,400 so saying you can share this code freely, 193 00:09:44,400 --> 00:09:46,720 and you can use it freely as well. 194 00:09:46,720 --> 00:09:49,655 So I might want to adopt this license for my software here, 195 00:09:49,655 --> 00:09:52,750 and I'll go ahead and say that in my description file as well. 196 00:09:52,750 --> 00:09:54,750 I'll come back over here to RStudio and say, 197 00:09:54,750 --> 00:09:57,630 I want to build on top of the MIT license. 198 00:09:57,630 --> 00:10:00,670 I want to license my software under this language here. 199 00:10:00,670 --> 00:10:05,530 So I could simply type in MIT as the template for my license now. 200 00:10:05,530 --> 00:10:09,450 But it turns out that in R, there's a bit more we need to specify here. 201 00:10:09,450 --> 00:10:12,510 I should also specify the year which I created this software 202 00:10:12,510 --> 00:10:16,530 and who the copyright holder is inside the license itself. 203 00:10:16,530 --> 00:10:18,280 This is particular to MIT license as well. 204 00:10:18,280 --> 00:10:23,113 So if I want to add on to this MIT license, I can take MIT as my template 205 00:10:23,113 --> 00:10:25,280 but then add on some other file that gives some more 206 00:10:25,280 --> 00:10:26,780 information about this license. 207 00:10:26,780 --> 00:10:31,160 I could say MIT + file and then LICENSE in all caps. 208 00:10:31,160 --> 00:10:32,150 And this is convention. 209 00:10:32,150 --> 00:10:34,490 If I want to add on to my license, I do so 210 00:10:34,490 --> 00:10:37,490 with a file called LICENSE in all caps. 211 00:10:37,490 --> 00:10:40,430 And now this says the entire license for my software 212 00:10:40,430 --> 00:10:44,090 is the MIT license as the base, plus some file 213 00:10:44,090 --> 00:10:46,610 that I'll include called license. 214 00:10:46,610 --> 00:10:48,200 Well, let's create it now. 215 00:10:48,200 --> 00:10:50,960 I'll get on my console here and say file.create. 216 00:10:50,960 --> 00:10:53,330 I'll create a file called license-- 217 00:10:53,330 --> 00:10:57,590 no extension here-- open it up, and now add in some placeholders 218 00:10:57,590 --> 00:10:58,610 that I should fill. 219 00:10:58,610 --> 00:11:00,770 For the MIT license in particular, I need 220 00:11:00,770 --> 00:11:04,200 to say, again, what year it was that I created this software. 221 00:11:04,200 --> 00:11:07,280 So I'll say YEAR in all caps and the year I created the software, 222 00:11:07,280 --> 00:11:10,490 and then the copyright holder as well. 223 00:11:10,490 --> 00:11:12,860 In this case I'll say just the ducksay authors, 224 00:11:12,860 --> 00:11:17,330 referencing again this author's field in my description file. 225 00:11:17,330 --> 00:11:20,610 So now we have the basic bare-bone structure 226 00:11:20,610 --> 00:11:22,770 for our package called ducksay. 227 00:11:22,770 --> 00:11:26,460 We have a description, a license, authorship, and so on. 228 00:11:26,460 --> 00:11:31,320 Let me ask, what questions do we have so far on packages in R 229 00:11:31,320 --> 00:11:33,030 and creating those of our very own? 230 00:11:33,030 --> 00:11:35,790 231 00:11:35,790 --> 00:11:38,910 AUDIENCE: Which are the best practices to enumerate the version number, 232 00:11:38,910 --> 00:11:42,810 and when do we choose when to increment the number to the left of the point 233 00:11:42,810 --> 00:11:44,580 and the one to the right of the point? 234 00:11:44,580 --> 00:11:46,955 CARTER ZENKE: So a good question about version numbering. 235 00:11:46,955 --> 00:11:49,230 I want to change the version number that we're using. 236 00:11:49,230 --> 00:11:51,272 Let me show you what we're doing now and show you 237 00:11:51,272 --> 00:11:53,460 another process called semantic versioning, too. 238 00:11:53,460 --> 00:11:57,870 I'll come back over here, and here I'm using just 1.0 for simplicity, 239 00:11:57,870 --> 00:12:00,090 but actually it turns out some of the community use 240 00:12:00,090 --> 00:12:02,220 a convention known as semantic versioning. 241 00:12:02,220 --> 00:12:06,180 And that versioning system actually allows for three numbers here. 242 00:12:06,180 --> 00:12:09,720 Now by convention, each of these numbers has some certain meaning. 243 00:12:09,720 --> 00:12:13,590 In this case, this last number is known as the patch version. 244 00:12:13,590 --> 00:12:17,520 If you make some bugfix, you would increment this number here. 245 00:12:17,520 --> 00:12:20,730 This middle number is known as the minor version. 246 00:12:20,730 --> 00:12:23,250 If you add in some new feature, like a new function, 247 00:12:23,250 --> 00:12:26,160 for instance, you would increment that number too. 248 00:12:26,160 --> 00:12:29,880 But it turns out that this first number, like 1 here, 249 00:12:29,880 --> 00:12:32,040 is known as the major version. 250 00:12:32,040 --> 00:12:34,170 You would only increment this if you made 251 00:12:34,170 --> 00:12:39,030 some change that broke the conventions you prior used in your package 252 00:12:39,030 --> 00:12:40,780 and somebody who's relying on your package 253 00:12:40,780 --> 00:12:44,740 would have to update their code too to still use your package. 254 00:12:44,740 --> 00:12:47,770 So this is one set of conventions for versioning here. 255 00:12:47,770 --> 00:12:53,730 We'll go back, though, and use 1.0 just for simplicity. 256 00:12:53,730 --> 00:12:54,360 OK. 257 00:12:54,360 --> 00:12:58,170 So we have here now our basic bare-bones R package, 258 00:12:58,170 --> 00:13:01,050 but our goal is to also add some source code. 259 00:13:01,050 --> 00:13:04,260 And we can do it in the same way we've usually written source code before-- 260 00:13:04,260 --> 00:13:06,750 by first writing some tests, writing our code, 261 00:13:06,750 --> 00:13:09,300 and writing documentation for our code. 262 00:13:09,300 --> 00:13:12,960 Now, we saw before, if I want to test inside of my package-- 263 00:13:12,960 --> 00:13:14,860 I want to run some unit tests as well-- 264 00:13:14,860 --> 00:13:19,080 I should write them inside of this folder called tests. 265 00:13:19,080 --> 00:13:21,960 And we can certainly create this folder ourselves, 266 00:13:21,960 --> 00:13:25,260 but as we get into the weeds of structuring R package, 267 00:13:25,260 --> 00:13:27,150 we might want some help. 268 00:13:27,150 --> 00:13:31,860 So thankfully, there is a package that helps you write packages in R, 269 00:13:31,860 --> 00:13:33,690 one called devtools. 270 00:13:33,690 --> 00:13:37,380 This package helps us give us some tools to write our very own package, 271 00:13:37,380 --> 00:13:41,100 and among the functions it has for unit testing are these-- 272 00:13:41,100 --> 00:13:43,840 one called use_testthat. 273 00:13:43,840 --> 00:13:46,170 So we saw testthat last time. 274 00:13:46,170 --> 00:13:49,260 It's a package for unit testing R software. 275 00:13:49,260 --> 00:13:53,440 If I want to use the testthat package to test my code, all I have to do 276 00:13:53,440 --> 00:13:55,750 is use use_testthat. 277 00:13:55,750 --> 00:13:59,230 And then, once I've done that, if I want to create some testing 278 00:13:59,230 --> 00:14:02,770 file for a function, I could simply use use_test, 279 00:14:02,770 --> 00:14:06,670 and that will create for me a new testing file for my function. 280 00:14:06,670 --> 00:14:10,210 And then finally, once I have all those tests written, 281 00:14:10,210 --> 00:14:14,930 if I want to run those tests, all I have to do is run the test function as well. 282 00:14:14,930 --> 00:14:18,220 So very helpful for us for structuring our package and running our unit 283 00:14:18,220 --> 00:14:19,478 tests too. 284 00:14:19,478 --> 00:14:22,270 So let's begin by writing some unit tests for our ducksay function. 285 00:14:22,270 --> 00:14:26,190 I'll come back now to RStudio, and let's try to use use_testthat. 286 00:14:26,190 --> 00:14:28,690 Well, because this function is part of the devtools package, 287 00:14:28,690 --> 00:14:32,290 I'll first need to load, if not install, the devtools package. 288 00:14:32,290 --> 00:14:37,450 So here down below, I'll use library devtools to load the devtools package, 289 00:14:37,450 --> 00:14:39,280 assuming it is installed. 290 00:14:39,280 --> 00:14:44,560 I'll hit Enter here, and now, let's use use_testthat to configure our package 291 00:14:44,560 --> 00:14:47,500 to run tests with the package testthat. 292 00:14:47,500 --> 00:14:51,580 Well, I'll use down below here use_testthat, and I'll hit Enter, 293 00:14:51,580 --> 00:14:55,360 and we'll see a few things have happened actually that I'll see in my console 294 00:14:55,360 --> 00:14:56,260 down below. 295 00:14:56,260 --> 00:15:00,970 Now, the first thing I see is that my description file has been added to. 296 00:15:00,970 --> 00:15:05,440 I see now this new field called Suggests, and as part of Suggests, 297 00:15:05,440 --> 00:15:07,450 I now see testthat. 298 00:15:07,450 --> 00:15:10,690 This means when somebody installs R package, 299 00:15:10,690 --> 00:15:14,770 it will be suggested that they also install testthat at a version 300 00:15:14,770 --> 00:15:17,770 greater than or equal to 3.0. 301 00:15:17,770 --> 00:15:19,915 Well, why would we suggest testthat? 302 00:15:19,915 --> 00:15:22,540 Well, maybe these want to test R code themselves, in which case 303 00:15:22,540 --> 00:15:26,360 they'll need to use testthat, because we used it ourselves as well. 304 00:15:26,360 --> 00:15:29,770 If you want somebody, though, not just to suggest some code-- it would actually 305 00:15:29,770 --> 00:15:32,080 be suggested to use your code-- you could also 306 00:15:32,080 --> 00:15:34,880 make it required they install some other package as well. 307 00:15:34,880 --> 00:15:37,820 So I can make a field called Requires, like this, 308 00:15:37,820 --> 00:15:41,020 and list any packages I want to require the user 309 00:15:41,020 --> 00:15:43,630 to install to use my own package. 310 00:15:43,630 --> 00:15:46,610 Now, the user here likely won't be testing our software for us, 311 00:15:46,610 --> 00:15:50,075 so only suggest it, not require it, but if you do want to require some code-- 312 00:15:50,075 --> 00:15:53,200 some package-- you can actually use Requires as a field in your description 313 00:15:53,200 --> 00:15:55,270 file as well. 314 00:15:55,270 --> 00:15:58,860 I also see here config/testthat/edition. 315 00:15:58,860 --> 00:16:01,830 This just means that when our tests are run, 316 00:16:01,830 --> 00:16:05,800 we'll be sure to use the version 3 of testthat. 317 00:16:05,800 --> 00:16:07,900 But a few other things have happened as well. 318 00:16:07,900 --> 00:16:10,020 If I look down below my console here, I'll 319 00:16:10,020 --> 00:16:14,790 see it's created some new folders for me-- namely, one called tests over here. 320 00:16:14,790 --> 00:16:19,410 If I click on tests, I'll see well a new file, testthat.R, 321 00:16:19,410 --> 00:16:22,110 and a folder, also called testthat. 322 00:16:22,110 --> 00:16:26,730 If I open up testthat.R, this is a file that was automatically created for me, 323 00:16:26,730 --> 00:16:29,370 and it includes some configuration for testthat 324 00:16:29,370 --> 00:16:32,070 as I run my tests inside of this package. 325 00:16:32,070 --> 00:16:34,680 We'll leave this alone for now, but notice how I also 326 00:16:34,680 --> 00:16:36,930 have a folder called testthat. 327 00:16:36,930 --> 00:16:41,100 And it's inside this folder that I will actually write my unit tests themselves, 328 00:16:41,100 --> 00:16:44,040 similar to what we saw last time. 329 00:16:44,040 --> 00:16:47,910 Now that the structure been set up, I can actually write my tests now. 330 00:16:47,910 --> 00:16:52,200 And see how it's suggesting that I use use_test to create my very first unit 331 00:16:52,200 --> 00:16:54,330 test for my function. 332 00:16:54,330 --> 00:16:58,060 Well, here I want to use_test, and I want 333 00:16:58,060 --> 00:17:02,890 to create a test for, in this case, a function called ducksay. 334 00:17:02,890 --> 00:17:07,810 So I'll enter as input to use_test the function's name, ducksay, just 335 00:17:07,810 --> 00:17:08,619 like that. 336 00:17:08,619 --> 00:17:11,680 I'll hit Enter now, and I'll see a few things happen again. 337 00:17:11,680 --> 00:17:16,390 One, I now have this new file, called test-ducksay.R, 338 00:17:16,390 --> 00:17:21,700 which is inside my testthat folder, which itself is inside my tests folder. 339 00:17:21,700 --> 00:17:25,240 And now I can modify test-ducksay.R. 340 00:17:25,240 --> 00:17:28,990 It's given me here some basic structure for my test file, 341 00:17:28,990 --> 00:17:30,550 but I don't want to use this so far. 342 00:17:30,550 --> 00:17:31,640 I'll just remove it. 343 00:17:31,640 --> 00:17:34,870 And now, I want to think about how I could describe ducksay. 344 00:17:34,870 --> 00:17:38,620 What do I want it to do in this testing file? 345 00:17:38,620 --> 00:17:42,340 Well, we saw last time we could use some code a bit 346 00:17:42,340 --> 00:17:45,970 like this to describe how I want to how we want ducksay to run. 347 00:17:45,970 --> 00:17:49,900 I could say describe and then use ducksay here 348 00:17:49,900 --> 00:17:53,600 to say I'm going to describe how I want ducksay to run. 349 00:17:53,600 --> 00:17:54,850 Well, what do I want it to do? 350 00:17:54,850 --> 00:17:58,000 I think the first thing I want it to do is to work with cat. 351 00:17:58,000 --> 00:18:03,400 So I could say it can print to the console with cat. 352 00:18:03,400 --> 00:18:08,710 And now, I'll include some test to see if ducksay can print with cat. 353 00:18:08,710 --> 00:18:10,870 And what I mean by this is as follows. 354 00:18:10,870 --> 00:18:14,620 If I use cat here and gave as input ducksay, 355 00:18:14,620 --> 00:18:18,400 I should see the output of ducks in the console. 356 00:18:18,400 --> 00:18:21,670 Ducksay will simply return to me some character string, 357 00:18:21,670 --> 00:18:25,510 but cat will take care of outputting it to the console. 358 00:18:25,510 --> 00:18:28,300 Now, how could I make this a test? 359 00:18:28,300 --> 00:18:30,580 I have the code I want to have run, but I 360 00:18:30,580 --> 00:18:33,170 want to test that it's doing what I want it to do. 361 00:18:33,170 --> 00:18:36,640 Well, it turns out that similar to expect_equals, which you saw last time, 362 00:18:36,640 --> 00:18:41,620 there is a function called expect_output that can expect when I run this code, 363 00:18:41,620 --> 00:18:44,380 I get output in my console. 364 00:18:44,380 --> 00:18:48,550 So I'll use this function, expect_output, part of testthat, 365 00:18:48,550 --> 00:18:53,300 to say I expect that when I run this code, cat with ducksay, 366 00:18:53,300 --> 00:18:56,600 I'll see some output in my console-- 367 00:18:56,600 --> 00:18:58,550 anything at all. 368 00:18:58,550 --> 00:19:02,330 And that seems to be our first description now for ducksay. 369 00:19:02,330 --> 00:19:06,050 But I think we could still do a little better-- get more specific, if we will. 370 00:19:06,050 --> 00:19:08,870 So here we could say it prints to the console with cat, 371 00:19:08,870 --> 00:19:10,190 but what should it print? 372 00:19:10,190 --> 00:19:12,815 Well, it should print "hello, world" at least, so I'll go ahead 373 00:19:12,815 --> 00:19:16,460 and say it can say hello to the world. 374 00:19:16,460 --> 00:19:18,650 That's another feature now of ducksay. 375 00:19:18,650 --> 00:19:24,920 And I'll say that well, when I want ducksay to run, I expect to see "hello, 376 00:19:24,920 --> 00:19:26,960 world" in the output. 377 00:19:26,960 --> 00:19:30,710 Now, we saw last time one called expect_equal-- 378 00:19:30,710 --> 00:19:33,260 one called expect_equal with ducksay here. 379 00:19:33,260 --> 00:19:36,980 And I could say I expect that ducksay will return to me a string that 380 00:19:36,980 --> 00:19:38,900 is equal to "hello, world." 381 00:19:38,900 --> 00:19:42,860 But I argue this might not work as I intend it to. 382 00:19:42,860 --> 00:19:44,720 Because if we look at our output here-- 383 00:19:44,720 --> 00:19:47,150 here's our intended output of ducksay-- 384 00:19:47,150 --> 00:19:51,140 why might it not work, if we were to say I expect this output 385 00:19:51,140 --> 00:19:54,130 to be equal to "hello, world"? 386 00:19:54,130 --> 00:19:56,980 Well, it seems like this is not strictly equal to "hello, world." 387 00:19:56,980 --> 00:19:59,770 I have "hello, world" and then some duck at the end. 388 00:19:59,770 --> 00:20:02,020 So what I would rather do is ask a different question. 389 00:20:02,020 --> 00:20:06,190 Is hello world somewhere in this output we've gotten back. 390 00:20:06,190 --> 00:20:08,710 Not is it equal to "hello, world," but is "hello, world" 391 00:20:08,710 --> 00:20:10,690 somewhere inside of it? 392 00:20:10,690 --> 00:20:14,620 Now thankfully, there is another function besides expect_equal-- one 393 00:20:14,620 --> 00:20:16,870 called expect_match. 394 00:20:16,870 --> 00:20:22,210 I can expect to find a match of "hello, world" inside this output of ducksay. 395 00:20:22,210 --> 00:20:23,450 I think I can try it out. 396 00:20:23,450 --> 00:20:27,680 I'll come back over here, and I can use expect_match like this. 397 00:20:27,680 --> 00:20:31,870 I'll say expect_match now between the return value of ducksay 398 00:20:31,870 --> 00:20:33,670 and this character string "hello, world," 399 00:20:33,670 --> 00:20:36,550 and that will treat this as a pattern-- 400 00:20:36,550 --> 00:20:38,500 hello comma space world. 401 00:20:38,500 --> 00:20:42,160 And if it finds that pattern inside the return value of ducksay, 402 00:20:42,160 --> 00:20:43,570 well, this will be true-- 403 00:20:43,570 --> 00:20:44,770 no errors at all. 404 00:20:44,770 --> 00:20:48,430 If I can't find that pattern, though, in ducksay, it will raise an error, 405 00:20:48,430 --> 00:20:50,440 and our tests will fail. 406 00:20:50,440 --> 00:20:52,420 So again, expect_match is good for trying 407 00:20:52,420 --> 00:20:58,830 to find this pattern, "hello, world," inside the output of ducksay right here. 408 00:20:58,830 --> 00:21:02,240 So I think these tests are in pretty good shape. 409 00:21:02,240 --> 00:21:04,850 I now know exactly what I want ducksay to do. 410 00:21:04,850 --> 00:21:07,430 It should work with cat, and it should print out 411 00:21:07,430 --> 00:21:10,820 some output that says hello to the world. 412 00:21:10,820 --> 00:21:14,390 But now that we have our tests, we need to write our actual code. 413 00:21:14,390 --> 00:21:16,490 We need to write the function ducksay itself. 414 00:21:16,490 --> 00:21:20,750 And for that, we saw we could use this folder called R. 415 00:21:20,750 --> 00:21:22,850 In general, in working with packages, we're 416 00:21:22,850 --> 00:21:28,070 going to write all of our dot R files inside of a folder called R. 417 00:21:28,070 --> 00:21:30,740 But again, rather than structuring this ourselves, 418 00:21:30,740 --> 00:21:33,620 we could rely on devtools to do it for us. 419 00:21:33,620 --> 00:21:37,700 I could use a function in devtools called use_r 420 00:21:37,700 --> 00:21:40,520 and pass in the function name I hope to create. 421 00:21:40,520 --> 00:21:45,020 And then I'll get a R file to write my function definition in. 422 00:21:45,020 --> 00:21:48,110 So let's try this now, now that we've written our unit tests. 423 00:21:48,110 --> 00:21:50,930 I'll come back now to RStudio, and let's see 424 00:21:50,930 --> 00:21:56,030 if I can use use_r to create for me the function ducksay and the file I 425 00:21:56,030 --> 00:21:57,470 should define it in. 426 00:21:57,470 --> 00:22:02,570 I'll go to my console now and use use_r, and then go back up in my File Explorer 427 00:22:02,570 --> 00:22:05,120 to ducksay as the folder here. 428 00:22:05,120 --> 00:22:09,410 And I'll try to create now this file to define ducksay in. 429 00:22:09,410 --> 00:22:12,230 I'll say I want to create this new function, ducksay, 430 00:22:12,230 --> 00:22:13,800 and the R file for it. 431 00:22:13,800 --> 00:22:17,570 I'll hit Enter now and see a few things happening. 432 00:22:17,570 --> 00:22:22,430 One, I see I have a folder called R-- brand new thanks to use_r. 433 00:22:22,430 --> 00:22:25,850 And I also see I have a new file, ducksay.R, 434 00:22:25,850 --> 00:22:28,490 which has been created inside of this R folder 435 00:22:28,490 --> 00:22:31,530 to keep things organized in this case. 436 00:22:31,530 --> 00:22:35,060 I can close my description here and my-- let me save it first-- 437 00:22:35,060 --> 00:22:38,390 and then I'll go ahead and remove test ducksay here and focus now 438 00:22:38,390 --> 00:22:43,670 on ducksay.R. Well, how should I write ducksay? 439 00:22:43,670 --> 00:22:45,890 If I look at my output here-- 440 00:22:45,890 --> 00:22:48,380 here I have my intended output-- 441 00:22:48,380 --> 00:22:51,230 I notice that I really have three lines of output 442 00:22:51,230 --> 00:22:53,630 I hope to return from this function. 443 00:22:53,630 --> 00:22:57,950 I have hello comma space world, the top half of my duck, 444 00:22:57,950 --> 00:23:00,530 and the bottom half of my duck. 445 00:23:00,530 --> 00:23:02,610 These are all character strings. 446 00:23:02,610 --> 00:23:06,170 So I'm actually ask our group here, what function 447 00:23:06,170 --> 00:23:11,780 we've seen so far do you think would help us combine these strings? 448 00:23:11,780 --> 00:23:16,220 If I want to have three different lines here-- hello, world, top of my duck, 449 00:23:16,220 --> 00:23:21,230 bottom of my duck, what function could I use to combine these strings 450 00:23:21,230 --> 00:23:25,220 and perhaps return them from my ducksay function? 451 00:23:25,220 --> 00:23:27,517 AUDIENCE: Maybe the Paste function? 452 00:23:27,517 --> 00:23:28,850 CARTER ZENKE: Yeah, maybe Paste. 453 00:23:28,850 --> 00:23:32,780 So we saw before that Paste is good for combining different strings, 454 00:23:32,780 --> 00:23:34,760 and we can actually use Paste here. 455 00:23:34,760 --> 00:23:36,890 But instead of separating now with spaces, 456 00:23:36,890 --> 00:23:40,843 we could separate with new lines-- our backslash and escape character. 457 00:23:40,843 --> 00:23:41,760 So let's try that out. 458 00:23:41,760 --> 00:23:45,500 I'll come back now to my file, and let's define for ourselves 459 00:23:45,500 --> 00:23:47,990 the ducksay function using Paste. 460 00:23:47,990 --> 00:23:52,680 I'll say here I want to make a new function called ducksay that currently 461 00:23:52,680 --> 00:23:54,600 doesn't take any input at all. 462 00:23:54,600 --> 00:23:57,630 But inside of this function, I will return 463 00:23:57,630 --> 00:24:01,800 the result of calling Paste on three different strings. 464 00:24:01,800 --> 00:24:04,380 The first one will be my first string here, 465 00:24:04,380 --> 00:24:08,220 one called "hello, world" at the very top of my output. 466 00:24:08,220 --> 00:24:11,010 And then the very beginning of my duck here-- 467 00:24:11,010 --> 00:24:15,240 I could give it a little beak, some eyes, and now the top of its body here. 468 00:24:15,240 --> 00:24:18,540 And then underneath, I could use the bottom of the duck, which 469 00:24:18,540 --> 00:24:20,880 will look a bit like this-- 470 00:24:20,880 --> 00:24:23,460 some underscores and then a forward slash. 471 00:24:23,460 --> 00:24:29,130 And now, I think we have what looks to be our intended output, thanks to Paste. 472 00:24:29,130 --> 00:24:34,680 But as we said before, Paste default is to combine these strings using a space. 473 00:24:34,680 --> 00:24:36,520 And we want a new line instead. 474 00:24:36,520 --> 00:24:39,690 So I should change now the sep parameter to Paste. 475 00:24:39,690 --> 00:24:43,350 We've seen before from a space to a backslash n-- 476 00:24:43,350 --> 00:24:48,330 this new line character that says I want to separate each of these character 477 00:24:48,330 --> 00:24:50,250 strings by a new line. 478 00:24:50,250 --> 00:24:53,810 I could be hitting Enter each time on my keyboard. 479 00:24:53,810 --> 00:24:56,030 So I'll save now this function. 480 00:24:56,030 --> 00:25:00,440 And when ducksay runs, it should now return to me this output. 481 00:25:00,440 --> 00:25:05,130 But I've defined ducksay here, and I want to use it. 482 00:25:05,130 --> 00:25:08,120 Turns out I can't do that just yet. 483 00:25:08,120 --> 00:25:11,430 We saw earlier this idea of a NAMESPACE file, 484 00:25:11,430 --> 00:25:16,310 which tells us which functions in our package an end user can use. 485 00:25:16,310 --> 00:25:18,890 And so now we've defined our ducksay function, 486 00:25:18,890 --> 00:25:22,430 we should actually include it in our package's NAMESPACE-- 487 00:25:22,430 --> 00:25:26,690 the list of functions that an end user could use in R package. 488 00:25:26,690 --> 00:25:30,290 So let me now create this file called NAMESPACE. 489 00:25:30,290 --> 00:25:34,070 I'll say file.create, NAMESPACE down below, 490 00:25:34,070 --> 00:25:40,520 and I can then see in my folder called ducksay a new file called NAMESPACE. 491 00:25:40,520 --> 00:25:44,540 I'll open this one up, and what should I include in NAMESPACE? 492 00:25:44,540 --> 00:25:48,650 Well, by convention, we have a function here called export. 493 00:25:48,650 --> 00:25:49,340 Export. 494 00:25:49,340 --> 00:25:51,860 Export says take a function that I've defined 495 00:25:51,860 --> 00:25:56,430 and make it available to the end user who installs this package. 496 00:25:56,430 --> 00:26:01,450 In this case, I'll export ducksay function just like this. 497 00:26:01,450 --> 00:26:05,130 So to be clear, I've now defined my ducksay function 498 00:26:05,130 --> 00:26:09,930 inside a file called ducksay.R, which itself was inside an R folder 499 00:26:09,930 --> 00:26:11,460 to keep things organized. 500 00:26:11,460 --> 00:26:14,130 And then once I've defined it, I want to make 501 00:26:14,130 --> 00:26:17,700 it available to a user, which I'll do through the NAMESPACE file 502 00:26:17,700 --> 00:26:22,410 and say I want the ducksay function in particular to be available to our end 503 00:26:22,410 --> 00:26:24,180 users here. 504 00:26:24,180 --> 00:26:28,470 Now once I do that, I can make use of another devtools function, 505 00:26:28,470 --> 00:26:30,810 one called load.all-- 506 00:26:30,810 --> 00:26:35,730 load.all-- that says whatever functions I exported from my package, 507 00:26:35,730 --> 00:26:38,370 like ducksay here, I want you to load them 508 00:26:38,370 --> 00:26:41,250 so I can use them right here in my console. 509 00:26:41,250 --> 00:26:46,110 I'll go ahead and load all, and I'll see this is loading the ducksay package now. 510 00:26:46,110 --> 00:26:49,740 What I can do now is use ducksay in my console. 511 00:26:49,740 --> 00:26:54,237 I could say I want to cat the result of calling ducksay, just like this. 512 00:26:54,237 --> 00:26:55,320 And let's see what we get. 513 00:26:55,320 --> 00:26:56,670 Fingers crossed. 514 00:26:56,670 --> 00:26:59,820 We get a cute duck saying hello to the world. 515 00:26:59,820 --> 00:27:02,250 And now we could test our code more thoroughly too. 516 00:27:02,250 --> 00:27:07,410 I could run test as well in my console, and now I'll run those tests I created. 517 00:27:07,410 --> 00:27:10,800 Let me open up inside my tests and test that folder here. 518 00:27:10,800 --> 00:27:12,270 Here were those tests. 519 00:27:12,270 --> 00:27:16,140 If I now run test the function, thanks to devtools, 520 00:27:16,140 --> 00:27:21,630 I will be able to run all the tests I defined in this file. 521 00:27:21,630 --> 00:27:23,340 Now just one more thing here too. 522 00:27:23,340 --> 00:27:26,340 Last time, we used source at the top of our file 523 00:27:26,340 --> 00:27:31,890 to give this file access to a function like ducksay in ducksay.R. 524 00:27:31,890 --> 00:27:37,620 But now that we've used load all and exported this function from our package, 525 00:27:37,620 --> 00:27:40,320 we can simply load all and then run the tests, 526 00:27:40,320 --> 00:27:43,800 and they will have access to that function called ducksay. 527 00:27:43,800 --> 00:27:47,070 No more using source so long as we're inside a package 528 00:27:47,070 --> 00:27:50,790 that we've exported our functions from. 529 00:27:50,790 --> 00:27:54,190 OK, so we've seen now how to define unit tests for our package, 530 00:27:54,190 --> 00:27:57,010 how to write code that adheres to those tests. 531 00:27:57,010 --> 00:28:02,320 Let me ask, what questions do we have on what we've seen so far, either 532 00:28:02,320 --> 00:28:05,200 on testing our code, writing our functions, 533 00:28:05,200 --> 00:28:08,230 or defining our package more generally? 534 00:28:08,230 --> 00:28:10,930 AUDIENCE: When you run the test program on the terminal, what's 535 00:28:10,930 --> 00:28:13,327 the meaning of the colored letters FWS? 536 00:28:13,327 --> 00:28:14,660 CARTER ZENKE: Ah, good question. 537 00:28:14,660 --> 00:28:18,280 So if I look over here in the console, I'll see some pretty output, 538 00:28:18,280 --> 00:28:19,925 let's say, from testthat. 539 00:28:19,925 --> 00:28:21,800 And let me walk through it step by step here. 540 00:28:21,800 --> 00:28:23,890 So here I see testing ducksay. 541 00:28:23,890 --> 00:28:25,990 That is the function we are testing, right? 542 00:28:25,990 --> 00:28:30,130 I'll also see FWS and OK. 543 00:28:30,130 --> 00:28:34,210 Now, these are different kinds of results we can get from our tests here. 544 00:28:34,210 --> 00:28:37,720 It seems like F corresponds to fail down below. 545 00:28:37,720 --> 00:28:40,180 Fail-- we didn't pass this particular test. 546 00:28:40,180 --> 00:28:42,940 W stands for a warning. 547 00:28:42,940 --> 00:28:46,240 We saw last time how our tests sometimes raise warnings. 548 00:28:46,240 --> 00:28:50,620 Well, this would be the number of tests that gave me a warning, in this case. 549 00:28:50,620 --> 00:28:51,790 S stands for skip. 550 00:28:51,790 --> 00:28:53,900 It turns out you can skip tests if you want to. 551 00:28:53,900 --> 00:28:57,830 And then OK means this test passed with flying colors. 552 00:28:57,830 --> 00:29:00,770 So here, I see these two tests-- they both passed, 553 00:29:00,770 --> 00:29:05,750 and I'll see a 2 in the OK and a 2 total that are passing down below. 554 00:29:05,750 --> 00:29:09,470 If I had more than one function to test I might see more than one of these 555 00:29:09,470 --> 00:29:11,720 and see the total number of ones that were passed, 556 00:29:11,720 --> 00:29:16,130 skipped, warned about, or failed overall down at the bottom of this results 557 00:29:16,130 --> 00:29:17,292 here as well. 558 00:29:17,292 --> 00:29:19,250 So I hope that helps clarify what exactly we're 559 00:29:19,250 --> 00:29:21,980 seeing as a result of using test. 560 00:29:21,980 --> 00:29:25,050 But great question there. 561 00:29:25,050 --> 00:29:28,850 OK, so I think we're in a pretty good place, 562 00:29:28,850 --> 00:29:31,460 but there's arguably one more thing to consider now. 563 00:29:31,460 --> 00:29:36,170 So I've seen that my code can both print to the console with cat, 564 00:29:36,170 --> 00:29:38,150 and it includes "hello, world" in the output, 565 00:29:38,150 --> 00:29:42,800 but an important thing here too is, Does it include a duck? 566 00:29:42,800 --> 00:29:44,360 So let's see that as well. 567 00:29:44,360 --> 00:29:47,780 I'll come back now to RStudio and update these tests now. 568 00:29:47,780 --> 00:29:49,400 Let me add a new one-- 569 00:29:49,400 --> 00:29:52,610 a new test that says ducksay-- 570 00:29:52,610 --> 00:29:58,290 it can even say hello with a duck, just like this. 571 00:29:58,290 --> 00:30:02,240 And I think I could probably use a very similar structure to what I used before 572 00:30:02,240 --> 00:30:04,157 with expect_match. 573 00:30:04,157 --> 00:30:05,990 In this case, though, I could expect a match 574 00:30:05,990 --> 00:30:11,100 between the output of ducksay and the duck that I have to show to the user. 575 00:30:11,100 --> 00:30:14,090 So I could use expect_match, like I did before, 576 00:30:14,090 --> 00:30:17,280 and then enter ducksay, just like this. 577 00:30:17,280 --> 00:30:20,810 And now I want to expect a match between my duck pattern 578 00:30:20,810 --> 00:30:24,650 and whatever I see in the return value of ducksay. 579 00:30:24,650 --> 00:30:27,135 But I'll probably need a new object for this duck, 580 00:30:27,135 --> 00:30:29,510 and I want to type in the whole duck as an argument here. 581 00:30:29,510 --> 00:30:31,520 I could go up above here and define myself 582 00:30:31,520 --> 00:30:33,740 a new duck, similar to how we did it before. 583 00:30:33,740 --> 00:30:37,490 I'll paste together the top half of this duck with a cute little beak, 584 00:30:37,490 --> 00:30:41,240 and a top here, and then the bottom half of my duck-- 585 00:30:41,240 --> 00:30:44,390 1, 2, 3, 4 underscores-- and then I forward slash, 586 00:30:44,390 --> 00:30:48,650 and that is my duck, so long as I separate each with a backslash n, 587 00:30:48,650 --> 00:30:50,030 just like that. 588 00:30:50,030 --> 00:30:54,410 And now, I think what I could do is expect a match between the return 589 00:30:54,410 --> 00:30:56,900 value of ducksay and this duck-- 590 00:30:56,900 --> 00:30:59,720 this duck I've created over here. 591 00:30:59,720 --> 00:31:02,720 Now, this, you think, might work. 592 00:31:02,720 --> 00:31:04,760 But I'd argue there's one more thing to consider 593 00:31:04,760 --> 00:31:08,630 here, which is I told you earlier, expect_match 594 00:31:08,630 --> 00:31:11,990 will take the pattern we've defined here and look 595 00:31:11,990 --> 00:31:14,450 for it in the return value of ducksay. 596 00:31:14,450 --> 00:31:17,510 In this case, this is our pattern. 597 00:31:17,510 --> 00:31:21,585 Well, these patterns are more formally called regular expressions. 598 00:31:21,585 --> 00:31:24,710 And we'll get into them today, but in general, one thing to know about them 599 00:31:24,710 --> 00:31:28,160 is that these characters, parentheses and a dot, 600 00:31:28,160 --> 00:31:31,370 have a special meaning inside of regular expressions. 601 00:31:31,370 --> 00:31:34,400 They don't actually mean literally a parenthesis or a dot. 602 00:31:34,400 --> 00:31:36,530 They mean something else entirely. 603 00:31:36,530 --> 00:31:39,740 So if I want to treat this pattern not as this thing called 604 00:31:39,740 --> 00:31:42,770 a regular expression but exactly as I see it here, 605 00:31:42,770 --> 00:31:45,830 I can set the other parameter equal to true instead-- 606 00:31:45,830 --> 00:31:47,450 one called fixed. 607 00:31:47,450 --> 00:31:51,170 Fixed says, I want you to treat these characters here 608 00:31:51,170 --> 00:31:53,900 not as part of some regular expression, but instead, 609 00:31:53,900 --> 00:31:57,410 exactly as we see them here-- a greater-than sign, a parentheses, 610 00:31:57,410 --> 00:31:59,300 and a dot or a period. 611 00:31:59,300 --> 00:32:01,850 So more on those another time, but for now, 612 00:32:01,850 --> 00:32:04,730 let's just say I want to look for exactly this pattern 613 00:32:04,730 --> 00:32:06,380 inside the output of ducksay. 614 00:32:06,380 --> 00:32:10,460 I'll leave this as is now, and I'll go down below and run my tests with test 615 00:32:10,460 --> 00:32:11,100 again. 616 00:32:11,100 --> 00:32:14,420 And now I'll see all three tests are passing. 617 00:32:14,420 --> 00:32:15,410 None are failing. 618 00:32:15,410 --> 00:32:16,610 None are giving us warnings. 619 00:32:16,610 --> 00:32:17,570 None have been skipped. 620 00:32:17,570 --> 00:32:20,810 All, in this case, have passed. 621 00:32:20,810 --> 00:32:23,570 So we've fixed our tests. 622 00:32:23,570 --> 00:32:25,550 We've written our code. 623 00:32:25,550 --> 00:32:30,200 One next step is to document how to use R function. 624 00:32:30,200 --> 00:32:31,880 Maybe a user is new to R package. 625 00:32:31,880 --> 00:32:33,005 They don't know what to do. 626 00:32:33,005 --> 00:32:35,570 We want to give them some guidance on how to use R functions. 627 00:32:35,570 --> 00:32:38,600 In fact, you've probably seen to access documentation, 628 00:32:38,600 --> 00:32:42,560 you can use question mark followed by the name of some function. 629 00:32:42,560 --> 00:32:47,060 And right now, if I use question mark ducksay, well, I don't see anything. 630 00:32:47,060 --> 00:32:49,530 There's no documentation for ducksay. 631 00:32:49,530 --> 00:32:51,830 Well, let's go ahead and fix that. 632 00:32:51,830 --> 00:32:55,640 Thankfully, I can define my own documentation for ducksay 633 00:32:55,640 --> 00:32:59,330 by putting it inside of this folder we saw earlier-- 634 00:32:59,330 --> 00:33:03,500 one called man, where man stands for manual. 635 00:33:03,500 --> 00:33:06,470 But what will go inside this man folder? 636 00:33:06,470 --> 00:33:11,570 It turns out a variety of files all ending with dot Rd, where 637 00:33:11,570 --> 00:33:14,600 dot Rd stands for R documentation. 638 00:33:14,600 --> 00:33:19,100 In fact, inside these files, we'll write not just plain text, 639 00:33:19,100 --> 00:33:22,100 but we'll actually write something called a markup language. 640 00:33:22,100 --> 00:33:24,410 Now, a markup language is not a programming language. 641 00:33:24,410 --> 00:33:27,470 There are no functions and loops and so on. 642 00:33:27,470 --> 00:33:30,920 Instead, a language for formatting some text. 643 00:33:30,920 --> 00:33:34,950 Now for instance, R's markup language looks a bit like this. 644 00:33:34,950 --> 00:33:37,820 I can give each of my documentation files 645 00:33:37,820 --> 00:33:42,110 some particular parameters, like title, description, and usage here. 646 00:33:42,110 --> 00:33:45,710 Here, title says, what is the title of my documentation? 647 00:33:45,710 --> 00:33:48,230 Description says, describe this function for me. 648 00:33:48,230 --> 00:33:51,650 And usage says, how should I use this function too? 649 00:33:51,650 --> 00:33:55,380 There are other commands we'll see here, but our dot Rd files 650 00:33:55,380 --> 00:33:59,250 will look a lot like this and will then render them or convert them 651 00:33:59,250 --> 00:34:02,520 to those same files you're used to seeing when you use the question 652 00:34:02,520 --> 00:34:04,560 mark down in your console. 653 00:34:04,560 --> 00:34:05,970 So let's try this out now. 654 00:34:05,970 --> 00:34:10,650 I'll come back to RStudio and try to make some documentation for ducksay. 655 00:34:10,650 --> 00:34:13,770 Well, I want to probably first create for myself 656 00:34:13,770 --> 00:34:17,610 that man folder to put my documentation inside of. 657 00:34:17,610 --> 00:34:23,610 So I could use that same function we saw earlier, dir.create. 658 00:34:23,610 --> 00:34:28,170 And I'll create for myself the folder called man, short for manual. 659 00:34:28,170 --> 00:34:33,330 And it's inside of this folder that I will store all of my dot Rd files. 660 00:34:33,330 --> 00:34:38,340 I'll say file.create now, man/ducksay.Rd. 661 00:34:38,340 --> 00:34:39,690 And this is convention. 662 00:34:39,690 --> 00:34:43,710 I'm putting this file, ducksay.Rd, inside the man folder, 663 00:34:43,710 --> 00:34:48,300 and I'm calling it-- giving it the same name as the function it should document. 664 00:34:48,300 --> 00:34:53,840 So this file, ducksay.Rd, should document the same function, ducksay. 665 00:34:53,840 --> 00:34:57,830 I'll go ahead now and create this file, and if I open now my man folder, 666 00:34:57,830 --> 00:35:01,370 I should see ducksay.Rd right there inside. 667 00:35:01,370 --> 00:35:03,470 I'll open it up, and what do I see? 668 00:35:03,470 --> 00:35:06,080 Well, nothing yet, but I'd argue we could 669 00:35:06,080 --> 00:35:10,370 go ahead and use R's markup language to create some documentation now 670 00:35:10,370 --> 00:35:12,290 for ducksay. 671 00:35:12,290 --> 00:35:16,460 Now, I've read the documentation for creating documentation in R, 672 00:35:16,460 --> 00:35:18,560 and there are several different keywords you 673 00:35:18,560 --> 00:35:20,750 can use to create your documentation. 674 00:35:20,750 --> 00:35:23,510 Among the most important ones are these here. 675 00:35:23,510 --> 00:35:25,700 One is slash name. 676 00:35:25,700 --> 00:35:30,470 And inside these curly braces here will include the name of our function 677 00:35:30,470 --> 00:35:33,183 we're trying to document-- in this case, ducksay in particular. 678 00:35:33,183 --> 00:35:35,600 This is the name of the function we're trying to document. 679 00:35:35,600 --> 00:35:39,800 The next one, most important one, is going to be slash alias. 680 00:35:39,800 --> 00:35:43,010 Slash alias is what you want the user to type 681 00:35:43,010 --> 00:35:46,100 in in their console to see your documentation. 682 00:35:46,100 --> 00:35:50,180 For instance, if I go down to my console now and I use question mark ducksay, 683 00:35:50,180 --> 00:35:52,340 well, my alias is ducksay-- 684 00:35:52,340 --> 00:35:54,090 literally this right here. 685 00:35:54,090 --> 00:35:57,450 If any user were to go to their console and use question mark ducksay, 686 00:35:57,450 --> 00:36:00,870 they could see this documentation that I have now created for them, 687 00:36:00,870 --> 00:36:02,640 as long as I've installed my package. 688 00:36:02,640 --> 00:36:07,150 So also, my alias is similarly ducksay as well. 689 00:36:07,150 --> 00:36:09,510 And now, here comes our title. 690 00:36:09,510 --> 00:36:11,160 What is the title of this function? 691 00:36:11,160 --> 00:36:14,190 Kind of a more English characterization, like capitals and spaces 692 00:36:14,190 --> 00:36:18,310 and so on-- we'll call this function Duck Say, just like this, 693 00:36:18,310 --> 00:36:20,460 and provide a description. 694 00:36:20,460 --> 00:36:24,780 I'll say that this is a duck that says hello. 695 00:36:24,780 --> 00:36:28,860 And just like that, with these four lines of markup language, 696 00:36:28,860 --> 00:36:31,980 we can actually already see it being rendered or converted 697 00:36:31,980 --> 00:36:34,260 into our documentation file. 698 00:36:34,260 --> 00:36:38,250 If I go on my console now and run question mark ducksay, 699 00:36:38,250 --> 00:36:42,570 I'll see my very first R documentation file. 700 00:36:42,570 --> 00:36:46,590 Notice how here, the name of this function is actually 701 00:36:46,590 --> 00:36:49,500 included in my documentation, right up here. 702 00:36:49,500 --> 00:36:51,810 The alias is also included in what I use down here. 703 00:36:51,810 --> 00:36:54,600 I said question mark ducksay and got this documentation file. 704 00:36:54,600 --> 00:36:57,120 The title is there too-- slash title ducksay. 705 00:36:57,120 --> 00:36:58,570 We see that right here. 706 00:36:58,570 --> 00:37:02,670 And so too is the description, a duck that says hello. 707 00:37:02,670 --> 00:37:05,340 So we could keep adding to this documentation using 708 00:37:05,340 --> 00:37:07,260 this same syntax here. 709 00:37:07,260 --> 00:37:10,560 There are other kinds of components we can add to our documentation file 710 00:37:10,560 --> 00:37:11,288 as well. 711 00:37:11,288 --> 00:37:13,080 In fact, let's go ahead and add a few more. 712 00:37:13,080 --> 00:37:17,790 Let's add one called usage, which tells people how to use R function. 713 00:37:17,790 --> 00:37:22,950 I'll use slash usage here, and this will say how I want users to use it, in fact. 714 00:37:22,950 --> 00:37:25,080 So I'll say ducksay here. 715 00:37:25,080 --> 00:37:30,570 And by convention, in usage, we include the function's name, some parentheses, 716 00:37:30,570 --> 00:37:32,910 and if there are any parameters, we include those 717 00:37:32,910 --> 00:37:35,020 in the function's parentheses as well. 718 00:37:35,020 --> 00:37:37,320 But currently, there are no parameters. 719 00:37:37,320 --> 00:37:40,170 I'll also include a section called value, which 720 00:37:40,170 --> 00:37:42,150 is the return value of this function. 721 00:37:42,150 --> 00:37:43,710 What does it return to us? 722 00:37:43,710 --> 00:37:48,270 Well, it returns to us really a string representation of a duck 723 00:37:48,270 --> 00:37:51,030 saying hello to the world. 724 00:37:51,030 --> 00:37:54,450 And then finally, we can also include some examples 725 00:37:54,450 --> 00:37:57,630 of how to use this function, in case people are unfamiliar. 726 00:37:57,630 --> 00:38:00,810 I could say examples here and provide some examples 727 00:38:00,810 --> 00:38:03,037 of how people could use ducksay. 728 00:38:03,037 --> 00:38:04,620 Maybe I want them to use it with cats. 729 00:38:04,620 --> 00:38:07,560 So I'll show them, look, you can use ducksay like this. 730 00:38:07,560 --> 00:38:11,340 Take cat and pass as input ducksay, just like that. 731 00:38:11,340 --> 00:38:14,640 And now with these other pieces of syntax here, 732 00:38:14,640 --> 00:38:19,440 I can say question mark ducksay and see the new-- 733 00:38:19,440 --> 00:38:22,440 oops, let me save this file first-- and then run question mark ducksay, 734 00:38:22,440 --> 00:38:25,830 and I should see the now rendered version of what I'm 735 00:38:25,830 --> 00:38:28,620 seeing on the left-hand side over here. 736 00:38:28,620 --> 00:38:30,870 Notice how we have some new pieces. 737 00:38:30,870 --> 00:38:32,370 I see usage now. 738 00:38:32,370 --> 00:38:33,420 I see value. 739 00:38:33,420 --> 00:38:38,760 And I see some examples as well down below in my documentation file. 740 00:38:38,760 --> 00:38:44,220 So we've seen now how to document R functions using these markup 741 00:38:44,220 --> 00:38:45,390 language here. 742 00:38:45,390 --> 00:38:49,350 What questions do we have and how to document R code 743 00:38:49,350 --> 00:38:51,570 and how to render it in R console? 744 00:38:51,570 --> 00:38:54,430 745 00:38:54,430 --> 00:38:56,510 OK, seeing none, let's keep going then. 746 00:38:56,510 --> 00:38:58,360 And I think we're now in a pretty good spot. 747 00:38:58,360 --> 00:39:02,020 So we now have the ability to write our own functions, to test them, 748 00:39:02,020 --> 00:39:04,870 and to write documentation for those functions. 749 00:39:04,870 --> 00:39:07,240 So what should we do now? 750 00:39:07,240 --> 00:39:11,290 Well, ideally, we want to package up and share it with the world. 751 00:39:11,290 --> 00:39:14,980 And in fact, this process of taking what we have as source code 752 00:39:14,980 --> 00:39:19,030 and converting it into a single file has a particular name. 753 00:39:19,030 --> 00:39:22,450 This name is called building our package. 754 00:39:22,450 --> 00:39:26,140 Building our package-- taking it from source code into a single file 755 00:39:26,140 --> 00:39:28,510 we could share around the world. 756 00:39:28,510 --> 00:39:31,030 Now, there are a few options for building our source 757 00:39:31,030 --> 00:39:33,220 code into that single file. 758 00:39:33,220 --> 00:39:34,450 Among them are these-- 759 00:39:34,450 --> 00:39:38,320 build, which is a devtools function that takes our source code 760 00:39:38,320 --> 00:39:41,020 and gives us some single file at the end. 761 00:39:41,020 --> 00:39:43,960 But build, it turns out, is actually a wrapper 762 00:39:43,960 --> 00:39:48,490 on top of a base R command called R CMD build. 763 00:39:48,490 --> 00:39:49,720 They have the same purpose. 764 00:39:49,720 --> 00:39:53,420 R CMD build, though, works in your actual computer's terminal, 765 00:39:53,420 --> 00:39:54,680 not in the console. 766 00:39:54,680 --> 00:39:58,370 So we'll instead use build to keep ourselves inside the R console 767 00:39:58,370 --> 00:40:01,280 and build our package into a single file. 768 00:40:01,280 --> 00:40:05,930 So we'll still rely on devtools now and use their build function in particular. 769 00:40:05,930 --> 00:40:09,510 Let me come back now to RStudio and show you how this exactly works. 770 00:40:09,510 --> 00:40:14,210 So notice here how I'm inside of my ducksay folder-- 771 00:40:14,210 --> 00:40:16,220 my ducksay package now, if you will. 772 00:40:16,220 --> 00:40:19,500 Let me go ahead and close my previous files here. 773 00:40:19,500 --> 00:40:24,230 Let me run this command called build, thanks to devtools. 774 00:40:24,230 --> 00:40:27,800 If I run build, I'll see some output here. 775 00:40:27,800 --> 00:40:30,860 And I'll see down below the file I have gotten 776 00:40:30,860 --> 00:40:35,330 from building this source code into a single file I can share with others. 777 00:40:35,330 --> 00:40:40,160 I see it's called ducksay_1.0.tar.gz. 778 00:40:40,160 --> 00:40:43,130 And if I move up one level in my folder structure, 779 00:40:43,130 --> 00:40:47,060 I'll see this file actually right next to the ducksay folder-- 780 00:40:47,060 --> 00:40:50,300 ducksay_1.0.tar.gz. 781 00:40:50,300 --> 00:40:53,450 And this is a funky kind of file name, but this stands for, essentially, 782 00:40:53,450 --> 00:40:54,530 a zip file, if you will. 783 00:40:54,530 --> 00:40:56,240 It's very similar in spirit. 784 00:40:56,240 --> 00:40:58,040 It's also called a tarball sometimes. 785 00:40:58,040 --> 00:41:01,160 So this is basically a single file which we can share our code, 786 00:41:01,160 --> 00:41:03,830 email it to somebody, post it online, et cetera. 787 00:41:03,830 --> 00:41:09,410 This is all that source code in our folder now in one single file. 788 00:41:09,410 --> 00:41:11,840 So we've done pretty well so far. 789 00:41:11,840 --> 00:41:14,630 But before I share this, I think I've forgotten 790 00:41:14,630 --> 00:41:17,390 kind of one important thing, which is the duck actually only 791 00:41:17,390 --> 00:41:19,100 says "hello, world" right now. 792 00:41:19,100 --> 00:41:22,050 It doesn't take as input any given kind of string. 793 00:41:22,050 --> 00:41:25,160 So I probably want to update our code and rebuild this package 794 00:41:25,160 --> 00:41:26,390 again and again. 795 00:41:26,390 --> 00:41:29,715 And in fact, you'll find a package building process often iterative. 796 00:41:29,715 --> 00:41:32,840 You build it, you add something new, you build it again, add something new, 797 00:41:32,840 --> 00:41:35,250 build it again, and so on and so forth. 798 00:41:35,250 --> 00:41:39,440 So let's go ahead and update our package and rebuild our code. 799 00:41:39,440 --> 00:41:42,920 Let's go back over here and consider how we could make this duck 800 00:41:42,920 --> 00:41:45,890 say any given phrase that we have. 801 00:41:45,890 --> 00:41:50,850 Well, I'll go back to my ducksay source code inside my folder here. 802 00:41:50,850 --> 00:41:55,760 I'll go back to my tests folder, open up now my tests for ducksay, 803 00:41:55,760 --> 00:42:02,730 and let me add one more description for each of these for this function here. 804 00:42:02,730 --> 00:42:05,180 I'll come down below, and I'll say, I want 805 00:42:05,180 --> 00:42:12,170 to make sure that this can say hello, or can say any given phrase, rather. 806 00:42:12,170 --> 00:42:14,930 Ducksay can say any given phrase. 807 00:42:14,930 --> 00:42:19,580 And now to exemplify this, I want to include this test here. 808 00:42:19,580 --> 00:42:23,900 I expect to match between running, let's say, ducksay, 809 00:42:23,900 --> 00:42:26,780 with a given phrase like "quack," like that, 810 00:42:26,780 --> 00:42:31,340 and I just have to find "quack" anywhere inside of that given return value, 811 00:42:31,340 --> 00:42:32,640 just like this. 812 00:42:32,640 --> 00:42:37,220 So now, again I'm saying that ducksay should be able to say any given phrase. 813 00:42:37,220 --> 00:42:40,010 If I run it with ducksay and pass as input quack, 814 00:42:40,010 --> 00:42:43,800 I should see quack inside of that return value. 815 00:42:43,800 --> 00:42:46,970 So it's a good test, but I need to implement it now in code. 816 00:42:46,970 --> 00:42:50,960 I'll come back now to my ducksay.R file. 817 00:42:50,960 --> 00:42:55,640 Going back to my main folder inside R, where we store our R files, 818 00:42:55,640 --> 00:43:00,830 open now ducksay.R, and I'll see my function definition. 819 00:43:00,830 --> 00:43:01,830 Well, I could do this. 820 00:43:01,830 --> 00:43:06,187 I could say ducksay now takes as input a given phrase, just like that. 821 00:43:06,187 --> 00:43:08,270 And I'll make sure that instead of "hello, world," 822 00:43:08,270 --> 00:43:11,300 we say that given phrase. 823 00:43:11,300 --> 00:43:17,600 But now, per my tests, I still want to run ducksay, or be able to run ducksay, 824 00:43:17,600 --> 00:43:20,390 like this, without any arguments whatsoever. 825 00:43:20,390 --> 00:43:23,510 And I still expect to see "hello, world" when ducksay 826 00:43:23,510 --> 00:43:25,530 is run without any arguments here. 827 00:43:25,530 --> 00:43:30,080 So I could go back to ducksay now and say that phrase has a default 828 00:43:30,080 --> 00:43:33,200 value of hello, world, just like this. 829 00:43:33,200 --> 00:43:36,830 If I supply a value, well, I'll see the phrase there, hopefully, 830 00:43:36,830 --> 00:43:39,350 and if I don't supply a value, well, I'll hopefully 831 00:43:39,350 --> 00:43:42,450 see "hello, world" there instead. 832 00:43:42,450 --> 00:43:45,500 So let me test my code interactively now. 833 00:43:45,500 --> 00:43:50,630 I've updated my function here, so I should again run load all. 834 00:43:50,630 --> 00:43:54,360 I'm going to update my function after I've redefined it here 835 00:43:54,360 --> 00:43:57,180 and make it available to myself in my console. 836 00:43:57,180 --> 00:44:00,450 I now have access to the latest version of ducksay. 837 00:44:00,450 --> 00:44:05,250 I can run cat ducksay and then give as input quack, 838 00:44:05,250 --> 00:44:08,402 and hopefully we'll see quack on top of our duck. 839 00:44:08,402 --> 00:44:10,110 So it seems to be working just right now. 840 00:44:10,110 --> 00:44:12,810 Let me go ahead and run test and see what I can see. 841 00:44:12,810 --> 00:44:16,800 I'll see that all four of my tests are passing. 842 00:44:16,800 --> 00:44:20,280 So I've updated my tests, my source code. 843 00:44:20,280 --> 00:44:21,970 What else should I update? 844 00:44:21,970 --> 00:44:23,040 Well, my documentation. 845 00:44:23,040 --> 00:44:27,870 I'll go back now to my ducksay package folder, open up man, open up ducksay.R, 846 00:44:27,870 --> 00:44:29,760 and I'll update my documentation. 847 00:44:29,760 --> 00:44:34,770 I'll then say now that the return value of ducksay 848 00:44:34,770 --> 00:44:37,950 is no longer a duck saying hello to the world. 849 00:44:37,950 --> 00:44:41,700 It's really a duck saying the given phrase. 850 00:44:41,700 --> 00:44:44,730 And similarly, I should update now my usage. 851 00:44:44,730 --> 00:44:48,870 By convention, we include the function's name, remember, some parentheses, 852 00:44:48,870 --> 00:44:52,090 remember, and also, the parameters to this function. 853 00:44:52,090 --> 00:44:54,960 So one parameter now is this one called phrase. 854 00:44:54,960 --> 00:44:58,150 And I've given a default value of hello, world. 855 00:44:58,150 --> 00:45:01,890 So this is by convention how I would write that out in my documentation. 856 00:45:01,890 --> 00:45:05,490 I would give the parameter in its default value with an equal sign 857 00:45:05,490 --> 00:45:06,720 separating them. 858 00:45:06,720 --> 00:45:09,210 And then finally, down below my examples here, 859 00:45:09,210 --> 00:45:12,360 I could give one more example of using ducksay. 860 00:45:12,360 --> 00:45:15,900 Underneath this here, I could say you can also, if you want to, 861 00:45:15,900 --> 00:45:19,690 give ducksay some input, like quack, just like that. 862 00:45:19,690 --> 00:45:22,590 So this, I think, covers us in terms of our function 863 00:45:22,590 --> 00:45:26,190 itself, our tests, and our documentation. 864 00:45:26,190 --> 00:45:31,290 I can test my documentation by rerendering it ?ducksay, 865 00:45:31,290 --> 00:45:33,180 and now I'll see it on the right-hand side. 866 00:45:33,180 --> 00:45:37,980 I will in fact see my updated usage, my updated value, 867 00:45:37,980 --> 00:45:40,990 and my updated examples down below. 868 00:45:40,990 --> 00:45:44,040 So I think we're in a pretty good spot. 869 00:45:44,040 --> 00:45:46,710 Let me now rebuild this code. 870 00:45:46,710 --> 00:45:49,870 I'll simply use build again, just like this. 871 00:45:49,870 --> 00:45:55,350 And now, I should see my updated version of my package now in a single file. 872 00:45:55,350 --> 00:45:58,950 And to be clear, if I wanted to share this code with somebody else, 873 00:45:58,950 --> 00:46:03,090 I would need to rebuild it every time I modify it to put all the updates inside 874 00:46:03,090 --> 00:46:03,690 of ducksay-- 875 00:46:03,690 --> 00:46:10,290 this folder-- into this new file, ducksay_1.0.tra.gz. 876 00:46:10,290 --> 00:46:11,250 OK. 877 00:46:11,250 --> 00:46:15,960 Let me ask, what questions do we have on iteratively updating and now 878 00:46:15,960 --> 00:46:19,260 rebuilding our package over time? 879 00:46:19,260 --> 00:46:24,840 AUDIENCE: Is there a command in R which rebuilds the package for us 880 00:46:24,840 --> 00:46:26,960 as soon as the file changes? 881 00:46:26,960 --> 00:46:28,210 CARTER ZENKE: A good question. 882 00:46:28,210 --> 00:46:32,460 So is there a command in R that rebuilds the package for us as we change it? 883 00:46:32,460 --> 00:46:33,687 Not that I am aware of. 884 00:46:33,687 --> 00:46:35,520 So I'm familiar with devtools in particular, 885 00:46:35,520 --> 00:46:38,280 and I don't think there is a function that would do exactly that. 886 00:46:38,280 --> 00:46:40,780 I know in other languages, there can be functions like that, 887 00:46:40,780 --> 00:46:44,730 but not that I'm familiar with in R. A good question, though. 888 00:46:44,730 --> 00:46:45,940 Let's keep going, then. 889 00:46:45,940 --> 00:46:47,430 And so we've rebuilt our package. 890 00:46:47,430 --> 00:46:49,020 We now have it as a single file. 891 00:46:49,020 --> 00:46:52,560 I think what's left to do now is to really use R package. 892 00:46:52,560 --> 00:46:55,680 So let's see if we can create ourselves a new program that 893 00:46:55,680 --> 00:46:57,510 uses exactly this package. 894 00:46:57,510 --> 00:47:01,170 Maybe I will make a program called greet.R 895 00:47:01,170 --> 00:47:04,440 that will instead of giving a user just a plain old simple hello 896 00:47:04,440 --> 00:47:06,480 give them a hello from a duck. 897 00:47:06,480 --> 00:47:11,610 So I will move my working directory up one level to out of my ducksay folder. 898 00:47:11,610 --> 00:47:16,410 I'll use setwd and then in quotes here, dot dot. 899 00:47:16,410 --> 00:47:19,920 That means move me one level higher in my working directory. 900 00:47:19,920 --> 00:47:23,640 So I'm presently in ducksay, but now I'll be right next to ducksay, 901 00:47:23,640 --> 00:47:26,850 if you will, in the same view I have on the right-hand side here. 902 00:47:26,850 --> 00:47:33,430 I'll now create for myself a new a new program called greet.R, just like this, 903 00:47:33,430 --> 00:47:35,430 and I'll see greet.R show up over here. 904 00:47:35,430 --> 00:47:38,760 I'll open it, and now let's write this program together. 905 00:47:38,760 --> 00:47:42,540 Well, now that I have my very own package, one called ducksay, 906 00:47:42,540 --> 00:47:45,660 I could use library to load my function ducksay. 907 00:47:45,660 --> 00:47:50,550 I could use library here and say I want to load this library called-- 908 00:47:50,550 --> 00:47:52,770 load this package called ducksay. 909 00:47:52,770 --> 00:47:55,050 And once I've done that, what do I want to do? 910 00:47:55,050 --> 00:47:57,580 I want to ask now the user for their name, 911 00:47:57,580 --> 00:48:00,960 so I could use let's say something like readline. 912 00:48:00,960 --> 00:48:04,860 I could say I want to ask the user for their name. 913 00:48:04,860 --> 00:48:08,010 I could use name and then store inside readline, 914 00:48:08,010 --> 00:48:10,920 just like that, and ask, what's your name? 915 00:48:10,920 --> 00:48:14,040 Readline, as you've seen before, will take their input 916 00:48:14,040 --> 00:48:17,217 and return it to us, storing it in this object called name. 917 00:48:17,217 --> 00:48:19,050 And then I could create a greeting for them. 918 00:48:19,050 --> 00:48:21,150 I could say maybe let's have a greeting here. 919 00:48:21,150 --> 00:48:23,610 But the greeting will actually be the result 920 00:48:23,610 --> 00:48:28,290 of calling our function ducksay, which is inside this package called ducksay. 921 00:48:28,290 --> 00:48:32,070 I'll use ducksay here and pass as input in this case, 922 00:48:32,070 --> 00:48:37,350 well, the combination of hello and then let's say the user's name. 923 00:48:37,350 --> 00:48:42,060 And now down below, I could use cat, of course, to print our greeting. 924 00:48:42,060 --> 00:48:46,680 So top to bottom, I'm loading this package we created called ducksay. 925 00:48:46,680 --> 00:48:48,960 I'm asking the user for their name. 926 00:48:48,960 --> 00:48:53,240 I'm then using the function we defined as part of ducksay, called ducksay 927 00:48:53,240 --> 00:48:53,840 as well. 928 00:48:53,840 --> 00:48:58,160 I'm going to pass in the concatenated version of hello and their name. 929 00:48:58,160 --> 00:49:02,930 And then I'm going to print the result of calling ducksay here. 930 00:49:02,930 --> 00:49:06,500 But before I can do this, I have built my package. 931 00:49:06,500 --> 00:49:09,140 But what I haven't done is installed it. 932 00:49:09,140 --> 00:49:10,050 So I've built it. 933 00:49:10,050 --> 00:49:11,550 I could share this file with others. 934 00:49:11,550 --> 00:49:13,638 But I still need to install it on my own computer. 935 00:49:13,638 --> 00:49:15,680 And if anybody wants to use this package as well, 936 00:49:15,680 --> 00:49:18,150 they need to install it on their computer too. 937 00:49:18,150 --> 00:49:20,810 So thankfully, there are tools we can use to install 938 00:49:20,810 --> 00:49:24,200 packages in R. We've seen one already-- 939 00:49:24,200 --> 00:49:26,420 one called install.packages. 940 00:49:26,420 --> 00:49:31,340 In fact, you can use install.packages to install packages not just from the CRAN 941 00:49:31,340 --> 00:49:35,360 but also from an individual file, like the one we have here. 942 00:49:35,360 --> 00:49:39,467 You could also use a base R function called R CMD install. 943 00:49:39,467 --> 00:49:41,300 You can use that in your terminal, but we'll 944 00:49:41,300 --> 00:49:46,410 stick now to using install.packages, and it keeps us inside the R console itself. 945 00:49:46,410 --> 00:49:50,420 So let's now install our package with install.packages. 946 00:49:50,420 --> 00:49:53,190 I'll come back now to RStudio, and I think 947 00:49:53,190 --> 00:49:55,890 I could use this single file I now have-- 948 00:49:55,890 --> 00:49:58,440 the compiled version of my package. 949 00:49:58,440 --> 00:50:00,900 And I can install it now with install.packages. 950 00:50:00,900 --> 00:50:06,840 I'll say install.packages, and I'll now use the file name itself-- 951 00:50:06,840 --> 00:50:10,290 ducksay_1.0.tar.gz. 952 00:50:10,290 --> 00:50:12,780 This tar ball, this kind of like a zip file, that I 953 00:50:12,780 --> 00:50:14,820 can use to store my package contents. 954 00:50:14,820 --> 00:50:19,230 I'll install this here, and now that it's installed, I see done. 955 00:50:19,230 --> 00:50:24,150 I can now run source in greet and type in Carter. 956 00:50:24,150 --> 00:50:24,780 Voila. 957 00:50:24,780 --> 00:50:28,560 I now have a duck that can say hello to anyone who enters in their name, 958 00:50:28,560 --> 00:50:32,980 thanks to this package I made called ducksay. 959 00:50:32,980 --> 00:50:33,720 Well. 960 00:50:33,720 --> 00:50:37,260 We've seen now how to build R packages, how to install them, 961 00:50:37,260 --> 00:50:38,340 and how to use them. 962 00:50:38,340 --> 00:50:41,423 Now the only thing I have to do is to share them with the world. 963 00:50:41,423 --> 00:50:43,590 So if you want to share your package with the world, 964 00:50:43,590 --> 00:50:45,180 you have a number of options. 965 00:50:45,180 --> 00:50:49,190 You could use the CRAN, so long as your package adheres to their guidelines. 966 00:50:49,190 --> 00:50:50,940 You could use a service like GitHub, which 967 00:50:50,940 --> 00:50:53,482 is a tool for sharing software and collaborating with others. 968 00:50:53,482 --> 00:50:57,030 You can even share your package over email with a friend. 969 00:50:57,030 --> 00:50:59,430 Now however you choose to share your code, 970 00:50:59,430 --> 00:51:00,600 I hope you keep in mind just how much you've 971 00:51:00,600 --> 00:51:02,558 learned over the course of this course and what 972 00:51:02,558 --> 00:51:04,050 you have to share with the world. 973 00:51:04,050 --> 00:51:08,250 In fact, you began by learning how to represent data using vectors and data 974 00:51:08,250 --> 00:51:08,910 frames. 975 00:51:08,910 --> 00:51:13,050 You graduated to transforming data, using subsets, conditions, 976 00:51:13,050 --> 00:51:14,670 and logical expressions. 977 00:51:14,670 --> 00:51:18,210 You then saw how to make your analysis more efficient using loops and functions 978 00:51:18,210 --> 00:51:22,068 and dipping your toes into this paradigm called functional programming. 979 00:51:22,068 --> 00:51:25,110 And in the second half of the course, you saw packages like the tidyverse 980 00:51:25,110 --> 00:51:28,560 and all they could do-- how they could tidy your data, help you visualize it, 981 00:51:28,560 --> 00:51:30,420 and help you test your programs too. 982 00:51:30,420 --> 00:51:32,730 All that's left now is to take all you've learned, 983 00:51:32,730 --> 00:51:35,910 package it up, and share it now with the world. 984 00:51:35,910 --> 00:51:38,340 We're so excited to see what you'll create. 985 00:51:38,340 --> 00:51:42,530 This was CS50's Introduction to Programming with R. 986 00:51:42,530 --> 00:51:45,000