WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.451 [MUSIC PLAYING] 00:00:20.230 --> 00:00:23.530 CARTER ZENKE: Well, hello, one and all, and welcome back to CS50's Introduction 00:00:23.530 --> 00:00:25.960 to Programming with R. My name is Carter Zenke, 00:00:25.960 --> 00:00:29.470 and this is our lecture on packaging programs, by which we can share them 00:00:29.470 --> 00:00:30.790 with the world. 00:00:30.790 --> 00:00:33.820 Now, together today, we'll make a package called ducksay. 00:00:33.820 --> 00:00:38.110 And the goal of ducksay is take as input some character string and output 00:00:38.110 --> 00:00:41.690 a textual representation of a duck saying that character string. 00:00:41.690 --> 00:00:43.990 So for instance, if I typed in "hello, world," 00:00:43.990 --> 00:00:48.940 I might see in my console now this duck saying hello to the world. 00:00:48.940 --> 00:00:52.190 Now, if you're familiar with CS50 or with programming more generally, 00:00:52.190 --> 00:00:55.390 you might have heard of a package or a program called cowsay. 00:00:55.390 --> 00:00:57.460 And cowsay is very similar in spirit. 00:00:57.460 --> 00:01:02.500 It takes some piece of text and outputs in your terminal a cow saying that text. 00:01:02.500 --> 00:01:06.940 Now, cowsay is not an R package, but we will take some inspiration from it 00:01:06.940 --> 00:01:10.990 to make our own package, one called ducksay. 00:01:10.990 --> 00:01:14.200 But to do so, we need to learn more about packages. 00:01:14.200 --> 00:01:17.050 Up until now, we've been users of packages. 00:01:17.050 --> 00:01:19.970 We have seen how to install, how to load, 00:01:19.970 --> 00:01:22.850 and how to use functions inside of packages. 00:01:22.850 --> 00:01:26.960 But what we haven't seen is the source code behind our packages. 00:01:26.960 --> 00:01:28.550 How are they made? 00:01:28.550 --> 00:01:30.710 Well, it turns out when you download a package 00:01:30.710 --> 00:01:34.880 from something like the CRAN, what you get is a single binary file. 00:01:34.880 --> 00:01:38.780 But thankfully, you yourself don't have to write binary to create a package. 00:01:38.780 --> 00:01:42.920 You can instead use what we'll call source code-- the individual dot R files 00:01:42.920 --> 00:01:45.590 and folders that compose your package. 00:01:45.590 --> 00:01:48.230 And when it comes time to share your package with others, 00:01:48.230 --> 00:01:51.290 we'll take the source code and build it or compile it 00:01:51.290 --> 00:01:55.070 into some single file you can share around the world. 00:01:55.070 --> 00:01:57.410 But let's focus first on source code. 00:01:57.410 --> 00:02:00.890 I'll come back now to RStudio to make some source code for this package. 00:02:00.890 --> 00:02:03.290 And by convention, we write the source code 00:02:03.290 --> 00:02:07.680 for some package inside a folder that has that package's name. 00:02:07.680 --> 00:02:10.039 So if I want to make a package called ducksay, 00:02:10.039 --> 00:02:14.780 I should first create a folder in my working directory called ducksay. 00:02:14.780 --> 00:02:18.890 To do so, I can use this function here, dir.create, 00:02:18.890 --> 00:02:21.950 to create a directory-- in this case, called ducksay. 00:02:21.950 --> 00:02:23.220 I'll hit Enter here. 00:02:23.220 --> 00:02:28.580 And notice how in my File Explorer, I see a new folder called ducksay. 00:02:28.580 --> 00:02:33.050 Well, I want to make all my future files and folders inside of this folder here-- 00:02:33.050 --> 00:02:34.100 my future package. 00:02:34.100 --> 00:02:38.480 So I'll set now my working directory to ducksay. 00:02:38.480 --> 00:02:41.660 And now, it's a bit equivalent to me going inside of this folder, 00:02:41.660 --> 00:02:45.680 by clicking on it and seeing this blank slate in which we can begin writing 00:02:45.680 --> 00:02:48.200 the source code for our package. 00:02:48.200 --> 00:02:51.080 Now, this blank slate is a bit scary at first, 00:02:51.080 --> 00:02:53.000 but thankfully, there's a structure to how 00:02:53.000 --> 00:02:56.810 packages are organized in R. Let's take a look at what a typical package might 00:02:56.810 --> 00:02:57.710 look like. 00:02:57.710 --> 00:03:00.350 So in general, if you're making a package in R, 00:03:00.350 --> 00:03:03.050 you will tend to have these files and folders here. 00:03:03.050 --> 00:03:08.030 One first of all called DESCRIPTION in all caps, which describes your package-- 00:03:08.030 --> 00:03:12.290 what is the name of it, what version number is it, who wrote it, and so on. 00:03:12.290 --> 00:03:14.810 You'll also have a file called NAMESPACE, 00:03:14.810 --> 00:03:17.300 which you'll use to define the functions in your package 00:03:17.300 --> 00:03:20.180 some end user might be able to use. 00:03:20.180 --> 00:03:24.770 And then you'll have some folders here-- one called man, which stands for manual. 00:03:24.770 --> 00:03:28.220 You'll put inside that folder the documentation for your functions-- 00:03:28.220 --> 00:03:30.110 some instructions on how to use them. 00:03:30.110 --> 00:03:32.240 And here we have an R folder, too. 00:03:32.240 --> 00:03:36.320 We'll actually place your R files that have your function definitions inside 00:03:36.320 --> 00:03:36.920 of them. 00:03:36.920 --> 00:03:39.990 And of course, we'll need some tests to test our code, 00:03:39.990 --> 00:03:42.792 so we'll put those in this folder called tests. 00:03:42.792 --> 00:03:44.750 Now, if you're running a more advanced package, 00:03:44.750 --> 00:03:46.542 you might have other files and folders too, 00:03:46.542 --> 00:03:52.100 but these are the core ones that say this folder here is an R package. 00:03:52.100 --> 00:03:53.360 So let's begin. 00:03:53.360 --> 00:03:55.640 Let's begin with our description file and describe 00:03:55.640 --> 00:03:57.230 the package we want to create. 00:03:57.230 --> 00:04:02.900 I'll come back now to RStudio and create for this folder here a description file. 00:04:02.900 --> 00:04:06.380 | can do so with file.create, create, and I'll 00:04:06.380 --> 00:04:10.730 choose to make a DESCRIPTION file, in all caps with no file extension. 00:04:10.730 --> 00:04:14.720 Notice how now inside my ducksay folder, I have this file 00:04:14.720 --> 00:04:19.250 called description, which I can open up and see, well, another blank slate. 00:04:19.250 --> 00:04:21.649 And so it turns out here too, there are some conventions 00:04:21.649 --> 00:04:24.623 on how to organize our description file. 00:04:24.623 --> 00:04:26.540 In fact, our description file will be composed 00:04:26.540 --> 00:04:30.980 of individual fields that tell us some information about this package. 00:04:30.980 --> 00:04:33.020 But what are those fields? 00:04:33.020 --> 00:04:36.770 Well, the ones you need to know about-- you need to include in your file-- 00:04:36.770 --> 00:04:38.060 are these here. 00:04:38.060 --> 00:04:41.060 One called package, which is the name of our package 00:04:41.060 --> 00:04:43.550 as we want somebody to install it. 00:04:43.550 --> 00:04:46.550 We've seen install.packages, and we'll often 00:04:46.550 --> 00:04:49.580 include the name of the package we want to install as the input 00:04:49.580 --> 00:04:50.472 to that function. 00:04:50.472 --> 00:04:52.430 Well, this is the name we want the user to type 00:04:52.430 --> 00:04:56.450 in to install.packages to install our package. 00:04:56.450 --> 00:04:59.160 We also have to say the title of our package, which 00:04:59.160 --> 00:05:00.410 a bit more English-friendly. 00:05:00.410 --> 00:05:02.760 You could capitalize things, include spaces, and so on, 00:05:02.760 --> 00:05:05.570 but similar idea to the package field as well. 00:05:05.570 --> 00:05:07.425 We have here a description as well. 00:05:07.425 --> 00:05:09.050 What is the description of our package? 00:05:09.050 --> 00:05:10.760 What does it do, and so on? 00:05:10.760 --> 00:05:11.990 And a version number. 00:05:11.990 --> 00:05:12.920 What version is it? 00:05:12.920 --> 00:05:17.930 If we changed over time, is it version 1.0, 2.0, 3.0, or so on? 00:05:17.930 --> 00:05:20.600 And then finally here, we have so information on the authors. 00:05:20.600 --> 00:05:22.010 Who wrote this package? 00:05:22.010 --> 00:05:23.210 What was their role? 00:05:23.210 --> 00:05:25.250 And then a license file. 00:05:25.250 --> 00:05:27.680 That is, what is the legal terms in which you can actually 00:05:27.680 --> 00:05:31.340 use the code if you want to install this package and use it yourself? 00:05:31.340 --> 00:05:34.820 So let's now add these fields to our description file 00:05:34.820 --> 00:05:37.040 and make it an R package. 00:05:37.040 --> 00:05:39.530 I'll come back now to RStudio, and let's go ahead 00:05:39.530 --> 00:05:41.810 and start by naming our package. 00:05:41.810 --> 00:05:44.660 Well, we said before, our package will be called ducksay. 00:05:44.660 --> 00:05:49.370 So in the description file, I'll add this package field followed by a colon, 00:05:49.370 --> 00:05:53.030 and I'll say that our package is named, in this case, ducksay-- 00:05:53.030 --> 00:05:55.020 just some English text here. 00:05:55.020 --> 00:05:59.660 Then I'll add the title, and I'll choose to maybe title case our package here. 00:05:59.660 --> 00:06:02.450 I'll call it Duck Say, just like this, and I'll 00:06:02.450 --> 00:06:04.050 give a description to it as well. 00:06:04.050 --> 00:06:08.780 I'll say the description of this package is going to be "Say hello with a duck." 00:06:08.780 --> 00:06:11.690 The purpose is to say hello with a duck. 00:06:11.690 --> 00:06:14.330 And this is the first version of the package. 00:06:14.330 --> 00:06:18.290 I'm just starting out on my journey of developing this package. 00:06:18.290 --> 00:06:21.210 But now I need to include some more information too, 00:06:21.210 --> 00:06:24.810 like who wrote this package and what are the legal terms in which we can actually 00:06:24.810 --> 00:06:26.400 use the package too? 00:06:26.400 --> 00:06:32.010 So to define who wrote the package, I could use this field called Authors@R. 00:06:32.010 --> 00:06:35.550 And whereas up above, we've been using some English text, just 00:06:35.550 --> 00:06:38.580 regular old characters and so on, in this field, 00:06:38.580 --> 00:06:43.020 we can actually use an R function to define who wrote the package 00:06:43.020 --> 00:06:45.360 and what role they played. 00:06:45.360 --> 00:06:48.720 Now, to add a new author to this package, I'll use a function, 00:06:48.720 --> 00:06:50.400 one called person. 00:06:50.400 --> 00:06:52.680 And if I look at documentation, I would know 00:06:52.680 --> 00:06:57.130 that the first argument to this person function is the person's first name. 00:06:57.130 --> 00:06:58.860 So my first name is Carter. 00:06:58.860 --> 00:07:01.060 And the second argument is their last name. 00:07:01.060 --> 00:07:03.060 So in this case, my last name is Zenke. 00:07:03.060 --> 00:07:07.200 And there are a few other parameters as well that this person can have-- 00:07:07.200 --> 00:07:08.580 namely, an email. 00:07:08.580 --> 00:07:13.920 I could say my email here is carter@cs50.harvard.edu. 00:07:13.920 --> 00:07:16.860 And this lets people know if they want to contact the package author, 00:07:16.860 --> 00:07:19.620 they can email me at this email here. 00:07:19.620 --> 00:07:24.300 And then we should also specify what role each person played. 00:07:24.300 --> 00:07:28.210 So there's a parameter here called role as well. 00:07:28.210 --> 00:07:31.440 Now, because package authors can play more than one role, 00:07:31.440 --> 00:07:34.890 role will take as input a vector of roles. 00:07:34.890 --> 00:07:36.973 And there are actually a defined set of roles, 00:07:36.973 --> 00:07:39.390 which you can learn if you look at the documentation here, 00:07:39.390 --> 00:07:41.700 but I'll focus on a few in particular. 00:07:41.700 --> 00:07:46.200 One role is an author of the package-- somebody who contributed to it in full. 00:07:46.200 --> 00:07:50.370 And we denote somebody as an author by typing in aut-- 00:07:50.370 --> 00:07:52.650 this abbreviation here for author. 00:07:52.650 --> 00:07:55.560 So in this case, I'm saying that me, Carter Zenke, I'm 00:07:55.560 --> 00:07:58.020 an author of this package. 00:07:58.020 --> 00:07:59.520 Now, there's more roles, too. 00:07:59.520 --> 00:08:02.790 One role that is important as well is the creator 00:08:02.790 --> 00:08:05.100 of the package, which we note with cre. 00:08:05.100 --> 00:08:07.200 It's the abbreviation for creator. 00:08:07.200 --> 00:08:11.550 Now, creator and author seem pretty similar in meaning, 00:08:11.550 --> 00:08:14.280 but in R, they have two distinct meanings. 00:08:14.280 --> 00:08:18.570 An author is anybody who at any time contributed to the package. 00:08:18.570 --> 00:08:22.650 A creator, though, is the person who now maintains the package. 00:08:22.650 --> 00:08:25.890 They're in charge of updating it, making sure it's up to date and so on, 00:08:25.890 --> 00:08:26.620 over time. 00:08:26.620 --> 00:08:28.537 So these two are distinct, and there has to be 00:08:28.537 --> 00:08:32.799 at least one creator of some package-- a person who's maintaining it over time. 00:08:32.799 --> 00:08:35.530 So these are two main roles, but there is one more as well. 00:08:35.530 --> 00:08:40.200 I should also say that I am the copyright holder for this code. 00:08:40.200 --> 00:08:42.900 I own it, and I'm the person who owns a license that I will then 00:08:42.900 --> 00:08:45.430 specify underneath myself here. 00:08:45.430 --> 00:08:46.800 So these three roles-- 00:08:46.800 --> 00:08:49.620 author, creator, and copyright holder-- are the ones 00:08:49.620 --> 00:08:52.830 you will need to create a package in R. 00:08:52.830 --> 00:08:55.650 But now let's think through the license field-- 00:08:55.650 --> 00:08:57.120 license down below. 00:08:57.120 --> 00:09:00.342 Well, if you're a lawyer, you could maybe write your own license. 00:09:00.342 --> 00:09:03.300 But in general, it's best practice to rely on some standardized license 00:09:03.300 --> 00:09:04.710 that already exists. 00:09:04.710 --> 00:09:08.130 And in fact, if you want to share your code for free online for others 00:09:08.130 --> 00:09:10.410 to use freely as well, there's a whole community 00:09:10.410 --> 00:09:14.970 that has created various licenses-- one called the Free and Open Source Software 00:09:14.970 --> 00:09:15.600 community. 00:09:15.600 --> 00:09:17.440 This community has created several licenses 00:09:17.440 --> 00:09:22.800 that you can just use and adopt or adapt to share your software online for free. 00:09:22.800 --> 00:09:26.970 Among the typical licenses are these-- the MIT license, by a bunch of friends 00:09:26.970 --> 00:09:30.360 down the road, and the GNU General Public License, similar in spirit 00:09:30.360 --> 00:09:32.800 to the MIT license as well. 00:09:32.800 --> 00:09:35.730 Now, the MIT license begins as follows. 00:09:35.730 --> 00:09:38.150 "Permission is hereby granted, free of charge, 00:09:38.150 --> 00:09:40.192 to any person obtaining a copy of this software," 00:09:40.192 --> 00:09:42.567 basically to deal with the software without restriction-- 00:09:42.567 --> 00:09:44.400 so saying you can share this code freely, 00:09:44.400 --> 00:09:46.720 and you can use it freely as well. 00:09:46.720 --> 00:09:49.655 So I might want to adopt this license for my software here, 00:09:49.655 --> 00:09:52.750 and I'll go ahead and say that in my description file as well. 00:09:52.750 --> 00:09:54.750 I'll come back over here to RStudio and say, 00:09:54.750 --> 00:09:57.630 I want to build on top of the MIT license. 00:09:57.630 --> 00:10:00.670 I want to license my software under this language here. 00:10:00.670 --> 00:10:05.530 So I could simply type in MIT as the template for my license now. 00:10:05.530 --> 00:10:09.450 But it turns out that in R, there's a bit more we need to specify here. 00:10:09.450 --> 00:10:12.510 I should also specify the year which I created this software 00:10:12.510 --> 00:10:16.530 and who the copyright holder is inside the license itself. 00:10:16.530 --> 00:10:18.280 This is particular to MIT license as well. 00:10:18.280 --> 00:10:23.113 So if I want to add on to this MIT license, I can take MIT as my template 00:10:23.113 --> 00:10:25.280 but then add on some other file that gives some more 00:10:25.280 --> 00:10:26.780 information about this license. 00:10:26.780 --> 00:10:31.160 I could say MIT + file and then LICENSE in all caps. 00:10:31.160 --> 00:10:32.150 And this is convention. 00:10:32.150 --> 00:10:34.490 If I want to add on to my license, I do so 00:10:34.490 --> 00:10:37.490 with a file called LICENSE in all caps. 00:10:37.490 --> 00:10:40.430 And now this says the entire license for my software 00:10:40.430 --> 00:10:44.090 is the MIT license as the base, plus some file 00:10:44.090 --> 00:10:46.610 that I'll include called license. 00:10:46.610 --> 00:10:48.200 Well, let's create it now. 00:10:48.200 --> 00:10:50.960 I'll get on my console here and say file.create. 00:10:50.960 --> 00:10:53.330 I'll create a file called license-- 00:10:53.330 --> 00:10:57.590 no extension here-- open it up, and now add in some placeholders 00:10:57.590 --> 00:10:58.610 that I should fill. 00:10:58.610 --> 00:11:00.770 For the MIT license in particular, I need 00:11:00.770 --> 00:11:04.200 to say, again, what year it was that I created this software. 00:11:04.200 --> 00:11:07.280 So I'll say YEAR in all caps and the year I created the software, 00:11:07.280 --> 00:11:10.490 and then the copyright holder as well. 00:11:10.490 --> 00:11:12.860 In this case I'll say just the ducksay authors, 00:11:12.860 --> 00:11:17.330 referencing again this author's field in my description file. 00:11:17.330 --> 00:11:20.610 So now we have the basic bare-bone structure 00:11:20.610 --> 00:11:22.770 for our package called ducksay. 00:11:22.770 --> 00:11:26.460 We have a description, a license, authorship, and so on. 00:11:26.460 --> 00:11:31.320 Let me ask, what questions do we have so far on packages in R 00:11:31.320 --> 00:11:33.030 and creating those of our very own? 00:11:35.790 --> 00:11:38.910 AUDIENCE: Which are the best practices to enumerate the version number, 00:11:38.910 --> 00:11:42.810 and when do we choose when to increment the number to the left of the point 00:11:42.810 --> 00:11:44.580 and the one to the right of the point? 00:11:44.580 --> 00:11:46.955 CARTER ZENKE: So a good question about version numbering. 00:11:46.955 --> 00:11:49.230 I want to change the version number that we're using. 00:11:49.230 --> 00:11:51.272 Let me show you what we're doing now and show you 00:11:51.272 --> 00:11:53.460 another process called semantic versioning, too. 00:11:53.460 --> 00:11:57.870 I'll come back over here, and here I'm using just 1.0 for simplicity, 00:11:57.870 --> 00:12:00.090 but actually it turns out some of the community use 00:12:00.090 --> 00:12:02.220 a convention known as semantic versioning. 00:12:02.220 --> 00:12:06.180 And that versioning system actually allows for three numbers here. 00:12:06.180 --> 00:12:09.720 Now by convention, each of these numbers has some certain meaning. 00:12:09.720 --> 00:12:13.590 In this case, this last number is known as the patch version. 00:12:13.590 --> 00:12:17.520 If you make some bugfix, you would increment this number here. 00:12:17.520 --> 00:12:20.730 This middle number is known as the minor version. 00:12:20.730 --> 00:12:23.250 If you add in some new feature, like a new function, 00:12:23.250 --> 00:12:26.160 for instance, you would increment that number too. 00:12:26.160 --> 00:12:29.880 But it turns out that this first number, like 1 here, 00:12:29.880 --> 00:12:32.040 is known as the major version. 00:12:32.040 --> 00:12:34.170 You would only increment this if you made 00:12:34.170 --> 00:12:39.030 some change that broke the conventions you prior used in your package 00:12:39.030 --> 00:12:40.780 and somebody who's relying on your package 00:12:40.780 --> 00:12:44.740 would have to update their code too to still use your package. 00:12:44.740 --> 00:12:47.770 So this is one set of conventions for versioning here. 00:12:47.770 --> 00:12:53.730 We'll go back, though, and use 1.0 just for simplicity. 00:12:53.730 --> 00:12:54.360 OK. 00:12:54.360 --> 00:12:58.170 So we have here now our basic bare-bones R package, 00:12:58.170 --> 00:13:01.050 but our goal is to also add some source code. 00:13:01.050 --> 00:13:04.260 And we can do it in the same way we've usually written source code before-- 00:13:04.260 --> 00:13:06.750 by first writing some tests, writing our code, 00:13:06.750 --> 00:13:09.300 and writing documentation for our code. 00:13:09.300 --> 00:13:12.960 Now, we saw before, if I want to test inside of my package-- 00:13:12.960 --> 00:13:14.860 I want to run some unit tests as well-- 00:13:14.860 --> 00:13:19.080 I should write them inside of this folder called tests. 00:13:19.080 --> 00:13:21.960 And we can certainly create this folder ourselves, 00:13:21.960 --> 00:13:25.260 but as we get into the weeds of structuring R package, 00:13:25.260 --> 00:13:27.150 we might want some help. 00:13:27.150 --> 00:13:31.860 So thankfully, there is a package that helps you write packages in R, 00:13:31.860 --> 00:13:33.690 one called devtools. 00:13:33.690 --> 00:13:37.380 This package helps us give us some tools to write our very own package, 00:13:37.380 --> 00:13:41.100 and among the functions it has for unit testing are these-- 00:13:41.100 --> 00:13:43.840 one called use_testthat. 00:13:43.840 --> 00:13:46.170 So we saw testthat last time. 00:13:46.170 --> 00:13:49.260 It's a package for unit testing R software. 00:13:49.260 --> 00:13:53.440 If I want to use the testthat package to test my code, all I have to do 00:13:53.440 --> 00:13:55.750 is use use_testthat. 00:13:55.750 --> 00:13:59.230 And then, once I've done that, if I want to create some testing 00:13:59.230 --> 00:14:02.770 file for a function, I could simply use use_test, 00:14:02.770 --> 00:14:06.670 and that will create for me a new testing file for my function. 00:14:06.670 --> 00:14:10.210 And then finally, once I have all those tests written, 00:14:10.210 --> 00:14:14.930 if I want to run those tests, all I have to do is run the test function as well. 00:14:14.930 --> 00:14:18.220 So very helpful for us for structuring our package and running our unit 00:14:18.220 --> 00:14:19.478 tests too. 00:14:19.478 --> 00:14:22.270 So let's begin by writing some unit tests for our ducksay function. 00:14:22.270 --> 00:14:26.190 I'll come back now to RStudio, and let's try to use use_testthat. 00:14:26.190 --> 00:14:28.690 Well, because this function is part of the devtools package, 00:14:28.690 --> 00:14:32.290 I'll first need to load, if not install, the devtools package. 00:14:32.290 --> 00:14:37.450 So here down below, I'll use library devtools to load the devtools package, 00:14:37.450 --> 00:14:39.280 assuming it is installed. 00:14:39.280 --> 00:14:44.560 I'll hit Enter here, and now, let's use use_testthat to configure our package 00:14:44.560 --> 00:14:47.500 to run tests with the package testthat. 00:14:47.500 --> 00:14:51.580 Well, I'll use down below here use_testthat, and I'll hit Enter, 00:14:51.580 --> 00:14:55.360 and we'll see a few things have happened actually that I'll see in my console 00:14:55.360 --> 00:14:56.260 down below. 00:14:56.260 --> 00:15:00.970 Now, the first thing I see is that my description file has been added to. 00:15:00.970 --> 00:15:05.440 I see now this new field called Suggests, and as part of Suggests, 00:15:05.440 --> 00:15:07.450 I now see testthat. 00:15:07.450 --> 00:15:10.690 This means when somebody installs R package, 00:15:10.690 --> 00:15:14.770 it will be suggested that they also install testthat at a version 00:15:14.770 --> 00:15:17.770 greater than or equal to 3.0. 00:15:17.770 --> 00:15:19.915 Well, why would we suggest testthat? 00:15:19.915 --> 00:15:22.540 Well, maybe these want to test R code themselves, in which case 00:15:22.540 --> 00:15:26.360 they'll need to use testthat, because we used it ourselves as well. 00:15:26.360 --> 00:15:29.770 If you want somebody, though, not just to suggest some code-- it would actually 00:15:29.770 --> 00:15:32.080 be suggested to use your code-- you could also 00:15:32.080 --> 00:15:34.880 make it required they install some other package as well. 00:15:34.880 --> 00:15:37.820 So I can make a field called Requires, like this, 00:15:37.820 --> 00:15:41.020 and list any packages I want to require the user 00:15:41.020 --> 00:15:43.630 to install to use my own package. 00:15:43.630 --> 00:15:46.610 Now, the user here likely won't be testing our software for us, 00:15:46.610 --> 00:15:50.075 so only suggest it, not require it, but if you do want to require some code-- 00:15:50.075 --> 00:15:53.200 some package-- you can actually use Requires as a field in your description 00:15:53.200 --> 00:15:55.270 file as well. 00:15:55.270 --> 00:15:58.860 I also see here config/testthat/edition. 00:15:58.860 --> 00:16:01.830 This just means that when our tests are run, 00:16:01.830 --> 00:16:05.800 we'll be sure to use the version 3 of testthat. 00:16:05.800 --> 00:16:07.900 But a few other things have happened as well. 00:16:07.900 --> 00:16:10.020 If I look down below my console here, I'll 00:16:10.020 --> 00:16:14.790 see it's created some new folders for me-- namely, one called tests over here. 00:16:14.790 --> 00:16:19.410 If I click on tests, I'll see well a new file, testthat.R, 00:16:19.410 --> 00:16:22.110 and a folder, also called testthat. 00:16:22.110 --> 00:16:26.730 If I open up testthat.R, this is a file that was automatically created for me, 00:16:26.730 --> 00:16:29.370 and it includes some configuration for testthat 00:16:29.370 --> 00:16:32.070 as I run my tests inside of this package. 00:16:32.070 --> 00:16:34.680 We'll leave this alone for now, but notice how I also 00:16:34.680 --> 00:16:36.930 have a folder called testthat. 00:16:36.930 --> 00:16:41.100 And it's inside this folder that I will actually write my unit tests themselves, 00:16:41.100 --> 00:16:44.040 similar to what we saw last time. 00:16:44.040 --> 00:16:47.910 Now that the structure been set up, I can actually write my tests now. 00:16:47.910 --> 00:16:52.200 And see how it's suggesting that I use use_test to create my very first unit 00:16:52.200 --> 00:16:54.330 test for my function. 00:16:54.330 --> 00:16:58.060 Well, here I want to use_test, and I want 00:16:58.060 --> 00:17:02.890 to create a test for, in this case, a function called ducksay. 00:17:02.890 --> 00:17:07.810 So I'll enter as input to use_test the function's name, ducksay, just 00:17:07.810 --> 00:17:08.619 like that. 00:17:08.619 --> 00:17:11.680 I'll hit Enter now, and I'll see a few things happen again. 00:17:11.680 --> 00:17:16.390 One, I now have this new file, called test-ducksay.R, 00:17:16.390 --> 00:17:21.700 which is inside my testthat folder, which itself is inside my tests folder. 00:17:21.700 --> 00:17:25.240 And now I can modify test-ducksay.R. 00:17:25.240 --> 00:17:28.990 It's given me here some basic structure for my test file, 00:17:28.990 --> 00:17:30.550 but I don't want to use this so far. 00:17:30.550 --> 00:17:31.640 I'll just remove it. 00:17:31.640 --> 00:17:34.870 And now, I want to think about how I could describe ducksay. 00:17:34.870 --> 00:17:38.620 What do I want it to do in this testing file? 00:17:38.620 --> 00:17:42.340 Well, we saw last time we could use some code a bit 00:17:42.340 --> 00:17:45.970 like this to describe how I want to how we want ducksay to run. 00:17:45.970 --> 00:17:49.900 I could say describe and then use ducksay here 00:17:49.900 --> 00:17:53.600 to say I'm going to describe how I want ducksay to run. 00:17:53.600 --> 00:17:54.850 Well, what do I want it to do? 00:17:54.850 --> 00:17:58.000 I think the first thing I want it to do is to work with cat. 00:17:58.000 --> 00:18:03.400 So I could say it can print to the console with cat. 00:18:03.400 --> 00:18:08.710 And now, I'll include some test to see if ducksay can print with cat. 00:18:08.710 --> 00:18:10.870 And what I mean by this is as follows. 00:18:10.870 --> 00:18:14.620 If I use cat here and gave as input ducksay, 00:18:14.620 --> 00:18:18.400 I should see the output of ducks in the console. 00:18:18.400 --> 00:18:21.670 Ducksay will simply return to me some character string, 00:18:21.670 --> 00:18:25.510 but cat will take care of outputting it to the console. 00:18:25.510 --> 00:18:28.300 Now, how could I make this a test? 00:18:28.300 --> 00:18:30.580 I have the code I want to have run, but I 00:18:30.580 --> 00:18:33.170 want to test that it's doing what I want it to do. 00:18:33.170 --> 00:18:36.640 Well, it turns out that similar to expect_equals, which you saw last time, 00:18:36.640 --> 00:18:41.620 there is a function called expect_output that can expect when I run this code, 00:18:41.620 --> 00:18:44.380 I get output in my console. 00:18:44.380 --> 00:18:48.550 So I'll use this function, expect_output, part of testthat, 00:18:48.550 --> 00:18:53.300 to say I expect that when I run this code, cat with ducksay, 00:18:53.300 --> 00:18:56.600 I'll see some output in my console-- 00:18:56.600 --> 00:18:58.550 anything at all. 00:18:58.550 --> 00:19:02.330 And that seems to be our first description now for ducksay. 00:19:02.330 --> 00:19:06.050 But I think we could still do a little better-- get more specific, if we will. 00:19:06.050 --> 00:19:08.870 So here we could say it prints to the console with cat, 00:19:08.870 --> 00:19:10.190 but what should it print? 00:19:10.190 --> 00:19:12.815 Well, it should print "hello, world" at least, so I'll go ahead 00:19:12.815 --> 00:19:16.460 and say it can say hello to the world. 00:19:16.460 --> 00:19:18.650 That's another feature now of ducksay. 00:19:18.650 --> 00:19:24.920 And I'll say that well, when I want ducksay to run, I expect to see "hello, 00:19:24.920 --> 00:19:26.960 world" in the output. 00:19:26.960 --> 00:19:30.710 Now, we saw last time one called expect_equal-- 00:19:30.710 --> 00:19:33.260 one called expect_equal with ducksay here. 00:19:33.260 --> 00:19:36.980 And I could say I expect that ducksay will return to me a string that 00:19:36.980 --> 00:19:38.900 is equal to "hello, world." 00:19:38.900 --> 00:19:42.860 But I argue this might not work as I intend it to. 00:19:42.860 --> 00:19:44.720 Because if we look at our output here-- 00:19:44.720 --> 00:19:47.150 here's our intended output of ducksay-- 00:19:47.150 --> 00:19:51.140 why might it not work, if we were to say I expect this output 00:19:51.140 --> 00:19:54.130 to be equal to "hello, world"? 00:19:54.130 --> 00:19:56.980 Well, it seems like this is not strictly equal to "hello, world." 00:19:56.980 --> 00:19:59.770 I have "hello, world" and then some duck at the end. 00:19:59.770 --> 00:20:02.020 So what I would rather do is ask a different question. 00:20:02.020 --> 00:20:06.190 Is hello world somewhere in this output we've gotten back. 00:20:06.190 --> 00:20:08.710 Not is it equal to "hello, world," but is "hello, world" 00:20:08.710 --> 00:20:10.690 somewhere inside of it? 00:20:10.690 --> 00:20:14.620 Now thankfully, there is another function besides expect_equal-- one 00:20:14.620 --> 00:20:16.870 called expect_match. 00:20:16.870 --> 00:20:22.210 I can expect to find a match of "hello, world" inside this output of ducksay. 00:20:22.210 --> 00:20:23.450 I think I can try it out. 00:20:23.450 --> 00:20:27.680 I'll come back over here, and I can use expect_match like this. 00:20:27.680 --> 00:20:31.870 I'll say expect_match now between the return value of ducksay 00:20:31.870 --> 00:20:33.670 and this character string "hello, world," 00:20:33.670 --> 00:20:36.550 and that will treat this as a pattern-- 00:20:36.550 --> 00:20:38.500 hello comma space world. 00:20:38.500 --> 00:20:42.160 And if it finds that pattern inside the return value of ducksay, 00:20:42.160 --> 00:20:43.570 well, this will be true-- 00:20:43.570 --> 00:20:44.770 no errors at all. 00:20:44.770 --> 00:20:48.430 If I can't find that pattern, though, in ducksay, it will raise an error, 00:20:48.430 --> 00:20:50.440 and our tests will fail. 00:20:50.440 --> 00:20:52.420 So again, expect_match is good for trying 00:20:52.420 --> 00:20:58.830 to find this pattern, "hello, world," inside the output of ducksay right here. 00:20:58.830 --> 00:21:02.240 So I think these tests are in pretty good shape. 00:21:02.240 --> 00:21:04.850 I now know exactly what I want ducksay to do. 00:21:04.850 --> 00:21:07.430 It should work with cat, and it should print out 00:21:07.430 --> 00:21:10.820 some output that says hello to the world. 00:21:10.820 --> 00:21:14.390 But now that we have our tests, we need to write our actual code. 00:21:14.390 --> 00:21:16.490 We need to write the function ducksay itself. 00:21:16.490 --> 00:21:20.750 And for that, we saw we could use this folder called R. 00:21:20.750 --> 00:21:22.850 In general, in working with packages, we're 00:21:22.850 --> 00:21:28.070 going to write all of our dot R files inside of a folder called R. 00:21:28.070 --> 00:21:30.740 But again, rather than structuring this ourselves, 00:21:30.740 --> 00:21:33.620 we could rely on devtools to do it for us. 00:21:33.620 --> 00:21:37.700 I could use a function in devtools called use_r 00:21:37.700 --> 00:21:40.520 and pass in the function name I hope to create. 00:21:40.520 --> 00:21:45.020 And then I'll get a R file to write my function definition in. 00:21:45.020 --> 00:21:48.110 So let's try this now, now that we've written our unit tests. 00:21:48.110 --> 00:21:50.930 I'll come back now to RStudio, and let's see 00:21:50.930 --> 00:21:56.030 if I can use use_r to create for me the function ducksay and the file I 00:21:56.030 --> 00:21:57.470 should define it in. 00:21:57.470 --> 00:22:02.570 I'll go to my console now and use use_r, and then go back up in my File Explorer 00:22:02.570 --> 00:22:05.120 to ducksay as the folder here. 00:22:05.120 --> 00:22:09.410 And I'll try to create now this file to define ducksay in. 00:22:09.410 --> 00:22:12.230 I'll say I want to create this new function, ducksay, 00:22:12.230 --> 00:22:13.800 and the R file for it. 00:22:13.800 --> 00:22:17.570 I'll hit Enter now and see a few things happening. 00:22:17.570 --> 00:22:22.430 One, I see I have a folder called R-- brand new thanks to use_r. 00:22:22.430 --> 00:22:25.850 And I also see I have a new file, ducksay.R, 00:22:25.850 --> 00:22:28.490 which has been created inside of this R folder 00:22:28.490 --> 00:22:31.530 to keep things organized in this case. 00:22:31.530 --> 00:22:35.060 I can close my description here and my-- let me save it first-- 00:22:35.060 --> 00:22:38.390 and then I'll go ahead and remove test ducksay here and focus now 00:22:38.390 --> 00:22:43.670 on ducksay.R. Well, how should I write ducksay? 00:22:43.670 --> 00:22:45.890 If I look at my output here-- 00:22:45.890 --> 00:22:48.380 here I have my intended output-- 00:22:48.380 --> 00:22:51.230 I notice that I really have three lines of output 00:22:51.230 --> 00:22:53.630 I hope to return from this function. 00:22:53.630 --> 00:22:57.950 I have hello comma space world, the top half of my duck, 00:22:57.950 --> 00:23:00.530 and the bottom half of my duck. 00:23:00.530 --> 00:23:02.610 These are all character strings. 00:23:02.610 --> 00:23:06.170 So I'm actually ask our group here, what function 00:23:06.170 --> 00:23:11.780 we've seen so far do you think would help us combine these strings? 00:23:11.780 --> 00:23:16.220 If I want to have three different lines here-- hello, world, top of my duck, 00:23:16.220 --> 00:23:21.230 bottom of my duck, what function could I use to combine these strings 00:23:21.230 --> 00:23:25.220 and perhaps return them from my ducksay function? 00:23:25.220 --> 00:23:27.517 AUDIENCE: Maybe the Paste function? 00:23:27.517 --> 00:23:28.850 CARTER ZENKE: Yeah, maybe Paste. 00:23:28.850 --> 00:23:32.780 So we saw before that Paste is good for combining different strings, 00:23:32.780 --> 00:23:34.760 and we can actually use Paste here. 00:23:34.760 --> 00:23:36.890 But instead of separating now with spaces, 00:23:36.890 --> 00:23:40.843 we could separate with new lines-- our backslash and escape character. 00:23:40.843 --> 00:23:41.760 So let's try that out. 00:23:41.760 --> 00:23:45.500 I'll come back now to my file, and let's define for ourselves 00:23:45.500 --> 00:23:47.990 the ducksay function using Paste. 00:23:47.990 --> 00:23:52.680 I'll say here I want to make a new function called ducksay that currently 00:23:52.680 --> 00:23:54.600 doesn't take any input at all. 00:23:54.600 --> 00:23:57.630 But inside of this function, I will return 00:23:57.630 --> 00:24:01.800 the result of calling Paste on three different strings. 00:24:01.800 --> 00:24:04.380 The first one will be my first string here, 00:24:04.380 --> 00:24:08.220 one called "hello, world" at the very top of my output. 00:24:08.220 --> 00:24:11.010 And then the very beginning of my duck here-- 00:24:11.010 --> 00:24:15.240 I could give it a little beak, some eyes, and now the top of its body here. 00:24:15.240 --> 00:24:18.540 And then underneath, I could use the bottom of the duck, which 00:24:18.540 --> 00:24:20.880 will look a bit like this-- 00:24:20.880 --> 00:24:23.460 some underscores and then a forward slash. 00:24:23.460 --> 00:24:29.130 And now, I think we have what looks to be our intended output, thanks to Paste. 00:24:29.130 --> 00:24:34.680 But as we said before, Paste default is to combine these strings using a space. 00:24:34.680 --> 00:24:36.520 And we want a new line instead. 00:24:36.520 --> 00:24:39.690 So I should change now the sep parameter to Paste. 00:24:39.690 --> 00:24:43.350 We've seen before from a space to a backslash n-- 00:24:43.350 --> 00:24:48.330 this new line character that says I want to separate each of these character 00:24:48.330 --> 00:24:50.250 strings by a new line. 00:24:50.250 --> 00:24:53.810 I could be hitting Enter each time on my keyboard. 00:24:53.810 --> 00:24:56.030 So I'll save now this function. 00:24:56.030 --> 00:25:00.440 And when ducksay runs, it should now return to me this output. 00:25:00.440 --> 00:25:05.130 But I've defined ducksay here, and I want to use it. 00:25:05.130 --> 00:25:08.120 Turns out I can't do that just yet. 00:25:08.120 --> 00:25:11.430 We saw earlier this idea of a NAMESPACE file, 00:25:11.430 --> 00:25:16.310 which tells us which functions in our package an end user can use. 00:25:16.310 --> 00:25:18.890 And so now we've defined our ducksay function, 00:25:18.890 --> 00:25:22.430 we should actually include it in our package's NAMESPACE-- 00:25:22.430 --> 00:25:26.690 the list of functions that an end user could use in R package. 00:25:26.690 --> 00:25:30.290 So let me now create this file called NAMESPACE. 00:25:30.290 --> 00:25:34.070 I'll say file.create, NAMESPACE down below, 00:25:34.070 --> 00:25:40.520 and I can then see in my folder called ducksay a new file called NAMESPACE. 00:25:40.520 --> 00:25:44.540 I'll open this one up, and what should I include in NAMESPACE? 00:25:44.540 --> 00:25:48.650 Well, by convention, we have a function here called export. 00:25:48.650 --> 00:25:49.340 Export. 00:25:49.340 --> 00:25:51.860 Export says take a function that I've defined 00:25:51.860 --> 00:25:56.430 and make it available to the end user who installs this package. 00:25:56.430 --> 00:26:01.450 In this case, I'll export ducksay function just like this. 00:26:01.450 --> 00:26:05.130 So to be clear, I've now defined my ducksay function 00:26:05.130 --> 00:26:09.930 inside a file called ducksay.R, which itself was inside an R folder 00:26:09.930 --> 00:26:11.460 to keep things organized. 00:26:11.460 --> 00:26:14.130 And then once I've defined it, I want to make 00:26:14.130 --> 00:26:17.700 it available to a user, which I'll do through the NAMESPACE file 00:26:17.700 --> 00:26:22.410 and say I want the ducksay function in particular to be available to our end 00:26:22.410 --> 00:26:24.180 users here. 00:26:24.180 --> 00:26:28.470 Now once I do that, I can make use of another devtools function, 00:26:28.470 --> 00:26:30.810 one called load.all-- 00:26:30.810 --> 00:26:35.730 load.all-- that says whatever functions I exported from my package, 00:26:35.730 --> 00:26:38.370 like ducksay here, I want you to load them 00:26:38.370 --> 00:26:41.250 so I can use them right here in my console. 00:26:41.250 --> 00:26:46.110 I'll go ahead and load all, and I'll see this is loading the ducksay package now. 00:26:46.110 --> 00:26:49.740 What I can do now is use ducksay in my console. 00:26:49.740 --> 00:26:54.237 I could say I want to cat the result of calling ducksay, just like this. 00:26:54.237 --> 00:26:55.320 And let's see what we get. 00:26:55.320 --> 00:26:56.670 Fingers crossed. 00:26:56.670 --> 00:26:59.820 We get a cute duck saying hello to the world. 00:26:59.820 --> 00:27:02.250 And now we could test our code more thoroughly too. 00:27:02.250 --> 00:27:07.410 I could run test as well in my console, and now I'll run those tests I created. 00:27:07.410 --> 00:27:10.800 Let me open up inside my tests and test that folder here. 00:27:10.800 --> 00:27:12.270 Here were those tests. 00:27:12.270 --> 00:27:16.140 If I now run test the function, thanks to devtools, 00:27:16.140 --> 00:27:21.630 I will be able to run all the tests I defined in this file. 00:27:21.630 --> 00:27:23.340 Now just one more thing here too. 00:27:23.340 --> 00:27:26.340 Last time, we used source at the top of our file 00:27:26.340 --> 00:27:31.890 to give this file access to a function like ducksay in ducksay.R. 00:27:31.890 --> 00:27:37.620 But now that we've used load all and exported this function from our package, 00:27:37.620 --> 00:27:40.320 we can simply load all and then run the tests, 00:27:40.320 --> 00:27:43.800 and they will have access to that function called ducksay. 00:27:43.800 --> 00:27:47.070 No more using source so long as we're inside a package 00:27:47.070 --> 00:27:50.790 that we've exported our functions from. 00:27:50.790 --> 00:27:54.190 OK, so we've seen now how to define unit tests for our package, 00:27:54.190 --> 00:27:57.010 how to write code that adheres to those tests. 00:27:57.010 --> 00:28:02.320 Let me ask, what questions do we have on what we've seen so far, either 00:28:02.320 --> 00:28:05.200 on testing our code, writing our functions, 00:28:05.200 --> 00:28:08.230 or defining our package more generally? 00:28:08.230 --> 00:28:10.930 AUDIENCE: When you run the test program on the terminal, what's 00:28:10.930 --> 00:28:13.327 the meaning of the colored letters FWS? 00:28:13.327 --> 00:28:14.660 CARTER ZENKE: Ah, good question. 00:28:14.660 --> 00:28:18.280 So if I look over here in the console, I'll see some pretty output, 00:28:18.280 --> 00:28:19.925 let's say, from testthat. 00:28:19.925 --> 00:28:21.800 And let me walk through it step by step here. 00:28:21.800 --> 00:28:23.890 So here I see testing ducksay. 00:28:23.890 --> 00:28:25.990 That is the function we are testing, right? 00:28:25.990 --> 00:28:30.130 I'll also see FWS and OK. 00:28:30.130 --> 00:28:34.210 Now, these are different kinds of results we can get from our tests here. 00:28:34.210 --> 00:28:37.720 It seems like F corresponds to fail down below. 00:28:37.720 --> 00:28:40.180 Fail-- we didn't pass this particular test. 00:28:40.180 --> 00:28:42.940 W stands for a warning. 00:28:42.940 --> 00:28:46.240 We saw last time how our tests sometimes raise warnings. 00:28:46.240 --> 00:28:50.620 Well, this would be the number of tests that gave me a warning, in this case. 00:28:50.620 --> 00:28:51.790 S stands for skip. 00:28:51.790 --> 00:28:53.900 It turns out you can skip tests if you want to. 00:28:53.900 --> 00:28:57.830 And then OK means this test passed with flying colors. 00:28:57.830 --> 00:29:00.770 So here, I see these two tests-- they both passed, 00:29:00.770 --> 00:29:05.750 and I'll see a 2 in the OK and a 2 total that are passing down below. 00:29:05.750 --> 00:29:09.470 If I had more than one function to test I might see more than one of these 00:29:09.470 --> 00:29:11.720 and see the total number of ones that were passed, 00:29:11.720 --> 00:29:16.130 skipped, warned about, or failed overall down at the bottom of this results 00:29:16.130 --> 00:29:17.292 here as well. 00:29:17.292 --> 00:29:19.250 So I hope that helps clarify what exactly we're 00:29:19.250 --> 00:29:21.980 seeing as a result of using test. 00:29:21.980 --> 00:29:25.050 But great question there. 00:29:25.050 --> 00:29:28.850 OK, so I think we're in a pretty good place, 00:29:28.850 --> 00:29:31.460 but there's arguably one more thing to consider now. 00:29:31.460 --> 00:29:36.170 So I've seen that my code can both print to the console with cat, 00:29:36.170 --> 00:29:38.150 and it includes "hello, world" in the output, 00:29:38.150 --> 00:29:42.800 but an important thing here too is, Does it include a duck? 00:29:42.800 --> 00:29:44.360 So let's see that as well. 00:29:44.360 --> 00:29:47.780 I'll come back now to RStudio and update these tests now. 00:29:47.780 --> 00:29:49.400 Let me add a new one-- 00:29:49.400 --> 00:29:52.610 a new test that says ducksay-- 00:29:52.610 --> 00:29:58.290 it can even say hello with a duck, just like this. 00:29:58.290 --> 00:30:02.240 And I think I could probably use a very similar structure to what I used before 00:30:02.240 --> 00:30:04.157 with expect_match. 00:30:04.157 --> 00:30:05.990 In this case, though, I could expect a match 00:30:05.990 --> 00:30:11.100 between the output of ducksay and the duck that I have to show to the user. 00:30:11.100 --> 00:30:14.090 So I could use expect_match, like I did before, 00:30:14.090 --> 00:30:17.280 and then enter ducksay, just like this. 00:30:17.280 --> 00:30:20.810 And now I want to expect a match between my duck pattern 00:30:20.810 --> 00:30:24.650 and whatever I see in the return value of ducksay. 00:30:24.650 --> 00:30:27.135 But I'll probably need a new object for this duck, 00:30:27.135 --> 00:30:29.510 and I want to type in the whole duck as an argument here. 00:30:29.510 --> 00:30:31.520 I could go up above here and define myself 00:30:31.520 --> 00:30:33.740 a new duck, similar to how we did it before. 00:30:33.740 --> 00:30:37.490 I'll paste together the top half of this duck with a cute little beak, 00:30:37.490 --> 00:30:41.240 and a top here, and then the bottom half of my duck-- 00:30:41.240 --> 00:30:44.390 1, 2, 3, 4 underscores-- and then I forward slash, 00:30:44.390 --> 00:30:48.650 and that is my duck, so long as I separate each with a backslash n, 00:30:48.650 --> 00:30:50.030 just like that. 00:30:50.030 --> 00:30:54.410 And now, I think what I could do is expect a match between the return 00:30:54.410 --> 00:30:56.900 value of ducksay and this duck-- 00:30:56.900 --> 00:30:59.720 this duck I've created over here. 00:30:59.720 --> 00:31:02.720 Now, this, you think, might work. 00:31:02.720 --> 00:31:04.760 But I'd argue there's one more thing to consider 00:31:04.760 --> 00:31:08.630 here, which is I told you earlier, expect_match 00:31:08.630 --> 00:31:11.990 will take the pattern we've defined here and look 00:31:11.990 --> 00:31:14.450 for it in the return value of ducksay. 00:31:14.450 --> 00:31:17.510 In this case, this is our pattern. 00:31:17.510 --> 00:31:21.585 Well, these patterns are more formally called regular expressions. 00:31:21.585 --> 00:31:24.710 And we'll get into them today, but in general, one thing to know about them 00:31:24.710 --> 00:31:28.160 is that these characters, parentheses and a dot, 00:31:28.160 --> 00:31:31.370 have a special meaning inside of regular expressions. 00:31:31.370 --> 00:31:34.400 They don't actually mean literally a parenthesis or a dot. 00:31:34.400 --> 00:31:36.530 They mean something else entirely. 00:31:36.530 --> 00:31:39.740 So if I want to treat this pattern not as this thing called 00:31:39.740 --> 00:31:42.770 a regular expression but exactly as I see it here, 00:31:42.770 --> 00:31:45.830 I can set the other parameter equal to true instead-- 00:31:45.830 --> 00:31:47.450 one called fixed. 00:31:47.450 --> 00:31:51.170 Fixed says, I want you to treat these characters here 00:31:51.170 --> 00:31:53.900 not as part of some regular expression, but instead, 00:31:53.900 --> 00:31:57.410 exactly as we see them here-- a greater-than sign, a parentheses, 00:31:57.410 --> 00:31:59.300 and a dot or a period. 00:31:59.300 --> 00:32:01.850 So more on those another time, but for now, 00:32:01.850 --> 00:32:04.730 let's just say I want to look for exactly this pattern 00:32:04.730 --> 00:32:06.380 inside the output of ducksay. 00:32:06.380 --> 00:32:10.460 I'll leave this as is now, and I'll go down below and run my tests with test 00:32:10.460 --> 00:32:11.100 again. 00:32:11.100 --> 00:32:14.420 And now I'll see all three tests are passing. 00:32:14.420 --> 00:32:15.410 None are failing. 00:32:15.410 --> 00:32:16.610 None are giving us warnings. 00:32:16.610 --> 00:32:17.570 None have been skipped. 00:32:17.570 --> 00:32:20.810 All, in this case, have passed. 00:32:20.810 --> 00:32:23.570 So we've fixed our tests. 00:32:23.570 --> 00:32:25.550 We've written our code. 00:32:25.550 --> 00:32:30.200 One next step is to document how to use R function. 00:32:30.200 --> 00:32:31.880 Maybe a user is new to R package. 00:32:31.880 --> 00:32:33.005 They don't know what to do. 00:32:33.005 --> 00:32:35.570 We want to give them some guidance on how to use R functions. 00:32:35.570 --> 00:32:38.600 In fact, you've probably seen to access documentation, 00:32:38.600 --> 00:32:42.560 you can use question mark followed by the name of some function. 00:32:42.560 --> 00:32:47.060 And right now, if I use question mark ducksay, well, I don't see anything. 00:32:47.060 --> 00:32:49.530 There's no documentation for ducksay. 00:32:49.530 --> 00:32:51.830 Well, let's go ahead and fix that. 00:32:51.830 --> 00:32:55.640 Thankfully, I can define my own documentation for ducksay 00:32:55.640 --> 00:32:59.330 by putting it inside of this folder we saw earlier-- 00:32:59.330 --> 00:33:03.500 one called man, where man stands for manual. 00:33:03.500 --> 00:33:06.470 But what will go inside this man folder? 00:33:06.470 --> 00:33:11.570 It turns out a variety of files all ending with dot Rd, where 00:33:11.570 --> 00:33:14.600 dot Rd stands for R documentation. 00:33:14.600 --> 00:33:19.100 In fact, inside these files, we'll write not just plain text, 00:33:19.100 --> 00:33:22.100 but we'll actually write something called a markup language. 00:33:22.100 --> 00:33:24.410 Now, a markup language is not a programming language. 00:33:24.410 --> 00:33:27.470 There are no functions and loops and so on. 00:33:27.470 --> 00:33:30.920 Instead, a language for formatting some text. 00:33:30.920 --> 00:33:34.950 Now for instance, R's markup language looks a bit like this. 00:33:34.950 --> 00:33:37.820 I can give each of my documentation files 00:33:37.820 --> 00:33:42.110 some particular parameters, like title, description, and usage here. 00:33:42.110 --> 00:33:45.710 Here, title says, what is the title of my documentation? 00:33:45.710 --> 00:33:48.230 Description says, describe this function for me. 00:33:48.230 --> 00:33:51.650 And usage says, how should I use this function too? 00:33:51.650 --> 00:33:55.380 There are other commands we'll see here, but our dot Rd files 00:33:55.380 --> 00:33:59.250 will look a lot like this and will then render them or convert them 00:33:59.250 --> 00:34:02.520 to those same files you're used to seeing when you use the question 00:34:02.520 --> 00:34:04.560 mark down in your console. 00:34:04.560 --> 00:34:05.970 So let's try this out now. 00:34:05.970 --> 00:34:10.650 I'll come back to RStudio and try to make some documentation for ducksay. 00:34:10.650 --> 00:34:13.770 Well, I want to probably first create for myself 00:34:13.770 --> 00:34:17.610 that man folder to put my documentation inside of. 00:34:17.610 --> 00:34:23.610 So I could use that same function we saw earlier, dir.create. 00:34:23.610 --> 00:34:28.170 And I'll create for myself the folder called man, short for manual. 00:34:28.170 --> 00:34:33.330 And it's inside of this folder that I will store all of my dot Rd files. 00:34:33.330 --> 00:34:38.340 I'll say file.create now, man/ducksay.Rd. 00:34:38.340 --> 00:34:39.690 And this is convention. 00:34:39.690 --> 00:34:43.710 I'm putting this file, ducksay.Rd, inside the man folder, 00:34:43.710 --> 00:34:48.300 and I'm calling it-- giving it the same name as the function it should document. 00:34:48.300 --> 00:34:53.840 So this file, ducksay.Rd, should document the same function, ducksay. 00:34:53.840 --> 00:34:57.830 I'll go ahead now and create this file, and if I open now my man folder, 00:34:57.830 --> 00:35:01.370 I should see ducksay.Rd right there inside. 00:35:01.370 --> 00:35:03.470 I'll open it up, and what do I see? 00:35:03.470 --> 00:35:06.080 Well, nothing yet, but I'd argue we could 00:35:06.080 --> 00:35:10.370 go ahead and use R's markup language to create some documentation now 00:35:10.370 --> 00:35:12.290 for ducksay. 00:35:12.290 --> 00:35:16.460 Now, I've read the documentation for creating documentation in R, 00:35:16.460 --> 00:35:18.560 and there are several different keywords you 00:35:18.560 --> 00:35:20.750 can use to create your documentation. 00:35:20.750 --> 00:35:23.510 Among the most important ones are these here. 00:35:23.510 --> 00:35:25.700 One is slash name. 00:35:25.700 --> 00:35:30.470 And inside these curly braces here will include the name of our function 00:35:30.470 --> 00:35:33.183 we're trying to document-- in this case, ducksay in particular. 00:35:33.183 --> 00:35:35.600 This is the name of the function we're trying to document. 00:35:35.600 --> 00:35:39.800 The next one, most important one, is going to be slash alias. 00:35:39.800 --> 00:35:43.010 Slash alias is what you want the user to type 00:35:43.010 --> 00:35:46.100 in in their console to see your documentation. 00:35:46.100 --> 00:35:50.180 For instance, if I go down to my console now and I use question mark ducksay, 00:35:50.180 --> 00:35:52.340 well, my alias is ducksay-- 00:35:52.340 --> 00:35:54.090 literally this right here. 00:35:54.090 --> 00:35:57.450 If any user were to go to their console and use question mark ducksay, 00:35:57.450 --> 00:36:00.870 they could see this documentation that I have now created for them, 00:36:00.870 --> 00:36:02.640 as long as I've installed my package. 00:36:02.640 --> 00:36:07.150 So also, my alias is similarly ducksay as well. 00:36:07.150 --> 00:36:09.510 And now, here comes our title. 00:36:09.510 --> 00:36:11.160 What is the title of this function? 00:36:11.160 --> 00:36:14.190 Kind of a more English characterization, like capitals and spaces 00:36:14.190 --> 00:36:18.310 and so on-- we'll call this function Duck Say, just like this, 00:36:18.310 --> 00:36:20.460 and provide a description. 00:36:20.460 --> 00:36:24.780 I'll say that this is a duck that says hello. 00:36:24.780 --> 00:36:28.860 And just like that, with these four lines of markup language, 00:36:28.860 --> 00:36:31.980 we can actually already see it being rendered or converted 00:36:31.980 --> 00:36:34.260 into our documentation file. 00:36:34.260 --> 00:36:38.250 If I go on my console now and run question mark ducksay, 00:36:38.250 --> 00:36:42.570 I'll see my very first R documentation file. 00:36:42.570 --> 00:36:46.590 Notice how here, the name of this function is actually 00:36:46.590 --> 00:36:49.500 included in my documentation, right up here. 00:36:49.500 --> 00:36:51.810 The alias is also included in what I use down here. 00:36:51.810 --> 00:36:54.600 I said question mark ducksay and got this documentation file. 00:36:54.600 --> 00:36:57.120 The title is there too-- slash title ducksay. 00:36:57.120 --> 00:36:58.570 We see that right here. 00:36:58.570 --> 00:37:02.670 And so too is the description, a duck that says hello. 00:37:02.670 --> 00:37:05.340 So we could keep adding to this documentation using 00:37:05.340 --> 00:37:07.260 this same syntax here. 00:37:07.260 --> 00:37:10.560 There are other kinds of components we can add to our documentation file 00:37:10.560 --> 00:37:11.288 as well. 00:37:11.288 --> 00:37:13.080 In fact, let's go ahead and add a few more. 00:37:13.080 --> 00:37:17.790 Let's add one called usage, which tells people how to use R function. 00:37:17.790 --> 00:37:22.950 I'll use slash usage here, and this will say how I want users to use it, in fact. 00:37:22.950 --> 00:37:25.080 So I'll say ducksay here. 00:37:25.080 --> 00:37:30.570 And by convention, in usage, we include the function's name, some parentheses, 00:37:30.570 --> 00:37:32.910 and if there are any parameters, we include those 00:37:32.910 --> 00:37:35.020 in the function's parentheses as well. 00:37:35.020 --> 00:37:37.320 But currently, there are no parameters. 00:37:37.320 --> 00:37:40.170 I'll also include a section called value, which 00:37:40.170 --> 00:37:42.150 is the return value of this function. 00:37:42.150 --> 00:37:43.710 What does it return to us? 00:37:43.710 --> 00:37:48.270 Well, it returns to us really a string representation of a duck 00:37:48.270 --> 00:37:51.030 saying hello to the world. 00:37:51.030 --> 00:37:54.450 And then finally, we can also include some examples 00:37:54.450 --> 00:37:57.630 of how to use this function, in case people are unfamiliar. 00:37:57.630 --> 00:38:00.810 I could say examples here and provide some examples 00:38:00.810 --> 00:38:03.037 of how people could use ducksay. 00:38:03.037 --> 00:38:04.620 Maybe I want them to use it with cats. 00:38:04.620 --> 00:38:07.560 So I'll show them, look, you can use ducksay like this. 00:38:07.560 --> 00:38:11.340 Take cat and pass as input ducksay, just like that. 00:38:11.340 --> 00:38:14.640 And now with these other pieces of syntax here, 00:38:14.640 --> 00:38:19.440 I can say question mark ducksay and see the new-- 00:38:19.440 --> 00:38:22.440 oops, let me save this file first-- and then run question mark ducksay, 00:38:22.440 --> 00:38:25.830 and I should see the now rendered version of what I'm 00:38:25.830 --> 00:38:28.620 seeing on the left-hand side over here. 00:38:28.620 --> 00:38:30.870 Notice how we have some new pieces. 00:38:30.870 --> 00:38:32.370 I see usage now. 00:38:32.370 --> 00:38:33.420 I see value. 00:38:33.420 --> 00:38:38.760 And I see some examples as well down below in my documentation file. 00:38:38.760 --> 00:38:44.220 So we've seen now how to document R functions using these markup 00:38:44.220 --> 00:38:45.390 language here. 00:38:45.390 --> 00:38:49.350 What questions do we have and how to document R code 00:38:49.350 --> 00:38:51.570 and how to render it in R console? 00:38:54.430 --> 00:38:56.510 OK, seeing none, let's keep going then. 00:38:56.510 --> 00:38:58.360 And I think we're now in a pretty good spot. 00:38:58.360 --> 00:39:02.020 So we now have the ability to write our own functions, to test them, 00:39:02.020 --> 00:39:04.870 and to write documentation for those functions. 00:39:04.870 --> 00:39:07.240 So what should we do now? 00:39:07.240 --> 00:39:11.290 Well, ideally, we want to package up and share it with the world. 00:39:11.290 --> 00:39:14.980 And in fact, this process of taking what we have as source code 00:39:14.980 --> 00:39:19.030 and converting it into a single file has a particular name. 00:39:19.030 --> 00:39:22.450 This name is called building our package. 00:39:22.450 --> 00:39:26.140 Building our package-- taking it from source code into a single file 00:39:26.140 --> 00:39:28.510 we could share around the world. 00:39:28.510 --> 00:39:31.030 Now, there are a few options for building our source 00:39:31.030 --> 00:39:33.220 code into that single file. 00:39:33.220 --> 00:39:34.450 Among them are these-- 00:39:34.450 --> 00:39:38.320 build, which is a devtools function that takes our source code 00:39:38.320 --> 00:39:41.020 and gives us some single file at the end. 00:39:41.020 --> 00:39:43.960 But build, it turns out, is actually a wrapper 00:39:43.960 --> 00:39:48.490 on top of a base R command called R CMD build. 00:39:48.490 --> 00:39:49.720 They have the same purpose. 00:39:49.720 --> 00:39:53.420 R CMD build, though, works in your actual computer's terminal, 00:39:53.420 --> 00:39:54.680 not in the console. 00:39:54.680 --> 00:39:58.370 So we'll instead use build to keep ourselves inside the R console 00:39:58.370 --> 00:40:01.280 and build our package into a single file. 00:40:01.280 --> 00:40:05.930 So we'll still rely on devtools now and use their build function in particular. 00:40:05.930 --> 00:40:09.510 Let me come back now to RStudio and show you how this exactly works. 00:40:09.510 --> 00:40:14.210 So notice here how I'm inside of my ducksay folder-- 00:40:14.210 --> 00:40:16.220 my ducksay package now, if you will. 00:40:16.220 --> 00:40:19.500 Let me go ahead and close my previous files here. 00:40:19.500 --> 00:40:24.230 Let me run this command called build, thanks to devtools. 00:40:24.230 --> 00:40:27.800 If I run build, I'll see some output here. 00:40:27.800 --> 00:40:30.860 And I'll see down below the file I have gotten 00:40:30.860 --> 00:40:35.330 from building this source code into a single file I can share with others. 00:40:35.330 --> 00:40:40.160 I see it's called ducksay_1.0.tar.gz. 00:40:40.160 --> 00:40:43.130 And if I move up one level in my folder structure, 00:40:43.130 --> 00:40:47.060 I'll see this file actually right next to the ducksay folder-- 00:40:47.060 --> 00:40:50.300 ducksay_1.0.tar.gz. 00:40:50.300 --> 00:40:53.450 And this is a funky kind of file name, but this stands for, essentially, 00:40:53.450 --> 00:40:54.530 a zip file, if you will. 00:40:54.530 --> 00:40:56.240 It's very similar in spirit. 00:40:56.240 --> 00:40:58.040 It's also called a tarball sometimes. 00:40:58.040 --> 00:41:01.160 So this is basically a single file which we can share our code, 00:41:01.160 --> 00:41:03.830 email it to somebody, post it online, et cetera. 00:41:03.830 --> 00:41:09.410 This is all that source code in our folder now in one single file. 00:41:09.410 --> 00:41:11.840 So we've done pretty well so far. 00:41:11.840 --> 00:41:14.630 But before I share this, I think I've forgotten 00:41:14.630 --> 00:41:17.390 kind of one important thing, which is the duck actually only 00:41:17.390 --> 00:41:19.100 says "hello, world" right now. 00:41:19.100 --> 00:41:22.050 It doesn't take as input any given kind of string. 00:41:22.050 --> 00:41:25.160 So I probably want to update our code and rebuild this package 00:41:25.160 --> 00:41:26.390 again and again. 00:41:26.390 --> 00:41:29.715 And in fact, you'll find a package building process often iterative. 00:41:29.715 --> 00:41:32.840 You build it, you add something new, you build it again, add something new, 00:41:32.840 --> 00:41:35.250 build it again, and so on and so forth. 00:41:35.250 --> 00:41:39.440 So let's go ahead and update our package and rebuild our code. 00:41:39.440 --> 00:41:42.920 Let's go back over here and consider how we could make this duck 00:41:42.920 --> 00:41:45.890 say any given phrase that we have. 00:41:45.890 --> 00:41:50.850 Well, I'll go back to my ducksay source code inside my folder here. 00:41:50.850 --> 00:41:55.760 I'll go back to my tests folder, open up now my tests for ducksay, 00:41:55.760 --> 00:42:02.730 and let me add one more description for each of these for this function here. 00:42:02.730 --> 00:42:05.180 I'll come down below, and I'll say, I want 00:42:05.180 --> 00:42:12.170 to make sure that this can say hello, or can say any given phrase, rather. 00:42:12.170 --> 00:42:14.930 Ducksay can say any given phrase. 00:42:14.930 --> 00:42:19.580 And now to exemplify this, I want to include this test here. 00:42:19.580 --> 00:42:23.900 I expect to match between running, let's say, ducksay, 00:42:23.900 --> 00:42:26.780 with a given phrase like "quack," like that, 00:42:26.780 --> 00:42:31.340 and I just have to find "quack" anywhere inside of that given return value, 00:42:31.340 --> 00:42:32.640 just like this. 00:42:32.640 --> 00:42:37.220 So now, again I'm saying that ducksay should be able to say any given phrase. 00:42:37.220 --> 00:42:40.010 If I run it with ducksay and pass as input quack, 00:42:40.010 --> 00:42:43.800 I should see quack inside of that return value. 00:42:43.800 --> 00:42:46.970 So it's a good test, but I need to implement it now in code. 00:42:46.970 --> 00:42:50.960 I'll come back now to my ducksay.R file. 00:42:50.960 --> 00:42:55.640 Going back to my main folder inside R, where we store our R files, 00:42:55.640 --> 00:43:00.830 open now ducksay.R, and I'll see my function definition. 00:43:00.830 --> 00:43:01.830 Well, I could do this. 00:43:01.830 --> 00:43:06.187 I could say ducksay now takes as input a given phrase, just like that. 00:43:06.187 --> 00:43:08.270 And I'll make sure that instead of "hello, world," 00:43:08.270 --> 00:43:11.300 we say that given phrase. 00:43:11.300 --> 00:43:17.600 But now, per my tests, I still want to run ducksay, or be able to run ducksay, 00:43:17.600 --> 00:43:20.390 like this, without any arguments whatsoever. 00:43:20.390 --> 00:43:23.510 And I still expect to see "hello, world" when ducksay 00:43:23.510 --> 00:43:25.530 is run without any arguments here. 00:43:25.530 --> 00:43:30.080 So I could go back to ducksay now and say that phrase has a default 00:43:30.080 --> 00:43:33.200 value of hello, world, just like this. 00:43:33.200 --> 00:43:36.830 If I supply a value, well, I'll see the phrase there, hopefully, 00:43:36.830 --> 00:43:39.350 and if I don't supply a value, well, I'll hopefully 00:43:39.350 --> 00:43:42.450 see "hello, world" there instead. 00:43:42.450 --> 00:43:45.500 So let me test my code interactively now. 00:43:45.500 --> 00:43:50.630 I've updated my function here, so I should again run load all. 00:43:50.630 --> 00:43:54.360 I'm going to update my function after I've redefined it here 00:43:54.360 --> 00:43:57.180 and make it available to myself in my console. 00:43:57.180 --> 00:44:00.450 I now have access to the latest version of ducksay. 00:44:00.450 --> 00:44:05.250 I can run cat ducksay and then give as input quack, 00:44:05.250 --> 00:44:08.402 and hopefully we'll see quack on top of our duck. 00:44:08.402 --> 00:44:10.110 So it seems to be working just right now. 00:44:10.110 --> 00:44:12.810 Let me go ahead and run test and see what I can see. 00:44:12.810 --> 00:44:16.800 I'll see that all four of my tests are passing. 00:44:16.800 --> 00:44:20.280 So I've updated my tests, my source code. 00:44:20.280 --> 00:44:21.970 What else should I update? 00:44:21.970 --> 00:44:23.040 Well, my documentation. 00:44:23.040 --> 00:44:27.870 I'll go back now to my ducksay package folder, open up man, open up ducksay.R, 00:44:27.870 --> 00:44:29.760 and I'll update my documentation. 00:44:29.760 --> 00:44:34.770 I'll then say now that the return value of ducksay 00:44:34.770 --> 00:44:37.950 is no longer a duck saying hello to the world. 00:44:37.950 --> 00:44:41.700 It's really a duck saying the given phrase. 00:44:41.700 --> 00:44:44.730 And similarly, I should update now my usage. 00:44:44.730 --> 00:44:48.870 By convention, we include the function's name, remember, some parentheses, 00:44:48.870 --> 00:44:52.090 remember, and also, the parameters to this function. 00:44:52.090 --> 00:44:54.960 So one parameter now is this one called phrase. 00:44:54.960 --> 00:44:58.150 And I've given a default value of hello, world. 00:44:58.150 --> 00:45:01.890 So this is by convention how I would write that out in my documentation. 00:45:01.890 --> 00:45:05.490 I would give the parameter in its default value with an equal sign 00:45:05.490 --> 00:45:06.720 separating them. 00:45:06.720 --> 00:45:09.210 And then finally, down below my examples here, 00:45:09.210 --> 00:45:12.360 I could give one more example of using ducksay. 00:45:12.360 --> 00:45:15.900 Underneath this here, I could say you can also, if you want to, 00:45:15.900 --> 00:45:19.690 give ducksay some input, like quack, just like that. 00:45:19.690 --> 00:45:22.590 So this, I think, covers us in terms of our function 00:45:22.590 --> 00:45:26.190 itself, our tests, and our documentation. 00:45:26.190 --> 00:45:31.290 I can test my documentation by rerendering it ?ducksay, 00:45:31.290 --> 00:45:33.180 and now I'll see it on the right-hand side. 00:45:33.180 --> 00:45:37.980 I will in fact see my updated usage, my updated value, 00:45:37.980 --> 00:45:40.990 and my updated examples down below. 00:45:40.990 --> 00:45:44.040 So I think we're in a pretty good spot. 00:45:44.040 --> 00:45:46.710 Let me now rebuild this code. 00:45:46.710 --> 00:45:49.870 I'll simply use build again, just like this. 00:45:49.870 --> 00:45:55.350 And now, I should see my updated version of my package now in a single file. 00:45:55.350 --> 00:45:58.950 And to be clear, if I wanted to share this code with somebody else, 00:45:58.950 --> 00:46:03.090 I would need to rebuild it every time I modify it to put all the updates inside 00:46:03.090 --> 00:46:03.690 of ducksay-- 00:46:03.690 --> 00:46:10.290 this folder-- into this new file, ducksay_1.0.tra.gz. 00:46:10.290 --> 00:46:11.250 OK. 00:46:11.250 --> 00:46:15.960 Let me ask, what questions do we have on iteratively updating and now 00:46:15.960 --> 00:46:19.260 rebuilding our package over time? 00:46:19.260 --> 00:46:24.840 AUDIENCE: Is there a command in R which rebuilds the package for us 00:46:24.840 --> 00:46:26.960 as soon as the file changes? 00:46:26.960 --> 00:46:28.210 CARTER ZENKE: A good question. 00:46:28.210 --> 00:46:32.460 So is there a command in R that rebuilds the package for us as we change it? 00:46:32.460 --> 00:46:33.687 Not that I am aware of. 00:46:33.687 --> 00:46:35.520 So I'm familiar with devtools in particular, 00:46:35.520 --> 00:46:38.280 and I don't think there is a function that would do exactly that. 00:46:38.280 --> 00:46:40.780 I know in other languages, there can be functions like that, 00:46:40.780 --> 00:46:44.730 but not that I'm familiar with in R. A good question, though. 00:46:44.730 --> 00:46:45.940 Let's keep going, then. 00:46:45.940 --> 00:46:47.430 And so we've rebuilt our package. 00:46:47.430 --> 00:46:49.020 We now have it as a single file. 00:46:49.020 --> 00:46:52.560 I think what's left to do now is to really use R package. 00:46:52.560 --> 00:46:55.680 So let's see if we can create ourselves a new program that 00:46:55.680 --> 00:46:57.510 uses exactly this package. 00:46:57.510 --> 00:47:01.170 Maybe I will make a program called greet.R 00:47:01.170 --> 00:47:04.440 that will instead of giving a user just a plain old simple hello 00:47:04.440 --> 00:47:06.480 give them a hello from a duck. 00:47:06.480 --> 00:47:11.610 So I will move my working directory up one level to out of my ducksay folder. 00:47:11.610 --> 00:47:16.410 I'll use setwd and then in quotes here, dot dot. 00:47:16.410 --> 00:47:19.920 That means move me one level higher in my working directory. 00:47:19.920 --> 00:47:23.640 So I'm presently in ducksay, but now I'll be right next to ducksay, 00:47:23.640 --> 00:47:26.850 if you will, in the same view I have on the right-hand side here. 00:47:26.850 --> 00:47:33.430 I'll now create for myself a new a new program called greet.R, just like this, 00:47:33.430 --> 00:47:35.430 and I'll see greet.R show up over here. 00:47:35.430 --> 00:47:38.760 I'll open it, and now let's write this program together. 00:47:38.760 --> 00:47:42.540 Well, now that I have my very own package, one called ducksay, 00:47:42.540 --> 00:47:45.660 I could use library to load my function ducksay. 00:47:45.660 --> 00:47:50.550 I could use library here and say I want to load this library called-- 00:47:50.550 --> 00:47:52.770 load this package called ducksay. 00:47:52.770 --> 00:47:55.050 And once I've done that, what do I want to do? 00:47:55.050 --> 00:47:57.580 I want to ask now the user for their name, 00:47:57.580 --> 00:48:00.960 so I could use let's say something like readline. 00:48:00.960 --> 00:48:04.860 I could say I want to ask the user for their name. 00:48:04.860 --> 00:48:08.010 I could use name and then store inside readline, 00:48:08.010 --> 00:48:10.920 just like that, and ask, what's your name? 00:48:10.920 --> 00:48:14.040 Readline, as you've seen before, will take their input 00:48:14.040 --> 00:48:17.217 and return it to us, storing it in this object called name. 00:48:17.217 --> 00:48:19.050 And then I could create a greeting for them. 00:48:19.050 --> 00:48:21.150 I could say maybe let's have a greeting here. 00:48:21.150 --> 00:48:23.610 But the greeting will actually be the result 00:48:23.610 --> 00:48:28.290 of calling our function ducksay, which is inside this package called ducksay. 00:48:28.290 --> 00:48:32.070 I'll use ducksay here and pass as input in this case, 00:48:32.070 --> 00:48:37.350 well, the combination of hello and then let's say the user's name. 00:48:37.350 --> 00:48:42.060 And now down below, I could use cat, of course, to print our greeting. 00:48:42.060 --> 00:48:46.680 So top to bottom, I'm loading this package we created called ducksay. 00:48:46.680 --> 00:48:48.960 I'm asking the user for their name. 00:48:48.960 --> 00:48:53.240 I'm then using the function we defined as part of ducksay, called ducksay 00:48:53.240 --> 00:48:53.840 as well. 00:48:53.840 --> 00:48:58.160 I'm going to pass in the concatenated version of hello and their name. 00:48:58.160 --> 00:49:02.930 And then I'm going to print the result of calling ducksay here. 00:49:02.930 --> 00:49:06.500 But before I can do this, I have built my package. 00:49:06.500 --> 00:49:09.140 But what I haven't done is installed it. 00:49:09.140 --> 00:49:10.050 So I've built it. 00:49:10.050 --> 00:49:11.550 I could share this file with others. 00:49:11.550 --> 00:49:13.638 But I still need to install it on my own computer. 00:49:13.638 --> 00:49:15.680 And if anybody wants to use this package as well, 00:49:15.680 --> 00:49:18.150 they need to install it on their computer too. 00:49:18.150 --> 00:49:20.810 So thankfully, there are tools we can use to install 00:49:20.810 --> 00:49:24.200 packages in R. We've seen one already-- 00:49:24.200 --> 00:49:26.420 one called install.packages. 00:49:26.420 --> 00:49:31.340 In fact, you can use install.packages to install packages not just from the CRAN 00:49:31.340 --> 00:49:35.360 but also from an individual file, like the one we have here. 00:49:35.360 --> 00:49:39.467 You could also use a base R function called R CMD install. 00:49:39.467 --> 00:49:41.300 You can use that in your terminal, but we'll 00:49:41.300 --> 00:49:46.410 stick now to using install.packages, and it keeps us inside the R console itself. 00:49:46.410 --> 00:49:50.420 So let's now install our package with install.packages. 00:49:50.420 --> 00:49:53.190 I'll come back now to RStudio, and I think 00:49:53.190 --> 00:49:55.890 I could use this single file I now have-- 00:49:55.890 --> 00:49:58.440 the compiled version of my package. 00:49:58.440 --> 00:50:00.900 And I can install it now with install.packages. 00:50:00.900 --> 00:50:06.840 I'll say install.packages, and I'll now use the file name itself-- 00:50:06.840 --> 00:50:10.290 ducksay_1.0.tar.gz. 00:50:10.290 --> 00:50:12.780 This tar ball, this kind of like a zip file, that I 00:50:12.780 --> 00:50:14.820 can use to store my package contents. 00:50:14.820 --> 00:50:19.230 I'll install this here, and now that it's installed, I see done. 00:50:19.230 --> 00:50:24.150 I can now run source in greet and type in Carter. 00:50:24.150 --> 00:50:24.780 Voila. 00:50:24.780 --> 00:50:28.560 I now have a duck that can say hello to anyone who enters in their name, 00:50:28.560 --> 00:50:32.980 thanks to this package I made called ducksay. 00:50:32.980 --> 00:50:33.720 Well. 00:50:33.720 --> 00:50:37.260 We've seen now how to build R packages, how to install them, 00:50:37.260 --> 00:50:38.340 and how to use them. 00:50:38.340 --> 00:50:41.423 Now the only thing I have to do is to share them with the world. 00:50:41.423 --> 00:50:43.590 So if you want to share your package with the world, 00:50:43.590 --> 00:50:45.180 you have a number of options. 00:50:45.180 --> 00:50:49.190 You could use the CRAN, so long as your package adheres to their guidelines. 00:50:49.190 --> 00:50:50.940 You could use a service like GitHub, which 00:50:50.940 --> 00:50:53.482 is a tool for sharing software and collaborating with others. 00:50:53.482 --> 00:50:57.030 You can even share your package over email with a friend. 00:50:57.030 --> 00:50:59.430 Now however you choose to share your code, 00:50:59.430 --> 00:51:00.600 I hope you keep in mind just how much you've 00:51:00.600 --> 00:51:02.558 learned over the course of this course and what 00:51:02.558 --> 00:51:04.050 you have to share with the world. 00:51:04.050 --> 00:51:08.250 In fact, you began by learning how to represent data using vectors and data 00:51:08.250 --> 00:51:08.910 frames. 00:51:08.910 --> 00:51:13.050 You graduated to transforming data, using subsets, conditions, 00:51:13.050 --> 00:51:14.670 and logical expressions. 00:51:14.670 --> 00:51:18.210 You then saw how to make your analysis more efficient using loops and functions 00:51:18.210 --> 00:51:22.068 and dipping your toes into this paradigm called functional programming. 00:51:22.068 --> 00:51:25.110 And in the second half of the course, you saw packages like the tidyverse 00:51:25.110 --> 00:51:28.560 and all they could do-- how they could tidy your data, help you visualize it, 00:51:28.560 --> 00:51:30.420 and help you test your programs too. 00:51:30.420 --> 00:51:32.730 All that's left now is to take all you've learned, 00:51:32.730 --> 00:51:35.910 package it up, and share it now with the world. 00:51:35.910 --> 00:51:38.340 We're so excited to see what you'll create. 00:51:38.340 --> 00:51:42.530 This was CS50's Introduction to Programming with R.