[MUSIC PLAYING] CARTER ZENKE: Well, hello, one and all, and welcome back to CS50's Introduction to Programming with R. My name is Carter Zenke, and this is our lecture on packaging programs, by which we can share them with the world. Now, together today, we'll make a package called ducksay. And the goal of ducksay is take as input some character string and output a textual representation of a duck saying that character string. 

So for instance, if I typed in "hello, world," I might see in my console now this duck saying hello to the world. Now, if you're familiar with CS50 or with programming more generally, you might have heard of a package or a program called cowsay. And cowsay is very similar in spirit. It takes some piece of text and outputs in your terminal a cow saying that text. 

Now, cowsay is not an R package, but we will take some inspiration from it to make our own package, one called ducksay. But to do so, we need to learn more about packages. Up until now, we've been users of packages. We have seen how to install, how to load, and how to use functions inside of packages. But what we haven't seen is the source code behind our packages. How are they made? 

Well, it turns out when you download a package from something like the CRAN, what you get is a single binary file. But thankfully, you yourself don't have to write binary to create a package. You can instead use what we'll call source code-- the individual dot R files and folders that compose your package. 

And when it comes time to share your package with others, we'll take the source code and build it or compile it into some single file you can share around the world. But let's focus first on source code. I'll come back now to RStudio to make some source code for this package. And by convention, we write the source code for some package inside a folder that has that package's name. 

So if I want to make a package called ducksay, I should first create a folder in my working directory called ducksay. To do so, I can use this function here, dir.create, to create a directory-- in this case, called ducksay. I'll hit Enter here. And notice how in my File Explorer, I see a new folder called ducksay. 

Well, I want to make all my future files and folders inside of this folder here-- my future package. So I'll set now my working directory to ducksay. And now, it's a bit equivalent to me going inside of this folder, by clicking on it and seeing this blank slate in which we can begin writing the source code for our package. 

Now, this blank slate is a bit scary at first, but thankfully, there's a structure to how packages are organized in R. Let's take a look at what a typical package might look like. So in general, if you're making a package in R, you will tend to have these files and folders here. One first of all called DESCRIPTION in all caps, which describes your package-- what is the name of it, what version number is it, who wrote it, and so on. 

You'll also have a file called NAMESPACE, which you'll use to define the functions in your package some end user might be able to use. And then you'll have some folders here-- one called man, which stands for manual. You'll put inside that folder the documentation for your functions-- some instructions on how to use them. And here we have an R folder, too. We'll actually place your R files that have your function definitions inside of them. And of course, we'll need some tests to test our code, so we'll put those in this folder called tests. 

Now, if you're running a more advanced package, you might have other files and folders too, but these are the core ones that say this folder here is an R package. So let's begin. Let's begin with our description file and describe the package we want to create. I'll come back now to RStudio and create for this folder here a description file. 

| can do so with file.create, create, and I'll choose to make a DESCRIPTION file, in all caps with no file extension. Notice how now inside my ducksay folder, I have this file called description, which I can open up and see, well, another blank slate. And so it turns out here too, there are some conventions on how to organize our description file. In fact, our description file will be composed of individual fields that tell us some information about this package. But what are those fields? 

Well, the ones you need to know about-- you need to include in your file-- are these here. One called package, which is the name of our package as we want somebody to install it. We've seen install.packages, and we'll often include the name of the package we want to install as the input to that function. 

Well, this is the name we want the user to type in to install.packages to install our package. We also have to say the title of our package, which a bit more English-friendly. You could capitalize things, include spaces, and so on, but similar idea to the package field as well. 

We have here a description as well. What is the description of our package? What does it do, and so on? And a version number. What version is it? If we changed over time, is it version 1.0, 2.0, 3.0, or so on? And then finally here, we have so information on the authors. Who wrote this package? What was their role? And then a license file. That is, what is the legal terms in which you can actually use the code if you want to install this package and use it yourself? 

So let's now add these fields to our description file and make it an R package. I'll come back now to RStudio, and let's go ahead and start by naming our package. Well, we said before, our package will be called ducksay. So in the description file, I'll add this package field followed by a colon, and I'll say that our package is named, in this case, ducksay-- just some English text here. 

Then I'll add the title, and I'll choose to maybe title case our package here. I'll call it Duck Say, just like this, and I'll give a description to it as well. I'll say the description of this package is going to be "Say hello with a duck." The purpose is to say hello with a duck. And this is the first version of the package. I'm just starting out on my journey of developing this package. 

But now I need to include some more information too, like who wrote this package and what are the legal terms in which we can actually use the package too? So to define who wrote the package, I could use this field called Authors@R. And whereas up above, we've been using some English text, just regular old characters and so on, in this field, we can actually use an R function to define who wrote the package and what role they played. 

Now, to add a new author to this package, I'll use a function, one called person. And if I look at documentation, I would know that the first argument to this person function is the person's first name. So my first name is Carter. And the second argument is their last name. So in this case, my last name is Zenke. 

And there are a few other parameters as well that this person can have-- namely, an email. I could say my email here is carter@cs50.harvard.edu. And this lets people know if they want to contact the package author, they can email me at this email here. 

And then we should also specify what role each person played. So there's a parameter here called role as well. Now, because package authors can play more than one role, role will take as input a vector of roles. And there are actually a defined set of roles, which you can learn if you look at the documentation here, but I'll focus on a few in particular. 

One role is an author of the package-- somebody who contributed to it in full. And we denote somebody as an author by typing in aut-- this abbreviation here for author. So in this case, I'm saying that me, Carter Zenke, I'm an author of this package. 

Now, there's more roles, too. One role that is important as well is the creator of the package, which we note with cre. It's the abbreviation for creator. Now, creator and author seem pretty similar in meaning, but in R, they have two distinct meanings. An author is anybody who at any time contributed to the package. A creator, though, is the person who now maintains the package. They're in charge of updating it, making sure it's up to date and so on, over time. 

So these two are distinct, and there has to be at least one creator of some package-- a person who's maintaining it over time. So these are two main roles, but there is one more as well. I should also say that I am the copyright holder for this code. I own it, and I'm the person who owns a license that I will then specify underneath myself here. So these three roles-- author, creator, and copyright holder-- are the ones you will need to create a package in R. 

But now let's think through the license field-- license down below. Well, if you're a lawyer, you could maybe write your own license. But in general, it's best practice to rely on some standardized license that already exists. And in fact, if you want to share your code for free online for others to use freely as well, there's a whole community that has created various licenses-- one called the Free and Open Source Software community. 

This community has created several licenses that you can just use and adopt or adapt to share your software online for free. Among the typical licenses are these-- the MIT license, by a bunch of friends down the road, and the GNU General Public License, similar in spirit to the MIT license as well. 

Now, the MIT license begins as follows. "Permission is hereby granted, free of charge, to any person obtaining a copy of this software," basically to deal with the software without restriction-- so saying you can share this code freely, and you can use it freely as well. 

So I might want to adopt this license for my software here, and I'll go ahead and say that in my description file as well. I'll come back over here to RStudio and say, I want to build on top of the MIT license. I want to license my software under this language here. So I could simply type in MIT as the template for my license now. 

But it turns out that in R, there's a bit more we need to specify here. I should also specify the year which I created this software and who the copyright holder is inside the license itself. This is particular to MIT license as well. So if I want to add on to this MIT license, I can take MIT as my template but then add on some other file that gives some more information about this license. I could say MIT + file and then LICENSE in all caps. 

And this is convention. If I want to add on to my license, I do so with a file called LICENSE in all caps. And now this says the entire license for my software is the MIT license as the base, plus some file that I'll include called license. Well, let's create it now. I'll get on my console here and say file.create. I'll create a file called license-- no extension here-- open it up, and now add in some placeholders that I should fill. 

For the MIT license in particular, I need to say, again, what year it was that I created this software. So I'll say YEAR in all caps and the year I created the software, and then the copyright holder as well. In this case I'll say just the ducksay authors, referencing again this author's field in my description file. 

So now we have the basic bare-bone structure for our package called ducksay. We have a description, a license, authorship, and so on. Let me ask, what questions do we have so far on packages in R and creating those of our very own? 

AUDIENCE: Which are the best practices to enumerate the version number, and when do we choose when to increment the number to the left of the point and the one to the right of the point? 

CARTER ZENKE: So a good question about version numbering. I want to change the version number that we're using. Let me show you what we're doing now and show you another process called semantic versioning, too. I'll come back over here, and here I'm using just 1.0 for simplicity, but actually it turns out some of the community use a convention known as semantic versioning. And that versioning system actually allows for three numbers here. 

Now by convention, each of these numbers has some certain meaning. In this case, this last number is known as the patch version. If you make some bugfix, you would increment this number here. This middle number is known as the minor version. If you add in some new feature, like a new function, for instance, you would increment that number too. 

But it turns out that this first number, like 1 here, is known as the major version. You would only increment this if you made some change that broke the conventions you prior used in your package and somebody who's relying on your package would have to update their code too to still use your package. So this is one set of conventions for versioning here. We'll go back, though, and use 1.0 just for simplicity. 

OK. So we have here now our basic bare-bones R package, but our goal is to also add some source code. And we can do it in the same way we've usually written source code before-- by first writing some tests, writing our code, and writing documentation for our code. Now, we saw before, if I want to test inside of my package-- I want to run some unit tests as well-- I should write them inside of this folder called tests. 

And we can certainly create this folder ourselves, but as we get into the weeds of structuring R package, we might want some help. So thankfully, there is a package that helps you write packages in R, one called devtools. This package helps us give us some tools to write our very own package, and among the functions it has for unit testing are these-- one called use_testthat. 

So we saw testthat last time. It's a package for unit testing R software. If I want to use the testthat package to test my code, all I have to do is use use_testthat. And then, once I've done that, if I want to create some testing file for a function, I could simply use use_test, and that will create for me a new testing file for my function. 

And then finally, once I have all those tests written, if I want to run those tests, all I have to do is run the test function as well. So very helpful for us for structuring our package and running our unit tests too. So let's begin by writing some unit tests for our ducksay function. 

I'll come back now to RStudio, and let's try to use use_testthat. Well, because this function is part of the devtools package, I'll first need to load, if not install, the devtools package. So here down below, I'll use library devtools to load the devtools package, assuming it is installed. 

I'll hit Enter here, and now, let's use use_testthat to configure our package to run tests with the package testthat. Well, I'll use down below here use_testthat, and I'll hit Enter, and we'll see a few things have happened actually that I'll see in my console down below. 

Now, the first thing I see is that my description file has been added to. I see now this new field called Suggests, and as part of Suggests, I now see testthat. This means when somebody installs R package, it will be suggested that they also install testthat at a version greater than or equal to 3.0. 

Well, why would we suggest testthat? Well, maybe these want to test R code themselves, in which case they'll need to use testthat, because we used it ourselves as well. If you want somebody, though, not just to suggest some code-- it would actually be suggested to use your code-- you could also make it required they install some other package as well. So I can make a field called Requires, like this, and list any packages I want to require the user to install to use my own package. 

Now, the user here likely won't be testing our software for us, so only suggest it, not require it, but if you do want to require some code-- some package-- you can actually use Requires as a field in your description file as well. I also see here config/testthat/edition. This just means that when our tests are run, we'll be sure to use the version 3 of testthat. 

But a few other things have happened as well. If I look down below my console here, I'll see it's created some new folders for me-- namely, one called tests over here. If I click on tests, I'll see well a new file, testthat.R, and a folder, also called testthat. If I open up testthat.R, this is a file that was automatically created for me, and it includes some configuration for testthat as I run my tests inside of this package. 

We'll leave this alone for now, but notice how I also have a folder called testthat. And it's inside this folder that I will actually write my unit tests themselves, similar to what we saw last time. Now that the structure been set up, I can actually write my tests now. And see how it's suggesting that I use use_test to create my very first unit test for my function. 

Well, here I want to use_test, and I want to create a test for, in this case, a function called ducksay. So I'll enter as input to use_test the function's name, ducksay, just like that. I'll hit Enter now, and I'll see a few things happen again. One, I now have this new file, called test-ducksay.R, which is inside my testthat folder, which itself is inside my tests folder. And now I can modify test-ducksay.R. 

It's given me here some basic structure for my test file, but I don't want to use this so far. I'll just remove it. And now, I want to think about how I could describe ducksay. What do I want it to do in this testing file? Well, we saw last time we could use some code a bit like this to describe how I want to how we want ducksay to run. 

I could say describe and then use ducksay here to say I'm going to describe how I want ducksay to run. Well, what do I want it to do? I think the first thing I want it to do is to work with cat. So I could say it can print to the console with cat. And now, I'll include some test to see if ducksay can print with cat. 

And what I mean by this is as follows. If I use cat here and gave as input ducksay, I should see the output of ducks in the console. Ducksay will simply return to me some character string, but cat will take care of outputting it to the console. 

Now, how could I make this a test? I have the code I want to have run, but I want to test that it's doing what I want it to do. Well, it turns out that similar to expect_equals, which you saw last time, there is a function called expect_output that can expect when I run this code, I get output in my console. 

So I'll use this function, expect_output, part of testthat, to say I expect that when I run this code, cat with ducksay, I'll see some output in my console-- anything at all. And that seems to be our first description now for ducksay. But I think we could still do a little better-- get more specific, if we will. 

So here we could say it prints to the console with cat, but what should it print? Well, it should print "hello, world" at least, so I'll go ahead and say it can say hello to the world. That's another feature now of ducksay. And I'll say that well, when I want ducksay to run, I expect to see "hello, world" in the output. 

Now, we saw last time one called expect_equal-- one called expect_equal with ducksay here. And I could say I expect that ducksay will return to me a string that is equal to "hello, world." But I argue this might not work as I intend it to. Because if we look at our output here-- here's our intended output of ducksay-- why might it not work, if we were to say I expect this output to be equal to "hello, world"? 

Well, it seems like this is not strictly equal to "hello, world." I have "hello, world" and then some duck at the end. So what I would rather do is ask a different question. Is hello world somewhere in this output we've gotten back. Not is it equal to "hello, world," but is "hello, world" somewhere inside of it? 

Now thankfully, there is another function besides expect_equal-- one called expect_match. I can expect to find a match of "hello, world" inside this output of ducksay. I think I can try it out. I'll come back over here, and I can use expect_match like this. I'll say expect_match now between the return value of ducksay and this character string "hello, world," and that will treat this as a pattern-- hello comma space world. 

And if it finds that pattern inside the return value of ducksay, well, this will be true-- no errors at all. If I can't find that pattern, though, in ducksay, it will raise an error, and our tests will fail. So again, expect_match is good for trying to find this pattern, "hello, world," inside the output of ducksay right here. 

So I think these tests are in pretty good shape. I now know exactly what I want ducksay to do. It should work with cat, and it should print out some output that says hello to the world. But now that we have our tests, we need to write our actual code. We need to write the function ducksay itself. And for that, we saw we could use this folder called R. 

In general, in working with packages, we're going to write all of our dot R files inside of a folder called R. But again, rather than structuring this ourselves, we could rely on devtools to do it for us. I could use a function in devtools called use_r and pass in the function name I hope to create. And then I'll get a R file to write my function definition in. 

So let's try this now, now that we've written our unit tests. I'll come back now to RStudio, and let's see if I can use use_r to create for me the function ducksay and the file I should define it in. I'll go to my console now and use use_r, and then go back up in my File Explorer to ducksay as the folder here. 

And I'll try to create now this file to define ducksay in. I'll say I want to create this new function, ducksay, and the R file for it. I'll hit Enter now and see a few things happening. One, I see I have a folder called R-- brand new thanks to use_r. And I also see I have a new file, ducksay.R, which has been created inside of this R folder to keep things organized in this case. 

I can close my description here and my-- let me save it first-- and then I'll go ahead and remove test ducksay here and focus now on ducksay.R. Well, how should I write ducksay? If I look at my output here-- here I have my intended output-- I notice that I really have three lines of output I hope to return from this function. 

I have hello comma space world, the top half of my duck, and the bottom half of my duck. These are all character strings. So I'm actually ask our group here, what function we've seen so far do you think would help us combine these strings? 

If I want to have three different lines here-- hello, world, top of my duck, bottom of my duck, what function could I use to combine these strings and perhaps return them from my ducksay function? 

AUDIENCE: Maybe the Paste function? 

CARTER ZENKE: Yeah, maybe Paste. So we saw before that Paste is good for combining different strings, and we can actually use Paste here. But instead of separating now with spaces, we could separate with new lines-- our backslash and escape character. 

So let's try that out. I'll come back now to my file, and let's define for ourselves the ducksay function using Paste. I'll say here I want to make a new function called ducksay that currently doesn't take any input at all. But inside of this function, I will return the result of calling Paste on three different strings. The first one will be my first string here, one called "hello, world" at the very top of my output. 

And then the very beginning of my duck here-- I could give it a little beak, some eyes, and now the top of its body here. And then underneath, I could use the bottom of the duck, which will look a bit like this-- some underscores and then a forward slash. And now, I think we have what looks to be our intended output, thanks to Paste. 

But as we said before, Paste default is to combine these strings using a space. And we want a new line instead. So I should change now the sep parameter to Paste. We've seen before from a space to a backslash n-- this new line character that says I want to separate each of these character strings by a new line. I could be hitting Enter each time on my keyboard. 

So I'll save now this function. And when ducksay runs, it should now return to me this output. But I've defined ducksay here, and I want to use it. Turns out I can't do that just yet. We saw earlier this idea of a NAMESPACE file, which tells us which functions in our package an end user can use. And so now we've defined our ducksay function, we should actually include it in our package's NAMESPACE-- the list of functions that an end user could use in R package. 

So let me now create this file called NAMESPACE. I'll say file.create, NAMESPACE down below, and I can then see in my folder called ducksay a new file called NAMESPACE. I'll open this one up, and what should I include in NAMESPACE? Well, by convention, we have a function here called export. Export. 

Export says take a function that I've defined and make it available to the end user who installs this package. In this case, I'll export ducksay function just like this. So to be clear, I've now defined my ducksay function inside a file called ducksay.R, which itself was inside an R folder to keep things organized. 

And then once I've defined it, I want to make it available to a user, which I'll do through the NAMESPACE file and say I want the ducksay function in particular to be available to our end users here. Now once I do that, I can make use of another devtools function, one called load.all-- load.all-- that says whatever functions I exported from my package, like ducksay here, I want you to load them so I can use them right here in my console. 

I'll go ahead and load all, and I'll see this is loading the ducksay package now. What I can do now is use ducksay in my console. I could say I want to cat the result of calling ducksay, just like this. And let's see what we get. Fingers crossed. We get a cute duck saying hello to the world. 

And now we could test our code more thoroughly too. I could run test as well in my console, and now I'll run those tests I created. Let me open up inside my tests and test that folder here. Here were those tests. If I now run test the function, thanks to devtools, I will be able to run all the tests I defined in this file. 

Now just one more thing here too. Last time, we used source at the top of our file to give this file access to a function like ducksay in ducksay.R. But now that we've used load all and exported this function from our package, we can simply load all and then run the tests, and they will have access to that function called ducksay. No more using source so long as we're inside a package that we've exported our functions from. 

OK, so we've seen now how to define unit tests for our package, how to write code that adheres to those tests. Let me ask, what questions do we have on what we've seen so far, either on testing our code, writing our functions, or defining our package more generally? 

AUDIENCE: When you run the test program on the terminal, what's the meaning of the colored letters FWS? 

CARTER ZENKE: Ah, good question. So if I look over here in the console, I'll see some pretty output, let's say, from testthat. And let me walk through it step by step here. So here I see testing ducksay. That is the function we are testing, right? I'll also see FWS and OK. Now, these are different kinds of results we can get from our tests here. 

It seems like F corresponds to fail down below. Fail-- we didn't pass this particular test. W stands for a warning. We saw last time how our tests sometimes raise warnings. Well, this would be the number of tests that gave me a warning, in this case. 

S stands for skip. It turns out you can skip tests if you want to. And then OK means this test passed with flying colors. So here, I see these two tests-- they both passed, and I'll see a 2 in the OK and a 2 total that are passing down below. 

If I had more than one function to test I might see more than one of these and see the total number of ones that were passed, skipped, warned about, or failed overall down at the bottom of this results here as well. So I hope that helps clarify what exactly we're seeing as a result of using test. But great question there. 

OK, so I think we're in a pretty good place, but there's arguably one more thing to consider now. So I've seen that my code can both print to the console with cat, and it includes "hello, world" in the output, but an important thing here too is, Does it include a duck? So let's see that as well. 

I'll come back now to RStudio and update these tests now. Let me add a new one-- a new test that says ducksay-- it can even say hello with a duck, just like this. And I think I could probably use a very similar structure to what I used before with expect_match. In this case, though, I could expect a match between the output of ducksay and the duck that I have to show to the user. 

So I could use expect_match, like I did before, and then enter ducksay, just like this. And now I want to expect a match between my duck pattern and whatever I see in the return value of ducksay. But I'll probably need a new object for this duck, and I want to type in the whole duck as an argument here. 

I could go up above here and define myself a new duck, similar to how we did it before. I'll paste together the top half of this duck with a cute little beak, and a top here, and then the bottom half of my duck-- 1, 2, 3, 4 underscores-- and then I forward slash, and that is my duck, so long as I separate each with a backslash n, just like that. 

And now, I think what I could do is expect a match between the return value of ducksay and this duck-- this duck I've created over here. Now, this, you think, might work. But I'd argue there's one more thing to consider here, which is I told you earlier, expect_match will take the pattern we've defined here and look for it in the return value of ducksay. In this case, this is our pattern. 

Well, these patterns are more formally called regular expressions. And we'll get into them today, but in general, one thing to know about them is that these characters, parentheses and a dot, have a special meaning inside of regular expressions. They don't actually mean literally a parenthesis or a dot. They mean something else entirely. 

So if I want to treat this pattern not as this thing called a regular expression but exactly as I see it here, I can set the other parameter equal to true instead-- one called fixed. Fixed says, I want you to treat these characters here not as part of some regular expression, but instead, exactly as we see them here-- a greater-than sign, a parentheses, and a dot or a period. 

So more on those another time, but for now, let's just say I want to look for exactly this pattern inside the output of ducksay. I'll leave this as is now, and I'll go down below and run my tests with test again. And now I'll see all three tests are passing. None are failing. None are giving us warnings. None have been skipped. All, in this case, have passed. 

So we've fixed our tests. We've written our code. One next step is to document how to use R function. Maybe a user is new to R package. They don't know what to do. We want to give them some guidance on how to use R functions. 

In fact, you've probably seen to access documentation, you can use question mark followed by the name of some function. And right now, if I use question mark ducksay, well, I don't see anything. There's no documentation for ducksay. 

Well, let's go ahead and fix that. Thankfully, I can define my own documentation for ducksay by putting it inside of this folder we saw earlier-- one called man, where man stands for manual. But what will go inside this man folder? It turns out a variety of files all ending with dot Rd, where dot Rd stands for R documentation. 

In fact, inside these files, we'll write not just plain text, but we'll actually write something called a markup language. Now, a markup language is not a programming language. There are no functions and loops and so on. Instead, a language for formatting some text. Now for instance, R's markup language looks a bit like this. I can give each of my documentation files some particular parameters, like title, description, and usage here. 

Here, title says, what is the title of my documentation? Description says, describe this function for me. And usage says, how should I use this function too? There are other commands we'll see here, but our dot Rd files will look a lot like this and will then render them or convert them to those same files you're used to seeing when you use the question mark down in your console. 

So let's try this out now. I'll come back to RStudio and try to make some documentation for ducksay. Well, I want to probably first create for myself that man folder to put my documentation inside of. So I could use that same function we saw earlier, dir.create. And I'll create for myself the folder called man, short for manual. 

And it's inside of this folder that I will store all of my dot Rd files. I'll say file.create now, man/ducksay.Rd. And this is convention. I'm putting this file, ducksay.Rd, inside the man folder, and I'm calling it-- giving it the same name as the function it should document. So this file, ducksay.Rd, should document the same function, ducksay. 

I'll go ahead now and create this file, and if I open now my man folder, I should see ducksay.Rd right there inside. I'll open it up, and what do I see? Well, nothing yet, but I'd argue we could go ahead and use R's markup language to create some documentation now for ducksay. 

Now, I've read the documentation for creating documentation in R, and there are several different keywords you can use to create your documentation. Among the most important ones are these here. One is slash name. And inside these curly braces here will include the name of our function we're trying to document-- in this case, ducksay in particular. This is the name of the function we're trying to document. 

The next one, most important one, is going to be slash alias. Slash alias is what you want the user to type in in their console to see your documentation. For instance, if I go down to my console now and I use question mark ducksay, well, my alias is ducksay-- literally this right here. If any user were to go to their console and use question mark ducksay, they could see this documentation that I have now created for them, as long as I've installed my package. So also, my alias is similarly ducksay as well. 

And now, here comes our title. What is the title of this function? Kind of a more English characterization, like capitals and spaces and so on-- we'll call this function Duck Say, just like this, and provide a description. I'll say that this is a duck that says hello. 

And just like that, with these four lines of markup language, we can actually already see it being rendered or converted into our documentation file. If I go on my console now and run question mark ducksay, I'll see my very first R documentation file. 

Notice how here, the name of this function is actually included in my documentation, right up here. The alias is also included in what I use down here. I said question mark ducksay and got this documentation file. The title is there too-- slash title ducksay. We see that right here. And so too is the description, a duck that says hello. 

So we could keep adding to this documentation using this same syntax here. There are other kinds of components we can add to our documentation file as well. In fact, let's go ahead and add a few more. Let's add one called usage, which tells people how to use R function. I'll use slash usage here, and this will say how I want users to use it, in fact. So I'll say ducksay here. 

And by convention, in usage, we include the function's name, some parentheses, and if there are any parameters, we include those in the function's parentheses as well. But currently, there are no parameters. I'll also include a section called value, which is the return value of this function. 

What does it return to us? Well, it returns to us really a string representation of a duck saying hello to the world. And then finally, we can also include some examples of how to use this function, in case people are unfamiliar. I could say examples here and provide some examples of how people could use ducksay. 

Maybe I want them to use it with cats. So I'll show them, look, you can use ducksay like this. Take cat and pass as input ducksay, just like that. And now with these other pieces of syntax here, I can say question mark ducksay and see the new-- oops, let me save this file first-- and then run question mark ducksay, and I should see the now rendered version of what I'm seeing on the left-hand side over here. 

Notice how we have some new pieces. I see usage now. I see value. And I see some examples as well down below in my documentation file. So we've seen now how to document R functions using these markup language here. What questions do we have and how to document R code and how to render it in R console? 

OK, seeing none, let's keep going then. And I think we're now in a pretty good spot. So we now have the ability to write our own functions, to test them, and to write documentation for those functions. So what should we do now? Well, ideally, we want to package up and share it with the world. 

And in fact, this process of taking what we have as source code and converting it into a single file has a particular name. This name is called building our package. Building our package-- taking it from source code into a single file we could share around the world. 

Now, there are a few options for building our source code into that single file. Among them are these-- build, which is a devtools function that takes our source code and gives us some single file at the end. But build, it turns out, is actually a wrapper on top of a base R command called R CMD build. 

They have the same purpose. R CMD build, though, works in your actual computer's terminal, not in the console. So we'll instead use build to keep ourselves inside the R console and build our package into a single file. So we'll still rely on devtools now and use their build function in particular. 

Let me come back now to RStudio and show you how this exactly works. So notice here how I'm inside of my ducksay folder-- my ducksay package now, if you will. Let me go ahead and close my previous files here. Let me run this command called build, thanks to devtools. 

If I run build, I'll see some output here. And I'll see down below the file I have gotten from building this source code into a single file I can share with others. I see it's called ducksay_1.0.tar.gz. And if I move up one level in my folder structure, I'll see this file actually right next to the ducksay folder-- ducksay_1.0.tar.gz. 

And this is a funky kind of file name, but this stands for, essentially, a zip file, if you will. It's very similar in spirit. It's also called a tarball sometimes. So this is basically a single file which we can share our code, email it to somebody, post it online, et cetera. This is all that source code in our folder now in one single file. 

So we've done pretty well so far. But before I share this, I think I've forgotten kind of one important thing, which is the duck actually only says "hello, world" right now. It doesn't take as input any given kind of string. So I probably want to update our code and rebuild this package again and again. And in fact, you'll find a package building process often iterative. You build it, you add something new, you build it again, add something new, build it again, and so on and so forth. 

So let's go ahead and update our package and rebuild our code. Let's go back over here and consider how we could make this duck say any given phrase that we have. Well, I'll go back to my ducksay source code inside my folder here. I'll go back to my tests folder, open up now my tests for ducksay, and let me add one more description for each of these for this function here. 

I'll come down below, and I'll say, I want to make sure that this can say hello, or can say any given phrase, rather. Ducksay can say any given phrase. And now to exemplify this, I want to include this test here. I expect to match between running, let's say, ducksay, with a given phrase like "quack," like that, and I just have to find "quack" anywhere inside of that given return value, just like this. 

So now, again I'm saying that ducksay should be able to say any given phrase. If I run it with ducksay and pass as input quack, I should see quack inside of that return value. So it's a good test, but I need to implement it now in code. I'll come back now to my ducksay.R file. Going back to my main folder inside R, where we store our R files, open now ducksay.R, and I'll see my function definition. 

Well, I could do this. I could say ducksay now takes as input a given phrase, just like that. And I'll make sure that instead of "hello, world," we say that given phrase. But now, per my tests, I still want to run ducksay, or be able to run ducksay, like this, without any arguments whatsoever. And I still expect to see "hello, world" when ducksay is run without any arguments here. 

So I could go back to ducksay now and say that phrase has a default value of hello, world, just like this. If I supply a value, well, I'll see the phrase there, hopefully, and if I don't supply a value, well, I'll hopefully see "hello, world" there instead. 

So let me test my code interactively now. I've updated my function here, so I should again run load all. I'm going to update my function after I've redefined it here and make it available to myself in my console. 

I now have access to the latest version of ducksay. I can run cat ducksay and then give as input quack, and hopefully we'll see quack on top of our duck. So it seems to be working just right now. Let me go ahead and run test and see what I can see. I'll see that all four of my tests are passing. 

So I've updated my tests, my source code. What else should I update? Well, my documentation. I'll go back now to my ducksay package folder, open up man, open up ducksay.R, and I'll update my documentation. I'll then say now that the return value of ducksay is no longer a duck saying hello to the world. It's really a duck saying the given phrase. 

And similarly, I should update now my usage. By convention, we include the function's name, remember, some parentheses, remember, and also, the parameters to this function. So one parameter now is this one called phrase. And I've given a default value of hello, world. So this is by convention how I would write that out in my documentation. I would give the parameter in its default value with an equal sign separating them. 

And then finally, down below my examples here, I could give one more example of using ducksay. Underneath this here, I could say you can also, if you want to, give ducksay some input, like quack, just like that. So this, I think, covers us in terms of our function itself, our tests, and our documentation. 

I can test my documentation by rerendering it ?ducksay, and now I'll see it on the right-hand side. I will in fact see my updated usage, my updated value, and my updated examples down below. So I think we're in a pretty good spot. 

Let me now rebuild this code. I'll simply use build again, just like this. And now, I should see my updated version of my package now in a single file. And to be clear, if I wanted to share this code with somebody else, I would need to rebuild it every time I modify it to put all the updates inside of ducksay-- this folder-- into this new file, ducksay_1.0.tra.gz. 

OK. Let me ask, what questions do we have on iteratively updating and now rebuilding our package over time? 

AUDIENCE: Is there a command in R which rebuilds the package for us as soon as the file changes? 

CARTER ZENKE: A good question. So is there a command in R that rebuilds the package for us as we change it? Not that I am aware of. So I'm familiar with devtools in particular, and I don't think there is a function that would do exactly that. I know in other languages, there can be functions like that, but not that I'm familiar with in R. A good question, though. 

Let's keep going, then. And so we've rebuilt our package. We now have it as a single file. I think what's left to do now is to really use R package. So let's see if we can create ourselves a new program that uses exactly this package. Maybe I will make a program called greet.R that will instead of giving a user just a plain old simple hello give them a hello from a duck. 

So I will move my working directory up one level to out of my ducksay folder. I'll use setwd and then in quotes here, dot dot. That means move me one level higher in my working directory. So I'm presently in ducksay, but now I'll be right next to ducksay, if you will, in the same view I have on the right-hand side here. I'll now create for myself a new a new program called greet.R, just like this, and I'll see greet.R show up over here. I'll open it, and now let's write this program together. 

Well, now that I have my very own package, one called ducksay, I could use library to load my function ducksay. I could use library here and say I want to load this library called-- load this package called ducksay. And once I've done that, what do I want to do? I want to ask now the user for their name, so I could use let's say something like readline. 

I could say I want to ask the user for their name. I could use name and then store inside readline, just like that, and ask, what's your name? Readline, as you've seen before, will take their input and return it to us, storing it in this object called name. 

And then I could create a greeting for them. I could say maybe let's have a greeting here. But the greeting will actually be the result of calling our function ducksay, which is inside this package called ducksay. I'll use ducksay here and pass as input in this case, well, the combination of hello and then let's say the user's name. 

And now down below, I could use cat, of course, to print our greeting. So top to bottom, I'm loading this package we created called ducksay. I'm asking the user for their name. I'm then using the function we defined as part of ducksay, called ducksay as well. I'm going to pass in the concatenated version of hello and their name. And then I'm going to print the result of calling ducksay here. 

But before I can do this, I have built my package. But what I haven't done is installed it. So I've built it. I could share this file with others. But I still need to install it on my own computer. And if anybody wants to use this package as well, they need to install it on their computer too. 

So thankfully, there are tools we can use to install packages in R. We've seen one already-- one called install.packages. In fact, you can use install.packages to install packages not just from the CRAN but also from an individual file, like the one we have here. 

You could also use a base R function called R CMD install. You can use that in your terminal, but we'll stick now to using install.packages, and it keeps us inside the R console itself. So let's now install our package with install.packages. I'll come back now to RStudio, and I think I could use this single file I now have-- the compiled version of my package. And I can install it now with install.packages. 

I'll say install.packages, and I'll now use the file name itself-- ducksay_1.0.tar.gz. This tar ball, this kind of like a zip file, that I can use to store my package contents. I'll install this here, and now that it's installed, I see done. I can now run source in greet and type in Carter. Voila. I now have a duck that can say hello to anyone who enters in their name, thanks to this package I made called ducksay. 

Well. We've seen now how to build R packages, how to install them, and how to use them. Now the only thing I have to do is to share them with the world. So if you want to share your package with the world, you have a number of options. You could use the CRAN, so long as your package adheres to their guidelines. You could use a service like GitHub, which is a tool for sharing software and collaborating with others. You can even share your package over email with a friend. 

Now however you choose to share your code, I hope you keep in mind just how much you've learned over the course of this course and what you have to share with the world. In fact, you began by learning how to represent data using vectors and data frames. You graduated to transforming data, using subsets, conditions, and logical expressions. 

You then saw how to make your analysis more efficient using loops and functions and dipping your toes into this paradigm called functional programming. And in the second half of the course, you saw packages like the tidyverse and all they could do-- how they could tidy your data, help you visualize it, and help you test your programs too. All that's left now is to take all you've learned, package it up, and share it now with the world. We're so excited to see what you'll create. This was CS50's Introduction to Programming with R.