1 00:00:00,000 --> 00:00:02,994 [MUSIC PLAYING] 2 00:00:02,994 --> 00:00:19,217 3 00:00:19,217 --> 00:00:20,800 CARTER ZENKE: Well, hello one and all. 4 00:00:20,800 --> 00:00:23,200 And welcome back to CS50's introduction to programming 5 00:00:23,200 --> 00:00:24,970 with R. My name is Carter Zenke. 6 00:00:24,970 --> 00:00:27,520 And this is our lecture on testing programs. 7 00:00:27,520 --> 00:00:30,850 We'll see today all the ways our programs could go wrong, 8 00:00:30,850 --> 00:00:32,860 how to handle these things called errors, 9 00:00:32,860 --> 00:00:36,970 and see how to test our programs to ensure they behave as we intend. 10 00:00:36,970 --> 00:00:40,090 So let's jump in and see all the ways a function I've written 11 00:00:40,090 --> 00:00:41,890 could go a little bit wrong. 12 00:00:41,890 --> 00:00:46,660 I have here in RStudio a function I've defined called average. 13 00:00:46,660 --> 00:00:51,280 And this function average is defined in this file called average.R. 14 00:00:51,280 --> 00:00:55,900 And the purpose of this average function is to take as input a vector of numbers 15 00:00:55,900 --> 00:00:59,440 and return to me the single average number it finds across all 16 00:00:59,440 --> 00:01:01,330 of those numbers in that vector. 17 00:01:01,330 --> 00:01:06,340 So notice here how I'm using the built in functions sum and length. 18 00:01:06,340 --> 00:01:08,890 And you might know if you're familiar with averages or means 19 00:01:08,890 --> 00:01:12,220 that that's defined as basically taking the sum of numbers you have 20 00:01:12,220 --> 00:01:14,770 and dividing by the number of numbers you have. 21 00:01:14,770 --> 00:01:17,770 So it's exactly what I'm doing here with sum and length. 22 00:01:17,770 --> 00:01:21,260 And let me go ahead and presume that I figured that sum and length are 23 00:01:21,260 --> 00:01:22,190 correctly implemented. 24 00:01:22,190 --> 00:01:27,020 I can rely on these functions just as well in my own function called average. 25 00:01:27,020 --> 00:01:30,050 Now, it turns out there is already a function called mean, 26 00:01:30,050 --> 00:01:33,860 which does this very same thing built into R. It turns to us the mean 27 00:01:33,860 --> 00:01:35,840 or the average of some set of numbers. 28 00:01:35,840 --> 00:01:38,780 But our goal today is to write our own version 29 00:01:38,780 --> 00:01:40,730 of that function called average. 30 00:01:40,730 --> 00:01:42,560 So we can kind of see the design decisions 31 00:01:42,560 --> 00:01:45,230 that went into writing a function like mean 32 00:01:45,230 --> 00:01:47,990 in R. So here is my average function. 33 00:01:47,990 --> 00:01:51,140 Let's go ahead and try it and think about what could go wrong, actually. 34 00:01:51,140 --> 00:01:53,330 So I said before that this function average should 35 00:01:53,330 --> 00:01:56,570 take as input a vector of numbers. 36 00:01:56,570 --> 00:02:01,190 But we've seen some ways a user could give us not numbers, but text. 37 00:02:01,190 --> 00:02:04,730 If you recall using readline, you might know that readline by default 38 00:02:04,730 --> 00:02:07,160 takes as input text-- and hands it back text. 39 00:02:07,160 --> 00:02:10,340 So maybe I might have forgotten to convert that text to a number. 40 00:02:10,340 --> 00:02:15,350 I could run average in my console here, first defining it up above on line 1. 41 00:02:15,350 --> 00:02:16,760 I could run average. 42 00:02:16,760 --> 00:02:21,260 And let's say I've forgotten to convert some input from the user to a number. 43 00:02:21,260 --> 00:02:24,080 And I instead have now a vector of characters or character 44 00:02:24,080 --> 00:02:26,520 representations of these numbers here. 45 00:02:26,520 --> 00:02:30,043 So I'll pass as input this vector 1, 2, and 3. 46 00:02:30,043 --> 00:02:31,460 But those are not numbers, per se. 47 00:02:31,460 --> 00:02:33,270 They're actually characters here. 48 00:02:33,270 --> 00:02:34,520 I'll go ahead and run average. 49 00:02:34,520 --> 00:02:37,305 And now, I'll see this error. 50 00:02:37,305 --> 00:02:39,680 This is probably not the first time you've seen an error. 51 00:02:39,680 --> 00:02:41,690 Probably when you're programming, you've seen lots and lots of errors. 52 00:02:41,690 --> 00:02:44,520 But let's give these errors a more formal name. 53 00:02:44,520 --> 00:02:47,655 So these errors are more formally called exceptions. 54 00:02:47,655 --> 00:02:50,780 And an exception occurs when something exceptional happens in your program, 55 00:02:50,780 --> 00:02:52,640 but not in a good way. 56 00:02:52,640 --> 00:02:56,280 It happens when our program encounters some situation, some scenario 57 00:02:56,280 --> 00:02:57,530 it doesn't know how to handle. 58 00:02:57,530 --> 00:03:00,200 And instead, it stops entirely. 59 00:03:00,200 --> 00:03:04,640 So a question then becomes how could we handle these exceptions or these errors 60 00:03:04,640 --> 00:03:05,690 in our code? 61 00:03:05,690 --> 00:03:09,200 And one way to do so is to handle them more proactively. 62 00:03:09,200 --> 00:03:10,010 Preempt them. 63 00:03:10,010 --> 00:03:12,560 And do something else instead of encountering 64 00:03:12,560 --> 00:03:14,510 this error or this exception. 65 00:03:14,510 --> 00:03:18,080 So let's see if we could take that approach now in our own function here 66 00:03:18,080 --> 00:03:19,130 called average. 67 00:03:19,130 --> 00:03:21,050 I'll come back now to RStudio. 68 00:03:21,050 --> 00:03:24,500 And let's think through what exactly caused this exception. 69 00:03:24,500 --> 00:03:28,940 Well, if I look at it here, I'll see that I gave the sum function, it seems, 70 00:03:28,940 --> 00:03:32,060 some invalid type character of argument. 71 00:03:32,060 --> 00:03:34,190 So it seems like the problem was, in fact, 72 00:03:34,190 --> 00:03:38,880 that I gave as input to the average function this vector of characters. 73 00:03:38,880 --> 00:03:42,320 So what could I do to check for this before I maybe pass 74 00:03:42,320 --> 00:03:45,170 this input down into sum and length? 75 00:03:45,170 --> 00:03:48,680 I could probably use something like a conditional to ask some question. 76 00:03:48,680 --> 00:03:50,630 But what question would I ask? 77 00:03:50,630 --> 00:03:55,372 Well, I probably could ask is this vector numeric or is it not? 78 00:03:55,372 --> 00:03:57,830 And maybe I would consider the case where it isn't numeric, 79 00:03:57,830 --> 00:04:00,538 where I might get an exception to handle that case in particular. 80 00:04:00,538 --> 00:04:05,120 So here before I run line 3 now, trying to sum up these numbers, 81 00:04:05,120 --> 00:04:07,340 and finding their length, and dividing therein, 82 00:04:07,340 --> 00:04:09,830 why don't I go ahead and try to ask the question? 83 00:04:09,830 --> 00:04:14,990 Is, let's say, this vector x, is it not numeric? 84 00:04:14,990 --> 00:04:16,170 Just like this. 85 00:04:16,170 --> 00:04:20,240 So I'm going to make use of now this function, is.numeric, 86 00:04:20,240 --> 00:04:24,980 which asks the question, returns me true or false, is x a vector of numbers 87 00:04:24,980 --> 00:04:26,280 or is it not? 88 00:04:26,280 --> 00:04:29,030 And when I use this exclamation point here, I'm essentially asking 89 00:04:29,030 --> 00:04:32,780 is the vector x not full of numbers? 90 00:04:32,780 --> 00:04:36,230 And now I have the option here of handling that error before it 91 00:04:36,230 --> 00:04:39,230 might happen down here on line 5. 92 00:04:39,230 --> 00:04:42,080 So what could I do to handle this error? 93 00:04:42,080 --> 00:04:44,600 Well, a convention sometimes in the R world 94 00:04:44,600 --> 00:04:48,110 is to return a special value, one like NA. 95 00:04:48,110 --> 00:04:51,290 So if we give as input to our function average 96 00:04:51,290 --> 00:04:55,670 a vector that doesn't include numbers, I could say no, no, let's stop here 97 00:04:55,670 --> 00:04:59,480 and just return NA, instead of getting this error ultimately. 98 00:04:59,480 --> 00:05:01,670 So I'll go ahead and do just that on line 3. 99 00:05:01,670 --> 00:05:06,500 I'll say if we find that this vector x is not full of numbers, is not numeric, 100 00:05:06,500 --> 00:05:09,600 I'll go ahead and return NA instead. 101 00:05:09,600 --> 00:05:14,210 And hopefully I'll now avoid this error by kind of preempting it and handling 102 00:05:14,210 --> 00:05:15,680 it up above. 103 00:05:15,680 --> 00:05:18,320 Let me go ahead and redefine my average function now 104 00:05:18,320 --> 00:05:20,637 to update what it has included here. 105 00:05:20,637 --> 00:05:22,720 I'll go ahead and run the same thing I did before, 106 00:05:22,720 --> 00:05:25,970 giving as input to average this vector of characters. 107 00:05:25,970 --> 00:05:30,250 And I'll see I'll get back now just NA and no error. 108 00:05:30,250 --> 00:05:32,530 Now, we've handled it, preempted it before it has 109 00:05:32,530 --> 00:05:35,530 had the chance to arise in this case. 110 00:05:35,530 --> 00:05:39,700 But if we're going to do something a little unexpected 111 00:05:39,700 --> 00:05:43,270 here, like return NA when the user might have thought they were getting back 112 00:05:43,270 --> 00:05:47,440 a number, it's worth thinking about how to alert the user to that fact. 113 00:05:47,440 --> 00:05:50,980 Right now, we're handling this error silently, if you will. 114 00:05:50,980 --> 00:05:53,230 Meaning we're not going to raise anything to the user. 115 00:05:53,230 --> 00:05:55,255 We're going to hand them back NA. 116 00:05:55,255 --> 00:05:57,130 Unless they looked at the return value, well, 117 00:05:57,130 --> 00:05:59,710 they wouldn't know anything in particular was wrong 118 00:05:59,710 --> 00:06:01,468 or that they had done anything wrong. 119 00:06:01,468 --> 00:06:03,760 So let's think through how we could alert the user here 120 00:06:03,760 --> 00:06:07,180 and let them know what it is exactly we're doing here. 121 00:06:07,180 --> 00:06:11,530 Now, one way to do that is to make use of this function built into R 122 00:06:11,530 --> 00:06:12,940 called message. 123 00:06:12,940 --> 00:06:16,270 Message allows you to essentially send a message to the console 124 00:06:16,270 --> 00:06:17,890 while a function is running. 125 00:06:17,890 --> 00:06:19,650 So let's see if we could use message here. 126 00:06:19,650 --> 00:06:21,560 I'll go back now to my function. 127 00:06:21,560 --> 00:06:25,220 And maybe before I return NA, I could let the user 128 00:06:25,220 --> 00:06:27,900 know what it is I'm about to do. 129 00:06:27,900 --> 00:06:31,370 I could decide to send them a message using the message function. 130 00:06:31,370 --> 00:06:34,250 And it turns out that as input to this message function, 131 00:06:34,250 --> 00:06:36,530 I can provide the character string showing 132 00:06:36,530 --> 00:06:39,260 the message I want to tell the user. 133 00:06:39,260 --> 00:06:42,050 I want to tell them what it is I'm doing and probably tell them 134 00:06:42,050 --> 00:06:43,160 why I'm doing it. 135 00:06:43,160 --> 00:06:47,690 So first, I'll say that maybe this input x here, our vector, I'll 136 00:06:47,690 --> 00:06:53,180 say that x, this x here must be a numeric vector. 137 00:06:53,180 --> 00:06:56,060 So this is the cause of why I'm returning 138 00:06:56,060 --> 00:06:58,640 NA not, let's say, the actual average. 139 00:06:58,640 --> 00:07:00,770 And now, I could say what I'm doing instead. 140 00:07:00,770 --> 00:07:04,400 I'm going to return NA instead. 141 00:07:04,400 --> 00:07:07,190 Now, if I were to run this function, I need 142 00:07:07,190 --> 00:07:11,360 to first redefine it, go back down to my console, and provide the same input. 143 00:07:11,360 --> 00:07:13,140 And now, let's see what happens. 144 00:07:13,140 --> 00:07:14,450 I'll see that message. 145 00:07:14,450 --> 00:07:16,820 So now, we're not being silent anymore. 146 00:07:16,820 --> 00:07:20,555 The user who's run this function, they would get back NA as a return value. 147 00:07:20,555 --> 00:07:21,680 But now they would know it. 148 00:07:21,680 --> 00:07:26,285 They would say-- it would say x must be a numeric vector returning NA instead. 149 00:07:26,285 --> 00:07:28,160 So we've kind of gone away from being silent. 150 00:07:28,160 --> 00:07:31,920 And now, the user knows exactly what has gone wrong, perhaps, in this function. 151 00:07:31,920 --> 00:07:34,170 So a little more intuitive now. 152 00:07:34,170 --> 00:07:37,310 But it turns out that, by convention, message 153 00:07:37,310 --> 00:07:40,130 is often used when things are going just smoothly. 154 00:07:40,130 --> 00:07:42,620 We're trying to tell the user exactly what's going on. 155 00:07:42,620 --> 00:07:44,875 It kind of tells them a bit of a progress indicator. 156 00:07:44,875 --> 00:07:47,000 Gives them an idea of what their function is doing. 157 00:07:47,000 --> 00:07:50,630 It's meant to be used in cases where something has not gone wrong. 158 00:07:50,630 --> 00:07:54,530 But I'd argue, in this case, something has gone wrong. 159 00:07:54,530 --> 00:07:59,000 We gave as input to the average function an input it should not have been given. 160 00:07:59,000 --> 00:08:03,020 So there are ways to take a message and to escalate it, if you will, 161 00:08:03,020 --> 00:08:03,598 in severity. 162 00:08:03,598 --> 00:08:06,140 To let the user know that actually, something has gone wrong. 163 00:08:06,140 --> 00:08:08,540 There might be a potential issue here. 164 00:08:08,540 --> 00:08:10,850 Now, if we were to escalate this message, 165 00:08:10,850 --> 00:08:14,600 we could instead convert it into something called a warning. 166 00:08:14,600 --> 00:08:17,120 Now a warning is good when your function encounters 167 00:08:17,120 --> 00:08:18,950 something that is a potential issue. 168 00:08:18,950 --> 00:08:21,260 It's a bit similar to if you've driven a car 169 00:08:21,260 --> 00:08:23,032 and your check engine light pops up. 170 00:08:23,032 --> 00:08:26,240 Whether you're driving that car, you'll know that, well, your car could still 171 00:08:26,240 --> 00:08:27,440 continue running. 172 00:08:27,440 --> 00:08:28,610 But you might want to check under the hood 173 00:08:28,610 --> 00:08:30,860 and make sure everything is going as you expect it to. 174 00:08:30,860 --> 00:08:33,440 So a warning tells a user that something has gone wrong 175 00:08:33,440 --> 00:08:35,302 that could be a potential issue. 176 00:08:35,302 --> 00:08:37,760 And I think this is more in line with what's happened here. 177 00:08:37,760 --> 00:08:41,299 The user has given us a value that they really shouldn't have given us. 178 00:08:41,299 --> 00:08:43,640 So let's, instead of messaging them, warn them. 179 00:08:43,640 --> 00:08:45,590 And tell them that, look, you should not have done this. 180 00:08:45,590 --> 00:08:48,132 You should make sure this is exactly what you want in the end 181 00:08:48,132 --> 00:08:49,590 as far as the return value. 182 00:08:49,590 --> 00:08:54,427 So I'll convert message here to a warning instead, just like this. 183 00:08:54,427 --> 00:08:57,260 And now that means that the user will not get a regular old message. 184 00:08:57,260 --> 00:09:01,220 They'll get a warning indicating some potential issue. 185 00:09:01,220 --> 00:09:02,660 I'll go ahead and back to line 1. 186 00:09:02,660 --> 00:09:04,370 And I'll redefine this function. 187 00:09:04,370 --> 00:09:08,000 And now what will happen if I run it again with that same input, 188 00:09:08,000 --> 00:09:12,050 I'll see I get not a message, but a warning message. 189 00:09:12,050 --> 00:09:15,740 In this case, I see warning message in my function, 190 00:09:15,740 --> 00:09:19,160 and its input here, the exact message I typed on line 3, 191 00:09:19,160 --> 00:09:23,022 x must be a numeric vector, returning NA instead. 192 00:09:23,022 --> 00:09:24,980 So this is a way of alerting the user that they 193 00:09:24,980 --> 00:09:28,610 might get some value they didn't expect because there 194 00:09:28,610 --> 00:09:31,040 was a potential issue, which is they didn't give us 195 00:09:31,040 --> 00:09:33,830 an actual numeric vector. 196 00:09:33,830 --> 00:09:38,075 So a warning then is good for some potential issue in your function 197 00:09:38,075 --> 00:09:40,070 that it could still recover from. 198 00:09:40,070 --> 00:09:42,290 We could still return NA here. 199 00:09:42,290 --> 00:09:46,430 But there is one more level of severity, going from a warning 200 00:09:46,430 --> 00:09:48,440 to a full-fledged error. 201 00:09:48,440 --> 00:09:51,800 I think we could here have a discussion of whether a warning or an error 202 00:09:51,800 --> 00:09:53,660 is best for this scenario. 203 00:09:53,660 --> 00:09:56,960 On the one hand, I could argue that this function average 204 00:09:56,960 --> 00:10:01,340 is supposed to fundamentally take a vector of numbers and return to me 205 00:10:01,340 --> 00:10:02,150 a number. 206 00:10:02,150 --> 00:10:06,470 If I haven't done that, my function cannot accomplish its goal at all. 207 00:10:06,470 --> 00:10:11,060 In that case, I might not want to just warn the user and return NA, 208 00:10:11,060 --> 00:10:14,547 I might want to just stop entirely and say, look, you've given me an input. 209 00:10:14,547 --> 00:10:16,130 And I have no idea what to do with it. 210 00:10:16,130 --> 00:10:17,390 I can't handle it at all. 211 00:10:17,390 --> 00:10:19,710 I'm going to stop my function in its entirety. 212 00:10:19,710 --> 00:10:23,130 So let's see what it would look like if we actually not just warn the user, 213 00:10:23,130 --> 00:10:25,150 but stop the function entirely. 214 00:10:25,150 --> 00:10:27,210 Now, it just so just so turns out that R has 215 00:10:27,210 --> 00:10:33,300 this function called stop that allows us to raise or to throw this error. 216 00:10:33,300 --> 00:10:38,850 So let's upgrade now our warning to full-fledged error using stop, 217 00:10:38,850 --> 00:10:42,540 letting the user know that we simply cannot proceed with the input they have 218 00:10:42,540 --> 00:10:43,350 given us. 219 00:10:43,350 --> 00:10:45,330 I'll go back now to my average function. 220 00:10:45,330 --> 00:10:50,370 And let me go ahead and use stop, much like I used message and warning, 221 00:10:50,370 --> 00:10:53,460 I'll give it an error message in this case. 222 00:10:53,460 --> 00:10:56,730 But now what I should do is not return NA. 223 00:10:56,730 --> 00:10:59,370 In fact, if I were to run this code, I would never 224 00:10:59,370 --> 00:11:04,710 get to line 4 because as stop implies, my function will stop on line 3. 225 00:11:04,710 --> 00:11:05,910 It will not continue. 226 00:11:05,910 --> 00:11:09,270 It will not return any kind of value in this case. 227 00:11:09,270 --> 00:11:12,300 Why don't I go ahead and remove line 4 now? 228 00:11:12,300 --> 00:11:14,010 And now, what will happen is this. 229 00:11:14,010 --> 00:11:16,800 We're going to ask the question, is this input numeric? 230 00:11:16,800 --> 00:11:17,700 Or is it not? 231 00:11:17,700 --> 00:11:21,100 If it's not numeric, well, I will throw or raise this error 232 00:11:21,100 --> 00:11:23,170 that the user will now see. 233 00:11:23,170 --> 00:11:26,230 Let me go ahead and run or redefine this average function. 234 00:11:26,230 --> 00:11:28,480 Pass as input, the same thing we've been doing so far. 235 00:11:28,480 --> 00:11:31,420 And now, I'll see, well, an error. 236 00:11:31,420 --> 00:11:34,400 And we're kind of back where we started, giving an error now. 237 00:11:34,400 --> 00:11:36,010 But this one is more precise. 238 00:11:36,010 --> 00:11:38,500 It's one that we've raised or thrown ourselves. 239 00:11:38,500 --> 00:11:41,560 And it tells us exactly what has happened and why it has happened. 240 00:11:41,560 --> 00:11:45,760 Here we see error in our function average. x, the input, 241 00:11:45,760 --> 00:11:49,172 must be a numeric vector returning-- oops-- returning NA instead. 242 00:11:49,172 --> 00:11:51,130 And actually, that's probably not true anymore. 243 00:11:51,130 --> 00:11:54,460 So this stop on line 3 doesn't seem to return us anything, 244 00:11:54,460 --> 00:11:55,482 at least not NA now. 245 00:11:55,482 --> 00:11:57,940 So let's go ahead and go ahead and remove that message here 246 00:11:57,940 --> 00:12:00,970 to make sure that the user doesn't anticipate an NA. 247 00:12:00,970 --> 00:12:04,930 Why don't we just say x must be a numeric vector? 248 00:12:04,930 --> 00:12:06,500 I'll go ahead and redefine it. 249 00:12:06,500 --> 00:12:10,210 And now rerun it, and we should see exactly the error 250 00:12:10,210 --> 00:12:12,350 we were hoping to see here. 251 00:12:12,350 --> 00:12:15,700 So we've seen now how to talk to the user and message 252 00:12:15,700 --> 00:12:19,030 them about these kinds of potential issues in their functions. 253 00:12:19,030 --> 00:12:20,070 We've seen message. 254 00:12:20,070 --> 00:12:21,180 We've seen warning. 255 00:12:21,180 --> 00:12:22,440 We've seen stop. 256 00:12:22,440 --> 00:12:27,030 Let me ask now what questions we have about any of these functions so far, 257 00:12:27,030 --> 00:12:30,060 and how we convey or communicate about these errors that 258 00:12:30,060 --> 00:12:32,880 could happen in our functions. 259 00:12:32,880 --> 00:12:36,930 AUDIENCE: What's the difference between using message versus print versus cat 260 00:12:36,930 --> 00:12:38,490 to display an error message? 261 00:12:38,490 --> 00:12:39,865 CARTER ZENKE: So a good question. 262 00:12:39,865 --> 00:12:44,070 We've seen so far these functions like print, and cat, and now this one 263 00:12:44,070 --> 00:12:45,000 called message. 264 00:12:45,000 --> 00:12:47,790 They all seem to show us some text in the console. 265 00:12:47,790 --> 00:12:50,700 Well, a message is a more special kind of text output 266 00:12:50,700 --> 00:12:53,010 that we could later on choose to suppress. 267 00:12:53,010 --> 00:12:56,100 So you've probably seen so far, suppress warnings. 268 00:12:56,100 --> 00:12:58,620 There might also be a function called suppress message 269 00:12:58,620 --> 00:13:01,590 you could use to hide those messages as they come up. 270 00:13:01,590 --> 00:13:04,650 There is no such feature though for print row or for cat. 271 00:13:04,650 --> 00:13:07,380 A message is more particular and exclusive to showing 272 00:13:07,380 --> 00:13:11,940 the user a message they could either view or decline to view later on. 273 00:13:11,940 --> 00:13:13,600 Good question. 274 00:13:13,600 --> 00:13:15,750 OK, so let's consider other scenarios here 275 00:13:15,750 --> 00:13:18,240 that we could try to address in our function. 276 00:13:18,240 --> 00:13:20,590 We've considered so far what happens if we 277 00:13:20,590 --> 00:13:22,900 don't get the type of input we're expecting, 278 00:13:22,900 --> 00:13:25,390 in this case, a non-numeric input. 279 00:13:25,390 --> 00:13:29,080 But there are other scenarios we should probably consider and anticipate. 280 00:13:29,080 --> 00:13:33,700 And one of them might be if our input has NAs in it. 281 00:13:33,700 --> 00:13:36,340 So we've seen that the mean function, if it's 282 00:13:36,340 --> 00:13:41,360 given some input that has NA, well, it returns to us NA instead. 283 00:13:41,360 --> 00:13:44,650 So if we want our function to do the very same thing, 284 00:13:44,650 --> 00:13:46,480 maybe we could have a check here. 285 00:13:46,480 --> 00:13:49,750 Maybe after I check to see if the input is numeric, 286 00:13:49,750 --> 00:13:51,730 I could ask another question. 287 00:13:51,730 --> 00:13:54,940 I could ask this one here, if any, let's say, 288 00:13:54,940 --> 00:14:00,730 if any of the values in this x vector are NA, 289 00:14:00,730 --> 00:14:04,270 just like this, why don't we go ahead and do something else 290 00:14:04,270 --> 00:14:07,240 before we run, now, line 8? 291 00:14:07,240 --> 00:14:12,670 But what is it I should do if any of these numbers are NA? 292 00:14:12,670 --> 00:14:15,890 Well, I could, of course, return NA, like we decided to do earlier. 293 00:14:15,890 --> 00:14:17,620 But now there's a question here. 294 00:14:17,620 --> 00:14:20,740 I don't want to silently return NA. 295 00:14:20,740 --> 00:14:22,480 And I have three options. 296 00:14:22,480 --> 00:14:24,190 I could either message the user. 297 00:14:24,190 --> 00:14:25,630 I could warn them. 298 00:14:25,630 --> 00:14:28,480 Or I could throw an error using stop. 299 00:14:28,480 --> 00:14:30,160 Let me actually ask our audience here. 300 00:14:30,160 --> 00:14:31,420 What would you use? 301 00:14:31,420 --> 00:14:35,800 Would you use message, or warning, or stop in this case? 302 00:14:35,800 --> 00:14:40,120 How might you try to handle this particular input? 303 00:14:40,120 --> 00:14:44,650 AUDIENCE: I think it because is that-- if something or an error happened, 304 00:14:44,650 --> 00:14:45,730 it will-- 305 00:14:45,730 --> 00:14:47,022 the user [INAUDIBLE] the error. 306 00:14:47,022 --> 00:14:48,147 CARTER ZENKE: A good point. 307 00:14:48,147 --> 00:14:51,070 So we've seen that message is good for just conveying information, 308 00:14:51,070 --> 00:14:53,140 when there's nothing really wrong going on. 309 00:14:53,140 --> 00:14:55,930 Whereas, a warning is good to mention a potential issue 310 00:14:55,930 --> 00:14:57,790 the users should take a closer look at. 311 00:14:57,790 --> 00:15:01,300 And I'd argue that in this case, a warning is probably better. 312 00:15:01,300 --> 00:15:03,370 We're doing something a little unexpected. 313 00:15:03,370 --> 00:15:06,910 We're returning NA, as opposed to in this case, the average the user might 314 00:15:06,910 --> 00:15:09,130 have been expecting, so I think a warning 315 00:15:09,130 --> 00:15:11,088 would be best here to alert the user that there 316 00:15:11,088 --> 00:15:12,800 might be some potential issue here. 317 00:15:12,800 --> 00:15:14,110 I'll come back to RStudio. 318 00:15:14,110 --> 00:15:16,870 And let's go ahead and actually implement now this warning. 319 00:15:16,870 --> 00:15:20,840 I'll do it the same way I did before with warning here before I return NA. 320 00:15:20,840 --> 00:15:21,860 I'll give a warning. 321 00:15:21,860 --> 00:15:27,800 And in this case, I'll say as follows, that x, the input now, contains 322 00:15:27,800 --> 00:15:32,130 one or more NA values, just like this. 323 00:15:32,130 --> 00:15:35,060 So now, my function is looking a little bit better 324 00:15:35,060 --> 00:15:38,810 at handling these kinds of cases I might not have anticipated. 325 00:15:38,810 --> 00:15:42,800 First, if we get a non-numeric input, we're going to go ahead and stop. 326 00:15:42,800 --> 00:15:46,970 We fundamentally cannot continue with a non-numeric input. 327 00:15:46,970 --> 00:15:51,350 Then we're going to ask the question, if any of the values in our vector x 328 00:15:51,350 --> 00:15:54,380 are NA, we're going to warn the user and tell them 329 00:15:54,380 --> 00:15:56,532 that x contains one or more NA values. 330 00:15:56,532 --> 00:15:57,740 They might not know that yet. 331 00:15:57,740 --> 00:16:01,820 And we're going to ourselves return NA by convention here. 332 00:16:01,820 --> 00:16:03,880 If though, none of these conditions are true, 333 00:16:03,880 --> 00:16:05,630 we're going to go down to the bottom here. 334 00:16:05,630 --> 00:16:09,890 And we're going to return the average, just as we would otherwise. 335 00:16:09,890 --> 00:16:14,060 So let me go ahead and run this definition for the average function. 336 00:16:14,060 --> 00:16:18,020 I'll go ahead and give it, let's say, some faulty input, like we saw before. 337 00:16:18,020 --> 00:16:20,070 I'll get the error, like we see. 338 00:16:20,070 --> 00:16:23,340 Why don't I try now giving it an input with some NAs 339 00:16:23,340 --> 00:16:28,980 I'll give it 1, the number, 2 the number, and 3, NA, let's see. 340 00:16:28,980 --> 00:16:32,760 Now, I get NA back and this warning message down below. 341 00:16:32,760 --> 00:16:35,430 Well, let's try a normal input average, and I'll 342 00:16:35,430 --> 00:16:37,710 give it 1, 2, and 3, all numbers. 343 00:16:37,710 --> 00:16:41,050 Here we'll see the average of those numbers, 2. 344 00:16:41,050 --> 00:16:43,290 So I think our function now is much better designed. 345 00:16:43,290 --> 00:16:46,570 We're able to handle these edge cases that the user might have given us. 346 00:16:46,570 --> 00:16:48,570 And now, we can alert them to exactly what we're 347 00:16:48,570 --> 00:16:51,540 going to do to handle those cases. 348 00:16:51,540 --> 00:16:54,810 Now, when we come back, we'll see how to actually test 349 00:16:54,810 --> 00:16:56,940 this program this function in particular and make 350 00:16:56,940 --> 00:16:58,890 sure it's behaving like we intend. 351 00:16:58,890 --> 00:17:02,670 We'll come back in five and see how to test programs like these. 352 00:17:02,670 --> 00:17:03,990 Well, we're back. 353 00:17:03,990 --> 00:17:07,680 And we've seen so far how to preempt a few potential errors in functions 354 00:17:07,680 --> 00:17:08,430 we've written. 355 00:17:08,430 --> 00:17:12,960 What's next is to actually test our code and make sure it behaves as we intend. 356 00:17:12,960 --> 00:17:16,530 And we'll do so by writing what we'll call unit tests. 357 00:17:16,530 --> 00:17:19,650 Now, a unit test is some code that we write ourselves 358 00:17:19,650 --> 00:17:22,440 to test some unit of our program. 359 00:17:22,440 --> 00:17:24,390 But what are those units? 360 00:17:24,390 --> 00:17:26,700 Well, functions-- or sorry-- programs are composed 361 00:17:26,700 --> 00:17:28,830 of individual units called functions. 362 00:17:28,830 --> 00:17:34,290 So unit tests now are code we can write to test individual functions inside 363 00:17:34,290 --> 00:17:35,208 of our programs. 364 00:17:35,208 --> 00:17:37,500 And we'll go ahead and write some unit tests of our own 365 00:17:37,500 --> 00:17:39,882 by now testing our average function. 366 00:17:39,882 --> 00:17:41,340 So let's go ahead and do just that. 367 00:17:41,340 --> 00:17:42,690 I'll go back to RStudio. 368 00:17:42,690 --> 00:17:46,440 And I will, by convention, to test this function average, 369 00:17:46,440 --> 00:17:52,080 create a new file called test-average.R. So I will go ahead down here. 370 00:17:52,080 --> 00:17:56,880 And I'll say, I want to create a new file called test-average.R. 371 00:17:56,880 --> 00:17:58,920 And I'll see that file was created for me. 372 00:17:58,920 --> 00:18:04,050 If I now go to my File Explorer over here, I can open up test-average. 373 00:18:04,050 --> 00:18:06,810 And I'll see a blank page, in which I can write 374 00:18:06,810 --> 00:18:09,870 my tests for this average function. 375 00:18:09,870 --> 00:18:13,830 Well again, by convention, what I'll do is write a function 376 00:18:13,830 --> 00:18:18,030 to test this code that I've written now in average.R, in particular, 377 00:18:18,030 --> 00:18:19,590 this function average. 378 00:18:19,590 --> 00:18:23,740 I can call it test_average, just like this. 379 00:18:23,740 --> 00:18:25,810 And I'll make sure it is a function. 380 00:18:25,810 --> 00:18:27,520 Doesn't take any inputs for now. 381 00:18:27,520 --> 00:18:30,600 But within this test here that I've written, 382 00:18:30,600 --> 00:18:32,790 this function I can use to test my code, I'll 383 00:18:32,790 --> 00:18:35,910 then provide one or more test cases. 384 00:18:35,910 --> 00:18:40,830 Now, a test case is some representative scenario our function might encounter. 385 00:18:40,830 --> 00:18:43,350 And we want to ask the question, did our function 386 00:18:43,350 --> 00:18:47,250 return the right value for this particular scenario? 387 00:18:47,250 --> 00:18:50,400 So if I want to ask a question, I could do that using a conditional, 388 00:18:50,400 --> 00:18:51,000 as we've seen. 389 00:18:51,000 --> 00:18:52,530 So maybe I'll ask the question here. 390 00:18:52,530 --> 00:18:57,630 If, let's say, average, let me call our function, and give it as input 391 00:18:57,630 --> 00:19:01,980 this vector we saw earlier, 1, 2, and 3, all numbers. 392 00:19:01,980 --> 00:19:07,170 If the return value of average, given this input, is equal to 2, 393 00:19:07,170 --> 00:19:08,520 well what should I say? 394 00:19:08,520 --> 00:19:11,430 I could probably say, in this case, that average, 395 00:19:11,430 --> 00:19:14,280 my function average passed the test. 396 00:19:14,280 --> 00:19:16,560 And I'll give it a little smiley face, just for fun. 397 00:19:16,560 --> 00:19:19,620 Otherwise, though, if I don't get that value back, 398 00:19:19,620 --> 00:19:22,860 what should I show to the user or to myself here, the programmer? 399 00:19:22,860 --> 00:19:26,760 I could probably say something like, well, average failed the test. 400 00:19:26,760 --> 00:19:29,920 And give me a little sad face, to make sure I know what's going on. 401 00:19:29,920 --> 00:19:34,920 So this is my first test for this function average. 402 00:19:34,920 --> 00:19:37,410 Notice how I've defined one test case. 403 00:19:37,410 --> 00:19:42,480 My test case is when I give average the input 1, 2, and 3 as a list, 404 00:19:42,480 --> 00:19:45,420 I then expect that I'll get back the value 2. 405 00:19:45,420 --> 00:19:47,970 And if I do, I'll say the average passed the test. 406 00:19:47,970 --> 00:19:50,333 If not, I'll go ahead and say we failed the test. 407 00:19:50,333 --> 00:19:52,250 And now, for cleanliness here, let me go ahead 408 00:19:52,250 --> 00:19:56,100 and say backslash n to add a new line to each of these messages here. 409 00:19:56,100 --> 00:20:00,450 And why don't we go ahead and try to run this test_average function? 410 00:20:00,450 --> 00:20:03,240 Well, before we do so, we probably want to know a few things. 411 00:20:03,240 --> 00:20:06,450 One is I've only defined the test_average function here. 412 00:20:06,450 --> 00:20:09,180 I've defined it as having this test case here. 413 00:20:09,180 --> 00:20:13,020 But if I want to run it, I should still call this function, probably 414 00:20:13,020 --> 00:20:14,940 down at the bottom of my file down here. 415 00:20:14,940 --> 00:20:17,757 I'll say let's run test_average. 416 00:20:17,757 --> 00:20:20,840 And one thing you might notice if you're being particularly observant here 417 00:20:20,840 --> 00:20:24,260 is that I'm calling the average function. 418 00:20:24,260 --> 00:20:28,070 But at least within this file here, test-average.R, 419 00:20:28,070 --> 00:20:30,760 well I don't see average defined. 420 00:20:30,760 --> 00:20:33,920 What we don't want to do is this, I don't want to go over to average.R, 421 00:20:33,920 --> 00:20:37,340 copy and paste this, and put it over in test-average.R. 422 00:20:37,340 --> 00:20:42,560 What I can do more simply is run source within this file. 423 00:20:42,560 --> 00:20:46,010 I could say source, and then the name of the file 424 00:20:46,010 --> 00:20:50,930 I want to run before I run the rest of the code now in this program here. 425 00:20:50,930 --> 00:20:55,220 I'm going to run now the code in average.R, which will give me access 426 00:20:55,220 --> 00:20:56,840 to this average function. 427 00:20:56,840 --> 00:21:00,260 And I can then later on call it in this file. 428 00:21:00,260 --> 00:21:04,880 So we're kind of, if you will, importing this function into this file here. 429 00:21:04,880 --> 00:21:09,140 We're not-- we're now able to use any function we've defined in average.R 430 00:21:09,140 --> 00:21:12,260 in this new file, test-average.R because we've sourced it. 431 00:21:12,260 --> 00:21:15,990 We've run it before we've run any code in this file here. 432 00:21:15,990 --> 00:21:20,100 So I think this is all we'll need to test our average function. 433 00:21:20,100 --> 00:21:22,750 Why don't I go ahead and run this test here? 434 00:21:22,750 --> 00:21:25,590 I'll go ahead and click on source to run this file now. 435 00:21:25,590 --> 00:21:28,290 And we'll see, average passed the test. 436 00:21:28,290 --> 00:21:32,820 So it seems like in this case, if I give my average function the input 1, 2, 437 00:21:32,820 --> 00:21:35,820 and 3, it will return to me the value 2. 438 00:21:35,820 --> 00:21:39,570 Well, what are some other test cases we could think of? 439 00:21:39,570 --> 00:21:43,230 Like ideally, we'd think through some representative cases that we should 440 00:21:43,230 --> 00:21:46,050 know how to handle, but also ones that kind of 441 00:21:46,050 --> 00:21:48,210 cover a broad range of scenarios. 442 00:21:48,210 --> 00:21:51,150 Here, I've been testing positive numbers. 443 00:21:51,150 --> 00:21:55,520 But it would be worthwhile to test if average can work with negative numbers 444 00:21:55,520 --> 00:21:56,020 too. 445 00:21:56,020 --> 00:21:58,590 So let's add a new test case, I might just kind of 446 00:21:58,590 --> 00:22:00,090 copy and paste this for now. 447 00:22:00,090 --> 00:22:03,570 I'll take my test case here, and add a new one down below. 448 00:22:03,570 --> 00:22:07,140 And why don't I change now the input to the average function? 449 00:22:07,140 --> 00:22:10,260 I'll give it now some negative numbers, more representative examples 450 00:22:10,260 --> 00:22:15,150 here, negative 1, negative 2, negative 2, and negative 3. 451 00:22:15,150 --> 00:22:16,560 And what should I get back? 452 00:22:16,560 --> 00:22:21,330 Well, negative 2 is the average of negative 1, negative 2, and negative 3. 453 00:22:21,330 --> 00:22:23,340 That's what I should expect. 454 00:22:23,340 --> 00:22:27,150 Here, I've tested now both positive and negative numbers. 455 00:22:27,150 --> 00:22:29,700 But it's probably worth testing zero too, 456 00:22:29,700 --> 00:22:31,950 which is neither positive nor negative. 457 00:22:31,950 --> 00:22:35,610 I'm trying to think of scenarios that might go beyond the usual cases 458 00:22:35,610 --> 00:22:38,520 but are still important for me to be able to handle appropriately. 459 00:22:38,520 --> 00:22:40,680 So why don't I make a new test case? 460 00:22:40,680 --> 00:22:42,030 One that involves zero? 461 00:22:42,030 --> 00:22:43,800 I'll go down here and add a new one. 462 00:22:43,800 --> 00:22:49,530 Maybe I'll do negative 1 and 0 to use that number and then 1. 463 00:22:49,530 --> 00:22:52,410 So now we're going between negative and positive, 464 00:22:52,410 --> 00:22:54,150 and neither negative or positive. 465 00:22:54,150 --> 00:22:57,780 And the average here should be, well, zero. 466 00:22:57,780 --> 00:23:03,060 So here, I have three test cases in this one test function. 467 00:23:03,060 --> 00:23:05,220 First, I'll test positive numbers. 468 00:23:05,220 --> 00:23:06,960 Then I'll test negative numbers. 469 00:23:06,960 --> 00:23:10,680 Then I'll test positive, negative, and neither positive nor negative 470 00:23:10,680 --> 00:23:14,640 numbers, hoping for the right output in each case. 471 00:23:14,640 --> 00:23:17,340 I'll still run my test average function down below. 472 00:23:17,340 --> 00:23:19,980 Let me clear my console, click on source. 473 00:23:19,980 --> 00:23:24,240 And now, I'll see average seems to have passed all three tests. 474 00:23:24,240 --> 00:23:27,180 So my code seems doing pretty well here. 475 00:23:27,180 --> 00:23:31,800 But if we wanted to keep going and adding more test cases to this, 476 00:23:31,800 --> 00:23:34,740 I'd argue that we'd get pretty bored pretty quickly. 477 00:23:34,740 --> 00:23:36,570 And it would be a lot a lot of copy/paste. 478 00:23:36,570 --> 00:23:39,660 Like I've already written here 21 lines of code 479 00:23:39,660 --> 00:23:42,660 to test a function that was 10 lines of code. 480 00:23:42,660 --> 00:23:45,630 And if we wanted to test our programs, and every time, 481 00:23:45,630 --> 00:23:50,310 had to write three, four times the amount of code to test that function, 482 00:23:50,310 --> 00:23:52,540 well, nobody would test their code. 483 00:23:52,540 --> 00:23:54,340 And we want people to test their code. 484 00:23:54,340 --> 00:23:56,880 So thankfully, people who are in the R community 485 00:23:56,880 --> 00:23:59,910 have developed their own package to allow us to make 486 00:23:59,910 --> 00:24:03,090 testing easier, and arguably, more fun. 487 00:24:03,090 --> 00:24:06,270 So a package that is canonical in the R community to test your programs, 488 00:24:06,270 --> 00:24:07,875 is called testthat. 489 00:24:07,875 --> 00:24:11,610 It allows you to test that your function behaves as you might expect. 490 00:24:11,610 --> 00:24:16,530 So let's go ahead and use testthat now to improve the design of our tests 491 00:24:16,530 --> 00:24:20,190 and make it easier to write test cases like these. 492 00:24:20,190 --> 00:24:23,550 Now, testthat comes with a function called 493 00:24:23,550 --> 00:24:28,380 test_that, which allows me to make a new test for my code. 494 00:24:28,380 --> 00:24:31,078 But before I can use it, I, of course, need to install testthat. 495 00:24:31,078 --> 00:24:33,870 So if you haven't already, let me go down to your console down here 496 00:24:33,870 --> 00:24:38,280 and say install.package, and install testthat. 497 00:24:38,280 --> 00:24:41,410 Once you've installed it, you then need to load it. 498 00:24:41,410 --> 00:24:44,280 So I'll go ahead and load testthat, just like this, 499 00:24:44,280 --> 00:24:47,015 by doing library followed by testthat. 500 00:24:47,015 --> 00:24:48,390 Now I'll go ahead and Enter here. 501 00:24:48,390 --> 00:24:51,330 And I'll see that I've now loaded testthat. 502 00:24:51,330 --> 00:24:55,240 I now have access to functions like test_that. 503 00:24:55,240 --> 00:24:58,848 So I think what I've written here so far is pretty good. 504 00:24:58,848 --> 00:25:00,390 It at least has some good test cases. 505 00:25:00,390 --> 00:25:03,147 But I don't need any of this to use test that anymore. 506 00:25:03,147 --> 00:25:04,980 I'm going to go ahead and delete most of it, 507 00:25:04,980 --> 00:25:09,390 but still include now my average.R import, if you will. 508 00:25:09,390 --> 00:25:12,790 I'm taking whatever I've written an average.R and making it able to make 509 00:25:12,790 --> 00:25:13,290 me-- 510 00:25:13,290 --> 00:25:17,160 making myself able to use it now and test-average.R. Now, 511 00:25:17,160 --> 00:25:22,150 we said before that testthat comes with a function called test_that. 512 00:25:22,150 --> 00:25:27,520 And we use this function to define a new test for some function that we have. 513 00:25:27,520 --> 00:25:30,580 So I'll go ahead and go back to test-average.R. 514 00:25:30,580 --> 00:25:34,030 And I'll go ahead and use test_that. 515 00:25:34,030 --> 00:25:39,110 And the first input to testthat is a description of the test I want to run. 516 00:25:39,110 --> 00:25:41,770 So here I'll say I want to test that-- 517 00:25:41,770 --> 00:25:43,270 I want to test that-- 518 00:25:43,270 --> 00:25:47,350 oops-- that average, let's say, the average function here, 519 00:25:47,350 --> 00:25:51,290 calculates the mean, or in this case, the average of these numbers. 520 00:25:51,290 --> 00:25:54,070 So this is kind of an English sentence now. 521 00:25:54,070 --> 00:25:58,100 I'm going to test that average calculates mean. 522 00:25:58,100 --> 00:26:01,240 Well, the next argument is the set of test cases 523 00:26:01,240 --> 00:26:05,680 I want to run to ensure that testthat-- or to ensure that average calculates 524 00:26:05,680 --> 00:26:07,510 the mean appropriately. 525 00:26:07,510 --> 00:26:12,760 By convention, I'll put these test cases inside of these curly braces 526 00:26:12,760 --> 00:26:14,240 as a second argument now. 527 00:26:14,240 --> 00:26:19,010 And I can now provide several test cases inside of this one function 528 00:26:19,010 --> 00:26:21,050 that I've decided to create here. 529 00:26:21,050 --> 00:26:23,030 Now, how could I say-- 530 00:26:23,030 --> 00:26:24,860 or express a test case? 531 00:26:24,860 --> 00:26:28,370 Well, testthat comes with some functions we can use. 532 00:26:28,370 --> 00:26:30,950 And we really use them by expecting-- or saying 533 00:26:30,950 --> 00:26:34,220 what we expect to happen when our function returns some value. 534 00:26:34,220 --> 00:26:37,730 One of these functions here is expect_equal. 535 00:26:37,730 --> 00:26:40,700 We could expect that when our function is run, 536 00:26:40,700 --> 00:26:44,510 we should get back a return value that is equal to some other value, 537 00:26:44,510 --> 00:26:47,040 much like we just did with our conditionals earlier. 538 00:26:47,040 --> 00:26:49,010 But now, I'll use expect_equal. 539 00:26:49,010 --> 00:26:50,990 Let me go back now to my code. 540 00:26:50,990 --> 00:26:56,810 And inside of my test here, I'll go ahead and define a few test cases. 541 00:26:56,810 --> 00:27:02,030 The first one will be I want to expect equality between the return 542 00:27:02,030 --> 00:27:07,850 value of the average function when given 1, 2, and 3 as input, and this value 2 543 00:27:07,850 --> 00:27:09,360 on the right hand side. 544 00:27:09,360 --> 00:27:12,770 So to be clear here, the first input to expect_equal 545 00:27:12,770 --> 00:27:15,950 is the argument, the value we'll get back from, 546 00:27:15,950 --> 00:27:18,020 in this case, our average function. 547 00:27:18,020 --> 00:27:20,810 And the next argument is the value we expect 548 00:27:20,810 --> 00:27:23,330 to find as the return value of average. 549 00:27:23,330 --> 00:27:25,790 I'm going to expect those are now equal. 550 00:27:25,790 --> 00:27:27,470 And this is our test case. 551 00:27:27,470 --> 00:27:29,527 There are no conditionals, no nothing else. 552 00:27:29,527 --> 00:27:31,610 We're going to go ahead and just use this to test, 553 00:27:31,610 --> 00:27:37,280 did average return to us 2 when we gave it as input a vector of 1, 2, and 3? 554 00:27:37,280 --> 00:27:39,470 Well, let's now add our other test cases. 555 00:27:39,470 --> 00:27:42,170 I could copy/paste this and change the input. 556 00:27:42,170 --> 00:27:45,860 I'll do negative 1, negative 2, and negative 3 557 00:27:45,860 --> 00:27:48,080 to test now for negative values. 558 00:27:48,080 --> 00:27:50,390 The expected value is negative 2 now. 559 00:27:50,390 --> 00:27:51,570 I'll do the same now. 560 00:27:51,570 --> 00:27:54,440 But for negative 1, 0, and 1. 561 00:27:54,440 --> 00:27:57,130 The expected value now is going to be zero. 562 00:27:57,130 --> 00:27:59,630 And why don't we go ahead and just add some more test cases? 563 00:27:59,630 --> 00:28:01,190 Now it's just so easy for us. 564 00:28:01,190 --> 00:28:06,410 One thing I could do is test maybe more than an odd number of numbers. 565 00:28:06,410 --> 00:28:07,970 I've always been testing three here. 566 00:28:07,970 --> 00:28:10,580 Maybe I'll test four as another scenario. 567 00:28:10,580 --> 00:28:15,620 I'll go ahead and do, let's say, negative 2, negative 1, 1, and 2. 568 00:28:15,620 --> 00:28:18,950 So now we test both positive and negative numbers. 569 00:28:18,950 --> 00:28:21,680 But now, we're giving an even number of numbers as input. 570 00:28:21,680 --> 00:28:25,410 And we should get back, of course, zero in the end. 571 00:28:25,410 --> 00:28:28,193 So this, then, is our test of our average function. 572 00:28:28,193 --> 00:28:30,110 Let's go ahead and see what could happen here. 573 00:28:30,110 --> 00:28:34,760 Notice how at the top of RStudio, I now see a button called Run Tests. 574 00:28:34,760 --> 00:28:38,720 This means we're going to run every test we see in this file. 575 00:28:38,720 --> 00:28:42,200 I could, alternatively though, go down to the bottom of my console and just 576 00:28:42,200 --> 00:28:43,890 source this file to run it. 577 00:28:43,890 --> 00:28:48,635 I could say source test-average.R and let's see what kind of output we get. 578 00:28:48,635 --> 00:28:49,760 I'll go ahead and run this. 579 00:28:49,760 --> 00:28:51,980 And oh, test passed. 580 00:28:51,980 --> 00:28:55,430 We have a little gold medal here to say our function worked 581 00:28:55,430 --> 00:28:56,720 as we intended it to. 582 00:28:56,720 --> 00:29:00,680 Here, I can see that average will return to me all these values for each 583 00:29:00,680 --> 00:29:03,350 of these test cases, making it much easier 584 00:29:03,350 --> 00:29:07,940 now to write test cases like these, thanks to testthat. 585 00:29:07,940 --> 00:29:12,800 Let me ask now, what questions do we have on defining these test cases 586 00:29:12,800 --> 00:29:16,433 and using a package like testthat? 587 00:29:16,433 --> 00:29:18,350 AUDIENCE: Don't we have to source average data 588 00:29:18,350 --> 00:29:23,120 because if you have a huge file, and if you only want to test one function, 589 00:29:23,120 --> 00:29:25,880 then it won't be like a good idea to source the entire file? 590 00:29:25,880 --> 00:29:27,130 CARTER ZENKE: A good question. 591 00:29:27,130 --> 00:29:30,560 So notice here how I actually ran the source test-average.R because my goal 592 00:29:30,560 --> 00:29:33,320 was to run these tests here top to bottom. 593 00:29:33,320 --> 00:29:37,820 Test-average.R already sources or runs, if you will, average.R, 594 00:29:37,820 --> 00:29:42,710 giving me access to any functions inside of that file here. 595 00:29:42,710 --> 00:29:45,650 When we come back another time, next lecture, 596 00:29:45,650 --> 00:29:47,750 we'll see how to make packages of our code. 597 00:29:47,750 --> 00:29:51,487 And we'll see how to write tests that don't require us to put source up top. 598 00:29:51,487 --> 00:29:54,320 But so long as we're not running packages and just testing our code, 599 00:29:54,320 --> 00:29:58,550 we're going to need to include source average.R up top to give us access 600 00:29:58,550 --> 00:29:59,660 to average. 601 00:29:59,660 --> 00:30:02,900 So we can run it inside this test file here. 602 00:30:02,900 --> 00:30:08,120 What other questions do we have on testthat or testing our code so far? 603 00:30:08,120 --> 00:30:11,553 AUDIENCE: When is an appropriate time to write tests? 604 00:30:11,553 --> 00:30:13,970 CARTER ZENKE: Yeah, when is it appropriate to write tests? 605 00:30:13,970 --> 00:30:15,803 And what time is appropriate to write tests? 606 00:30:15,803 --> 00:30:19,170 So there are-- so I'd say any varying philosophies on this. 607 00:30:19,170 --> 00:30:22,440 There is a kind of a movement or a philosophy called 608 00:30:22,440 --> 00:30:25,950 test-driven development, which argues you should write tests 609 00:30:25,950 --> 00:30:27,840 before you even write your code. 610 00:30:27,840 --> 00:30:30,480 And by writing your tests, you kind of get your mind 611 00:30:30,480 --> 00:30:32,190 around what you want your code to do. 612 00:30:32,190 --> 00:30:34,740 And then you write code to pass those tests. 613 00:30:34,740 --> 00:30:36,810 On the other hand, folks might say, well, I just 614 00:30:36,810 --> 00:30:39,870 want to get something done, I'll write the code, and then I'll test it. 615 00:30:39,870 --> 00:30:42,210 There's arguments on both sides to be made. 616 00:30:42,210 --> 00:30:45,270 It's going up to you and your team to decide when you want to test 617 00:30:45,270 --> 00:30:46,620 and how you want to test. 618 00:30:46,620 --> 00:30:50,800 This is telling us how we could test now using packages like testthat. 619 00:30:50,800 --> 00:30:54,970 But good question on when to test as well. 620 00:30:54,970 --> 00:30:58,240 OK, so here, we've written a pretty good test case. 621 00:30:58,240 --> 00:31:00,660 There are many test cases here for average function. 622 00:31:00,660 --> 00:31:03,060 But there are still other scenarios to test. 623 00:31:03,060 --> 00:31:06,870 And in particular, we saw what could happen if we gave average 624 00:31:06,870 --> 00:31:09,840 some input that included NA values. 625 00:31:09,840 --> 00:31:12,990 Well, we could just as well test the result of average 626 00:31:12,990 --> 00:31:17,070 when it's given some NA values as we could some regular values like these. 627 00:31:17,070 --> 00:31:23,100 So let's go back now and add some new tests and test cases to our file here. 628 00:31:23,100 --> 00:31:28,320 Now, if I want to test what average does, when it's given some input that 629 00:31:28,320 --> 00:31:32,130 includes NA values, well, I could keep adding test cases here 630 00:31:32,130 --> 00:31:35,820 to my single function, or my single test, average calculates mean. 631 00:31:35,820 --> 00:31:39,630 But if I were to keep going and adding more and more tests, 632 00:31:39,630 --> 00:31:42,480 this function would become quite, quite long. 633 00:31:42,480 --> 00:31:48,120 So ideally, what I want to do instead is maybe divide up my test, my test cases, 634 00:31:48,120 --> 00:31:50,490 into a way that makes logical sense. 635 00:31:50,490 --> 00:31:54,960 Here, I argue, I'm going to have all my test cases that are giving average 636 00:31:54,960 --> 00:31:59,250 some pretty typical inputs, numbers, I'm going to find the average of them, 637 00:31:59,250 --> 00:32:01,320 and get back and check for equality. 638 00:32:01,320 --> 00:32:05,220 But if I'm going to give average some new type of input, like inputs 639 00:32:05,220 --> 00:32:09,330 that include NAs, well, maybe I should make a new test for that. 640 00:32:09,330 --> 00:32:14,280 And I can do that by including more than one instance of this testthat function. 641 00:32:14,280 --> 00:32:16,440 I could say I want to test that. 642 00:32:16,440 --> 00:32:20,700 Now, average, let's say, how do I want to word this? 643 00:32:20,700 --> 00:32:27,240 I want to say test_that average warns about NAs in input. 644 00:32:27,240 --> 00:32:31,760 So we saw before that our goal, when we wrote the average function, 645 00:32:31,760 --> 00:32:36,590 was to test and to make sure that it gave us a warning when the input 646 00:32:36,590 --> 00:32:38,670 x included NA values. 647 00:32:38,670 --> 00:32:41,210 So we could write some test cases to make sure 648 00:32:41,210 --> 00:32:44,460 that that is what is happening with our average function. 649 00:32:44,460 --> 00:32:46,460 So here's my description of this test. 650 00:32:46,460 --> 00:32:49,100 I'll go ahead and give myself some space now for test cases. 651 00:32:49,100 --> 00:32:55,580 And I want to test that average raises or throws a warning here. 652 00:32:55,580 --> 00:32:59,480 And it seems like expecting equality might not work because this 653 00:32:59,480 --> 00:33:01,730 allows me to test two distinct values. 654 00:33:01,730 --> 00:33:04,430 But a warning is something else entirely. 655 00:33:04,430 --> 00:33:09,530 Well, thankfully, in testthat, we have access to other expectations 656 00:33:09,530 --> 00:33:14,540 we can say, one including test warning, or expect_warning in this case. 657 00:33:14,540 --> 00:33:17,840 We can say we want to expect a warning from this function 658 00:33:17,840 --> 00:33:20,650 or expect no warning at all. 659 00:33:20,650 --> 00:33:23,760 So let me go over here and say I want to expect 660 00:33:23,760 --> 00:33:27,720 that I'll get a warning from the average function 661 00:33:27,720 --> 00:33:32,580 when I give it some input like this, maybe 1, NA, and 3. 662 00:33:32,580 --> 00:33:34,710 So some input that involves an NA. 663 00:33:34,710 --> 00:33:36,270 I can do the same thing down below. 664 00:33:36,270 --> 00:33:39,780 And I could say, why don't I give it maybe all NAs here, 665 00:33:39,780 --> 00:33:41,430 a vector of three NAs? 666 00:33:41,430 --> 00:33:45,420 And now I could expect that when I run average in this way, 667 00:33:45,420 --> 00:33:47,460 I expect I'll get a warning. 668 00:33:47,460 --> 00:33:50,280 So let's go ahead and rerun our tests now, 669 00:33:50,280 --> 00:33:54,622 testing for both a calculation of the mean and a warning from average. 670 00:33:54,622 --> 00:33:55,830 Let me go ahead and run this. 671 00:33:55,830 --> 00:33:58,950 And oh, what do we see? 672 00:33:58,950 --> 00:34:00,107 Test failed. 673 00:34:00,107 --> 00:34:01,440 So let's see what happened here. 674 00:34:01,440 --> 00:34:05,490 If I scroll back up, I'll see that one test passed. 675 00:34:05,490 --> 00:34:08,040 That seems to be my first one here, average still 676 00:34:08,040 --> 00:34:09,300 seems to calculate the mean. 677 00:34:09,300 --> 00:34:13,440 But if I look down below, my next test is that average 678 00:34:13,440 --> 00:34:15,780 warns about NAs in the input. 679 00:34:15,780 --> 00:34:19,260 And in fact, what I've gotten it seems, from average, 680 00:34:19,260 --> 00:34:23,760 is not a warning, but an error, error in average. 681 00:34:23,760 --> 00:34:27,989 When I gave it NA, NA, NA, x must be a numeric vector. 682 00:34:27,989 --> 00:34:32,130 So although I expected a warning on line 12, 683 00:34:32,130 --> 00:34:35,560 it seems like I got an error instead. 684 00:34:35,560 --> 00:34:38,040 So that's probably cause for me to go back to my code, 685 00:34:38,040 --> 00:34:39,960 and see what could happen, so I could fix it, 686 00:34:39,960 --> 00:34:43,920 and make sure it adheres to these expectations of my function. 687 00:34:43,920 --> 00:34:46,620 Let me come back now to average.R and think 688 00:34:46,620 --> 00:34:48,969 through what could be going wrong here. 689 00:34:48,969 --> 00:34:52,650 Well, we got, it seems, this error, that x 690 00:34:52,650 --> 00:34:56,100 must be a numeric vector, when we wanted, it seems, 691 00:34:56,100 --> 00:34:58,480 this warning down below. 692 00:34:58,480 --> 00:35:04,080 So maybe what happened is that when I gave it a vector of all NAs, maybe 693 00:35:04,080 --> 00:35:08,290 it found that that vector is not numeric, which it might well have done. 694 00:35:08,290 --> 00:35:10,840 So let me go ahead and get on my console here and test this. 695 00:35:10,840 --> 00:35:16,140 I could say is.numeric, and give as input NA, NA, NA, and ask 696 00:35:16,140 --> 00:35:17,920 is that numeric or not? 697 00:35:17,920 --> 00:35:19,870 Hmm, so it's not. 698 00:35:19,870 --> 00:35:22,840 So because this vector of NAs is not numeric, 699 00:35:22,840 --> 00:35:28,270 I would first throw my error that x must be a numeric vector. 700 00:35:28,270 --> 00:35:33,070 But what I really want to do is return NA if I get a vector of NAs. 701 00:35:33,070 --> 00:35:36,580 So I think I should probably reorder here this handling of my errors. 702 00:35:36,580 --> 00:35:40,220 Let me go ahead and reorder this and put this up top first. 703 00:35:40,220 --> 00:35:43,300 So now, what we'll do is first check. 704 00:35:43,300 --> 00:35:48,460 Is NA-- or is the vector-- is the vector we got as input to average here, 705 00:35:48,460 --> 00:35:49,960 does it include any values? 706 00:35:49,960 --> 00:35:52,480 If so, we'll raise a warning and return an NA. 707 00:35:52,480 --> 00:35:54,760 And then we'll check if it's numeric. 708 00:35:54,760 --> 00:35:57,380 I think this might help us solve our problem here. 709 00:35:57,380 --> 00:36:01,000 Let me go back to test-average.R. Let me rerun these tests with our updated 710 00:36:01,000 --> 00:36:02,830 version of average.R. 711 00:36:02,830 --> 00:36:05,920 And we'll see all the tests passed. 712 00:36:05,920 --> 00:36:07,690 We have a little confetti, some rainbows. 713 00:36:07,690 --> 00:36:10,700 We seem to be moving along pretty well. 714 00:36:10,700 --> 00:36:14,920 So what questions do we have on this new version of our test? 715 00:36:14,920 --> 00:36:17,690 We've expected how average will calculate the mean. 716 00:36:17,690 --> 00:36:19,820 And will we get a warning, now? 717 00:36:19,820 --> 00:36:22,010 What should we do next? 718 00:36:22,010 --> 00:36:26,060 And what questions do we have before we move on? 719 00:36:26,060 --> 00:36:28,310 AUDIENCE: When you get-- when the user gets a warning, 720 00:36:28,310 --> 00:36:34,090 can we use pass to get the user to rerun the input to have a warning in-- 721 00:36:34,090 --> 00:36:34,828 or an error? 722 00:36:34,828 --> 00:36:38,120 CARTER ZENKE: So I'm hearing a question about handling these errors or warnings 723 00:36:38,120 --> 00:36:39,920 as they come up in our code. 724 00:36:39,920 --> 00:36:42,770 And it turns out that R actually has a function called try 725 00:36:42,770 --> 00:36:46,880 and a function called try_catch that let us handle errors and warnings as they 726 00:36:46,880 --> 00:36:47,970 arise in our code. 727 00:36:47,970 --> 00:36:50,720 We won't focus on those today, but certainly learn more about them 728 00:36:50,720 --> 00:36:52,718 if you're curious about them too. 729 00:36:52,718 --> 00:36:55,010 Let's keep going here and see what else we should test. 730 00:36:55,010 --> 00:36:59,090 So I argue that we've tested that average returns us the right value 731 00:36:59,090 --> 00:37:02,930 to calculate the mean, that it also warns about NAs in input. 732 00:37:02,930 --> 00:37:04,610 But what else could we test? 733 00:37:04,610 --> 00:37:08,990 Well, it seems like we should also test that average actually returns 734 00:37:08,990 --> 00:37:12,260 to us NA if we give it NA value. 735 00:37:12,260 --> 00:37:14,660 Here, we're only expecting a warning. 736 00:37:14,660 --> 00:37:18,510 And we're not so much testing if we're getting back the right return value. 737 00:37:18,510 --> 00:37:19,820 So let me do just that. 738 00:37:19,820 --> 00:37:26,000 I'll go ahead and add a new test case, one that tests that average returns NA. 739 00:37:26,000 --> 00:37:27,320 I'll test that here. 740 00:37:27,320 --> 00:37:34,260 Average returns NA with NAs in our input, just like this. 741 00:37:34,260 --> 00:37:37,550 And I'll go ahead and add some more test cases. 742 00:37:37,550 --> 00:37:41,330 Well here, it seems like expect_equal might work for me. 743 00:37:41,330 --> 00:37:44,240 I'm going to test that the return value of average 744 00:37:44,240 --> 00:37:46,680 will be equal to an NA value. 745 00:37:46,680 --> 00:37:49,280 So I'll go ahead and expect_equal. 746 00:37:49,280 --> 00:37:51,230 I'll go ahead and give the same kind of input 747 00:37:51,230 --> 00:37:54,140 I gave down below here, 1, NA, and 3. 748 00:37:54,140 --> 00:37:57,620 I'll expect that to be equal now to NA. 749 00:37:57,620 --> 00:38:02,750 Let me go down over here and say I expect_equal between this vector 750 00:38:02,750 --> 00:38:05,900 of just all NA values and the NA value itself. 751 00:38:05,900 --> 00:38:09,800 Let me go ahead and actually make this a vector, just like that. 752 00:38:09,800 --> 00:38:11,370 Make this a vector as well. 753 00:38:11,370 --> 00:38:14,990 And now we're passing into average those same inputs down below. 754 00:38:14,990 --> 00:38:18,710 But now, I'm testing to see if the return value is NA. 755 00:38:18,710 --> 00:38:22,400 I'll go back to my console now and run these tests that I've just added. 756 00:38:22,400 --> 00:38:26,690 And we'll see, hmm, something a little bit curious. 757 00:38:26,690 --> 00:38:29,210 I'll see test passed. 758 00:38:29,210 --> 00:38:30,890 And I'll see test passed. 759 00:38:30,890 --> 00:38:34,010 But I'll see a warning, it seems, in my second test. 760 00:38:34,010 --> 00:38:36,980 That average returns NA with NA as an input. 761 00:38:36,980 --> 00:38:42,560 And I get back a warning that x contains one or more NA values. 762 00:38:42,560 --> 00:38:46,700 Well, that is kind of expected because if we look at our average function, 763 00:38:46,700 --> 00:38:50,390 we'll see that if we do give our function NA values, 764 00:38:50,390 --> 00:38:52,490 we're going to throw this warning. 765 00:38:52,490 --> 00:38:55,550 But what we're testing in this test is not 766 00:38:55,550 --> 00:38:58,910 so much that we get a warning or not, we already did that down below. 767 00:38:58,910 --> 00:39:03,080 We're testing if the return value is equal to NA. 768 00:39:03,080 --> 00:39:07,142 So this would be a good chance for us to use a function like suppressWarnings. 769 00:39:07,142 --> 00:39:09,350 We're saying, we don't really care about the warning, 770 00:39:09,350 --> 00:39:12,840 we get we just want to test the return value in this case. 771 00:39:12,840 --> 00:39:19,460 So I'll wrap the average function, in this case, inside of suppressWarnings-- 772 00:39:19,460 --> 00:39:22,150 suppressWarning-- 773 00:39:22,150 --> 00:39:25,480 suppressWarnings, sorry, let me make it plural up above. 774 00:39:25,480 --> 00:39:28,330 And now, I think we should probably solve our problem here. 775 00:39:28,330 --> 00:39:29,830 I'll go ahead and rerun these tests. 776 00:39:29,830 --> 00:39:33,530 And now, we'll see, I have three tests passing overall. 777 00:39:33,530 --> 00:39:34,780 Well, what else could we test? 778 00:39:34,780 --> 00:39:38,650 We saw before in average.R that we also want to stop. 779 00:39:38,650 --> 00:39:41,020 We want to end our function, throw an error 780 00:39:41,020 --> 00:39:43,660 if we give it some non-numeric input. 781 00:39:43,660 --> 00:39:45,790 We could just as well test for that. 782 00:39:45,790 --> 00:39:48,670 I have this other set of functions, thanks to testthat. 783 00:39:48,670 --> 00:39:52,120 One is called expect_error and expect_no_error, 784 00:39:52,120 --> 00:39:56,390 those test if my function has stopped given some given input. 785 00:39:56,390 --> 00:39:58,310 So I could use now expect_error. 786 00:39:58,310 --> 00:40:01,450 We come back to test-average, and go down below and say, 787 00:40:01,450 --> 00:40:07,150 I'll test that maybe average stops if x, our input, 788 00:40:07,150 --> 00:40:09,618 is non-numeric, just like this. 789 00:40:09,618 --> 00:40:11,410 And now, I'll go ahead and expect that I'll 790 00:40:11,410 --> 00:40:15,940 get back some error if I give, as input to the average function, 791 00:40:15,940 --> 00:40:19,180 some value that is not numeric, maybe something like-- 792 00:40:19,180 --> 00:40:23,260 something like quack, just like this, or something like that test 793 00:40:23,260 --> 00:40:28,870 we saw before, 1 as a character, 2 as a character, and 3 as a character. 794 00:40:28,870 --> 00:40:30,760 So this is non-numeric input. 795 00:40:30,760 --> 00:40:33,250 Let's see if we get back the error from average. 796 00:40:33,250 --> 00:40:34,270 I'll go ahead and run. 797 00:40:34,270 --> 00:40:35,470 I'll first save my file. 798 00:40:35,470 --> 00:40:37,510 I'll then run test-average. 799 00:40:37,510 --> 00:40:40,360 And now, I'll see four tests passing too. 800 00:40:40,360 --> 00:40:44,530 So here, we've seen how to write test cases for our code thanks 801 00:40:44,530 --> 00:40:49,060 to expect_warning, expect_equal, and expect_error. 802 00:40:49,060 --> 00:40:51,880 What other questions do we have on testing 803 00:40:51,880 --> 00:40:57,610 our code using these kinds of test cases and testthat more generally? 804 00:40:57,610 --> 00:41:01,090 AUDIENCE: One thing which comes up very much in computer science 805 00:41:01,090 --> 00:41:03,550 is floating point inaccuracies, right? 806 00:41:03,550 --> 00:41:05,908 So can we account for that? 807 00:41:05,908 --> 00:41:07,450 CARTER ZENKE: A really good question. 808 00:41:07,450 --> 00:41:10,117 Actually, an excellent segue I was just going to talk about now. 809 00:41:10,117 --> 00:41:14,560 So it seems like in our code, we are testing integer numbers. 810 00:41:14,560 --> 00:41:16,420 I have here, 1, 2, and 3. 811 00:41:16,420 --> 00:41:18,650 And we get back a whole number, like 2. 812 00:41:18,650 --> 00:41:20,300 Same for all these other test cases. 813 00:41:20,300 --> 00:41:23,540 But to your point, we've missed an important kind 814 00:41:23,540 --> 00:41:27,847 of test case, which involves these floating point or decimal numbers. 815 00:41:27,847 --> 00:41:29,930 And due to what you've said about decimal numbers, 816 00:41:29,930 --> 00:41:31,970 and the way they're represented, there are some special considerations 817 00:41:31,970 --> 00:41:34,980 we take into account before we can test those kinds of numbers. 818 00:41:34,980 --> 00:41:37,820 So let's see what we should take into account before we test, 819 00:41:37,820 --> 00:41:40,590 in this case, floating point or decimal values. 820 00:41:40,590 --> 00:41:45,530 So let's actually go ahead over here and think through how we could test it, 821 00:41:45,530 --> 00:41:46,850 at least hypothetically. 822 00:41:46,850 --> 00:41:50,030 I could use here expect_equal still. 823 00:41:50,030 --> 00:41:53,150 And give now as input the average function. 824 00:41:53,150 --> 00:41:55,880 But I'll give it some floating point values. 825 00:41:55,880 --> 00:42:01,820 Maybe in this case, I will try to test these, 0.1 and 0.5, 826 00:42:01,820 --> 00:42:05,900 taking the average of this, which will be 0.3. 827 00:42:05,900 --> 00:42:11,210 Now, it seems to us, as humans, that if we were to do this kind of calculation, 828 00:42:11,210 --> 00:42:13,310 we would get the following kind of answer. 829 00:42:13,310 --> 00:42:20,810 That 0.1 plus 0.5 divided by 2 is equal to exactly 0.3. 830 00:42:20,810 --> 00:42:25,310 This is the average of these numbers, 0.1 and 0.5. 831 00:42:25,310 --> 00:42:30,140 But it turns out that computers can't do math exactly like this. 832 00:42:30,140 --> 00:42:33,740 That there are actually an infinite number of floating point numbers, 833 00:42:33,740 --> 00:42:36,470 of decimal numbers, and only a finite number of bits 834 00:42:36,470 --> 00:42:38,030 we can use to represent them. 835 00:42:38,030 --> 00:42:41,990 Which leads to this problem known as floating-point imprecision. 836 00:42:41,990 --> 00:42:44,210 We have so many decimal numbers to represent 837 00:42:44,210 --> 00:42:48,020 and only so few bits represent them that we can't represent all of them 838 00:42:48,020 --> 00:42:49,130 precisely. 839 00:42:49,130 --> 00:42:52,040 And in fact, a computer, even one running R, 840 00:42:52,040 --> 00:42:55,760 might perform that same calculation and arrive at this, 841 00:42:55,760 --> 00:43:02,870 0.1 plus 0.5 divided by 2 is equal to 0.299999-- 842 00:43:02,870 --> 00:43:05,790 lots of nines, then a lot of other values after that. 843 00:43:05,790 --> 00:43:07,880 So not exactly 0.3. 844 00:43:07,880 --> 00:43:11,690 And so because of this, we need to allow for some tolerance 845 00:43:11,690 --> 00:43:14,690 when we test our floating point values. 846 00:43:14,690 --> 00:43:17,040 But before we do that, let me kind of prove to you 847 00:43:17,040 --> 00:43:19,100 that this is what's happening in R, even. 848 00:43:19,100 --> 00:43:21,260 I'll come back to my computer here. 849 00:43:21,260 --> 00:43:23,750 And let's get an idea of this imprecision 850 00:43:23,750 --> 00:43:26,090 that happens in floating point values. 851 00:43:26,090 --> 00:43:28,080 Let's go back down to my console here. 852 00:43:28,080 --> 00:43:34,820 And if I were to print, let's say, this value 0.3, I'll get back 0.3. 853 00:43:34,820 --> 00:43:40,820 But if I push R just a little bit and I ask it what really is 0.3? 854 00:43:40,820 --> 00:43:48,230 I could say print 0.3 and show me now 17 digits after the decimal point. 855 00:43:48,230 --> 00:43:51,530 Let's see what we get. 856 00:43:51,530 --> 00:43:56,570 0.29999999999-- so it turns out that 0.3 is 857 00:43:56,570 --> 00:43:58,970 just one of those decimal numbers we can't properly 858 00:43:58,970 --> 00:44:03,530 represent due to having so many floating point values and so few bits 859 00:44:03,530 --> 00:44:04,670 to represent them. 860 00:44:04,670 --> 00:44:07,040 Now, because of this reality, like we said, 861 00:44:07,040 --> 00:44:09,770 we do need to allow for something called tolerance 862 00:44:09,770 --> 00:44:11,970 when we're testing code like this. 863 00:44:11,970 --> 00:44:16,850 Now, tolerance is the range of values I will accept above or below my expected 864 00:44:16,850 --> 00:44:20,150 value as being equal to that expected value. 865 00:44:20,150 --> 00:44:24,740 Mathematically, we could say this, maybe my expected value is 0.3, 866 00:44:24,740 --> 00:44:27,380 but in terms of numbers being equal to 0.3, 867 00:44:27,380 --> 00:44:30,590 I'll say any number between this range here, plus or minus 868 00:44:30,590 --> 00:44:35,000 let's say 1 times 10 to the negative 8, some small number here. 869 00:44:35,000 --> 00:44:37,760 Now, this number is our tolerance. 870 00:44:37,760 --> 00:44:40,310 And we can change it, if we wanted to too, depending 871 00:44:40,310 --> 00:44:44,390 on our needs for precision, or how we're presenting these numbers here. 872 00:44:44,390 --> 00:44:47,900 I'm gonna come back now to RStudio and show you that expect_equal actually 873 00:44:47,900 --> 00:44:52,140 has a parameter called tolerance that we could use to change this value here. 874 00:44:52,140 --> 00:44:53,330 I'll come back over. 875 00:44:53,330 --> 00:44:55,940 And let's look now at expect_equal. 876 00:44:55,940 --> 00:45:01,160 If I clear my console and go here, and say, in this particular test, 877 00:45:01,160 --> 00:45:02,840 I want to set some tolerance here. 878 00:45:02,840 --> 00:45:05,780 I could say as another parameter, tolerance, 879 00:45:05,780 --> 00:45:10,130 and set it equal to some small value, let's say 1 times 10 880 00:45:10,130 --> 00:45:14,810 to the negative 8, which I represent now as with 1e negative 8 here. 881 00:45:14,810 --> 00:45:18,027 Now, it turns out that testthat gives you some fault 882 00:45:18,027 --> 00:45:19,610 tolerance that's already been decided. 883 00:45:19,610 --> 00:45:22,652 So I'm actually going to rely on them to choose my tolerance for me here. 884 00:45:22,652 --> 00:45:24,980 But you could, if you wanted to, override it 885 00:45:24,980 --> 00:45:27,950 with this parameter called tolerance. 886 00:45:27,950 --> 00:45:31,030 So we've seen now this idea of floating-point imprecision, 887 00:45:31,030 --> 00:45:34,280 and this idea of tolerance, which we can use to better test our floating point 888 00:45:34,280 --> 00:45:36,410 values inside of our tests. 889 00:45:36,410 --> 00:45:40,610 Let me ask now what questions we have on either of these topics here. 890 00:45:40,610 --> 00:45:44,270 AUDIENCE: Why writing code spending much time to test another code, 891 00:45:44,270 --> 00:45:47,328 while you can go to the code that you wrote and just test it simply? 892 00:45:47,328 --> 00:45:48,870 CARTER ZENKE: A really good question. 893 00:45:48,870 --> 00:45:51,950 So maybe you're in the habit of kind of just testing your code in the console, 894 00:45:51,950 --> 00:45:53,658 for instance, kind of like I did earlier. 895 00:45:53,658 --> 00:45:56,900 If I come back over here, maybe I wrote the average function 896 00:45:56,900 --> 00:45:59,750 and I decided I'm pretty confident this will work. 897 00:45:59,750 --> 00:46:01,940 I'll assign it to be the average here. 898 00:46:01,940 --> 00:46:06,110 And I'll just run a few test cases, like average c, 1, 2, 3 here. 899 00:46:06,110 --> 00:46:07,220 OK, that seems to work. 900 00:46:07,220 --> 00:46:12,680 Maybe I'll do average, and then maybe c negative 1, negative 2, negative 3. 901 00:46:12,680 --> 00:46:14,460 That seems to work as well. 902 00:46:14,460 --> 00:46:17,820 Now, the reason you might not do this and instead spend 903 00:46:17,820 --> 00:46:22,320 more time to write your own test is simply just robustness of your tests. 904 00:46:22,320 --> 00:46:26,760 Notice, I can very quickly test for a lot of different scenarios in my tests 905 00:46:26,760 --> 00:46:30,720 here and make sure that my function works as it expects. 906 00:46:30,720 --> 00:46:34,500 It's also very useful if you're collaborating with others. 907 00:46:34,500 --> 00:46:36,570 Let's say you're both-- 908 00:46:36,570 --> 00:46:39,180 somebody else and you are both writing the average function. 909 00:46:39,180 --> 00:46:41,640 Well, you could collectively decide on what 910 00:46:41,640 --> 00:46:45,630 tests you will run to make sure that the average function is correct. 911 00:46:45,630 --> 00:46:49,680 And if somebody later were to go into the average function and make a change, 912 00:46:49,680 --> 00:46:52,260 they could test to make sure that their change did not 913 00:46:52,260 --> 00:46:53,950 break the code altogether. 914 00:46:53,950 --> 00:46:56,160 So this is a good way to standardize what it means 915 00:46:56,160 --> 00:46:58,390 for your code to be correct as well. 916 00:46:58,390 --> 00:47:03,540 But a good question on why would I even spend time writing tests like these. 917 00:47:03,540 --> 00:47:08,100 OK, so I think we've so far seen a lot of good tests for our code now. 918 00:47:08,100 --> 00:47:10,800 When we come back, we'll see how to test not just 919 00:47:10,800 --> 00:47:14,500 numbers, like we did here, but also strings as well 920 00:47:14,500 --> 00:47:18,720 and focus on these two philosophies, one called test-driven development and one 921 00:47:18,720 --> 00:47:20,400 called behavior-driven development. 922 00:47:20,400 --> 00:47:22,290 We'll see you all in a few. 923 00:47:22,290 --> 00:47:23,550 Well, we're back. 924 00:47:23,550 --> 00:47:26,520 And so we're going to next focus on testing these return values that 925 00:47:26,520 --> 00:47:30,180 will be strings, as well as focus on a few philosophies of testing, 926 00:47:30,180 --> 00:47:33,510 namely test-driven development and behavior-driven development. 927 00:47:33,510 --> 00:47:35,880 Now, what is test-driven development? 928 00:47:35,880 --> 00:47:40,170 Well, it is an answer to when and how we should write our tests. 929 00:47:40,170 --> 00:47:42,390 And central to this philosophy is that tests 930 00:47:42,390 --> 00:47:44,910 should be at the heart of your development process. 931 00:47:44,910 --> 00:47:48,480 In fact, it even argues you should probably be writing tests 932 00:47:48,480 --> 00:47:50,700 before you write the code. 933 00:47:50,700 --> 00:47:53,730 Now, let's consider here, I want to write a function that 934 00:47:53,730 --> 00:47:56,670 says hello to a user, one like a greet. 935 00:47:56,670 --> 00:47:58,920 Well, to make that happen, I should probably 936 00:47:58,920 --> 00:48:03,130 first write the tests for that code and then write the code itself. 937 00:48:03,130 --> 00:48:06,540 So let's do just that now over in RStudio. 938 00:48:06,540 --> 00:48:09,300 Come back over here, and create a file that 939 00:48:09,300 --> 00:48:13,590 will test this function called greet that doesn't yet exist. 940 00:48:13,590 --> 00:48:15,840 I'll do file.create. 941 00:48:15,840 --> 00:48:20,670 And then I'll say I want to create this file called test-greet.R. 942 00:48:20,670 --> 00:48:22,870 And I'll say-- it was created here-- 943 00:48:22,870 --> 00:48:26,490 I'll go ahead and go to my File Explorer and open up test-greet.R. 944 00:48:26,490 --> 00:48:30,870 And now I could start defining some tests to test this code that 945 00:48:30,870 --> 00:48:32,850 doesn't even exist yet. 946 00:48:32,850 --> 00:48:34,870 But why would I do that? 947 00:48:34,870 --> 00:48:39,300 Well, by writing tests, I make it much clearer to me and to others 948 00:48:39,300 --> 00:48:42,030 what it is I want this code to do. 949 00:48:42,030 --> 00:48:45,000 And once I have in mind what I want the code to do, 950 00:48:45,000 --> 00:48:48,580 I'm better able to write that code itself. 951 00:48:48,580 --> 00:48:52,440 So the very first thing I probably want to test here is what that test-- 952 00:48:52,440 --> 00:48:58,140 the greet function that it can say, let's say, hello to a user, 953 00:48:58,140 --> 00:48:59,070 just like this. 954 00:48:59,070 --> 00:49:02,310 That's the core part of this greet function I'm going to write, 955 00:49:02,310 --> 00:49:04,740 that it says hello to a user. 956 00:49:04,740 --> 00:49:08,400 And now, I could define some test cases to make sure that this is the reality. 957 00:49:08,400 --> 00:49:10,440 Why don't I go ahead and define at least one? 958 00:49:10,440 --> 00:49:15,570 And I'll say I'm going to expect that if I were to run greet and give it 959 00:49:15,570 --> 00:49:19,320 as input, Carter, I would get back as the return value 960 00:49:19,320 --> 00:49:22,870 now hello, comma, space, Carter. 961 00:49:22,870 --> 00:49:27,510 So here is my very first test and test case for this function greet 962 00:49:27,510 --> 00:49:29,460 that doesn't yet exist. 963 00:49:29,460 --> 00:49:33,090 I'm going to say that I want to be able to use greet in a way 964 00:49:33,090 --> 00:49:37,260 that it passes in a user's name and returns to me then the user's name, 965 00:49:37,260 --> 00:49:40,530 but with a prefix of hello, comma, space. 966 00:49:40,530 --> 00:49:42,750 So that is our very first test. 967 00:49:42,750 --> 00:49:46,380 Let me go ahead and make sure I include this eventual file 968 00:49:46,380 --> 00:49:51,300 that I will create called greet.R, in which I'll define greet itself. 969 00:49:51,300 --> 00:49:55,050 And once I've done this, well, I could probably start developing. 970 00:49:55,050 --> 00:49:57,923 Let me go ahead and now go back to my File Explorer. 971 00:49:57,923 --> 00:50:00,840 And I could either create a new file by hitting this plus button here. 972 00:50:00,840 --> 00:50:03,600 Or I could go ahead and do file.create. 973 00:50:03,600 --> 00:50:08,370 I'll use greet.R, hello.R. And then I'll go ahead and go to File Explorer 974 00:50:08,370 --> 00:50:12,210 here and open up greet, this blank canvas for me here. 975 00:50:12,210 --> 00:50:14,880 And now, I could define my greet function 976 00:50:14,880 --> 00:50:18,950 to do exactly what I see it should do in my test. 977 00:50:18,950 --> 00:50:21,740 Well, I'll say I have this function here called greet. 978 00:50:21,740 --> 00:50:24,620 And I'll define it as a function that takes some input. 979 00:50:24,620 --> 00:50:28,100 Maybe in this case, the input is called to-- because we're going 980 00:50:28,100 --> 00:50:30,740 to say hello to someone in this case. 981 00:50:30,740 --> 00:50:34,730 I'll go ahead and say that this function should return some value. 982 00:50:34,730 --> 00:50:38,810 And we've seen this function called paste before that concatenates strings. 983 00:50:38,810 --> 00:50:40,610 I bet that's what we might need here. 984 00:50:40,610 --> 00:50:42,750 I'll use paste just like this. 985 00:50:42,750 --> 00:50:46,280 And I'll paste together hello, comma, and then 986 00:50:46,280 --> 00:50:50,600 whatever value is supplied as input to this function under the argument 987 00:50:50,600 --> 00:50:52,280 or parameter to. 988 00:50:52,280 --> 00:50:56,360 And notice how paste here will take care of the space between the comma 989 00:50:56,360 --> 00:50:59,090 and whoever we're saying hello to. 990 00:50:59,090 --> 00:51:03,800 So now, thanks to my test, I have a very clear idea of what this code should do. 991 00:51:03,800 --> 00:51:08,330 And if I were to run this now, test-greet.R, 992 00:51:08,330 --> 00:51:10,850 we'll see that the test passed. 993 00:51:10,850 --> 00:51:13,970 So it seems like now, greet is working for me. 994 00:51:13,970 --> 00:51:16,130 I could go ahead and add more test cases. 995 00:51:16,130 --> 00:51:18,270 In fact, this is an iterative process. 996 00:51:18,270 --> 00:51:21,390 I might write some tests, write some code, write some tests, 997 00:51:21,390 --> 00:51:22,230 write some code. 998 00:51:22,230 --> 00:51:24,000 Here now, I could test other names. 999 00:51:24,000 --> 00:51:29,910 Maybe I'll say expect_equal, I'll greet Mario, and hope to see hello, Mario. 1000 00:51:29,910 --> 00:51:32,380 I'll do maybe Peach as well. 1001 00:51:32,380 --> 00:51:33,780 And hello to Peach. 1002 00:51:33,780 --> 00:51:35,610 I'm just choosing some representative names 1003 00:51:35,610 --> 00:51:38,910 I might get now and pass into this greet function. 1004 00:51:38,910 --> 00:51:41,350 Let's do Bowser as well. 1005 00:51:41,350 --> 00:51:46,170 So now, with these expanded test cases, I'll go ahead and test my code again. 1006 00:51:46,170 --> 00:51:49,110 And I'll see that it still seems to be working. 1007 00:51:49,110 --> 00:51:52,740 And now, if I were to modify greet in my greet.R file, 1008 00:51:52,740 --> 00:51:57,030 I could very quickly test to make sure I didn't break it with any changes 1009 00:51:57,030 --> 00:51:58,950 that I had made. 1010 00:51:58,950 --> 00:52:03,780 So this is one philosophy of development, test-driven development. 1011 00:52:03,780 --> 00:52:07,440 But there is a related philosophy that is still interesting to learn about, 1012 00:52:07,440 --> 00:52:10,290 one called behavior-driven development. 1013 00:52:10,290 --> 00:52:14,190 So test-driven development focuses on designing these test cases 1014 00:52:14,190 --> 00:52:16,320 for our code, giving it representative cases 1015 00:52:16,320 --> 00:52:20,130 and seeing if it actually follows through on the expected values. 1016 00:52:20,130 --> 00:52:24,210 Behavior-driven development is slightly different in that it requires 1017 00:52:24,210 --> 00:52:27,300 us to first define what it is we want our function 1018 00:52:27,300 --> 00:52:30,420 to do and describe its behavior. 1019 00:52:30,420 --> 00:52:34,347 Now, testthat allows us to use behavior-driven development 1020 00:52:34,347 --> 00:52:36,180 and actually gives us a few functions we can 1021 00:52:36,180 --> 00:52:38,490 use that kind of operate a bit like an English language 1022 00:52:38,490 --> 00:52:42,063 to define what it is our function behavior should be. 1023 00:52:42,063 --> 00:52:43,230 Let me show you what I mean. 1024 00:52:43,230 --> 00:52:44,560 So I'll come back over here. 1025 00:52:44,560 --> 00:52:47,100 And the two functions we have in testthat 1026 00:52:47,100 --> 00:52:53,310 to engage in behavior-driven development are these, describe and it. 1027 00:52:53,310 --> 00:52:57,870 Where describe is a way of describing what it is we want our function to do. 1028 00:52:57,870 --> 00:53:01,320 And it is a way of saying that our function should do something 1029 00:53:01,320 --> 00:53:02,040 in particular. 1030 00:53:02,040 --> 00:53:06,330 So let's go ahead and switch now to this philosophy of testing. 1031 00:53:06,330 --> 00:53:08,400 I'll avoid using testthat. 1032 00:53:08,400 --> 00:53:10,800 And now, I'll start using describe. 1033 00:53:10,800 --> 00:53:14,800 So in particular, describe lets me do something like this. 1034 00:53:14,800 --> 00:53:18,840 I want to describe now my greet function. 1035 00:53:18,840 --> 00:53:23,850 And by convention, stylistically, I'll include these empty curly braces here. 1036 00:53:23,850 --> 00:53:27,990 And now I can say inside these curly braces, 1037 00:53:27,990 --> 00:53:30,840 what it is I want my function to do. 1038 00:53:30,840 --> 00:53:33,960 Inside these curly braces, I'll describe what my function now 1039 00:53:33,960 --> 00:53:36,900 should do using this it function. 1040 00:53:36,900 --> 00:53:41,910 I could say that it, in this case, can say hello to a user. 1041 00:53:41,910 --> 00:53:45,060 And as a second argument now to it, I'll provide 1042 00:53:45,060 --> 00:53:50,440 some test cases that show examples of it saying hello to the user, namely this. 1043 00:53:50,440 --> 00:53:54,930 I could say maybe I have this object called name, 1044 00:53:54,930 --> 00:53:56,850 I'll set it equal to Carter. 1045 00:53:56,850 --> 00:54:02,280 And I'll expect that when I run greet and pass as input name, 1046 00:54:02,280 --> 00:54:04,500 I should see hello, Carter. 1047 00:54:04,500 --> 00:54:06,150 Similar now to before. 1048 00:54:06,150 --> 00:54:08,130 But notice what it is we've done. 1049 00:54:08,130 --> 00:54:10,410 We've kind of used a bit of English language 1050 00:54:10,410 --> 00:54:12,060 in the form of these functions. 1051 00:54:12,060 --> 00:54:15,210 Here, I'm going to describe my greet function. 1052 00:54:15,210 --> 00:54:16,410 Well, what should it do? 1053 00:54:16,410 --> 00:54:18,450 It can say hello to a user. 1054 00:54:18,450 --> 00:54:21,280 And here's an example of it doing just that. 1055 00:54:21,280 --> 00:54:22,620 So pretty cool. 1056 00:54:22,620 --> 00:54:26,550 Let me go ahead and now run test-greet, using this other philosophy here, 1057 00:54:26,550 --> 00:54:31,620 test-greet.R. And I'll see that test has passed. 1058 00:54:31,620 --> 00:54:33,630 What else could our function do? 1059 00:54:33,630 --> 00:54:36,060 What kind of behavior do we want it to exhibit? 1060 00:54:36,060 --> 00:54:40,350 Well, maybe I could also say that it, describing greet now, 1061 00:54:40,350 --> 00:54:44,350 can say hello to the world, just like this. 1062 00:54:44,350 --> 00:54:47,520 And now, provide an example of it saying hello to the world. 1063 00:54:47,520 --> 00:54:53,490 I'll expect equal that when I run greet, without any input at all, 1064 00:54:53,490 --> 00:54:55,350 I'll say hello to the world. 1065 00:54:55,350 --> 00:54:56,130 I get that back-- 1066 00:54:56,130 --> 00:54:59,340 I'll get that back out as a return value now. 1067 00:54:59,340 --> 00:55:03,690 So here, we see a fuller version of a description of greet. 1068 00:55:03,690 --> 00:55:05,730 First, it can say hello to a user. 1069 00:55:05,730 --> 00:55:07,920 And it can say hello to the world. 1070 00:55:07,920 --> 00:55:12,420 And by convention, I'm using this can say, can say here 1071 00:55:12,420 --> 00:55:16,170 because when we use it, it reads more like English to say it can do this, 1072 00:55:16,170 --> 00:55:17,140 it can do that. 1073 00:55:17,140 --> 00:55:19,300 So I can mention here, we're just using these-- 1074 00:55:19,300 --> 00:55:20,230 that grammar here. 1075 00:55:20,230 --> 00:55:24,320 But I could use any kind of text inside this it function here. 1076 00:55:24,320 --> 00:55:25,820 Let me go ahead and try to run this. 1077 00:55:25,820 --> 00:55:28,810 I'll say source test-greet.R, and we get-- 1078 00:55:28,810 --> 00:55:34,090 oop-- seems like, if I scroll up now, that one of our tests has failed. 1079 00:55:34,090 --> 00:55:37,210 So here we see error in greet. 1080 00:55:37,210 --> 00:55:38,740 Greet can say hello to the world. 1081 00:55:38,740 --> 00:55:41,800 But we actually get back an error, and not, in this case, hello, world. 1082 00:55:41,800 --> 00:55:48,010 Error in greet argument to is missing with no default. 1083 00:55:48,010 --> 00:55:52,600 So let's look now at greet.R and see what could have happened here. 1084 00:55:52,600 --> 00:55:56,230 I ran greet with no input. 1085 00:55:56,230 --> 00:55:59,422 And if I go look at greet itself, well, now it 1086 00:55:59,422 --> 00:56:01,630 kind of makes sense because I didn't supply a default 1087 00:56:01,630 --> 00:56:04,630 value for to if none is supplied. 1088 00:56:04,630 --> 00:56:05,590 So what should I do? 1089 00:56:05,590 --> 00:56:07,360 Maybe supply a default value here? 1090 00:56:07,360 --> 00:56:11,500 I could go ahead and say that to has a default value of world. 1091 00:56:11,500 --> 00:56:15,100 And fix this code after I have described it how-- 1092 00:56:15,100 --> 00:56:17,230 described how it should work already. 1093 00:56:17,230 --> 00:56:19,270 Let me go ahead and now rerun these tests 1094 00:56:19,270 --> 00:56:21,610 with this updated version of greet. 1095 00:56:21,610 --> 00:56:24,370 And we'll see that both of my tests have passed. 1096 00:56:24,370 --> 00:56:26,290 That greet can say hello to a user. 1097 00:56:26,290 --> 00:56:29,840 And it can say hello to the world. 1098 00:56:29,840 --> 00:56:33,520 So we've seen now how to test these strings for equality, 1099 00:56:33,520 --> 00:56:36,910 how to use test-driven development and behavior-driven development. 1100 00:56:36,910 --> 00:56:39,730 So what questions do we have on test-driven development 1101 00:56:39,730 --> 00:56:43,630 or behavior-driven development? 1102 00:56:43,630 --> 00:56:46,480 Seeing none, so let's focus on last-- one last topic for today, 1103 00:56:46,480 --> 00:56:48,670 one called test coverage. 1104 00:56:48,670 --> 00:56:51,580 Now, the goal of today has been to help you write tests 1105 00:56:51,580 --> 00:56:53,560 that systematically test your code. 1106 00:56:53,560 --> 00:56:57,640 And one measure you can use to figure out how much of the code you're testing 1107 00:56:57,640 --> 00:56:59,770 is one called test coverage. 1108 00:56:59,770 --> 00:57:03,310 When you have programs that are composed of not just one function or two 1109 00:57:03,310 --> 00:57:06,730 but many, test coverage and tell you how many of those functions 1110 00:57:06,730 --> 00:57:08,770 you've tested reliably. 1111 00:57:08,770 --> 00:57:13,030 Now, we've seen today how to spot errors in our programs, 1112 00:57:13,030 --> 00:57:16,840 how to handle those errors, and how to write tests to test our code 1113 00:57:16,840 --> 00:57:18,952 to ensure it behaves as we intend. 1114 00:57:18,952 --> 00:57:20,660 When we come back, we'll go ahead and see 1115 00:57:20,660 --> 00:57:23,660 how we can package our code up and share it with the world. 1116 00:57:23,660 --> 00:57:25,750 We'll see you then. 1117 00:57:25,750 --> 00:57:27,000