WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:01.464 --> 00:00:04.392 [MUSIC PLAYING] 00:00:19.628 --> 00:00:21.920 CARTER ZENKE: Well hello, one and all, and welcome back 00:00:21.920 --> 00:00:25.850 to CS Introduction to Programming with R. My name is Carter Zanke, 00:00:25.850 --> 00:00:28.632 and this is our lecture on applying functions. 00:00:28.632 --> 00:00:30.590 We'll take a look at functions-- in particular, 00:00:30.590 --> 00:00:32.810 writing some of our very own-- and we'll also 00:00:32.810 --> 00:00:36.680 learn about loops, how to repeat certain segments of code over time. 00:00:36.680 --> 00:00:38.840 Towards the end of lecture, we'll combine 00:00:38.840 --> 00:00:41.150 these two ideas to dip our toes into this thing called 00:00:41.150 --> 00:00:42.740 functional programming. 00:00:42.740 --> 00:00:44.270 So let's begin. 00:00:44.270 --> 00:00:47.300 Let's start by actually looking at another program we had from before, 00:00:47.300 --> 00:00:51.230 one called count.R. And if you remember, this program was 00:00:51.230 --> 00:00:54.980 designed to count up some number of votes we typed in in the console. 00:00:54.980 --> 00:00:58.070 So if I click source here, I can run this program. 00:00:58.070 --> 00:01:02.990 And I'll Enter in 100 votes for Mario, let's say, and 150 for Peach, 00:01:02.990 --> 00:01:04.670 and 120 for Bowser. 00:01:04.670 --> 00:01:08.900 And I'll see that, all in all, I typed in 370 votes. 00:01:08.900 --> 00:01:11.420 So this same program from before. 00:01:11.420 --> 00:01:13.490 And I'd argue this program is correct-- 00:01:13.490 --> 00:01:14.900 it works just fine-- 00:01:14.900 --> 00:01:17.600 but there's an opportunity for improved design here. 00:01:17.600 --> 00:01:20.370 I could write this code a little bit better. 00:01:20.370 --> 00:01:24.270 And one thing I noticed, in particular, is on lines one, two, and three, 00:01:24.270 --> 00:01:27.840 I'm repeating some functionality over and over and over again. 00:01:27.840 --> 00:01:31.770 I'm asking the user for some input, converting that to an integer. 00:01:31.770 --> 00:01:35.040 I'm doing the same thing on line two, and the same thing on line three. 00:01:35.040 --> 00:01:38.220 And when you find yourself repeating this kind of functionality 00:01:38.220 --> 00:01:40.890 over and over and over again, it's a good clue 00:01:40.890 --> 00:01:44.340 that defining your own function might help you. 00:01:44.340 --> 00:01:47.850 So we want to define a function here, but how do we do so? 00:01:47.850 --> 00:01:51.660 We've certainly used functions before, but to create a function 00:01:51.660 --> 00:01:53.490 is something else entirely. 00:01:53.490 --> 00:01:55.830 Well, to create a function in R we're going to use 00:01:55.830 --> 00:01:59.700 this keyword called function followed by some parentheses. 00:01:59.700 --> 00:02:04.050 Now, you might also think, well, I want to give this function some kind of name 00:02:04.050 --> 00:02:06.040 that I could reuse throughout my program. 00:02:06.040 --> 00:02:09.330 So to give a function a name, I could use this syntax here. 00:02:09.330 --> 00:02:12.790 Maybe get votes is assigned this function here. 00:02:12.790 --> 00:02:15.600 So get votes now will be the name for my function, 00:02:15.600 --> 00:02:20.360 because I want this function to, let's say, get some votes from the user. 00:02:20.360 --> 00:02:21.880 Now, this is pretty good. 00:02:21.880 --> 00:02:24.980 I have a name for our function, but what we also probably want 00:02:24.980 --> 00:02:27.800 is the ability for our function to take some input, some arguments, 00:02:27.800 --> 00:02:29.820 and run with those arguments. 00:02:29.820 --> 00:02:32.210 So if I wanted to change how my function runs, 00:02:32.210 --> 00:02:36.140 I could supply parameters inside of these parentheses here. 00:02:36.140 --> 00:02:40.310 I could supply any number of them separated by commas. 00:02:40.310 --> 00:02:43.700 But then, even with a name and some parameters, 00:02:43.700 --> 00:02:45.630 our function needs to do something. 00:02:45.630 --> 00:02:47.630 So essentially what our function actually does-- 00:02:47.630 --> 00:02:49.700 we'll use these curly braces here, in which 00:02:49.700 --> 00:02:53.510 we can define our function's body, that is, the lines of code that 00:02:53.510 --> 00:02:56.120 will run when our function is run. 00:02:56.120 --> 00:02:58.370 And down below towards, the end of these curly braces, 00:02:58.370 --> 00:03:01.820 we'll say what our function should return to us, the programmer, 00:03:01.820 --> 00:03:03.590 after it finishes running. 00:03:03.590 --> 00:03:06.980 So with this syntax here, let's actually go ahead and define 00:03:06.980 --> 00:03:12.560 our very own get votes function that can ask the user for some number of votes. 00:03:12.560 --> 00:03:17.510 Come back to our studio here, and let's write this function called get votes. 00:03:17.510 --> 00:03:19.760 I want it to have the same functionality, essentially, 00:03:19.760 --> 00:03:23.590 as what I'm doing on lines one, two, and three here, but to define it, 00:03:23.590 --> 00:03:26.290 I'll first do this at the top of my program here. 00:03:26.290 --> 00:03:28.560 So I define it, first and foremost, and then I 00:03:28.560 --> 00:03:31.090 can use it later on in my program. 00:03:31.090 --> 00:03:34.440 So I'll type, like we saw, get_votes. 00:03:34.440 --> 00:03:36.510 That's the name of this function. 00:03:36.510 --> 00:03:40.110 And I'll assign it to be some new function here, 00:03:40.110 --> 00:03:43.150 and I'll provide a function body just like this. 00:03:43.150 --> 00:03:47.190 So now I have my very first function in R, albeit an empty one, right? 00:03:47.190 --> 00:03:51.340 So I want this function, again, to get some votes from the user. 00:03:51.340 --> 00:03:55.320 So inside these curly braces, inside the function's body, 00:03:55.320 --> 00:03:58.440 I should define what code I want to run when I call 00:03:58.440 --> 00:04:01.620 or when I use this function later on in my program. 00:04:01.620 --> 00:04:04.740 Well, the thing I want to do is very similar to lines, now, five, six, 00:04:04.740 --> 00:04:05.340 and seven. 00:04:05.340 --> 00:04:09.750 I want to maybe ask the user for an integer prompting them with something 00:04:09.750 --> 00:04:13.200 like, let's say, just enter votes for now, just like that. 00:04:13.200 --> 00:04:17.490 And I'll maybe assign the result to this object called votes that is now 00:04:17.490 --> 00:04:20.430 visible inside of this function for me. 00:04:20.430 --> 00:04:22.980 And once I get those votes from the user, 00:04:22.980 --> 00:04:25.620 well, I want get votes to return them to me, the programmer, 00:04:25.620 --> 00:04:29.890 so I could assign them to some objects, like Mario, Peach, or Bowser down below. 00:04:29.890 --> 00:04:34.650 So to return some value from a function, I can use the keyword return 00:04:34.650 --> 00:04:39.270 and, inside parentheses, say which object's value I want to return. 00:04:39.270 --> 00:04:42.300 So here, in total, is now my function. 00:04:42.300 --> 00:04:45.300 On line two, I'm asking the user to enter some votes, 00:04:45.300 --> 00:04:47.280 converting that to an integer, and storing it 00:04:47.280 --> 00:04:49.770 in this object called votes inside of this function. 00:04:49.770 --> 00:04:52.950 And then, on line three, I'm returning the value of votes 00:04:52.950 --> 00:04:55.990 so I can reuse it later on in my program. 00:04:55.990 --> 00:04:59.760 So now I think I've implemented lines six, seven, and eight here 00:04:59.760 --> 00:05:01.440 as its own separate function. 00:05:01.440 --> 00:05:03.510 I should feel free to use that function now. 00:05:03.510 --> 00:05:07.560 On line six, I'll actually not use asinteger and readline. 00:05:07.560 --> 00:05:12.480 I'll instead use get_votes, and call it using these parentheses here. 00:05:12.480 --> 00:05:14.640 I'll do the same for line seven. 00:05:14.640 --> 00:05:16.140 Get_votes. 00:05:16.140 --> 00:05:18.340 And line eight as well. 00:05:18.340 --> 00:05:22.520 And now before when I run this program again, let's walk through top to bottom 00:05:22.520 --> 00:05:23.780 what I've just done. 00:05:23.780 --> 00:05:27.620 On line one, I have defined this function called get_votes. 00:05:27.620 --> 00:05:32.210 I've told R exactly what inputs it takes-- in this case, none-- 00:05:32.210 --> 00:05:35.810 I've told R exactly what it should do when I call this function. 00:05:35.810 --> 00:05:37.790 First line two, then line three. 00:05:37.790 --> 00:05:40.290 And I've given it some name, get_votes here. 00:05:40.290 --> 00:05:43.880 So, now, on lines six, seven, and eight, I can call get_votes. 00:05:43.880 --> 00:05:47.660 And every time I do, R will effectively go back to these lines two and three, 00:05:47.660 --> 00:05:50.120 run those, and return to me the value that I've 00:05:50.120 --> 00:05:53.750 asked it to return for each of Mario, Peach, and Bowser. 00:05:53.750 --> 00:05:55.650 So now let's run our program. 00:05:55.650 --> 00:05:58.430 I'll click source here, and enter in some number of votes. 00:05:58.430 --> 00:06:02.060 I think we had 100, first, for Mario, so I'll say 100. 00:06:02.060 --> 00:06:05.960 And then 150 for Peach, and 120 for Bowser. 00:06:05.960 --> 00:06:10.170 And now I see the same functionality, but now in my own function. 00:06:10.170 --> 00:06:14.030 So, congratulations, this is your first function in R. 00:06:14.030 --> 00:06:19.380 But if I'm doing this, I actually think I have lost some functionality. 00:06:19.380 --> 00:06:23.400 Because if I run this program again, I'll see enter votes. 00:06:23.400 --> 00:06:26.880 And before, we had this nice ability to prompt 00:06:26.880 --> 00:06:29.580 the user with some particular prompt that we wanted to. 00:06:29.580 --> 00:06:31.680 So how could we get that back? 00:06:31.680 --> 00:06:34.680 Well, one thing we could define now is a parameter 00:06:34.680 --> 00:06:38.040 to this function, some input that changes how it runs. 00:06:38.040 --> 00:06:42.000 And as you saw, we can define those parameters inside these parentheses 00:06:42.000 --> 00:06:44.250 here after our function keyword. 00:06:44.250 --> 00:06:47.010 So one thing I want to do is be able to change 00:06:47.010 --> 00:06:49.200 the prompt that I prompt the user with. 00:06:49.200 --> 00:06:52.300 So I'll call this parameter prompt, just like this. 00:06:52.300 --> 00:06:56.310 And now my function can take this input called prompt. 00:06:56.310 --> 00:07:01.320 Well, when I do that, maybe I also want to use that particular prompt the user 00:07:01.320 --> 00:07:02.040 has entered-- 00:07:02.040 --> 00:07:04.040 that I, the programmer have entered as the input 00:07:04.040 --> 00:07:06.760 to get votes-- and prompt the user with that instead. 00:07:06.760 --> 00:07:10.020 So now inside this function, I have access to this parameter, 00:07:10.020 --> 00:07:14.550 this argument called prompt, that I can then use to prompt the user on line two 00:07:14.550 --> 00:07:15.280 here. 00:07:15.280 --> 00:07:17.140 So let's try this. 00:07:17.140 --> 00:07:21.150 I'll now give as input to this function Mario, like we had before, 00:07:21.150 --> 00:07:25.710 and Peach, like we had before, and Bowser, like we had before. 00:07:25.710 --> 00:07:28.980 Each time I call this function, I'm setting this character string 00:07:28.980 --> 00:07:33.300 Mario equal to prompt, and then prompting the user with that prompt now. 00:07:33.300 --> 00:07:34.600 Let's see what happens. 00:07:34.600 --> 00:07:35.910 I'll click on source. 00:07:35.910 --> 00:07:38.790 And now, instead of enter votes, I'll see Mario. 00:07:38.790 --> 00:07:44.820 I'll type in 100, 150, and 120, and I'll get back the same result. 00:07:44.820 --> 00:07:46.110 So pretty handy here. 00:07:46.110 --> 00:07:50.460 You're able to define parameters for our functions and change them over time. 00:07:50.460 --> 00:07:54.570 Now, one optimization still we could make is that on line three 00:07:54.570 --> 00:07:59.520 I'm explicitly returning votes from this function, but it turns out that in R, 00:07:59.520 --> 00:08:02.550 by default, the last computed value-- 00:08:02.550 --> 00:08:03.780 in this case votes-- 00:08:03.780 --> 00:08:06.240 will be returned automatically for me. 00:08:06.240 --> 00:08:10.380 So on line three, I actually don't need to explicitly say return votes. 00:08:10.380 --> 00:08:15.210 I could omit that, and R will, by default, return the last computed value 00:08:15.210 --> 00:08:18.060 inside this function, which is votes. 00:08:18.060 --> 00:08:23.220 So stylistically, we often want to avoid typing return when R by default 00:08:23.220 --> 00:08:26.610 actually does that for us for the last computed value here. 00:08:26.610 --> 00:08:27.960 Let me run this program again. 00:08:27.960 --> 00:08:31.890 I'll click source, I'll type 100 votes for Mario, 150 for Peach, 00:08:31.890 --> 00:08:34.320 120 for Bowser, and now we're back in business. 00:08:34.320 --> 00:08:38.280 We have 370 votes in total. 00:08:38.280 --> 00:08:42.630 Now, let's say I get a little bit lazy as a programmer, 00:08:42.630 --> 00:08:47.520 and maybe sometimes I forget to enter in some value for this parameter we 00:08:47.520 --> 00:08:49.380 defined called prompt, right? 00:08:49.380 --> 00:08:53.910 I could go back to what we had before, using these functions without any input. 00:08:53.910 --> 00:08:59.040 But now, because my function is defined as taking this parameter called prompt, 00:08:59.040 --> 00:09:00.390 I might run into some trouble. 00:09:00.390 --> 00:09:04.590 If I click source here, I'll see that I get an error. 00:09:04.590 --> 00:09:05.880 Error in get votes. 00:09:05.880 --> 00:09:09.510 Argument prompt is missing with no default. 00:09:09.510 --> 00:09:13.950 So if you're defining a function and you want the user or the programmer 00:09:13.950 --> 00:09:17.130 to need to define some argument to that function, 00:09:17.130 --> 00:09:19.860 well, you're going to need to do exactly what we did here, 00:09:19.860 --> 00:09:22.920 where if I define this parameter and don't give it a default, 00:09:22.920 --> 00:09:26.130 the user or the programmer must supply some value for it. 00:09:26.130 --> 00:09:30.390 But if I'm being a little bit nice and I want to maybe catch somebody doing this 00:09:30.390 --> 00:09:33.270 and provide them with some default value, I could do that too. 00:09:33.270 --> 00:09:38.100 Up here in function, I could say, prompt, and give it some default value. 00:09:38.100 --> 00:09:42.430 Maybe set it equal to, initially, enter votes, just like this. 00:09:42.430 --> 00:09:46.620 So now if me or somebody else doesn't provide some input, 00:09:46.620 --> 00:09:50.130 well, the default value for prompt will be enter votes. 00:09:50.130 --> 00:09:51.100 Let me try this. 00:09:51.100 --> 00:09:54.450 I'll click on source, and now I'll see no error. 00:09:54.450 --> 00:09:56.460 I instead see enter votes. 00:09:56.460 --> 00:10:00.360 So even though we didn't supply some input to get votes here, 00:10:00.360 --> 00:10:03.790 we defined some default value that is used instead. 00:10:03.790 --> 00:10:05.880 So, prompt, when there's no value supplied, 00:10:05.880 --> 00:10:09.940 is going to be equal now to enter votes, as we've seen down below. 00:10:09.940 --> 00:10:11.460 So, pretty good. 00:10:11.460 --> 00:10:14.498 If I want to override this, as I might often want to do, 00:10:14.498 --> 00:10:15.540 I could do it as follows. 00:10:15.540 --> 00:10:19.210 I could go back to what we did before and type in some input, like Mario 00:10:19.210 --> 00:10:23.720 or like Peach or, let's say, like Bowser, just like this. 00:10:23.720 --> 00:10:27.580 And because this is the first input I've given to my function, 00:10:27.580 --> 00:10:31.630 and prompt is the first parameter, well, Mario will override, let's say, 00:10:31.630 --> 00:10:35.860 the default value of prompt, and same for Peach, and same for Bowser. 00:10:35.860 --> 00:10:39.940 There are a few ways to supply arguments to functions, as we've seen so far. 00:10:39.940 --> 00:10:41.440 One is positionally. 00:10:41.440 --> 00:10:44.590 Here, notice that the very first argument to get votes 00:10:44.590 --> 00:10:47.590 becomes the value for the first parameter, prompt. 00:10:47.590 --> 00:10:50.860 If, though, I had more than one parameter and more than one argument, 00:10:50.860 --> 00:10:52.960 I could define them separated by commas. 00:10:52.960 --> 00:10:57.700 So maybe this would be my first argument here, Mario followed by a comma. 00:10:57.700 --> 00:11:00.520 I could then provide some other value for the next argument 00:11:00.520 --> 00:11:02.470 if I had one in my function here. 00:11:02.470 --> 00:11:04.000 But I don't, so I won't. 00:11:04.000 --> 00:11:07.660 The other way I could define the argument for a particular parameter 00:11:07.660 --> 00:11:10.310 is by actually using the parameter's name. 00:11:10.310 --> 00:11:12.970 So the name of this parameter is prompt here. 00:11:12.970 --> 00:11:16.120 I could override that and make sure to explicitly say 00:11:16.120 --> 00:11:20.430 that this prompt is going to be equal to Mario using syntax like this. 00:11:20.430 --> 00:11:24.890 I could say that get votes, I know, explicitly has this argument parameter 00:11:24.890 --> 00:11:29.330 called prompt that I'll set equal, now, to Mario, and same for Peach, 00:11:29.330 --> 00:11:31.100 and same for Bowser. 00:11:31.100 --> 00:11:36.620 So now I'm able here to run this code by supplying or overriding 00:11:36.620 --> 00:11:39.770 the default value now of prompt. 00:11:39.770 --> 00:11:42.530 So I think this is a pretty good first function. 00:11:42.530 --> 00:11:47.060 If I click source and I click run, I can do 100 votes for Mario, 150 for Peach, 00:11:47.060 --> 00:11:49.730 120 for Bowser, getting that total back. 00:11:49.730 --> 00:11:52.550 But what's interesting that I notice now is 00:11:52.550 --> 00:11:56.270 if I go to my environment on my right hand side, 00:11:56.270 --> 00:11:59.840 I'll see a few different objects that I have. 00:11:59.840 --> 00:12:05.180 I see Bowser, I see Mario, I see Peach, and total, I see get votes, 00:12:05.180 --> 00:12:08.570 the function which we defined, but what I don't see 00:12:08.570 --> 00:12:11.540 is this votes object or prompt. 00:12:11.540 --> 00:12:15.980 And actually if I go down to my console and ask R to give me the value of votes 00:12:15.980 --> 00:12:18.500 as it currently is, well, I'll see error. 00:12:18.500 --> 00:12:20.690 Object votes not found. 00:12:20.690 --> 00:12:22.020 Now, why is that? 00:12:22.020 --> 00:12:25.220 Well, this has to do with what we call in programming this idea of scope. 00:12:25.220 --> 00:12:30.050 And scope tells us in what context objects like these are defined. 00:12:30.050 --> 00:12:32.750 To visualize this, let's think about our environment 00:12:32.750 --> 00:12:35.840 here and think about what scope we have in terms of our objects 00:12:35.840 --> 00:12:37.500 in that environment. 00:12:37.500 --> 00:12:40.010 So here is a representation of our environment. 00:12:40.010 --> 00:12:43.400 We have those four objects we saw earlier, Mario, Peach, 00:12:43.400 --> 00:12:44.390 Bowser, and Total. 00:12:44.390 --> 00:12:48.050 These are accessible to me, the programmer, pretty much at all times. 00:12:48.050 --> 00:12:51.530 We also have of course our function get_votes. 00:12:51.530 --> 00:12:55.730 But get_votes is kind of best viewed as this black box. 00:12:55.730 --> 00:13:00.080 I don't exactly know what's going inside of it when I call that function. 00:13:00.080 --> 00:13:03.410 If you think about calling a function like sum or mean, 00:13:03.410 --> 00:13:07.040 you likely don't know exactly what code was defined to compute those values, 00:13:07.040 --> 00:13:08.540 you just know that it kind of works. 00:13:08.540 --> 00:13:11.240 You give some input, and you get that output back. 00:13:11.240 --> 00:13:13.880 Well, the same thing now with our own functions. 00:13:13.880 --> 00:13:17.240 We just have to trust that we ourselves have defined these functions to take 00:13:17.240 --> 00:13:20.450 some input and produce some output for us, and we, the programmer, 00:13:20.450 --> 00:13:24.710 can't actually access those objects we defined inside the function. 00:13:24.710 --> 00:13:27.410 If we want to use those-- just kind of metaphorically 00:13:27.410 --> 00:13:29.990 zoom in or go inside that function's context 00:13:29.990 --> 00:13:34.430 to then be able to use and see those values, like votes and prompt. 00:13:34.430 --> 00:13:37.130 But for our sake as the programmer, these objects 00:13:37.130 --> 00:13:40.160 are only defined inside the scope of, now, 00:13:40.160 --> 00:13:43.370 our function, which is why we can't see them in our global environment, 00:13:43.370 --> 00:13:46.460 as we saw in R. 00:13:46.460 --> 00:13:49.100 So we have now defined our first function. 00:13:49.100 --> 00:13:52.400 We've taken some inputs, we've returned some values. 00:13:52.400 --> 00:13:54.980 We've also seen this question of scope here. 00:13:54.980 --> 00:13:58.430 Let me ask, what questions do we have so far on scope 00:13:58.430 --> 00:14:01.730 or on defining our own functions? 00:14:01.730 --> 00:14:04.988 AUDIENCE: What if the user enters a string or a character as an input? 00:14:04.988 --> 00:14:07.280 Is there any way to handle the errors that we will get? 00:14:07.280 --> 00:14:08.988 CARTER ZENKE: Yeah, really good question. 00:14:08.988 --> 00:14:11.630 So up until now, we've been kind of being a good user. 00:14:11.630 --> 00:14:14.450 We've been supplying numbers to the actual program here. 00:14:14.450 --> 00:14:17.000 But we need to think defensively as programmers 00:14:17.000 --> 00:14:19.760 and think about what could happen if we entered in something 00:14:19.760 --> 00:14:21.540 that wasn't a number, for instance. 00:14:21.540 --> 00:14:23.040 So let's see what might happen here. 00:14:23.040 --> 00:14:26.240 I'll come back to my computer, and let's go back 00:14:26.240 --> 00:14:30.200 to our program called count.R. And let me close my environment, 00:14:30.200 --> 00:14:33.950 but now think a little more maliciously as a user. 00:14:33.950 --> 00:14:35.870 What could I do to break this program? 00:14:35.870 --> 00:14:38.960 Well, one thing I could do is Enter in some value 00:14:38.960 --> 00:14:41.150 that I don't think the program expected to see. 00:14:41.150 --> 00:14:44.150 So if I click source, now, to run the program again, 00:14:44.150 --> 00:14:47.600 let me enter in something funny like duck for Mario, 00:14:47.600 --> 00:14:51.800 or quack for Peach, or cat for Bowser. 00:14:51.800 --> 00:14:54.030 And these are certainly not numbers. 00:14:54.030 --> 00:14:57.560 So if I hit enter now, oh, this is some pretty bad output. 00:14:57.560 --> 00:15:01.070 So what I see down below is total votes is now 00:15:01.070 --> 00:15:05.330 equal to NA, this value that means not applicable. 00:15:05.330 --> 00:15:07.670 And I see some warning messages. 00:15:07.670 --> 00:15:11.930 Now, if I look at this particular one-- in get_votes, prompt equals Mario, 00:15:11.930 --> 00:15:14.120 NA is introduced by coercion. 00:15:14.120 --> 00:15:15.360 Well, what does that mean? 00:15:15.360 --> 00:15:18.440 Well, we saw a little while ago that coercion is this process which 00:15:18.440 --> 00:15:21.470 would convert some storage mode to some other one, 00:15:21.470 --> 00:15:25.130 and it seems like we do that on line two, on asinteger. 00:15:25.130 --> 00:15:27.530 We convert some character string the user gave us 00:15:27.530 --> 00:15:31.700 via readline to some number, or some integer in particular. 00:15:31.700 --> 00:15:35.930 But what might happen if I did something like I did here, 00:15:35.930 --> 00:15:38.570 I gave cat instead of an actual number? 00:15:38.570 --> 00:15:41.592 Well, as.integer will say, one, I don't know 00:15:41.592 --> 00:15:44.550 what the heck you want me to do with that, so I'll give you NA instead. 00:15:44.550 --> 00:15:48.200 And it will also give me what's called a warning, telling me that, look, 00:15:48.200 --> 00:15:51.680 I couldn't do what you wanted me to do with the input you gave me. 00:15:51.680 --> 00:15:58.040 So this is why now we see this value NA as opposed to, let's say, cat or duck 00:15:58.040 --> 00:15:59.790 or quack instead. 00:15:59.790 --> 00:16:02.960 So what could we do to fix this? 00:16:02.960 --> 00:16:07.010 I think one thing we could do is try to catch this process. 00:16:07.010 --> 00:16:10.280 Like if we see inside this function that we actually 00:16:10.280 --> 00:16:14.360 got an NA value for votes, well, we could return something else entirely. 00:16:14.360 --> 00:16:15.390 We could start there. 00:16:15.390 --> 00:16:17.890 So let's go back to our program and make that happen for us. 00:16:17.890 --> 00:16:21.320 We could use what we saw last time called conditionals, where conditionals 00:16:21.320 --> 00:16:24.860 will just test for something and take some particular action because 00:16:24.860 --> 00:16:26.120 of that test. 00:16:26.120 --> 00:16:30.410 So, here, let's say-- let's assume the user enters in some bad value, 00:16:30.410 --> 00:16:33.317 like duck, and now votes is NA. 00:16:33.317 --> 00:16:35.150 Well, I don't want to do what we did before, 00:16:35.150 --> 00:16:37.310 which was return votes automatically. 00:16:37.310 --> 00:16:40.400 I'd rather first ask, is votes NA? 00:16:40.400 --> 00:16:44.810 And if it is, let's go ahead and not return the actual NA value. 00:16:44.810 --> 00:16:48.770 Why don't we return something like 0, maybe, just to kick things off? 00:16:48.770 --> 00:16:50.270 If I say if now-- 00:16:50.270 --> 00:16:56.990 if votes is NA, well, then inside I could return some special value, 00:16:56.990 --> 00:17:00.290 like 0, saying that, look, we couldn't count your votes. 00:17:00.290 --> 00:17:04.520 But otherwise, let's say, we could go ahead and safely return votes. 00:17:04.520 --> 00:17:08.940 So if votes is not MA, we can go ahead and return votes instead. 00:17:08.940 --> 00:17:11.270 And I think this will be a little bit safer for us. 00:17:11.270 --> 00:17:14.089 If I go ahead and click on source now-- 00:17:14.089 --> 00:17:17.480 let me go ahead and type in duck for Mario, quack for Peach, 00:17:17.480 --> 00:17:20.720 and cat for Bowser, and-- 00:17:20.720 --> 00:17:24.035 so I seem to have gotten total votes being 0. 00:17:24.035 --> 00:17:25.160 That's a little bit better. 00:17:25.160 --> 00:17:26.960 It's no longer NA. 00:17:26.960 --> 00:17:30.800 We seem to have just not counted, like, cat, duck, or quack, 00:17:30.800 --> 00:17:32.750 but I still get these warnings. 00:17:32.750 --> 00:17:36.440 Now, these warnings we'll see a little bit more depth in a future lecture. 00:17:36.440 --> 00:17:40.280 R does have warnings and errors that are more generally known as exceptions, 00:17:40.280 --> 00:17:44.927 but, for now, we can handle them using a function called suppress warnings. 00:17:44.927 --> 00:17:46.760 Suppress warnings allows me, the programmer, 00:17:46.760 --> 00:17:50.780 to say, look, I know something went wrong, but I'm handling it myself. 00:17:50.780 --> 00:17:51.920 I know how to do this. 00:17:51.920 --> 00:17:56.360 So let's see if we can tell as.integer to not give us a warning 00:17:56.360 --> 00:17:58.610 anymore, because we're handling it a little bit later. 00:17:58.610 --> 00:18:00.200 Let's go back to RStudio here. 00:18:00.200 --> 00:18:04.940 And I could use, like we said, this function called suppress warnings, 00:18:04.940 --> 00:18:09.050 where suppress warnings takes as input a function that could give us a warning, 00:18:09.050 --> 00:18:11.280 in this case like as.integer. 00:18:11.280 --> 00:18:15.770 So now if I give as input this particular function-- like this, 00:18:15.770 --> 00:18:17.480 suppress warnings-- 00:18:17.480 --> 00:18:21.770 what I'm effectively saying is that as you take the user's input 00:18:21.770 --> 00:18:24.965 and you convert it to an integer, if you encounter a warning, 00:18:24.965 --> 00:18:26.090 don't give me that warning. 00:18:26.090 --> 00:18:29.030 Just kind of suppress it, keep it low, and I, myself, the programmer, 00:18:29.030 --> 00:18:31.800 will handle it later on as well. 00:18:31.800 --> 00:18:33.620 So let's try this now. 00:18:33.620 --> 00:18:38.510 I'll click source, and then I'll do 100 votes for Mario, 150 for Peach, 00:18:38.510 --> 00:18:40.520 and 120 for Bowser. 00:18:40.520 --> 00:18:42.320 And I think now we're back in action. 00:18:42.320 --> 00:18:43.980 Although that was actually-- that was some good input, 00:18:43.980 --> 00:18:45.230 so let's try the bad input. 00:18:45.230 --> 00:18:50.270 Let's go ahead and do duck for Mario, quack for Peach, and cat for Bowser. 00:18:50.270 --> 00:18:55.700 And now we see total votes being 0, but now no warnings being raised thanks 00:18:55.700 --> 00:18:57.133 to suppress warnings. 00:18:57.133 --> 00:18:59.300 So we'll see this in more depth in a future lecture, 00:18:59.300 --> 00:19:02.240 but, for now, just think about suppressing those warnings, 00:19:02.240 --> 00:19:04.820 kind of silencing them because we the programmer know 00:19:04.820 --> 00:19:08.150 how to handle those in our own code. 00:19:08.150 --> 00:19:13.250 There's one more improvement I see here, which is that this block from line three 00:19:13.250 --> 00:19:14.180 to line seven-- 00:19:14.180 --> 00:19:18.980 this if else-- could be simplified, could be converted to one line of code. 00:19:18.980 --> 00:19:22.400 And, in fact, if you have an if else statement where 00:19:22.400 --> 00:19:27.290 inside the if and inside the else you're simply returning one value or another, 00:19:27.290 --> 00:19:30.800 well, you could simplify this and use a function called if else, 00:19:30.800 --> 00:19:33.890 just like this, where the first argument to if else 00:19:33.890 --> 00:19:39.240 is the logical expression to test, in this case, is votes NA, just like this. 00:19:39.240 --> 00:19:41.120 And if that expression is true-- 00:19:41.120 --> 00:19:44.060 if votes is NA, well, the second argument 00:19:44.060 --> 00:19:47.180 will be the thing we return, the value we get back from if else, 00:19:47.180 --> 00:19:50.150 and then the third argument will be what we get back 00:19:50.150 --> 00:19:52.550 if this logical expression is false. 00:19:52.550 --> 00:19:54.360 It's the else in if else. 00:19:54.360 --> 00:19:56.640 So I'll say votes here instead. 00:19:56.640 --> 00:20:00.320 So now, to be clear, line three is doing exactly the same work 00:20:00.320 --> 00:20:04.190 as lines 5 through 9 but is much shorter, I would argue, more readable, 00:20:04.190 --> 00:20:06.920 and so I can now get rid of lines 5 through 9 00:20:06.920 --> 00:20:08.990 and shorten this function even more. 00:20:08.990 --> 00:20:13.640 And because R will return to me the last computed value, if else, 00:20:13.640 --> 00:20:17.220 whatever it returns, will be the return value of my function itself. 00:20:17.220 --> 00:20:20.040 So if votes is Na, if else will return 0, 00:20:20.040 --> 00:20:23.740 but so will my function, and same with votes as well. 00:20:23.740 --> 00:20:25.150 So let's try this again. 00:20:25.150 --> 00:20:29.490 I'll click source, and let me go ahead and say something like duck for Mario, 00:20:29.490 --> 00:20:34.080 but I will enter in maybe 150 for Peach, and cat for Bowser. 00:20:34.080 --> 00:20:37.200 And now we see our total votes is 150. 00:20:37.200 --> 00:20:41.340 And I think we've really simplified this function for ourselves here. 00:20:41.340 --> 00:20:44.010 Now, as we've done this, I think you're seeing 00:20:44.010 --> 00:20:48.090 the power of putting this functionality inside of a function. 00:20:48.090 --> 00:20:50.670 If I hadn't done this, if I had had to repeat 00:20:50.670 --> 00:20:54.690 this code over and over and over again, my code would have been much longer. 00:20:54.690 --> 00:20:58.350 You can imagine myself repeating that same conditional if else, if else, 00:20:58.350 --> 00:21:00.750 if else through all of my prompts to the user. 00:21:00.750 --> 00:21:05.397 But by converting that code into my very own function, I can modularize things. 00:21:05.397 --> 00:21:08.730 I can make things easier to maintain and update, which is why in the first place 00:21:08.730 --> 00:21:11.280 we would write functions like these. 00:21:11.280 --> 00:21:14.850 So we've seen how to write our very first function in R, 00:21:14.850 --> 00:21:17.790 to handle some errors our user could present us with. 00:21:17.790 --> 00:21:21.760 What other questions do we have about defining these functions? 00:21:21.760 --> 00:21:27.130 AUDIENCE: If we had the first version of our function get_votes 00:21:27.130 --> 00:21:32.710 that we were still not checking for Na values by coercion, 00:21:32.710 --> 00:21:37.930 would we need to actually store our computation in the votes 00:21:37.930 --> 00:21:41.958 object, or could we just return the value directly? 00:21:41.958 --> 00:21:43.750 CARTER ZENKE: Yeah, a really good question. 00:21:43.750 --> 00:21:46.630 I think something that gets at shortening this program even more. 00:21:46.630 --> 00:21:50.592 If we go back, rewind a little bit to maybe not handling these NA that 00:21:50.592 --> 00:21:53.050 could be introduced, but instead just returning, let's say, 00:21:53.050 --> 00:21:54.820 whatever the number the user gives us is, 00:21:54.820 --> 00:21:56.260 I could probably shorten it even more. 00:21:56.260 --> 00:21:59.052 So let me go back to RStudio and show you how that could look like. 00:21:59.052 --> 00:22:03.830 I will maybe get rid of this if else here, and I'll instead maybe do this. 00:22:03.830 --> 00:22:06.850 I'll go back to what we had before, which was assigning this object 00:22:06.850 --> 00:22:09.358 votes whatever number the user types in. 00:22:09.358 --> 00:22:12.400 Now, I think you were asking, could we just get rid of this object votes? 00:22:12.400 --> 00:22:16.720 Could we simply have this, as.integer, readline, given some prompt? 00:22:16.720 --> 00:22:18.820 I think we could, because as.integer will still 00:22:18.820 --> 00:22:21.490 return for us whatever number the user has typed in, 00:22:21.490 --> 00:22:23.950 and therefore-- because the last line of my function-- 00:22:23.950 --> 00:22:27.080 my function will instead return that value as well. 00:22:27.080 --> 00:22:29.060 So I'll click on source to run my program. 00:22:29.060 --> 00:22:33.070 I'll type 100 for Mario, 150 for Peach, and 120 for Bowser, 00:22:33.070 --> 00:22:35.350 and now I see the same result we wanted. 00:22:35.350 --> 00:22:40.060 I would argue, though, because we want to keep this value and test its actual-- 00:22:40.060 --> 00:22:42.000 its value later on, like, was it NA or was it 00:22:42.000 --> 00:22:44.750 a number-- we might want to actually store it in a separate object 00:22:44.750 --> 00:22:47.350 and then test that value a little bit later, like we did here. 00:22:47.350 --> 00:22:50.740 But a great question, and a good optimization too. 00:22:50.740 --> 00:22:53.230 OK, so our program is better. 00:22:53.230 --> 00:22:55.300 It's certainly better than it was before, 00:22:55.300 --> 00:22:57.550 but there's still one thing I think that's missing, 00:22:57.550 --> 00:23:03.220 which is, if I click source now and I type in, maybe, quack for Mario, 00:23:03.220 --> 00:23:05.800 I've missed my chance now to enter Mario's votes. 00:23:05.800 --> 00:23:08.830 Wouldn't it be nice if instead my program could reprompt me 00:23:08.830 --> 00:23:13.300 every time to enter in a number for Mario, and it won't stop, let's say, 00:23:13.300 --> 00:23:16.780 until I do comply, I enter in the number for Mario's votes? 00:23:16.780 --> 00:23:20.250 Well, for that, we'll actually need some new structure, one called a loop. 00:23:20.250 --> 00:23:22.000 And in just a few minutes, we'll come back 00:23:22.000 --> 00:23:24.550 and talk about how to implement these loops in R code. 00:23:24.550 --> 00:23:26.230 We'll see you all in five. 00:23:26.230 --> 00:23:27.332 Well, we're back. 00:23:27.332 --> 00:23:29.290 And, as promised, we're going to learn together 00:23:29.290 --> 00:23:31.832 about these things called loops, these structures that let us 00:23:31.832 --> 00:23:34.570 repeat some code some number of times. 00:23:34.570 --> 00:23:37.390 Now, for this, I brought along a friend, the CS50 duck debugger, 00:23:37.390 --> 00:23:40.690 which is great to talk to about my code, [? the ?] illogic in my thoughts, 00:23:40.690 --> 00:23:42.670 but also great for thinking about loops. 00:23:42.670 --> 00:23:45.640 In particular, if you have a duck or any kind of object to squeeze, 00:23:45.640 --> 00:23:48.820 you could use that to think about how loops work underneath the hood. 00:23:48.820 --> 00:23:52.600 So let's go ahead and jump in and see what this duck can teach us about loops. 00:23:52.600 --> 00:23:55.660 So if you go back to my code over here in RStudio, 00:23:55.660 --> 00:23:59.320 I have a program that kind of simulates me squeezing this duck three times, 00:23:59.320 --> 00:24:02.182 for instance, like this. 00:24:02.182 --> 00:24:05.140 It's a bit more of a squeak than a quack, but we'll go with it for now. 00:24:05.140 --> 00:24:07.450 If I type source here-- click source-- 00:24:07.450 --> 00:24:11.230 I'll see quack, quack, quack, me squeezing this duck now 00:24:11.230 --> 00:24:16.360 three different times, so putting what we just did physically now into text. 00:24:16.360 --> 00:24:21.400 So let's visualize this code in terms of a flow chart and see what it looks like. 00:24:21.400 --> 00:24:24.640 Well, here, at the top of my program, I start it. 00:24:24.640 --> 00:24:29.170 I click source, my program begins, and the next step then is to say, quack. 00:24:29.170 --> 00:24:32.260 Every arrow, now, indicates some next step of my program. 00:24:32.260 --> 00:24:35.260 Well, after I say quack one time, I squeeze once, 00:24:35.260 --> 00:24:37.550 what will I do but say quack again? 00:24:37.550 --> 00:24:38.540 Just like this. 00:24:38.540 --> 00:24:40.490 And I'll quack again, just like this. 00:24:40.490 --> 00:24:43.900 And now I'm at the end of my program, I stop entirely. 00:24:43.900 --> 00:24:49.810 But let's say I want to quack my duck or squeeze it more than three times. 00:24:49.810 --> 00:24:54.730 If I have only this to work with, what might quickly become my problem? 00:24:54.730 --> 00:24:59.260 If I want to do this five times or 10 times, or even more, well, 00:24:59.260 --> 00:25:01.788 I'd probably need to do a lot of copying and pasting. 00:25:01.788 --> 00:25:04.330 If I come back to my code here to show you what I need to do, 00:25:04.330 --> 00:25:09.220 if I want to simulate squeezing this duck not just three times but 5 or 6 00:25:09.220 --> 00:25:13.780 or 10 or more, well, I need to copy line three, put it on line four, 00:25:13.780 --> 00:25:17.350 copy line four, put it on line five, and so on and so forth 00:25:17.350 --> 00:25:20.820 to repeat this code some number of times. 00:25:20.820 --> 00:25:23.353 Now, thankfully, we don't have to do this in R. 00:25:23.353 --> 00:25:26.520 And, in fact, as programmers, you should be looking out for cases like these 00:25:26.520 --> 00:25:30.640 and thinking, I could probably use a loop instead. 00:25:30.640 --> 00:25:34.740 So let's see what kind of loops R offers us, what kind of keywords 00:25:34.740 --> 00:25:39.390 we could use to make a loop and to repeat this code some number of times. 00:25:39.390 --> 00:25:44.070 Well, one of the first loops we have at our disposal is one called repeat. 00:25:44.070 --> 00:25:46.590 Repeat allows us to repeat whatever code is 00:25:46.590 --> 00:25:50.640 inside of its curly braces infinitely, however many times we want to. 00:25:50.640 --> 00:25:55.050 So let me go ahead and go into my RStudio again, and I'll type repeat now, 00:25:55.050 --> 00:25:58.590 this keyword, and I will then inside of those curly braces 00:25:58.590 --> 00:26:02.490 put this function, cat, which will print to the screen quack. 00:26:02.490 --> 00:26:07.260 And before I run this code, let's visualize its flowchart 00:26:07.260 --> 00:26:09.210 to see how it might work. 00:26:09.210 --> 00:26:12.750 Well, here on my screen I have this program. 00:26:12.750 --> 00:26:15.540 I'm going to start that program, and then 00:26:15.540 --> 00:26:19.360 I'm going to quack or squeeze my duck, and then I'm going to follow the arrow 00:26:19.360 --> 00:26:21.560 and quack, just like this. 00:26:21.560 --> 00:26:24.378 And then I'm going to follow the arrow and quack just like this, 00:26:24.378 --> 00:26:26.170 and I'm going to go follow the arrow again. 00:26:26.170 --> 00:26:32.560 And I worry I'd be stuck here for a very long time, because it seems 00:26:32.560 --> 00:26:36.280 like our next step is always to go back and to quack and to quack and to quack 00:26:36.280 --> 00:26:36.880 again. 00:26:36.880 --> 00:26:41.860 So before we dive into fixing this, let's talk about a bit of vocabulary. 00:26:41.860 --> 00:26:44.860 Now, what we've created here is, in fact, a loop. 00:26:44.860 --> 00:26:48.190 We're repeating this code over and over and over again. 00:26:48.190 --> 00:26:53.830 Now, each time we repeat this set of code, we're calling that one iteration. 00:26:53.830 --> 00:26:57.100 So, in other words, when I loop again and again and again 00:26:57.100 --> 00:26:59.860 I'm iterating again and again and again. 00:26:59.860 --> 00:27:04.030 One iteration means one segment of my code, top to bottom, 00:27:04.030 --> 00:27:05.620 inside of that loop. 00:27:05.620 --> 00:27:08.470 And I think what we've created here is something called 00:27:08.470 --> 00:27:11.050 an infinite loop, one that will never, ever end, 00:27:11.050 --> 00:27:14.620 because there's no condition telling us when to stop looping. 00:27:14.620 --> 00:27:17.980 So we'll need to figure out how to break out of this kind of loop 00:27:17.980 --> 00:27:20.230 and figure out what we could do to get out of it. 00:27:20.230 --> 00:27:22.780 Now, thankfully, R does offer us some keywords 00:27:22.780 --> 00:27:26.110 to do just that, so let's explore them now in R. 00:27:26.110 --> 00:27:27.580 I come back to RStudio. 00:27:27.580 --> 00:27:32.470 We'll want to introduce these two keywords here, break and next. 00:27:32.470 --> 00:27:35.650 Break symbolizes breaking out of some loop. 00:27:35.650 --> 00:27:39.610 When R encounters that break keyword, it will end that loop entirely, 00:27:39.610 --> 00:27:42.100 will stop wherever it is and end that loop. 00:27:42.100 --> 00:27:46.420 Next, on the other hand, says wherever you are in this iteration, 00:27:46.420 --> 00:27:49.540 go ahead and start the next iteration from the top. 00:27:49.540 --> 00:27:55.630 So let's try these out now in R. If I come back to my program, duck.R, 00:27:55.630 --> 00:27:59.470 I don't want to repeat this quack over and over and over again. 00:27:59.470 --> 00:28:01.270 But let's say I just-- 00:28:01.270 --> 00:28:04.360 maybe accidentally, I click source now, and, well, 00:28:04.360 --> 00:28:07.300 my computer is just stuck saying quack, quack, quack, quack 00:28:07.300 --> 00:28:09.040 over and over infinitely forever. 00:28:09.040 --> 00:28:11.710 I could, if I wanted to, exit this program. 00:28:11.710 --> 00:28:16.660 If I type control C, that means stop this program whether we're in a loop 00:28:16.660 --> 00:28:17.740 or we're not. 00:28:17.740 --> 00:28:21.160 So that can save us, control C. But ideally, I 00:28:21.160 --> 00:28:24.640 should consider a stop condition before I go ahead 00:28:24.640 --> 00:28:27.370 and repeat something infinitely many times. 00:28:27.370 --> 00:28:30.820 Now, what could I do if I wanted to quack, let's say, three times? 00:28:30.820 --> 00:28:34.600 One thing I could do is think about counting, like, maybe on my fingers. 00:28:34.600 --> 00:28:37.450 If I want to quack three times, I could maybe start at three. 00:28:37.450 --> 00:28:42.310 And if I have three here, I could quack, I could go down to two and quack 00:28:42.310 --> 00:28:45.040 again, go down to one and quack again. 00:28:45.040 --> 00:28:47.050 And then finally, at 0, I'm done. 00:28:47.050 --> 00:28:49.120 I shouldn't squeeze my duck anymore. 00:28:49.120 --> 00:28:54.790 So one thing we could do is try to put this idea of counting now in code. 00:28:54.790 --> 00:28:58.360 Well, I could create an object to store the number of times 00:28:58.360 --> 00:29:00.460 I want to squeeze this duck. 00:29:00.460 --> 00:29:03.220 I could, by convention, call it i, and that 00:29:03.220 --> 00:29:06.560 will keep track of the number of times I want to iterate in this loop. 00:29:06.560 --> 00:29:10.930 So on line one, I'll say I'm going to assign this value i to be three. 00:29:10.930 --> 00:29:14.050 It's kind of similar to me holding up my hands and saying three fingers, 00:29:14.050 --> 00:29:15.010 for instance. 00:29:15.010 --> 00:29:19.930 And now, as I repeat this code, I don't want to repeat it infinitely. 00:29:19.930 --> 00:29:24.460 I want to have some condition under which I break out of this loop. 00:29:24.460 --> 00:29:27.070 And as we saw before with my fingers, maybe the condition 00:29:27.070 --> 00:29:33.040 is if i is equal to 0, well, at that point I want to break this loop. 00:29:33.040 --> 00:29:35.950 I want to exit it and not loop anymore. 00:29:35.950 --> 00:29:41.260 But I shouldn't run this code just yet, because while I've set i equal 00:29:41.260 --> 00:29:44.380 to 3, like I have my fingers here, what I haven't done 00:29:44.380 --> 00:29:47.590 is made a mechanism for actually dropping fingers, going from 3 to 2, 00:29:47.590 --> 00:29:49.420 from 2 to 1, from 1 to 0. 00:29:49.420 --> 00:29:54.280 So maybe after I quack, I'll go ahead and adjust the value of i. 00:29:54.280 --> 00:29:58.520 I'll set it equal to i minus 1, just like this. 00:29:58.520 --> 00:30:03.220 And then let's say-- maybe in the case that i is 0, eventually 00:30:03.220 --> 00:30:05.590 we're going to break out of the loop, but if it's not, 00:30:05.590 --> 00:30:08.650 why don't I go ahead and go to the next iteration? 00:30:08.650 --> 00:30:12.610 When we see the next keyword, we'll stop our current iteration and go to the top 00:30:12.610 --> 00:30:16.370 again to repeat our code, top to bottom, just like this. 00:30:16.370 --> 00:30:20.150 So the flowchart for this looks a bit more like this. 00:30:20.150 --> 00:30:22.750 I'm going to start my program, and I'm first 00:30:22.750 --> 00:30:26.830 going to set i equal to 3, kind of like I did on my fingers here. 00:30:26.830 --> 00:30:31.600 Then I'm going to squeeze the duck, going to quack, subtract one from i, 00:30:31.600 --> 00:30:35.530 and ask a question, is i equal to 0? 00:30:35.530 --> 00:30:38.830 If it's not, well, I'll go back up and I'll squeeze the duck again. 00:30:38.830 --> 00:30:41.350 I'll subtract 1, and ask that same question. 00:30:41.350 --> 00:30:45.130 And if I ever get to ask that question and the result is true, 00:30:45.130 --> 00:30:47.960 well, then I'll stop my program. 00:30:47.960 --> 00:30:51.190 So let's visualize this here using some interactive stuff. 00:30:51.190 --> 00:30:56.300 I have my iPad here that can count, let's say, from 3, just like this-- 00:30:56.300 --> 00:30:56.800 whoops-- 00:30:56.800 --> 00:30:59.390 3 down to 0, let's say. 00:30:59.390 --> 00:31:02.600 So, currently, when i is 3, what do I do? 00:31:02.600 --> 00:31:05.360 I squeeze my duck once, just like this. 00:31:05.360 --> 00:31:09.280 I'll then subtract one from i, where i is kind of this iPad here 00:31:09.280 --> 00:31:12.070 where I subtract 1 now, and now i is 2. 00:31:12.070 --> 00:31:15.700 Well, I'll go back up and I'll squeeze again, I'll subtract 1 from i. 00:31:15.700 --> 00:31:17.110 Now i is 1. 00:31:17.110 --> 00:31:20.300 I'll ask the question, is i equal to 0? 00:31:20.300 --> 00:31:22.220 It's not, so we'll go back up again. 00:31:22.220 --> 00:31:25.920 I'll squeeze it up one more time, and I'll subtract 1 again. 00:31:25.920 --> 00:31:29.510 And now i is equal to 0, so we'll stop our program. 00:31:29.510 --> 00:31:32.180 There will be no more squeezing of this duck. 00:31:32.180 --> 00:31:36.080 So this is one way to approach the problem of creating some loop 00:31:36.080 --> 00:31:39.320 and having it repeat a certain number of times. 00:31:39.320 --> 00:31:42.110 But R comes with other kinds of loops too. 00:31:42.110 --> 00:31:45.140 A repeat loop is great when you want to do something at least once. 00:31:45.140 --> 00:31:47.570 I want to squeeze this duck at least one time. 00:31:47.570 --> 00:31:50.750 But if I only want to do it if some condition is true 00:31:50.750 --> 00:31:54.570 or while some condition is true, I could use another kind of loop as well. 00:31:54.570 --> 00:31:57.020 This loop is called a while loop. 00:31:57.020 --> 00:32:03.060 A while loop lets us repeat some set of code while some condition is true. 00:32:03.060 --> 00:32:08.330 So let's see what that looks like now in R. I will remove what I currently have 00:32:08.330 --> 00:32:11.520 and, instead, implement this while loop. 00:32:11.520 --> 00:32:17.550 So if I want to make a while loop, I can use while, just like this, 00:32:17.550 --> 00:32:20.510 and I'll make a condition to repeat under. 00:32:20.510 --> 00:32:24.920 As long as this condition is true, I will repeat the code inside this 00:32:24.920 --> 00:32:26.360 while loops curly braces. 00:32:26.360 --> 00:32:30.200 So I could say, maybe, while i is not equal to 0, 00:32:30.200 --> 00:32:33.800 I want to repeat whatever code is inside of this while loop. 00:32:33.800 --> 00:32:37.820 Well, I want to quack, so I'll say quack just like before, 00:32:37.820 --> 00:32:40.730 backslash n, make a new line, and then why 00:32:40.730 --> 00:32:44.450 don't I go ahead and add back this kind of helper object I had called i? 00:32:44.450 --> 00:32:46.520 I is assigned the value 3. 00:32:46.520 --> 00:32:51.180 And after I quack, well, I'll subtract one from i just like this. 00:32:51.180 --> 00:32:54.020 I is now assigned the value i minus 1. 00:32:54.020 --> 00:32:58.100 So a bit shorter than our repeat loop, but I'd argue 00:32:58.100 --> 00:32:59.690 they do the same exact thing now. 00:32:59.690 --> 00:33:02.930 So let's visualize what's happening in this program. 00:33:02.930 --> 00:33:07.620 Well, the first thing we do is we set i equal to 3, just like this. 00:33:07.620 --> 00:33:11.480 And then we ask the question-- before we do anything else, we ask the question, 00:33:11.480 --> 00:33:14.300 is i not equal to 0? 00:33:14.300 --> 00:33:18.140 If it's not equal to 0, if that is true, we're going to squeeze our duck 00:33:18.140 --> 00:33:20.330 and subtract one from i. 00:33:20.330 --> 00:33:24.890 And then we'll ask the question again, is i not equal to 0? 00:33:24.890 --> 00:33:28.520 And if ever in our loop that question is false, that is, 00:33:28.520 --> 00:33:33.290 this condition is no longer true, we will stop, exit our loop entirely. 00:33:33.290 --> 00:33:35.540 So let's visualize this now. 00:33:35.540 --> 00:33:37.790 I was first set to 3. 00:33:37.790 --> 00:33:39.950 So I'll set i here to 3. 00:33:39.950 --> 00:33:43.490 And now the difference is, before I do anything, 00:33:43.490 --> 00:33:45.020 I'm going to check this condition. 00:33:45.020 --> 00:33:49.490 Now is i equal to 0, or is i not equal to 0? 00:33:49.490 --> 00:33:52.940 Well i is 3 so, yes, i is not equal to 0. 00:33:52.940 --> 00:33:57.650 I'll go ahead and quack, just like this, and I'll subtract one from i. 00:33:57.650 --> 00:34:00.980 Now I'll go back to the top of my loop and ask that question again. 00:34:00.980 --> 00:34:03.800 Is i not equal to 0? 00:34:03.800 --> 00:34:05.810 Well, i is 2, so it's not equal to 0. 00:34:05.810 --> 00:34:08.870 I'll go ahead and squeeze my duck, subtract 1 from i, 00:34:08.870 --> 00:34:11.000 and then ask the question again. 00:34:11.000 --> 00:34:13.190 Is i not equal to 0? 00:34:13.190 --> 00:34:14.600 Yes, it's not equal to 0. 00:34:14.600 --> 00:34:16.730 I'll squeeze, subtract 1 from i. 00:34:16.730 --> 00:34:21.199 But now, when I ask the question, is I not equal to 0, 00:34:21.199 --> 00:34:23.870 well, i is not not equal to zero. 00:34:23.870 --> 00:34:28.370 In fact, it is 0, so I will exit my loop altogether, squeezing my duck now 00:34:28.370 --> 00:34:30.080 three times in total. 00:34:30.080 --> 00:34:31.409 So let's visualize this. 00:34:31.409 --> 00:34:33.320 I'll come back to RStudio now. 00:34:33.320 --> 00:34:36.230 And if I run this code by clicking source, 00:34:36.230 --> 00:34:44.060 I will have quacked exactly three times, counting down from three, two, and one. 00:34:44.060 --> 00:34:50.630 OK, so just as we have counted down, we could also imagine counting up. 00:34:50.630 --> 00:34:58.760 Maybe I start i at 1 and my condition now is to loop so long as i is less than 00:34:58.760 --> 00:34:59.660 or equal to 3. 00:34:59.660 --> 00:35:04.220 Like I could imagine one, two, three times, but not four. 00:35:04.220 --> 00:35:10.340 So I'll say, while i is less than or equal to 3, I want to keep quacking. 00:35:10.340 --> 00:35:14.570 But now I need to actually increase i as I go. 00:35:14.570 --> 00:35:18.990 On each iteration of my loop, on each run from the code top to bottom, 00:35:18.990 --> 00:35:21.570 I want to increase i by 1. 00:35:21.570 --> 00:35:25.590 And now let's visualize what this is doing in terms of a flow chart. 00:35:25.590 --> 00:35:29.460 Here, very similar idea, but we're starting now from 1. 00:35:29.460 --> 00:35:34.740 I equals 1, and we'll ask this question, is i less than or equal to 3? 00:35:34.740 --> 00:35:38.550 If it is, we'll squeeze our duck and add 1 to i. 00:35:38.550 --> 00:35:41.520 If it's not, as we go back and approach that question again 00:35:41.520 --> 00:35:44.760 at the top of our loop, if it's ever not the case that i is less than 00:35:44.760 --> 00:35:48.180 or equal to 3, we'll stop, we won't loop anymore. 00:35:48.180 --> 00:35:53.550 So, again, let's visualize this, but now i is first set to 1. 00:35:53.550 --> 00:35:58.890 So before we do anything, we ask the question, is i less than or equal to 3? 00:35:58.890 --> 00:35:59.700 It is. 00:35:59.700 --> 00:36:04.320 We'll squeeze our duck, add 1 now to i, and ask the question again. 00:36:04.320 --> 00:36:06.870 Is i less than or equal to 3? 00:36:06.870 --> 00:36:07.740 It is. 00:36:07.740 --> 00:36:09.750 I'll squeeze, add 1 to i. 00:36:09.750 --> 00:36:11.070 Now it's 3. 00:36:11.070 --> 00:36:13.260 Is 3 less than or equal to 3? 00:36:13.260 --> 00:36:16.290 Well, it's equal to, so I'll go ahead and squeeze, add 1 to i. 00:36:16.290 --> 00:36:20.670 And now it's 4, and 4 is not less than or equal to 3, 00:36:20.670 --> 00:36:24.480 so we'll go ahead and stop our loop and not loop anymore. 00:36:24.480 --> 00:36:28.830 Come back now to RStudio and I will show you what this looks like now. 00:36:28.830 --> 00:36:32.760 If I click source, I'll see quack, quack, quack. 00:36:32.760 --> 00:36:36.160 Again, quacking three separate times. 00:36:36.160 --> 00:36:39.510 So we've seen now two kinds of loops in R, 00:36:39.510 --> 00:36:43.140 one called a repeat loop and one called a while loop. 00:36:43.140 --> 00:36:47.640 Let me ask, what questions do we have about these loops so far? 00:36:47.640 --> 00:36:51.870 AUDIENCE: How could we decide when to use repeat 00:36:51.870 --> 00:36:55.038 or when to use while, it doesn't matter? 00:36:55.038 --> 00:36:56.830 CARTER ZENKE: Yeah, a really good question. 00:36:56.830 --> 00:36:59.790 So in general, as we'll see, a repeat loop 00:36:59.790 --> 00:37:02.850 tends to be good when you want to do something at least once-- 00:37:02.850 --> 00:37:06.180 you want to quack at least once, you want to prompt the user at least 00:37:06.180 --> 00:37:10.050 once-- and then check if you should repeat or not. 00:37:10.050 --> 00:37:13.322 A while loop is good if, at the very beginning, before you do anything else, 00:37:13.322 --> 00:37:15.030 you want to check some condition, and you 00:37:15.030 --> 00:37:18.160 want to repeat that code while some condition is true. 00:37:18.160 --> 00:37:19.938 So you could think of a repeat being like, 00:37:19.938 --> 00:37:22.230 do this once, but then check if you should do it again, 00:37:22.230 --> 00:37:24.930 whereas a while loop is more like, if our condition is true, 00:37:24.930 --> 00:37:27.800 we should be repeating this code over and over again. 00:37:27.800 --> 00:37:29.770 Really good question there. 00:37:29.770 --> 00:37:31.080 All right, let's keep going. 00:37:31.080 --> 00:37:36.960 And one more loop we have available to us in R is one called a for loop. 00:37:36.960 --> 00:37:42.510 A for loop lets us do some piece of code for each element in some list 00:37:42.510 --> 00:37:44.580 or vector of elements. 00:37:44.580 --> 00:37:49.770 So instead of now using the while keyword, I could use the for keyword. 00:37:49.770 --> 00:37:51.000 I could say for-- 00:37:51.000 --> 00:37:55.110 and it turns out that inside of the parentheses of a for loop 00:37:55.110 --> 00:37:56.670 I need a few different components. 00:37:56.670 --> 00:38:00.840 I need, still, some kind of helper object to keep track of each iteration, 00:38:00.840 --> 00:38:05.220 but I also need some vector of elements to do some piece of code 00:38:05.220 --> 00:38:07.717 for each vector in that element-- 00:38:07.717 --> 00:38:09.300 or each element in that vector, sorry. 00:38:09.300 --> 00:38:14.760 So I'll say for i in, and then have a vector here, let's say, 1, 2, and 3. 00:38:14.760 --> 00:38:19.830 And now I have the same as before, my body of this loop, where inside of it 00:38:19.830 --> 00:38:22.840 I want to quack, just like this. 00:38:22.840 --> 00:38:27.480 Now, notice there's no need for me now to increment or decrement 00:38:27.480 --> 00:38:31.680 i, to add or subtract 1, because this is all taken care of thanks 00:38:31.680 --> 00:38:32.730 to our for loop. 00:38:32.730 --> 00:38:39.000 What the for loop will do is first set i equal to 1, and then it'll say quack. 00:38:39.000 --> 00:38:42.510 And then set i equal to 2, and then quack again. 00:38:42.510 --> 00:38:45.960 And then set i equal to 3, and quack again. 00:38:45.960 --> 00:38:48.990 But then, at the end of this vector, once there 00:38:48.990 --> 00:38:53.410 are no more elements to iterate over, well, our for loop is done. 00:38:53.410 --> 00:38:59.430 So i, as it iterates, kind of assumes the value of 1 and then 2 and then 3. 00:38:59.430 --> 00:39:02.610 And if I were to click source now, I would see quack, quack, 00:39:02.610 --> 00:39:05.760 and quack down here in my console. 00:39:05.760 --> 00:39:08.100 If I could simplify this just a little bit more, 00:39:08.100 --> 00:39:11.820 I could maybe make a vector that's going to be a little more dynamic than this. 00:39:11.820 --> 00:39:17.700 Like I could imagine myself typing in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to quack 00:39:17.700 --> 00:39:20.820 10 times now, if I were to click source here. 00:39:20.820 --> 00:39:24.870 But that's going to get really tedious if I want to quack more than, let's say, 00:39:24.870 --> 00:39:26.080 three or four times. 00:39:26.080 --> 00:39:28.680 So what I could do instead is use our syntax 00:39:28.680 --> 00:39:32.410 to give me some vector that is between certain numbers. 00:39:32.410 --> 00:39:35.160 So 1 colon 10, for instance, would say, give me 00:39:35.160 --> 00:39:38.070 a vector that includes 1 through 10 inclusive, which 00:39:38.070 --> 00:39:39.690 I can show you in my console here. 00:39:39.690 --> 00:39:43.410 1 colon 10 gives me one through 10 inclusive. 00:39:43.410 --> 00:39:47.190 With this, could I actually change how many times I loop? 00:39:47.190 --> 00:39:49.680 If I click source, I'll see 10 quacks. 00:39:49.680 --> 00:39:55.200 If I change this to 1 colon 3, well, now I'm able to see quack, quack, quack 00:39:55.200 --> 00:39:57.150 down here in my console. 00:39:57.150 --> 00:40:00.300 So a for loop is going to be the tool for you 00:40:00.300 --> 00:40:03.510 if you have some list, some vector elements to loop over, 00:40:03.510 --> 00:40:09.100 and then you want to do some piece of code for each of those elements there. 00:40:09.100 --> 00:40:13.920 OK, so now we've seen the three kinds of loops in R. We've seen repeat loops, 00:40:13.920 --> 00:40:16.920 we've seen while loops, and we've seen for loops. 00:40:16.920 --> 00:40:20.270 Our next step will be to apply these same loops to improve 00:40:20.270 --> 00:40:21.860 the design of our programs. 00:40:21.860 --> 00:40:24.230 We'll come back in five and do just that. 00:40:24.230 --> 00:40:25.670 See you all soon. 00:40:25.670 --> 00:40:26.930 Well, we're back. 00:40:26.930 --> 00:40:29.060 And, as promised, we're now going to explore 00:40:29.060 --> 00:40:33.690 how we could apply functions to make the design of our programs even better. 00:40:33.690 --> 00:40:39.860 So let's pick up where we last left off, writing this program called count.R. 00:40:39.860 --> 00:40:42.080 And we left off with this idea of wanting 00:40:42.080 --> 00:40:46.130 to reprompt the user any number of times until they 00:40:46.130 --> 00:40:50.360 comply with whatever kind of input we want, in this case, a number. 00:40:50.360 --> 00:40:54.500 So we saw just a little bit ago that a repeat loop is a great loop 00:40:54.500 --> 00:40:56.570 to use when you want to do something at least 00:40:56.570 --> 00:41:00.830 once and then check if you should do it again or break out of the loop. 00:41:00.830 --> 00:41:04.640 Now, in this case, I do want to prompt the user at least once. 00:41:04.640 --> 00:41:07.160 I want to tell them to input some number at least once. 00:41:07.160 --> 00:41:09.368 And if they don't comply, well, then I'll loop again, 00:41:09.368 --> 00:41:11.330 but at least I want to do it once. 00:41:11.330 --> 00:41:14.780 So I could use a repeat loop here, and I could use it 00:41:14.780 --> 00:41:17.430 in the same way we just saw, using the repeat keyword 00:41:17.430 --> 00:41:20.350 followed by some parentheses or some brackets just like this. 00:41:20.350 --> 00:41:24.480 And then inside those brackets-- inside of this loop's body-- 00:41:24.480 --> 00:41:28.330 I could be sure to prompt at least once, just like this. 00:41:28.330 --> 00:41:30.840 I will ask the user for some number of votes 00:41:30.840 --> 00:41:34.080 and store it in this object called votes. 00:41:34.080 --> 00:41:38.130 But, now, I don't want to run this code, because this will make an infinite loop. 00:41:38.130 --> 00:41:41.100 I'll be constantly asking the user to enter in some number of votes. 00:41:41.100 --> 00:41:45.270 So I need some condition under which I would break out of this loop. 00:41:45.270 --> 00:41:51.060 And I think that condition might be if the votes we receive is not NA-- 00:41:51.060 --> 00:41:55.800 if we get back some valid number of votes, that will be not equal to NA. 00:41:55.800 --> 00:42:00.120 If we do get some weird input like duck, though, that will be NA, in which case 00:42:00.120 --> 00:42:01.800 we should keep looping. 00:42:01.800 --> 00:42:04.650 So let me ask the question, is votes-- 00:42:04.650 --> 00:42:08.310 in this case, is it not equal to-- 00:42:08.310 --> 00:42:11.190 is votes not equal to NA? 00:42:11.190 --> 00:42:12.400 Just like this. 00:42:12.400 --> 00:42:15.690 And if it's not, well, we're going to break out of this loop. 00:42:15.690 --> 00:42:17.340 We're going to say, look, we're done. 00:42:17.340 --> 00:42:21.090 Votes is not Na, we don't need to ask the user any more. 00:42:21.090 --> 00:42:27.038 Alternatively, if votes is NA, we could continue on to the next iteration. 00:42:27.038 --> 00:42:28.830 Now, there's one improvement here, which is 00:42:28.830 --> 00:42:32.880 that technically, when we get to the bottom of this repeat loop, 00:42:32.880 --> 00:42:35.580 get to its last curly brace in the body here, 00:42:35.580 --> 00:42:38.550 it will automatically go back up to the top, 00:42:38.550 --> 00:42:41.010 do the next iteration from the top of its body. 00:42:41.010 --> 00:42:45.240 So I'd argue that this extra next here isn't really needed. 00:42:45.240 --> 00:42:48.390 We're going to go to the next iteration regardless if we don't break out 00:42:48.390 --> 00:42:50.280 of this loop altogether. 00:42:50.280 --> 00:42:53.270 And now, I think, we have a good loop going. 00:42:53.270 --> 00:42:55.020 I'm going to ask the user for their votes. 00:42:55.020 --> 00:42:59.070 If, in other words, this is a valid vote, a valid number, 00:42:59.070 --> 00:43:01.740 we're going to break out of the loop, not prompt them anymore. 00:43:01.740 --> 00:43:03.070 And what could we do? 00:43:03.070 --> 00:43:04.830 Well, now we don't need to check if votes 00:43:04.830 --> 00:43:08.760 is NA, because if we get down to line eight we know votes is not NA. 00:43:08.760 --> 00:43:13.000 I could simply return votes overall. 00:43:13.000 --> 00:43:15.090 So here, I think, is a better implementation, 00:43:15.090 --> 00:43:17.220 one that will prompt the user again and again 00:43:17.220 --> 00:43:19.740 until they enter some number of votes that we actually want. 00:43:19.740 --> 00:43:23.280 Let me go ahead and click source, and let me type 100 votes for Mario. 00:43:23.280 --> 00:43:27.420 But now maybe I'll type duck for Peach. 00:43:27.420 --> 00:43:28.620 And I'm re-prompted. 00:43:28.620 --> 00:43:31.230 OK, maybe I type quack for Peach. 00:43:31.230 --> 00:43:32.430 I'm re-prompted. 00:43:32.430 --> 00:43:33.600 Maybe I'll now comply. 00:43:33.600 --> 00:43:37.545 I'll say, OK, 150 votes for Peach, and now I move on to Bowser. 00:43:37.545 --> 00:43:38.920 So this seems to be working here. 00:43:38.920 --> 00:43:44.280 I'll go ahead and type 120, and now I'll see my total votes was 370. 00:43:44.280 --> 00:43:48.540 Now, one more improvement is that it seems to me a little extraneous 00:43:48.540 --> 00:43:54.300 to break out of this loop and then return, because a return actually 00:43:54.300 --> 00:43:57.600 signifies that, no matter where we are in our function, 00:43:57.600 --> 00:44:01.530 we're going to stop the function altogether and return the value we have. 00:44:01.530 --> 00:44:05.250 So it seems to me like I could move this return from line eight 00:44:05.250 --> 00:44:07.740 to inside of this if statement. 00:44:07.740 --> 00:44:12.420 And now if votes is not NA, if we have some valid number of votes, 00:44:12.420 --> 00:44:15.600 we'll not just break and then return, we'll go ahead and just simply return. 00:44:15.600 --> 00:44:18.780 Because a return would, by nature, break us out of the loop anyway. 00:44:18.780 --> 00:44:21.430 We're going to stop this function altogether. 00:44:21.430 --> 00:44:22.810 So let's try this again. 00:44:22.810 --> 00:44:24.870 I'll save my program, click source. 00:44:24.870 --> 00:44:28.860 I'll type 100 for Mario, 150 for Peach, 120 for Bowser. 00:44:28.860 --> 00:44:32.790 And now I think we're in good hands if we have good input. 00:44:32.790 --> 00:44:34.050 But if I type source-- 00:44:34.050 --> 00:44:39.180 let me go ahead and do duck for Mario, prompt it again, maybe quack for Mario, 00:44:39.180 --> 00:44:41.190 maybe 100 for Mario. 00:44:41.190 --> 00:44:45.820 Now I think we're doing well with invalid input as well. 00:44:45.820 --> 00:44:47.430 So pretty good. 00:44:47.430 --> 00:44:52.380 One other thing we could do, though, is think about these lines, 10 through 12. 00:44:52.380 --> 00:44:55.830 Well, it seems to me like, for each candidate that I have, 00:44:55.830 --> 00:44:58.380 I want to get some number of votes for them. 00:44:58.380 --> 00:45:00.600 Notice how I said "for each candidate." 00:45:00.600 --> 00:45:03.540 Well, if we want to do something for each candidate, 00:45:03.540 --> 00:45:07.170 for each item in some list or some vector, well, a for loop 00:45:07.170 --> 00:45:09.473 might be a great tool for us here. 00:45:09.473 --> 00:45:11.640 Why don't I go ahead and try to make a for loop now? 00:45:11.640 --> 00:45:16.380 I'll say for-- as we saw before, this helper object called i. 00:45:16.380 --> 00:45:23.180 For i in-- well, I want to prompt the user for every candidate that I have. 00:45:23.180 --> 00:45:27.860 And although we just saw for loops being used with numeric vectors-- vectors that 00:45:27.860 --> 00:45:31.040 include 1, 2, 3, 4, and so on-- we can also 00:45:31.040 --> 00:45:33.410 use for loops with non-numeric vectors. 00:45:33.410 --> 00:45:36.650 I could give a vector of the candidates that I have and know that a for loop 00:45:36.650 --> 00:45:39.650 will loop over each candidate in that vector. 00:45:39.650 --> 00:45:41.370 So I could do something like this. 00:45:41.370 --> 00:45:44.540 I could say, for i in a vector of my candidates-- 00:45:44.540 --> 00:45:48.200 Mario, Peach, and then Bowser-- 00:45:48.200 --> 00:45:51.800 and then I'll provide the body of this for loop. 00:45:51.800 --> 00:45:55.100 Now, what do I want to do in this loop? 00:45:55.100 --> 00:46:00.290 Well, each loop i will first be assigned some new element to my vector. 00:46:00.290 --> 00:46:03.530 First, it will be Mario, then Peach, then-- 00:46:03.530 --> 00:46:06.330 not Boswer-- Bowser, just like this. 00:46:06.330 --> 00:46:09.470 And then I want to, in this case, ask the user 00:46:09.470 --> 00:46:11.600 for some number of votes on each iteration, 00:46:11.600 --> 00:46:14.060 first for Mario, then for Peach, then for Bowser. 00:46:14.060 --> 00:46:18.100 So I could probably simply call get_votes just like this, 00:46:18.100 --> 00:46:22.120 and maybe store it in some object called votes, like that. 00:46:22.120 --> 00:46:26.610 But now the question is, how would I show the user the right prompt? 00:46:26.610 --> 00:46:31.100 Like I can't type in Mario here, because then Mario 00:46:31.100 --> 00:46:33.600 would show up with the prompt on every iteration of my loop. 00:46:33.600 --> 00:46:35.790 I need something more dynamic than that. 00:46:35.790 --> 00:46:39.120 One thing I could do is take advantage of how 00:46:39.120 --> 00:46:44.820 this object i is actually assigned the value of each element on each iteration. 00:46:44.820 --> 00:46:48.420 So on the first iteration, i will be equal to Mario. 00:46:48.420 --> 00:46:51.300 On the second iteration, i will be equal to Peach. 00:46:51.300 --> 00:46:54.150 On the third iteration, i will be equal to Bowser 00:46:54.150 --> 00:46:56.790 so I could use that to my advantage. 00:46:56.790 --> 00:46:59.070 I could take the candidate name, let's say, 00:46:59.070 --> 00:47:03.300 and maybe add in dynamically this colon space with, 00:47:03.300 --> 00:47:05.280 let's say, paste0, like we've seen before. 00:47:05.280 --> 00:47:06.660 I could say paste0-- 00:47:06.660 --> 00:47:11.700 I want to paste together the candidate's name followed by colon space. 00:47:11.700 --> 00:47:15.810 So now, on each iteration, i will first be equal to Mario. 00:47:15.810 --> 00:47:19.380 We'll get votes for Mario by prompting the user for Mario's votes. 00:47:19.380 --> 00:47:22.110 Then, on the next iteration, i will be Peach, 00:47:22.110 --> 00:47:25.320 will prompt the user for Peach's votes, followed by a colon space. 00:47:25.320 --> 00:47:27.540 And then the same thing for Bowser. 00:47:27.540 --> 00:47:32.070 So I think I could get rid of, let's say, this code down below here. 00:47:32.070 --> 00:47:35.130 But what am I left with? 00:47:35.130 --> 00:47:40.920 Well, it seems like on line 14 I was summing up Mario, Peach, and Bowser, 00:47:40.920 --> 00:47:44.490 but those objects don't exist for me anymore. 00:47:44.490 --> 00:47:47.220 I only have this one value now called votes, which 00:47:47.220 --> 00:47:50.310 seems to get changed every iteration. 00:47:50.310 --> 00:47:54.660 The first it will be Mario's votes, the next iteration it will be Peach's votes, 00:47:54.660 --> 00:47:57.240 the next it will be Bowser's votes. 00:47:57.240 --> 00:48:02.400 What ideas do we have for how to solve this problem? 00:48:02.400 --> 00:48:06.420 Any ideas for how we could maybe count up these votes 00:48:06.420 --> 00:48:09.510 while we go through our loop? 00:48:09.510 --> 00:48:14.490 AUDIENCE: First I think we should put it inside the for loop or return the sum. 00:48:14.490 --> 00:48:15.990 CARTER ZENKE: Yeah, so, a good idea. 00:48:15.990 --> 00:48:18.270 We still want to return their sum, and what 00:48:18.270 --> 00:48:21.180 you're thinking about trying to do this within the for loop. 00:48:21.180 --> 00:48:25.650 One thing that comes to mind is maybe trying to keep a running sum. 00:48:25.650 --> 00:48:29.550 That is, let's first get Mario's votes, add them to our total, 00:48:29.550 --> 00:48:33.750 then get Peach's votes, add those to our total, then get Bowser's votes, 00:48:33.750 --> 00:48:34.710 add those to our total. 00:48:34.710 --> 00:48:39.000 And at the end of our loop, we will have a total number of votes to count up. 00:48:39.000 --> 00:48:42.550 So let's see this in action in R. I'll come back to RStudio here. 00:48:42.550 --> 00:48:46.800 And if I can't have separate objects now for Mario, Peach, and Bowser, 00:48:46.800 --> 00:48:48.360 well, no problem. 00:48:48.360 --> 00:48:52.260 What I could do instead is start my count a little bit earlier. 00:48:52.260 --> 00:48:55.350 Maybe I'll set total initially equal to 0. 00:48:55.350 --> 00:48:58.740 So before I loop, I assume, well, there are 0 votes. 00:48:58.740 --> 00:49:02.310 But then, on each iteration of my loop, what will I do? 00:49:02.310 --> 00:49:05.760 I'll ask the user for some number of votes for the candidate, 00:49:05.760 --> 00:49:07.860 whether it's Mario, Peach, or Bowser. 00:49:07.860 --> 00:49:12.690 And then, down below, I'll add those votes to the total. 00:49:12.690 --> 00:49:19.080 I will update total to include the total plus the new votes we've received. 00:49:19.080 --> 00:49:21.708 So I think I could get rid of now line 16. 00:49:21.708 --> 00:49:24.250 And let's think through what this is doing line by line here. 00:49:24.250 --> 00:49:26.340 Well, first, total is 0. 00:49:26.340 --> 00:49:30.690 And if I go into my loop now, i will first be equal to Mario 00:49:30.690 --> 00:49:32.160 on this first iteration. 00:49:32.160 --> 00:49:34.350 So I'll prompt the user for Mario's votes 00:49:34.350 --> 00:49:36.330 and store them in this object called votes. 00:49:36.330 --> 00:49:37.740 Let's say it's 100. 00:49:37.740 --> 00:49:42.330 On line 13, I take this object called total and update it. 00:49:42.330 --> 00:49:43.890 I add Mario's votes to it. 00:49:43.890 --> 00:49:47.790 If Mario had 100 votes, total will be now 100. 00:49:47.790 --> 00:49:48.900 Then I'll move on. 00:49:48.900 --> 00:49:52.650 I will then become Peach, and I'll ask for Peach's votes now. 00:49:52.650 --> 00:49:56.310 Well, if Peach's votes is 150, on line 13 00:49:56.310 --> 00:49:59.580 I'll again say 100, which is the current value of total, 00:49:59.580 --> 00:50:03.302 plus 150, that's the new value of total, so 250. 00:50:03.302 --> 00:50:05.010 And you can see how we're kind of keeping 00:50:05.010 --> 00:50:08.310 a running track of our number of votes for each candidate. 00:50:08.310 --> 00:50:10.080 We'll do the same for Bowser, and I think 00:50:10.080 --> 00:50:14.460 at the end of this we will have a total number of votes for every candidate. 00:50:14.460 --> 00:50:16.700 So let me go ahead and click source now. 00:50:16.700 --> 00:50:19.910 And I'll see, if I type in 100 for Mario, 150 for Peach, 00:50:19.910 --> 00:50:26.070 and 120 for Bowser, well, we still now have our total, but now using this loop. 00:50:26.070 --> 00:50:30.080 So I'd argue we've made our program a little more efficient using these loops, 00:50:30.080 --> 00:50:33.810 and easier to read, easier to change as well. 00:50:33.810 --> 00:50:38.570 Now, what questions do we have about this program as we've written it? 00:50:38.570 --> 00:50:40.580 We've added in a few loops. 00:50:40.580 --> 00:50:43.370 We have a repeat loop and a for loop. 00:50:43.370 --> 00:50:47.060 What other questions do we have about this program? 00:50:47.060 --> 00:50:49.310 Seeing none so far, so let's keep going here. 00:50:49.310 --> 00:50:51.440 And one thing that we can do with these loops 00:50:51.440 --> 00:50:53.930 is think about how we could apply them to other problems. 00:50:53.930 --> 00:50:56.570 So one problem we saw a little bit earlier 00:50:56.570 --> 00:50:59.120 was this problem of working with a table of data 00:50:59.120 --> 00:51:00.953 that had our candidate's votes in it. 00:51:00.953 --> 00:51:02.870 So, if you recall, we had a table looked a bit 00:51:02.870 --> 00:51:05.180 like this, where for each candidate we had 00:51:05.180 --> 00:51:08.510 the number of votes they received at the polls, this physical location, 00:51:08.510 --> 00:51:11.300 and the number of votes they received in the mail. 00:51:11.300 --> 00:51:15.740 So it seems like Mario received 37 votes at the polls, the physical location, 00:51:15.740 --> 00:51:18.110 and 63 votes at the mail. 00:51:18.110 --> 00:51:21.920 But then our question was, well, how many votes did each candidate receive? 00:51:21.920 --> 00:51:25.670 That is, for each candidate, what was the sum of their votes? 00:51:25.670 --> 00:51:29.970 And then for each voting method, like poll or mail, well, 00:51:29.970 --> 00:51:33.330 how many votes did we receive overall in those columns too? 00:51:33.330 --> 00:51:37.370 So to visualize-- let me grab my clicker over here-- to visualize, 00:51:37.370 --> 00:51:40.130 let's say that we wanted to find Mario's total votes. 00:51:40.130 --> 00:51:43.198 Well, we would just sum up the row for Mario here. 00:51:43.198 --> 00:51:46.490 And then if we want to define the total number of votes we received at the poll 00:51:46.490 --> 00:51:50.700 or in the mail, we would sum up each value in these columns here. 00:51:50.700 --> 00:51:55.970 So notice how, again, we're saying for each candidate, or for each column. 00:51:55.970 --> 00:51:57.580 We want to sum up those votes. 00:51:57.580 --> 00:52:02.390 Well, we could probably use a for loop to accomplish this same task now. 00:52:02.390 --> 00:52:04.550 Let's go back to R and see how this could work. 00:52:04.550 --> 00:52:05.960 I'll come to RStudio. 00:52:05.960 --> 00:52:10.530 And let's make a new program, one that is called tabulate. 00:52:10.530 --> 00:52:11.780 Let me go ahead and actually-- 00:52:11.780 --> 00:52:13.072 I think I have it open already. 00:52:13.072 --> 00:52:16.550 I'll click on tabulate here, and I'll see a blank file called 00:52:16.550 --> 00:52:22.850 tabulate.R. Now, my goal is to read in this csv of votes that we have, 00:52:22.850 --> 00:52:25.070 one called votes.csv. 00:52:25.070 --> 00:52:30.290 So I'll use read.csv, and I'll try to open votes.csv. 00:52:30.290 --> 00:52:32.150 If you look in my File Explorer here, you'll 00:52:32.150 --> 00:52:35.600 see I do have a file called votes.csv. 00:52:35.600 --> 00:52:38.630 Now let me click source here to run this program, 00:52:38.630 --> 00:52:43.010 and I should now be able to view votes the data frame. 00:52:43.010 --> 00:52:47.630 So a similar thing to what we've seen earlier, but one thing is now different. 00:52:47.630 --> 00:52:52.700 Notice how in a prior lecture we saw that we had a column called candidates. 00:52:52.700 --> 00:52:56.840 Well, now what we've done is we've decided that the row names for this data 00:52:56.840 --> 00:52:58.610 frame are the candidates themselves. 00:52:58.610 --> 00:53:02.840 So Mario is the name of this first row, Peach is the name of the second, 00:53:02.840 --> 00:53:04.310 and Bowser is the third. 00:53:04.310 --> 00:53:08.180 This allows us to define our data frame as exclusively numbers. 00:53:08.180 --> 00:53:12.470 We could sum, like, 37, 63, 43, 107. 00:53:12.470 --> 00:53:16.130 And, moreover, it allows us to better subset our data frame, 00:53:16.130 --> 00:53:18.210 as we'll see in just a bit. 00:53:18.210 --> 00:53:21.260 So let's say my goal, at first, is to sum up 00:53:21.260 --> 00:53:25.460 the number of votes for each candidate across both the poll and the mail. 00:53:25.460 --> 00:53:29.900 Well, in tabulate.R, I could start by doing that by making a for loop, 00:53:29.900 --> 00:53:32.180 doing something for each candidate. 00:53:32.180 --> 00:53:39.440 So I could say, for candidate, let's say, in row names votes, 00:53:39.440 --> 00:53:43.310 just like this, and get a body for this loop. 00:53:43.310 --> 00:53:46.550 And what I've done here is I've decided that I 00:53:46.550 --> 00:53:49.370 no longer need to call this value i. 00:53:49.370 --> 00:53:50.390 Could call it candidate. 00:53:50.390 --> 00:53:52.140 I could call it really anything I want to, 00:53:52.140 --> 00:53:55.730 and I could use that inside of my loop here. 00:53:55.730 --> 00:53:59.630 The other thing I've decided is that instead of defining a list 00:53:59.630 --> 00:54:03.980 of the candidates-- in this case Mario, Peach, and Bowser-- 00:54:03.980 --> 00:54:06.410 I could be more dynamic than that. 00:54:06.410 --> 00:54:09.470 I could decide to tell R that it should tell me 00:54:09.470 --> 00:54:12.020 what my row names are, what my candidates are, 00:54:12.020 --> 00:54:14.040 and allow it to iterate over those. 00:54:14.040 --> 00:54:17.930 So I'll get ask for the row names of the votes data frame. 00:54:17.930 --> 00:54:20.330 If I actually see them down in my console below, 00:54:20.330 --> 00:54:24.290 I'll see that I get a vector of Mario, Peach, and Bowser. 00:54:24.290 --> 00:54:28.010 So the same structure for our loop now, but different ways 00:54:28.010 --> 00:54:32.210 of asking for a helper object to iterate with, and an actual vector 00:54:32.210 --> 00:54:35.130 of, in this case, candidate names. 00:54:35.130 --> 00:54:38.330 So now we have a loop to go over every candidate's name, 00:54:38.330 --> 00:54:42.140 and our next goal is to find out how many votes each candidate received 00:54:42.140 --> 00:54:44.600 across all of their columns here. 00:54:44.600 --> 00:54:47.720 Now, the first thing to do might be to subset my data 00:54:47.720 --> 00:54:52.640 frame, to figure out for each candidate which rows correspond to that candidate. 00:54:52.640 --> 00:54:56.960 Now, we saw last time ways to subset data frames using the subset function. 00:54:56.960 --> 00:55:01.070 But now that we actually have this row name being 00:55:01.070 --> 00:55:05.570 equal to the candidate's name, we can make this even more efficient. 00:55:05.570 --> 00:55:07.310 Let me visualize this for you here. 00:55:07.310 --> 00:55:10.010 If we have our data frame called votes and I 00:55:10.010 --> 00:55:12.590 want to find all of Mario's votes-- 00:55:12.590 --> 00:55:16.690 well, if Mario is the row name for one of my rows in my data frame, 00:55:16.690 --> 00:55:19.510 I could simply use the name Mario in the place 00:55:19.510 --> 00:55:21.970 I would normally put the row's index. 00:55:21.970 --> 00:55:23.590 For instance, like this. 00:55:23.590 --> 00:55:27.430 If I say votes bracket Mario as the character string, 00:55:27.430 --> 00:55:30.190 because Mario is the name of one of my row names, 00:55:30.190 --> 00:55:33.760 I'll then get back the row corresponding to the name Mario. 00:55:33.760 --> 00:55:36.790 And same thing with Peach, and same thing with Bowser. 00:55:36.790 --> 00:55:39.310 So we're very quickly now subsetting our data 00:55:39.310 --> 00:55:43.870 to find each candidate's rows and their number of votes across all the columns 00:55:43.870 --> 00:55:44.750 here. 00:55:44.750 --> 00:55:46.300 Let's come back and try this out. 00:55:46.300 --> 00:55:48.910 I will show you in the console that, indeed 00:55:48.910 --> 00:55:54.190 if I do type votes bracket Mario and then comma space 00:55:54.190 --> 00:55:59.140 to say I want all columns, but only the row associated with the name Mario, 00:55:59.140 --> 00:56:01.840 well, I'll get back a single row from this data 00:56:01.840 --> 00:56:04.900 frame that includes Mario's votes. 00:56:04.900 --> 00:56:10.180 So if I can do this at least in my console now with particular names, 00:56:10.180 --> 00:56:14.680 I bet I could do it in my for loop where candidate will stand in 00:56:14.680 --> 00:56:16.430 for any given candidate's name. 00:56:16.430 --> 00:56:18.800 First it will be Mario, then it will be Peach, 00:56:18.800 --> 00:56:21.800 then it will be Bowser on each successive iteration. 00:56:21.800 --> 00:56:26.600 So to subset this data, I could use votes, the data frame's name, 00:56:26.600 --> 00:56:30.980 followed by brackets, followed by the row name, in this case candidate 00:56:30.980 --> 00:56:34.610 on each iteration, updating, and then comma space, 00:56:34.610 --> 00:56:38.270 saying I want all columns for whatever row 00:56:38.270 --> 00:56:43.710 corresponds to this candidate's name, on whatever iteration is that we're on. 00:56:43.710 --> 00:56:46.010 So now that I have this working for me, I 00:56:46.010 --> 00:56:48.830 could probably put this into the function sum 00:56:48.830 --> 00:56:51.950 to get back the total number of votes across all the columns. 00:56:51.950 --> 00:56:55.040 If you give sum a data frame of one row, it 00:56:55.040 --> 00:56:58.670 will sum up all the values in that given row. 00:56:58.670 --> 00:57:04.220 OK, so now I seem to have the sum for each candidate's votes, 00:57:04.220 --> 00:57:07.650 but I still need some place to store it to look at it later. 00:57:07.650 --> 00:57:12.410 So one thing I could do is make this object called total votes, just 00:57:12.410 --> 00:57:13.400 like this. 00:57:13.400 --> 00:57:16.960 But what's the problem now? 00:57:16.960 --> 00:57:22.810 If I were to run this code top to bottom, what might I lose? 00:57:22.810 --> 00:57:26.470 And a question here is, what might the value of total votes 00:57:26.470 --> 00:57:30.370 be if I were to look at it at the very end of my loop? 00:57:30.370 --> 00:57:32.900 Any ideas here? 00:57:32.900 --> 00:57:35.600 Why can't I just leave my code like this, 00:57:35.600 --> 00:57:39.410 and what might be the last value of total votes, do you think? 00:57:39.410 --> 00:57:43.510 AUDIENCE: OK, the last value will be the sum of votes 00:57:43.510 --> 00:57:47.185 of the last candidate, the last one. 00:57:47.185 --> 00:57:49.810 CARTER ZENKE: Yeah, the last candidate that we have in our list 00:57:49.810 --> 00:57:52.030 would be the final value of total votes. 00:57:52.030 --> 00:57:53.620 So let's actually test this out. 00:57:53.620 --> 00:57:56.380 If I go back to my RStudio here-- 00:57:56.380 --> 00:58:00.670 and why don't I run this code by clicking source? 00:58:00.670 --> 00:58:04.730 And now let me check on the value of total votes, just like this. 00:58:04.730 --> 00:58:07.510 Well, total votes seems to be 120. 00:58:07.510 --> 00:58:08.800 Who had 120 votes? 00:58:08.800 --> 00:58:11.020 Seems like it was Bowser. 00:58:11.020 --> 00:58:14.530 But why is total votes equal to Bowser's total votes? 00:58:14.530 --> 00:58:17.960 Well, let's think about this going top to bottom through our loop. 00:58:17.960 --> 00:58:22.580 First, candidate is equal to Mario, and we'll subset our data frame 00:58:22.580 --> 00:58:23.870 to find Mario's votes. 00:58:23.870 --> 00:58:27.890 We'll sum those up across all the columns and store it now in total votes. 00:58:27.890 --> 00:58:30.110 But then we'll go on to the next iteration. 00:58:30.110 --> 00:58:34.460 Candidates will next be Peach, and we'll subset our data frame to find 00:58:34.460 --> 00:58:36.830 Peach's votes, sum them across all the columns, 00:58:36.830 --> 00:58:41.660 and effectively overwrite Mario's votes with Peach's. 00:58:41.660 --> 00:58:44.090 So now total votes is Peach's total votes. 00:58:44.090 --> 00:58:46.190 But then when Bowser comes along, well, Bowser 00:58:46.190 --> 00:58:47.643 will also overwrite Peach's votes. 00:58:47.643 --> 00:58:49.310 At the end of our loop, what do we have? 00:58:49.310 --> 00:58:54.090 Well, only one candidate's votes, and not all of them. 00:58:54.090 --> 00:58:56.420 So it seems like we need some way of making 00:58:56.420 --> 00:59:01.280 a vector of these actual total votes, and we could 00:59:01.280 --> 00:59:03.980 do that using some new syntax in R. 00:59:03.980 --> 00:59:07.850 One thing I could do is initially make an empty vector, 00:59:07.850 --> 00:59:11.960 just like this, total votes, and set it equal to this, 00:59:11.960 --> 00:59:13.910 C followed by some parentheses. 00:59:13.910 --> 00:59:17.480 This is that same C function we saw earlier, but it means the empty vector. 00:59:17.480 --> 00:59:19.460 Nothing, at least at first. 00:59:19.460 --> 00:59:24.800 And then in my loop I bet we could add to this vector so we get back at the end 00:59:24.800 --> 00:59:29.300 not any single candidate's votes, but a whole vector of their votes. 00:59:29.300 --> 00:59:31.070 Now, we'll need some new syntax for this, 00:59:31.070 --> 00:59:33.830 and some new feature we haven't seen yet in R. 00:59:33.830 --> 00:59:37.050 But let's visualize what we could do with that syntax. 00:59:37.050 --> 00:59:40.280 So here is a visualization of the empty vector total votes. 00:59:40.280 --> 00:59:43.020 There's nothing here, because this is an empty vector right now. 00:59:43.020 --> 00:59:46.400 But if I wanted to add some new element to it-- 00:59:46.400 --> 00:59:49.610 and not just add the element but give it some name too-- 00:59:49.610 --> 00:59:50.960 I could certainly do that. 00:59:50.960 --> 00:59:55.580 I could say total votes, bracket, and the name I want to give this element, 00:59:55.580 --> 00:59:58.880 and then assign the value for that element. 00:59:58.880 --> 01:00:02.660 So if I want to add to this vector total votes an element named 01:00:02.660 --> 01:00:08.190 Mario that has the value 100, well, I could do it using this syntax here. 01:00:08.190 --> 01:00:10.850 Well, what if I want to later add Peach's votes? 01:00:10.850 --> 01:00:14.300 Let's imagine this is the next iteration of our for loop. 01:00:14.300 --> 01:00:16.645 I would say, total votes, bracket, Peach. 01:00:16.645 --> 01:00:18.770 And that would then make a new element to my vector 01:00:18.770 --> 01:00:21.410 called Peach with the value 150. 01:00:21.410 --> 01:00:23.750 And same, let's say, for Bowser. 01:00:23.750 --> 01:00:26.390 On my next iteration, I will add Bowser's votes. 01:00:26.390 --> 01:00:28.820 And now, at the end of my loop, let's say, 01:00:28.820 --> 01:00:33.920 I now have a vector of Mario, Peach, and Bowser's votes all together now. 01:00:33.920 --> 01:00:35.280 So let's try it. 01:00:35.280 --> 01:00:38.060 I'll come back to RStudio here, and let's try 01:00:38.060 --> 01:00:42.470 using this process of adding named elements to our vectors. 01:00:42.470 --> 01:00:47.540 Well, on line five I might say that I want to add to total votes 01:00:47.540 --> 01:00:52.790 a new element whose name is, well, whatever the candidate's name is 01:00:52.790 --> 01:00:53.720 on any iteration. 01:00:53.720 --> 01:00:56.810 So I'll say, total votes, bracket, candidate-- 01:00:56.810 --> 01:01:00.680 meaning that whatever the candidate's name is on this iteration, 01:01:00.680 --> 01:01:03.770 I want to add a new element with that name-- 01:01:03.770 --> 01:01:07.730 and I'll give it the value of the sum of their votes. 01:01:07.730 --> 01:01:13.160 So if I click source, and now I go ahead and inspect the value of total votes 01:01:13.160 --> 01:01:17.640 by typing in my console and hitting enter, I'll see a much better output. 01:01:17.640 --> 01:01:20.970 I actually see each candidate's name in my vector, 01:01:20.970 --> 01:01:26.970 and I see the value now that they were assigned, 100 for Mario, 150 for Peach, 01:01:26.970 --> 01:01:29.910 and 120 for Bowser. 01:01:29.910 --> 01:01:34.440 So what questions do we have about this program as it exists now? 01:01:34.440 --> 01:01:40.530 AUDIENCE: Could we, instead of adding the total votes in a named vector, 01:01:40.530 --> 01:01:45.513 could we add a new column to our votes data frame? 01:01:45.513 --> 01:01:46.930 CARTER ZENKE: We absolutely could. 01:01:46.930 --> 01:01:52.620 So we could decide instead to make a new vector and add that as a column 01:01:52.620 --> 01:01:54.060 to our data frame. 01:01:54.060 --> 01:01:56.310 It just depends on what kind of output you want to do. 01:01:56.310 --> 01:01:58.770 So, here, we wanted this output of a named vector, 01:01:58.770 --> 01:02:03.540 but we could change this to, let's say, not supply a name to each element, 01:02:03.540 --> 01:02:06.545 and instead just add some new element after element, much like this. 01:02:06.545 --> 01:02:08.170 Let me show you that real briefly here. 01:02:08.170 --> 01:02:09.540 I'll come back to RStudio. 01:02:09.540 --> 01:02:12.930 And let's say I don't want to give it some name, 01:02:12.930 --> 01:02:15.900 I just want to kind of add in successive elements here. 01:02:15.900 --> 01:02:21.090 I could say total votes becomes the combination 01:02:21.090 --> 01:02:27.160 of the current state of total votes, adding in this new element here. 01:02:27.160 --> 01:02:30.030 So a little bit tricky to parse, but let's see what happens here. 01:02:30.030 --> 01:02:34.260 I'll click source, and I'll show you the value of total votes 01:02:34.260 --> 01:02:35.920 now, just like this. 01:02:35.920 --> 01:02:37.140 And what do I get back? 01:02:37.140 --> 01:02:40.957 Well, a total votes vector that is Mario's votes, then Peach's votes, 01:02:40.957 --> 01:02:41.790 then Bowser's votes. 01:02:41.790 --> 01:02:45.390 And I could, if I wanted to, add this as a column in my data frame. 01:02:45.390 --> 01:02:49.350 I could say votes total, let's say, and then I could say, 01:02:49.350 --> 01:02:53.260 make that the value of this vector here. 01:02:53.260 --> 01:02:55.770 So now if I run source-- 01:02:55.770 --> 01:03:00.870 so I click source, I should see that I have this new column called total. 01:03:00.870 --> 01:03:04.770 And effectively what I've done here is-- if I remove this part first-- 01:03:04.770 --> 01:03:09.390 is I've decided to start with this empty vector and then, on each iteration, 01:03:09.390 --> 01:03:12.210 I want to take whatever is in that current vector 01:03:12.210 --> 01:03:14.880 and simply append or add on some new element, which 01:03:14.880 --> 01:03:17.460 will be the sum of the current candidate's votes. 01:03:17.460 --> 01:03:20.820 So, on the first iteration, we'll add our very first element 01:03:20.820 --> 01:03:22.320 to this empty vector here. 01:03:22.320 --> 01:03:25.500 But then on the next iteration, we'll have a vector of one element 01:03:25.500 --> 01:03:28.140 and we'll add in one more element, and on the third, 01:03:28.140 --> 01:03:30.730 add the third, and the fourth, add the fourth, and so on. 01:03:30.730 --> 01:03:35.740 So a good way to add vectors together using C as well. 01:03:35.740 --> 01:03:37.590 I hope that helps. 01:03:37.590 --> 01:03:40.530 OK, so let's go back to what we had before here. 01:03:40.530 --> 01:03:45.450 Let me do command Z a few times to go back to our named vector, 01:03:45.450 --> 01:03:47.430 and let's see what else we could do. 01:03:47.430 --> 01:03:51.270 So if I click source, I'll see total votes again, 01:03:51.270 --> 01:03:52.740 exactly as you want it to be. 01:03:52.740 --> 01:03:56.250 But what if I wanted to sum up the columns too to figure out, 01:03:56.250 --> 01:03:58.830 for each voting method, how many votes did we receive? 01:03:58.830 --> 01:04:03.610 That is, for each poll and mail column, how many votes were there in those? 01:04:03.610 --> 01:04:05.640 Well, I could really just change my for loop. 01:04:05.640 --> 01:04:09.970 Instead of using row names, I could iterate over column names. 01:04:09.970 --> 01:04:12.810 So column names will tell me-- if I click on the console-- 01:04:12.810 --> 01:04:18.000 column names will tell me, what columns do I have inside this data frame? 01:04:18.000 --> 01:04:20.700 And I could then change my loop appropriately. 01:04:20.700 --> 01:04:25.110 Instead of calling each column candidate on each iteration, 01:04:25.110 --> 01:04:27.450 I could call it maybe, like, voting method 01:04:27.450 --> 01:04:30.360 for the polling or the mail-in ballots, and then I 01:04:30.360 --> 01:04:32.620 could change how I subset my data frame. 01:04:32.620 --> 01:04:37.600 Instead of subsetting by row, I could subset now by column, like this. 01:04:37.600 --> 01:04:41.640 And I could then update the name of each of these elements 01:04:41.640 --> 01:04:44.910 to instead be the same name as the method we're counting. 01:04:44.910 --> 01:04:48.720 So pretty much the same idea, same flow, but just a different 01:04:48.720 --> 01:04:50.610 process across columns now. 01:04:50.610 --> 01:04:53.730 If I click source, I'll see in total votes 01:04:53.730 --> 01:04:58.470 that I've now counted up the total number of votes for each column. 01:04:58.470 --> 01:05:03.210 OK, so it turns out that this method of doing things in R, 01:05:03.210 --> 01:05:07.380 this kind of analysis-- applying some function for every row and for every 01:05:07.380 --> 01:05:07.890 column-- 01:05:07.890 --> 01:05:11.640 is so common in R that we actually have some family of functions 01:05:11.640 --> 01:05:14.670 we can use to do that same analysis. 01:05:14.670 --> 01:05:18.780 And as we move from this world of writing procedures-- that is, 01:05:18.780 --> 01:05:21.180 specifying a loop like this and specifying everything 01:05:21.180 --> 01:05:25.020 we should do inside that loop-- to relying more on functions, 01:05:25.020 --> 01:05:28.920 we'll enter into this world called functional programming, where 01:05:28.920 --> 01:05:31.500 in functional programming we can actually use functions 01:05:31.500 --> 01:05:34.650 to do the work of iteration for us. 01:05:34.650 --> 01:05:38.070 Now, one common hallmark of functional programming 01:05:38.070 --> 01:05:42.670 is applying some function across these individual rows and individual columns. 01:05:42.670 --> 01:05:45.660 So R gives us this function called apply. 01:05:45.660 --> 01:05:49.770 And if I wanted to have the same result we just saw with our for loop but now 01:05:49.770 --> 01:05:53.520 using this function, I could use the following syntax. 01:05:53.520 --> 01:05:57.600 I could use this function called apply and give it three arguments. 01:05:57.600 --> 01:06:00.430 The first one is the data frame to work with. 01:06:00.430 --> 01:06:02.800 In this case, votes, as we see here. 01:06:02.800 --> 01:06:05.380 The next one is one called MARGIN. 01:06:05.380 --> 01:06:07.300 And MARGIN stands for-- 01:06:07.300 --> 01:06:13.270 if I want to apply this function across all of the rows or all of the columns 01:06:13.270 --> 01:06:17.140 here-- when MARGIN is 1, that means apply some function 01:06:17.140 --> 01:06:18.820 across all of the rows. 01:06:18.820 --> 01:06:23.980 When MARGIN is 2, that means apply this function across all of the columns here. 01:06:23.980 --> 01:06:28.808 And then, finally, the third argument to apply is a function itself. 01:06:28.808 --> 01:06:30.850 And this is a hallmark of functional programming. 01:06:30.850 --> 01:06:34.540 We can pass functions as input to other functions. 01:06:34.540 --> 01:06:37.180 In this case, we're telling apply, here is the function 01:06:37.180 --> 01:06:41.500 I want you to use to basically work across all of these rows 01:06:41.500 --> 01:06:43.670 and all of these columns here. 01:06:43.670 --> 01:06:46.750 So when MARGIN is equal to 1, what will happen? 01:06:46.750 --> 01:06:49.990 When I apply the function sum, well, for every row 01:06:49.990 --> 01:06:54.280 I will get back a sum of every element in that row. 01:06:54.280 --> 01:06:57.520 When MARGIN, though, is 2, what will I get? 01:06:57.520 --> 01:07:01.750 I'll get back the sum of every element inside each of these columns here, 01:07:01.750 --> 01:07:05.300 storing it, let's say, at the bottom of our data frame here. 01:07:05.300 --> 01:07:07.390 So let's see this in action. 01:07:07.390 --> 01:07:10.990 I'll come back to RStudio here, and let's try to use these apply 01:07:10.990 --> 01:07:14.890 functions instead of doing things more procedurally, typing a loop 01:07:14.890 --> 01:07:17.260 and then everything we want to do inside of that loop. 01:07:17.260 --> 01:07:20.260 I argue I could actually write all this code in terms 01:07:20.260 --> 01:07:23.110 of a single function call using apply. 01:07:23.110 --> 01:07:29.830 Well, I want to apply this function on a given data frame, votes, as we just saw. 01:07:29.830 --> 01:07:33.940 I want to apply it across all of my rows, that is, for every candidate 01:07:33.940 --> 01:07:36.070 that I have in my data frame. 01:07:36.070 --> 01:07:39.700 And the function I want to apply is the sum function. 01:07:39.700 --> 01:07:43.180 I want to take all of these rows and, for each row, 01:07:43.180 --> 01:07:46.000 I want to sum up all of those values. 01:07:46.000 --> 01:07:51.040 Now, if I run this line of code on line two, what will I get? 01:07:51.040 --> 01:07:54.700 The same exact result. I'll get, now, a named vector-- 01:07:54.700 --> 01:07:59.050 Mario, Peach, and Bowser, these three elements here, with Mario being 100, 01:07:59.050 --> 01:08:02.110 Peach being 150, and Bowser being 120. 01:08:02.110 --> 01:08:06.250 Notice how, if I go back to my data frame, this is the same thing we had, 01:08:06.250 --> 01:08:10.090 where for every row I've now found the sum, and apply 01:08:10.090 --> 01:08:13.120 has returned to me the name of that row and the result 01:08:13.120 --> 01:08:16.660 of summing all the values in that row. 01:08:16.660 --> 01:08:17.979 Let's think about this too. 01:08:17.979 --> 01:08:20.350 What if I changed MARGIN to 2? 01:08:20.350 --> 01:08:25.090 Well, this would find me the sum of every individual column, all the values 01:08:25.090 --> 01:08:26.890 within those individual columns. 01:08:26.890 --> 01:08:29.890 Whereas before we had to change row names to column names 01:08:29.890 --> 01:08:33.970 and change various other objects, now I can simply change 1 to 2 01:08:33.970 --> 01:08:35.920 to work on columns here. 01:08:35.920 --> 01:08:40.149 Let me click source, and now I'll see, if I go ahead and hit line two, 01:08:40.149 --> 01:08:44.770 command enter, now I get back that same result, the names of each of my columns 01:08:44.770 --> 01:08:49.660 and the result of summing up all of their values here. 01:08:49.660 --> 01:08:54.430 OK, so we've seen a much better way now to approach this same problem. 01:08:54.430 --> 01:08:57.732 Instead of doing things procedurally-- writing a loop and saying exactly what 01:08:57.732 --> 01:08:59.649 should happen in each iteration of that loop-- 01:08:59.649 --> 01:09:03.130 I can rely on a function like apply do a lot of that work for me. 01:09:03.130 --> 01:09:06.040 And, moreover, I can pass a function as input 01:09:06.040 --> 01:09:12.250 to apply for it to use on each iteration that it goes through in my data frame. 01:09:12.250 --> 01:09:15.670 Now, let me ask here, having seen these apply functions 01:09:15.670 --> 01:09:18.910 and how they work, what questions do we have about them? 01:09:18.910 --> 01:09:23.470 AUDIENCE: My question, instead of using the procedural approach of, like, 01:09:23.470 --> 01:09:28.330 sorting and un-sorting, are there any existing functions 01:09:28.330 --> 01:09:32.950 that I can use it on the data frame for sorting the datas in the rows 01:09:32.950 --> 01:09:33.479 or columns? 01:09:33.479 --> 01:09:34.729 CARTER ZENKE: A good question. 01:09:34.729 --> 01:09:37.270 So one thing you might want to do is sort your data. 01:09:37.270 --> 01:09:40.283 R does come with a function called sort that can do just that. 01:09:40.283 --> 01:09:41.950 Let me show you a little bit of it here. 01:09:41.950 --> 01:09:46.810 If I come back to RStudio, let's say I want to this vector 01:09:46.810 --> 01:09:48.550 I'm given from apply. 01:09:48.550 --> 01:09:52.220 I could call this vector something like total_votes, just like this. 01:09:52.220 --> 01:09:54.940 And let me run line one and then line two. 01:09:54.940 --> 01:10:00.800 Now I have total votes being this vector of named elements across my columns. 01:10:00.800 --> 01:10:03.140 Let's say I wanted to sort these. 01:10:03.140 --> 01:10:03.980 Let me say-- 01:10:03.980 --> 01:10:08.150 I could use the sort function here, and I could type total_votes inside 01:10:08.150 --> 01:10:08.900 as the input to. 01:10:08.900 --> 01:10:13.220 And now, if I hit command enter on sort, I should see-- well, 01:10:13.220 --> 01:10:17.090 it's kind of already in sorted order, at least going low to high-- 01:10:17.090 --> 01:10:21.620 but now if I type question mark sort to see how I could change the order here, 01:10:21.620 --> 01:10:25.700 I might see that sort has this parameter called decreasing, 01:10:25.700 --> 01:10:27.890 which is initially false, which means that we're 01:10:27.890 --> 01:10:29.810 going to count up instead of down. 01:10:29.810 --> 01:10:34.580 But now if I want to sort going low to high or increasing order, 01:10:34.580 --> 01:10:37.250 I could set decreasing-- 01:10:37.250 --> 01:10:41.780 sorry, no, if I want to set the vector going from high to low, 01:10:41.780 --> 01:10:45.260 let's say, instead of low to high, I could set decreasing equal to true. 01:10:45.260 --> 01:10:49.100 And then we'll see this vector is now in sorted order. 01:10:49.100 --> 01:10:52.160 I could do the same for every candidate that I have. 01:10:52.160 --> 01:10:54.710 Let me update total votes across the columns. 01:10:54.710 --> 01:10:57.930 Let me now on line three run sort. 01:10:57.930 --> 01:11:01.840 And now I have my candidates in sorted order as well. 01:11:01.840 --> 01:11:05.880 So a cool trick if you want to your data now using this sort function. 01:11:05.880 --> 01:11:08.220 And you can change whether it goes up or down 01:11:08.220 --> 01:11:10.690 using this decreasing parameter here. 01:11:10.690 --> 01:11:12.510 So we've seen a lot today. 01:11:12.510 --> 01:11:15.750 We've seen how to define our very own functions. 01:11:15.750 --> 01:11:19.770 We've seen how to write our own loops to repeat code multiple times, 01:11:19.770 --> 01:11:22.380 and we've seen how to combine these two ideas, 01:11:22.380 --> 01:11:24.810 dipping our toes into functional programming. 01:11:24.810 --> 01:11:27.768 That is, using functions to do the work of iteration for us. 01:11:27.768 --> 01:11:30.810 When we come back next time, we'll actually see how to clean up our data, 01:11:30.810 --> 01:11:33.690 how to tidy it to make analysis like these even easier. 01:11:33.690 --> 01:11:35.430 All that and more next time. 01:11:35.430 --> 01:11:37.340 See you soon.