1 00:00:00,000 --> 00:00:01,464 2 00:00:01,464 --> 00:00:04,392 [MUSIC PLAYING] 3 00:00:04,392 --> 00:00:19,628 4 00:00:19,628 --> 00:00:21,920 CARTER ZENKE: Well hello, one and all, and welcome back 5 00:00:21,920 --> 00:00:25,850 to CS Introduction to Programming with R. My name is Carter Zanke, 6 00:00:25,850 --> 00:00:28,632 and this is our lecture on applying functions. 7 00:00:28,632 --> 00:00:30,590 We'll take a look at functions-- in particular, 8 00:00:30,590 --> 00:00:32,810 writing some of our very own-- and we'll also 9 00:00:32,810 --> 00:00:36,680 learn about loops, how to repeat certain segments of code over time. 10 00:00:36,680 --> 00:00:38,840 Towards the end of lecture, we'll combine 11 00:00:38,840 --> 00:00:41,150 these two ideas to dip our toes into this thing called 12 00:00:41,150 --> 00:00:42,740 functional programming. 13 00:00:42,740 --> 00:00:44,270 So let's begin. 14 00:00:44,270 --> 00:00:47,300 Let's start by actually looking at another program we had from before, 15 00:00:47,300 --> 00:00:51,230 one called count.R. And if you remember, this program was 16 00:00:51,230 --> 00:00:54,980 designed to count up some number of votes we typed in in the console. 17 00:00:54,980 --> 00:00:58,070 So if I click source here, I can run this program. 18 00:00:58,070 --> 00:01:02,990 And I'll Enter in 100 votes for Mario, let's say, and 150 for Peach, 19 00:01:02,990 --> 00:01:04,670 and 120 for Bowser. 20 00:01:04,670 --> 00:01:08,900 And I'll see that, all in all, I typed in 370 votes. 21 00:01:08,900 --> 00:01:11,420 So this same program from before. 22 00:01:11,420 --> 00:01:13,490 And I'd argue this program is correct-- 23 00:01:13,490 --> 00:01:14,900 it works just fine-- 24 00:01:14,900 --> 00:01:17,600 but there's an opportunity for improved design here. 25 00:01:17,600 --> 00:01:20,370 I could write this code a little bit better. 26 00:01:20,370 --> 00:01:24,270 And one thing I noticed, in particular, is on lines one, two, and three, 27 00:01:24,270 --> 00:01:27,840 I'm repeating some functionality over and over and over again. 28 00:01:27,840 --> 00:01:31,770 I'm asking the user for some input, converting that to an integer. 29 00:01:31,770 --> 00:01:35,040 I'm doing the same thing on line two, and the same thing on line three. 30 00:01:35,040 --> 00:01:38,220 And when you find yourself repeating this kind of functionality 31 00:01:38,220 --> 00:01:40,890 over and over and over again, it's a good clue 32 00:01:40,890 --> 00:01:44,340 that defining your own function might help you. 33 00:01:44,340 --> 00:01:47,850 So we want to define a function here, but how do we do so? 34 00:01:47,850 --> 00:01:51,660 We've certainly used functions before, but to create a function 35 00:01:51,660 --> 00:01:53,490 is something else entirely. 36 00:01:53,490 --> 00:01:55,830 Well, to create a function in R we're going to use 37 00:01:55,830 --> 00:01:59,700 this keyword called function followed by some parentheses. 38 00:01:59,700 --> 00:02:04,050 Now, you might also think, well, I want to give this function some kind of name 39 00:02:04,050 --> 00:02:06,040 that I could reuse throughout my program. 40 00:02:06,040 --> 00:02:09,330 So to give a function a name, I could use this syntax here. 41 00:02:09,330 --> 00:02:12,790 Maybe get votes is assigned this function here. 42 00:02:12,790 --> 00:02:15,600 So get votes now will be the name for my function, 43 00:02:15,600 --> 00:02:20,360 because I want this function to, let's say, get some votes from the user. 44 00:02:20,360 --> 00:02:21,880 Now, this is pretty good. 45 00:02:21,880 --> 00:02:24,980 I have a name for our function, but what we also probably want 46 00:02:24,980 --> 00:02:27,800 is the ability for our function to take some input, some arguments, 47 00:02:27,800 --> 00:02:29,820 and run with those arguments. 48 00:02:29,820 --> 00:02:32,210 So if I wanted to change how my function runs, 49 00:02:32,210 --> 00:02:36,140 I could supply parameters inside of these parentheses here. 50 00:02:36,140 --> 00:02:40,310 I could supply any number of them separated by commas. 51 00:02:40,310 --> 00:02:43,700 But then, even with a name and some parameters, 52 00:02:43,700 --> 00:02:45,630 our function needs to do something. 53 00:02:45,630 --> 00:02:47,630 So essentially what our function actually does-- 54 00:02:47,630 --> 00:02:49,700 we'll use these curly braces here, in which 55 00:02:49,700 --> 00:02:53,510 we can define our function's body, that is, the lines of code that 56 00:02:53,510 --> 00:02:56,120 will run when our function is run. 57 00:02:56,120 --> 00:02:58,370 And down below towards, the end of these curly braces, 58 00:02:58,370 --> 00:03:01,820 we'll say what our function should return to us, the programmer, 59 00:03:01,820 --> 00:03:03,590 after it finishes running. 60 00:03:03,590 --> 00:03:06,980 So with this syntax here, let's actually go ahead and define 61 00:03:06,980 --> 00:03:12,560 our very own get votes function that can ask the user for some number of votes. 62 00:03:12,560 --> 00:03:17,510 Come back to our studio here, and let's write this function called get votes. 63 00:03:17,510 --> 00:03:19,760 I want it to have the same functionality, essentially, 64 00:03:19,760 --> 00:03:23,590 as what I'm doing on lines one, two, and three here, but to define it, 65 00:03:23,590 --> 00:03:26,290 I'll first do this at the top of my program here. 66 00:03:26,290 --> 00:03:28,560 So I define it, first and foremost, and then I 67 00:03:28,560 --> 00:03:31,090 can use it later on in my program. 68 00:03:31,090 --> 00:03:34,440 So I'll type, like we saw, get_votes. 69 00:03:34,440 --> 00:03:36,510 That's the name of this function. 70 00:03:36,510 --> 00:03:40,110 And I'll assign it to be some new function here, 71 00:03:40,110 --> 00:03:43,150 and I'll provide a function body just like this. 72 00:03:43,150 --> 00:03:47,190 So now I have my very first function in R, albeit an empty one, right? 73 00:03:47,190 --> 00:03:51,340 So I want this function, again, to get some votes from the user. 74 00:03:51,340 --> 00:03:55,320 So inside these curly braces, inside the function's body, 75 00:03:55,320 --> 00:03:58,440 I should define what code I want to run when I call 76 00:03:58,440 --> 00:04:01,620 or when I use this function later on in my program. 77 00:04:01,620 --> 00:04:04,740 Well, the thing I want to do is very similar to lines, now, five, six, 78 00:04:04,740 --> 00:04:05,340 and seven. 79 00:04:05,340 --> 00:04:09,750 I want to maybe ask the user for an integer prompting them with something 80 00:04:09,750 --> 00:04:13,200 like, let's say, just enter votes for now, just like that. 81 00:04:13,200 --> 00:04:17,490 And I'll maybe assign the result to this object called votes that is now 82 00:04:17,490 --> 00:04:20,430 visible inside of this function for me. 83 00:04:20,430 --> 00:04:22,980 And once I get those votes from the user, 84 00:04:22,980 --> 00:04:25,620 well, I want get votes to return them to me, the programmer, 85 00:04:25,620 --> 00:04:29,890 so I could assign them to some objects, like Mario, Peach, or Bowser down below. 86 00:04:29,890 --> 00:04:34,650 So to return some value from a function, I can use the keyword return 87 00:04:34,650 --> 00:04:39,270 and, inside parentheses, say which object's value I want to return. 88 00:04:39,270 --> 00:04:42,300 So here, in total, is now my function. 89 00:04:42,300 --> 00:04:45,300 On line two, I'm asking the user to enter some votes, 90 00:04:45,300 --> 00:04:47,280 converting that to an integer, and storing it 91 00:04:47,280 --> 00:04:49,770 in this object called votes inside of this function. 92 00:04:49,770 --> 00:04:52,950 And then, on line three, I'm returning the value of votes 93 00:04:52,950 --> 00:04:55,990 so I can reuse it later on in my program. 94 00:04:55,990 --> 00:04:59,760 So now I think I've implemented lines six, seven, and eight here 95 00:04:59,760 --> 00:05:01,440 as its own separate function. 96 00:05:01,440 --> 00:05:03,510 I should feel free to use that function now. 97 00:05:03,510 --> 00:05:07,560 On line six, I'll actually not use asinteger and readline. 98 00:05:07,560 --> 00:05:12,480 I'll instead use get_votes, and call it using these parentheses here. 99 00:05:12,480 --> 00:05:14,640 I'll do the same for line seven. 100 00:05:14,640 --> 00:05:16,140 Get_votes. 101 00:05:16,140 --> 00:05:18,340 And line eight as well. 102 00:05:18,340 --> 00:05:22,520 And now before when I run this program again, let's walk through top to bottom 103 00:05:22,520 --> 00:05:23,780 what I've just done. 104 00:05:23,780 --> 00:05:27,620 On line one, I have defined this function called get_votes. 105 00:05:27,620 --> 00:05:32,210 I've told R exactly what inputs it takes-- in this case, none-- 106 00:05:32,210 --> 00:05:35,810 I've told R exactly what it should do when I call this function. 107 00:05:35,810 --> 00:05:37,790 First line two, then line three. 108 00:05:37,790 --> 00:05:40,290 And I've given it some name, get_votes here. 109 00:05:40,290 --> 00:05:43,880 So, now, on lines six, seven, and eight, I can call get_votes. 110 00:05:43,880 --> 00:05:47,660 And every time I do, R will effectively go back to these lines two and three, 111 00:05:47,660 --> 00:05:50,120 run those, and return to me the value that I've 112 00:05:50,120 --> 00:05:53,750 asked it to return for each of Mario, Peach, and Bowser. 113 00:05:53,750 --> 00:05:55,650 So now let's run our program. 114 00:05:55,650 --> 00:05:58,430 I'll click source here, and enter in some number of votes. 115 00:05:58,430 --> 00:06:02,060 I think we had 100, first, for Mario, so I'll say 100. 116 00:06:02,060 --> 00:06:05,960 And then 150 for Peach, and 120 for Bowser. 117 00:06:05,960 --> 00:06:10,170 And now I see the same functionality, but now in my own function. 118 00:06:10,170 --> 00:06:14,030 So, congratulations, this is your first function in R. 119 00:06:14,030 --> 00:06:19,380 But if I'm doing this, I actually think I have lost some functionality. 120 00:06:19,380 --> 00:06:23,400 Because if I run this program again, I'll see enter votes. 121 00:06:23,400 --> 00:06:26,880 And before, we had this nice ability to prompt 122 00:06:26,880 --> 00:06:29,580 the user with some particular prompt that we wanted to. 123 00:06:29,580 --> 00:06:31,680 So how could we get that back? 124 00:06:31,680 --> 00:06:34,680 Well, one thing we could define now is a parameter 125 00:06:34,680 --> 00:06:38,040 to this function, some input that changes how it runs. 126 00:06:38,040 --> 00:06:42,000 And as you saw, we can define those parameters inside these parentheses 127 00:06:42,000 --> 00:06:44,250 here after our function keyword. 128 00:06:44,250 --> 00:06:47,010 So one thing I want to do is be able to change 129 00:06:47,010 --> 00:06:49,200 the prompt that I prompt the user with. 130 00:06:49,200 --> 00:06:52,300 So I'll call this parameter prompt, just like this. 131 00:06:52,300 --> 00:06:56,310 And now my function can take this input called prompt. 132 00:06:56,310 --> 00:07:01,320 Well, when I do that, maybe I also want to use that particular prompt the user 133 00:07:01,320 --> 00:07:02,040 has entered-- 134 00:07:02,040 --> 00:07:04,040 that I, the programmer have entered as the input 135 00:07:04,040 --> 00:07:06,760 to get votes-- and prompt the user with that instead. 136 00:07:06,760 --> 00:07:10,020 So now inside this function, I have access to this parameter, 137 00:07:10,020 --> 00:07:14,550 this argument called prompt, that I can then use to prompt the user on line two 138 00:07:14,550 --> 00:07:15,280 here. 139 00:07:15,280 --> 00:07:17,140 So let's try this. 140 00:07:17,140 --> 00:07:21,150 I'll now give as input to this function Mario, like we had before, 141 00:07:21,150 --> 00:07:25,710 and Peach, like we had before, and Bowser, like we had before. 142 00:07:25,710 --> 00:07:28,980 Each time I call this function, I'm setting this character string 143 00:07:28,980 --> 00:07:33,300 Mario equal to prompt, and then prompting the user with that prompt now. 144 00:07:33,300 --> 00:07:34,600 Let's see what happens. 145 00:07:34,600 --> 00:07:35,910 I'll click on source. 146 00:07:35,910 --> 00:07:38,790 And now, instead of enter votes, I'll see Mario. 147 00:07:38,790 --> 00:07:44,820 I'll type in 100, 150, and 120, and I'll get back the same result. 148 00:07:44,820 --> 00:07:46,110 So pretty handy here. 149 00:07:46,110 --> 00:07:50,460 You're able to define parameters for our functions and change them over time. 150 00:07:50,460 --> 00:07:54,570 Now, one optimization still we could make is that on line three 151 00:07:54,570 --> 00:07:59,520 I'm explicitly returning votes from this function, but it turns out that in R, 152 00:07:59,520 --> 00:08:02,550 by default, the last computed value-- 153 00:08:02,550 --> 00:08:03,780 in this case votes-- 154 00:08:03,780 --> 00:08:06,240 will be returned automatically for me. 155 00:08:06,240 --> 00:08:10,380 So on line three, I actually don't need to explicitly say return votes. 156 00:08:10,380 --> 00:08:15,210 I could omit that, and R will, by default, return the last computed value 157 00:08:15,210 --> 00:08:18,060 inside this function, which is votes. 158 00:08:18,060 --> 00:08:23,220 So stylistically, we often want to avoid typing return when R by default 159 00:08:23,220 --> 00:08:26,610 actually does that for us for the last computed value here. 160 00:08:26,610 --> 00:08:27,960 Let me run this program again. 161 00:08:27,960 --> 00:08:31,890 I'll click source, I'll type 100 votes for Mario, 150 for Peach, 162 00:08:31,890 --> 00:08:34,320 120 for Bowser, and now we're back in business. 163 00:08:34,320 --> 00:08:38,280 We have 370 votes in total. 164 00:08:38,280 --> 00:08:42,630 Now, let's say I get a little bit lazy as a programmer, 165 00:08:42,630 --> 00:08:47,520 and maybe sometimes I forget to enter in some value for this parameter we 166 00:08:47,520 --> 00:08:49,380 defined called prompt, right? 167 00:08:49,380 --> 00:08:53,910 I could go back to what we had before, using these functions without any input. 168 00:08:53,910 --> 00:08:59,040 But now, because my function is defined as taking this parameter called prompt, 169 00:08:59,040 --> 00:09:00,390 I might run into some trouble. 170 00:09:00,390 --> 00:09:04,590 If I click source here, I'll see that I get an error. 171 00:09:04,590 --> 00:09:05,880 Error in get votes. 172 00:09:05,880 --> 00:09:09,510 Argument prompt is missing with no default. 173 00:09:09,510 --> 00:09:13,950 So if you're defining a function and you want the user or the programmer 174 00:09:13,950 --> 00:09:17,130 to need to define some argument to that function, 175 00:09:17,130 --> 00:09:19,860 well, you're going to need to do exactly what we did here, 176 00:09:19,860 --> 00:09:22,920 where if I define this parameter and don't give it a default, 177 00:09:22,920 --> 00:09:26,130 the user or the programmer must supply some value for it. 178 00:09:26,130 --> 00:09:30,390 But if I'm being a little bit nice and I want to maybe catch somebody doing this 179 00:09:30,390 --> 00:09:33,270 and provide them with some default value, I could do that too. 180 00:09:33,270 --> 00:09:38,100 Up here in function, I could say, prompt, and give it some default value. 181 00:09:38,100 --> 00:09:42,430 Maybe set it equal to, initially, enter votes, just like this. 182 00:09:42,430 --> 00:09:46,620 So now if me or somebody else doesn't provide some input, 183 00:09:46,620 --> 00:09:50,130 well, the default value for prompt will be enter votes. 184 00:09:50,130 --> 00:09:51,100 Let me try this. 185 00:09:51,100 --> 00:09:54,450 I'll click on source, and now I'll see no error. 186 00:09:54,450 --> 00:09:56,460 I instead see enter votes. 187 00:09:56,460 --> 00:10:00,360 So even though we didn't supply some input to get votes here, 188 00:10:00,360 --> 00:10:03,790 we defined some default value that is used instead. 189 00:10:03,790 --> 00:10:05,880 So, prompt, when there's no value supplied, 190 00:10:05,880 --> 00:10:09,940 is going to be equal now to enter votes, as we've seen down below. 191 00:10:09,940 --> 00:10:11,460 So, pretty good. 192 00:10:11,460 --> 00:10:14,498 If I want to override this, as I might often want to do, 193 00:10:14,498 --> 00:10:15,540 I could do it as follows. 194 00:10:15,540 --> 00:10:19,210 I could go back to what we did before and type in some input, like Mario 195 00:10:19,210 --> 00:10:23,720 or like Peach or, let's say, like Bowser, just like this. 196 00:10:23,720 --> 00:10:27,580 And because this is the first input I've given to my function, 197 00:10:27,580 --> 00:10:31,630 and prompt is the first parameter, well, Mario will override, let's say, 198 00:10:31,630 --> 00:10:35,860 the default value of prompt, and same for Peach, and same for Bowser. 199 00:10:35,860 --> 00:10:39,940 There are a few ways to supply arguments to functions, as we've seen so far. 200 00:10:39,940 --> 00:10:41,440 One is positionally. 201 00:10:41,440 --> 00:10:44,590 Here, notice that the very first argument to get votes 202 00:10:44,590 --> 00:10:47,590 becomes the value for the first parameter, prompt. 203 00:10:47,590 --> 00:10:50,860 If, though, I had more than one parameter and more than one argument, 204 00:10:50,860 --> 00:10:52,960 I could define them separated by commas. 205 00:10:52,960 --> 00:10:57,700 So maybe this would be my first argument here, Mario followed by a comma. 206 00:10:57,700 --> 00:11:00,520 I could then provide some other value for the next argument 207 00:11:00,520 --> 00:11:02,470 if I had one in my function here. 208 00:11:02,470 --> 00:11:04,000 But I don't, so I won't. 209 00:11:04,000 --> 00:11:07,660 The other way I could define the argument for a particular parameter 210 00:11:07,660 --> 00:11:10,310 is by actually using the parameter's name. 211 00:11:10,310 --> 00:11:12,970 So the name of this parameter is prompt here. 212 00:11:12,970 --> 00:11:16,120 I could override that and make sure to explicitly say 213 00:11:16,120 --> 00:11:20,430 that this prompt is going to be equal to Mario using syntax like this. 214 00:11:20,430 --> 00:11:24,890 I could say that get votes, I know, explicitly has this argument parameter 215 00:11:24,890 --> 00:11:29,330 called prompt that I'll set equal, now, to Mario, and same for Peach, 216 00:11:29,330 --> 00:11:31,100 and same for Bowser. 217 00:11:31,100 --> 00:11:36,620 So now I'm able here to run this code by supplying or overriding 218 00:11:36,620 --> 00:11:39,770 the default value now of prompt. 219 00:11:39,770 --> 00:11:42,530 So I think this is a pretty good first function. 220 00:11:42,530 --> 00:11:47,060 If I click source and I click run, I can do 100 votes for Mario, 150 for Peach, 221 00:11:47,060 --> 00:11:49,730 120 for Bowser, getting that total back. 222 00:11:49,730 --> 00:11:52,550 But what's interesting that I notice now is 223 00:11:52,550 --> 00:11:56,270 if I go to my environment on my right hand side, 224 00:11:56,270 --> 00:11:59,840 I'll see a few different objects that I have. 225 00:11:59,840 --> 00:12:05,180 I see Bowser, I see Mario, I see Peach, and total, I see get votes, 226 00:12:05,180 --> 00:12:08,570 the function which we defined, but what I don't see 227 00:12:08,570 --> 00:12:11,540 is this votes object or prompt. 228 00:12:11,540 --> 00:12:15,980 And actually if I go down to my console and ask R to give me the value of votes 229 00:12:15,980 --> 00:12:18,500 as it currently is, well, I'll see error. 230 00:12:18,500 --> 00:12:20,690 Object votes not found. 231 00:12:20,690 --> 00:12:22,020 Now, why is that? 232 00:12:22,020 --> 00:12:25,220 Well, this has to do with what we call in programming this idea of scope. 233 00:12:25,220 --> 00:12:30,050 And scope tells us in what context objects like these are defined. 234 00:12:30,050 --> 00:12:32,750 To visualize this, let's think about our environment 235 00:12:32,750 --> 00:12:35,840 here and think about what scope we have in terms of our objects 236 00:12:35,840 --> 00:12:37,500 in that environment. 237 00:12:37,500 --> 00:12:40,010 So here is a representation of our environment. 238 00:12:40,010 --> 00:12:43,400 We have those four objects we saw earlier, Mario, Peach, 239 00:12:43,400 --> 00:12:44,390 Bowser, and Total. 240 00:12:44,390 --> 00:12:48,050 These are accessible to me, the programmer, pretty much at all times. 241 00:12:48,050 --> 00:12:51,530 We also have of course our function get_votes. 242 00:12:51,530 --> 00:12:55,730 But get_votes is kind of best viewed as this black box. 243 00:12:55,730 --> 00:13:00,080 I don't exactly know what's going inside of it when I call that function. 244 00:13:00,080 --> 00:13:03,410 If you think about calling a function like sum or mean, 245 00:13:03,410 --> 00:13:07,040 you likely don't know exactly what code was defined to compute those values, 246 00:13:07,040 --> 00:13:08,540 you just know that it kind of works. 247 00:13:08,540 --> 00:13:11,240 You give some input, and you get that output back. 248 00:13:11,240 --> 00:13:13,880 Well, the same thing now with our own functions. 249 00:13:13,880 --> 00:13:17,240 We just have to trust that we ourselves have defined these functions to take 250 00:13:17,240 --> 00:13:20,450 some input and produce some output for us, and we, the programmer, 251 00:13:20,450 --> 00:13:24,710 can't actually access those objects we defined inside the function. 252 00:13:24,710 --> 00:13:27,410 If we want to use those-- just kind of metaphorically 253 00:13:27,410 --> 00:13:29,990 zoom in or go inside that function's context 254 00:13:29,990 --> 00:13:34,430 to then be able to use and see those values, like votes and prompt. 255 00:13:34,430 --> 00:13:37,130 But for our sake as the programmer, these objects 256 00:13:37,130 --> 00:13:40,160 are only defined inside the scope of, now, 257 00:13:40,160 --> 00:13:43,370 our function, which is why we can't see them in our global environment, 258 00:13:43,370 --> 00:13:46,460 as we saw in R. 259 00:13:46,460 --> 00:13:49,100 So we have now defined our first function. 260 00:13:49,100 --> 00:13:52,400 We've taken some inputs, we've returned some values. 261 00:13:52,400 --> 00:13:54,980 We've also seen this question of scope here. 262 00:13:54,980 --> 00:13:58,430 Let me ask, what questions do we have so far on scope 263 00:13:58,430 --> 00:14:01,730 or on defining our own functions? 264 00:14:01,730 --> 00:14:04,988 AUDIENCE: What if the user enters a string or a character as an input? 265 00:14:04,988 --> 00:14:07,280 Is there any way to handle the errors that we will get? 266 00:14:07,280 --> 00:14:08,988 CARTER ZENKE: Yeah, really good question. 267 00:14:08,988 --> 00:14:11,630 So up until now, we've been kind of being a good user. 268 00:14:11,630 --> 00:14:14,450 We've been supplying numbers to the actual program here. 269 00:14:14,450 --> 00:14:17,000 But we need to think defensively as programmers 270 00:14:17,000 --> 00:14:19,760 and think about what could happen if we entered in something 271 00:14:19,760 --> 00:14:21,540 that wasn't a number, for instance. 272 00:14:21,540 --> 00:14:23,040 So let's see what might happen here. 273 00:14:23,040 --> 00:14:26,240 I'll come back to my computer, and let's go back 274 00:14:26,240 --> 00:14:30,200 to our program called count.R. And let me close my environment, 275 00:14:30,200 --> 00:14:33,950 but now think a little more maliciously as a user. 276 00:14:33,950 --> 00:14:35,870 What could I do to break this program? 277 00:14:35,870 --> 00:14:38,960 Well, one thing I could do is Enter in some value 278 00:14:38,960 --> 00:14:41,150 that I don't think the program expected to see. 279 00:14:41,150 --> 00:14:44,150 So if I click source, now, to run the program again, 280 00:14:44,150 --> 00:14:47,600 let me enter in something funny like duck for Mario, 281 00:14:47,600 --> 00:14:51,800 or quack for Peach, or cat for Bowser. 282 00:14:51,800 --> 00:14:54,030 And these are certainly not numbers. 283 00:14:54,030 --> 00:14:57,560 So if I hit enter now, oh, this is some pretty bad output. 284 00:14:57,560 --> 00:15:01,070 So what I see down below is total votes is now 285 00:15:01,070 --> 00:15:05,330 equal to NA, this value that means not applicable. 286 00:15:05,330 --> 00:15:07,670 And I see some warning messages. 287 00:15:07,670 --> 00:15:11,930 Now, if I look at this particular one-- in get_votes, prompt equals Mario, 288 00:15:11,930 --> 00:15:14,120 NA is introduced by coercion. 289 00:15:14,120 --> 00:15:15,360 Well, what does that mean? 290 00:15:15,360 --> 00:15:18,440 Well, we saw a little while ago that coercion is this process which 291 00:15:18,440 --> 00:15:21,470 would convert some storage mode to some other one, 292 00:15:21,470 --> 00:15:25,130 and it seems like we do that on line two, on asinteger. 293 00:15:25,130 --> 00:15:27,530 We convert some character string the user gave us 294 00:15:27,530 --> 00:15:31,700 via readline to some number, or some integer in particular. 295 00:15:31,700 --> 00:15:35,930 But what might happen if I did something like I did here, 296 00:15:35,930 --> 00:15:38,570 I gave cat instead of an actual number? 297 00:15:38,570 --> 00:15:41,592 Well, as.integer will say, one, I don't know 298 00:15:41,592 --> 00:15:44,550 what the heck you want me to do with that, so I'll give you NA instead. 299 00:15:44,550 --> 00:15:48,200 And it will also give me what's called a warning, telling me that, look, 300 00:15:48,200 --> 00:15:51,680 I couldn't do what you wanted me to do with the input you gave me. 301 00:15:51,680 --> 00:15:58,040 So this is why now we see this value NA as opposed to, let's say, cat or duck 302 00:15:58,040 --> 00:15:59,790 or quack instead. 303 00:15:59,790 --> 00:16:02,960 So what could we do to fix this? 304 00:16:02,960 --> 00:16:07,010 I think one thing we could do is try to catch this process. 305 00:16:07,010 --> 00:16:10,280 Like if we see inside this function that we actually 306 00:16:10,280 --> 00:16:14,360 got an NA value for votes, well, we could return something else entirely. 307 00:16:14,360 --> 00:16:15,390 We could start there. 308 00:16:15,390 --> 00:16:17,890 So let's go back to our program and make that happen for us. 309 00:16:17,890 --> 00:16:21,320 We could use what we saw last time called conditionals, where conditionals 310 00:16:21,320 --> 00:16:24,860 will just test for something and take some particular action because 311 00:16:24,860 --> 00:16:26,120 of that test. 312 00:16:26,120 --> 00:16:30,410 So, here, let's say-- let's assume the user enters in some bad value, 313 00:16:30,410 --> 00:16:33,317 like duck, and now votes is NA. 314 00:16:33,317 --> 00:16:35,150 Well, I don't want to do what we did before, 315 00:16:35,150 --> 00:16:37,310 which was return votes automatically. 316 00:16:37,310 --> 00:16:40,400 I'd rather first ask, is votes NA? 317 00:16:40,400 --> 00:16:44,810 And if it is, let's go ahead and not return the actual NA value. 318 00:16:44,810 --> 00:16:48,770 Why don't we return something like 0, maybe, just to kick things off? 319 00:16:48,770 --> 00:16:50,270 If I say if now-- 320 00:16:50,270 --> 00:16:56,990 if votes is NA, well, then inside I could return some special value, 321 00:16:56,990 --> 00:17:00,290 like 0, saying that, look, we couldn't count your votes. 322 00:17:00,290 --> 00:17:04,520 But otherwise, let's say, we could go ahead and safely return votes. 323 00:17:04,520 --> 00:17:08,940 So if votes is not MA, we can go ahead and return votes instead. 324 00:17:08,940 --> 00:17:11,270 And I think this will be a little bit safer for us. 325 00:17:11,270 --> 00:17:14,089 If I go ahead and click on source now-- 326 00:17:14,089 --> 00:17:17,480 let me go ahead and type in duck for Mario, quack for Peach, 327 00:17:17,480 --> 00:17:20,720 and cat for Bowser, and-- 328 00:17:20,720 --> 00:17:24,035 so I seem to have gotten total votes being 0. 329 00:17:24,035 --> 00:17:25,160 That's a little bit better. 330 00:17:25,160 --> 00:17:26,960 It's no longer NA. 331 00:17:26,960 --> 00:17:30,800 We seem to have just not counted, like, cat, duck, or quack, 332 00:17:30,800 --> 00:17:32,750 but I still get these warnings. 333 00:17:32,750 --> 00:17:36,440 Now, these warnings we'll see a little bit more depth in a future lecture. 334 00:17:36,440 --> 00:17:40,280 R does have warnings and errors that are more generally known as exceptions, 335 00:17:40,280 --> 00:17:44,927 but, for now, we can handle them using a function called suppress warnings. 336 00:17:44,927 --> 00:17:46,760 Suppress warnings allows me, the programmer, 337 00:17:46,760 --> 00:17:50,780 to say, look, I know something went wrong, but I'm handling it myself. 338 00:17:50,780 --> 00:17:51,920 I know how to do this. 339 00:17:51,920 --> 00:17:56,360 So let's see if we can tell as.integer to not give us a warning 340 00:17:56,360 --> 00:17:58,610 anymore, because we're handling it a little bit later. 341 00:17:58,610 --> 00:18:00,200 Let's go back to RStudio here. 342 00:18:00,200 --> 00:18:04,940 And I could use, like we said, this function called suppress warnings, 343 00:18:04,940 --> 00:18:09,050 where suppress warnings takes as input a function that could give us a warning, 344 00:18:09,050 --> 00:18:11,280 in this case like as.integer. 345 00:18:11,280 --> 00:18:15,770 So now if I give as input this particular function-- like this, 346 00:18:15,770 --> 00:18:17,480 suppress warnings-- 347 00:18:17,480 --> 00:18:21,770 what I'm effectively saying is that as you take the user's input 348 00:18:21,770 --> 00:18:24,965 and you convert it to an integer, if you encounter a warning, 349 00:18:24,965 --> 00:18:26,090 don't give me that warning. 350 00:18:26,090 --> 00:18:29,030 Just kind of suppress it, keep it low, and I, myself, the programmer, 351 00:18:29,030 --> 00:18:31,800 will handle it later on as well. 352 00:18:31,800 --> 00:18:33,620 So let's try this now. 353 00:18:33,620 --> 00:18:38,510 I'll click source, and then I'll do 100 votes for Mario, 150 for Peach, 354 00:18:38,510 --> 00:18:40,520 and 120 for Bowser. 355 00:18:40,520 --> 00:18:42,320 And I think now we're back in action. 356 00:18:42,320 --> 00:18:43,980 Although that was actually-- that was some good input, 357 00:18:43,980 --> 00:18:45,230 so let's try the bad input. 358 00:18:45,230 --> 00:18:50,270 Let's go ahead and do duck for Mario, quack for Peach, and cat for Bowser. 359 00:18:50,270 --> 00:18:55,700 And now we see total votes being 0, but now no warnings being raised thanks 360 00:18:55,700 --> 00:18:57,133 to suppress warnings. 361 00:18:57,133 --> 00:18:59,300 So we'll see this in more depth in a future lecture, 362 00:18:59,300 --> 00:19:02,240 but, for now, just think about suppressing those warnings, 363 00:19:02,240 --> 00:19:04,820 kind of silencing them because we the programmer know 364 00:19:04,820 --> 00:19:08,150 how to handle those in our own code. 365 00:19:08,150 --> 00:19:13,250 There's one more improvement I see here, which is that this block from line three 366 00:19:13,250 --> 00:19:14,180 to line seven-- 367 00:19:14,180 --> 00:19:18,980 this if else-- could be simplified, could be converted to one line of code. 368 00:19:18,980 --> 00:19:22,400 And, in fact, if you have an if else statement where 369 00:19:22,400 --> 00:19:27,290 inside the if and inside the else you're simply returning one value or another, 370 00:19:27,290 --> 00:19:30,800 well, you could simplify this and use a function called if else, 371 00:19:30,800 --> 00:19:33,890 just like this, where the first argument to if else 372 00:19:33,890 --> 00:19:39,240 is the logical expression to test, in this case, is votes NA, just like this. 373 00:19:39,240 --> 00:19:41,120 And if that expression is true-- 374 00:19:41,120 --> 00:19:44,060 if votes is NA, well, the second argument 375 00:19:44,060 --> 00:19:47,180 will be the thing we return, the value we get back from if else, 376 00:19:47,180 --> 00:19:50,150 and then the third argument will be what we get back 377 00:19:50,150 --> 00:19:52,550 if this logical expression is false. 378 00:19:52,550 --> 00:19:54,360 It's the else in if else. 379 00:19:54,360 --> 00:19:56,640 So I'll say votes here instead. 380 00:19:56,640 --> 00:20:00,320 So now, to be clear, line three is doing exactly the same work 381 00:20:00,320 --> 00:20:04,190 as lines 5 through 9 but is much shorter, I would argue, more readable, 382 00:20:04,190 --> 00:20:06,920 and so I can now get rid of lines 5 through 9 383 00:20:06,920 --> 00:20:08,990 and shorten this function even more. 384 00:20:08,990 --> 00:20:13,640 And because R will return to me the last computed value, if else, 385 00:20:13,640 --> 00:20:17,220 whatever it returns, will be the return value of my function itself. 386 00:20:17,220 --> 00:20:20,040 So if votes is Na, if else will return 0, 387 00:20:20,040 --> 00:20:23,740 but so will my function, and same with votes as well. 388 00:20:23,740 --> 00:20:25,150 So let's try this again. 389 00:20:25,150 --> 00:20:29,490 I'll click source, and let me go ahead and say something like duck for Mario, 390 00:20:29,490 --> 00:20:34,080 but I will enter in maybe 150 for Peach, and cat for Bowser. 391 00:20:34,080 --> 00:20:37,200 And now we see our total votes is 150. 392 00:20:37,200 --> 00:20:41,340 And I think we've really simplified this function for ourselves here. 393 00:20:41,340 --> 00:20:44,010 Now, as we've done this, I think you're seeing 394 00:20:44,010 --> 00:20:48,090 the power of putting this functionality inside of a function. 395 00:20:48,090 --> 00:20:50,670 If I hadn't done this, if I had had to repeat 396 00:20:50,670 --> 00:20:54,690 this code over and over and over again, my code would have been much longer. 397 00:20:54,690 --> 00:20:58,350 You can imagine myself repeating that same conditional if else, if else, 398 00:20:58,350 --> 00:21:00,750 if else through all of my prompts to the user. 399 00:21:00,750 --> 00:21:05,397 But by converting that code into my very own function, I can modularize things. 400 00:21:05,397 --> 00:21:08,730 I can make things easier to maintain and update, which is why in the first place 401 00:21:08,730 --> 00:21:11,280 we would write functions like these. 402 00:21:11,280 --> 00:21:14,850 So we've seen how to write our very first function in R, 403 00:21:14,850 --> 00:21:17,790 to handle some errors our user could present us with. 404 00:21:17,790 --> 00:21:21,760 What other questions do we have about defining these functions? 405 00:21:21,760 --> 00:21:27,130 AUDIENCE: If we had the first version of our function get_votes 406 00:21:27,130 --> 00:21:32,710 that we were still not checking for Na values by coercion, 407 00:21:32,710 --> 00:21:37,930 would we need to actually store our computation in the votes 408 00:21:37,930 --> 00:21:41,958 object, or could we just return the value directly? 409 00:21:41,958 --> 00:21:43,750 CARTER ZENKE: Yeah, a really good question. 410 00:21:43,750 --> 00:21:46,630 I think something that gets at shortening this program even more. 411 00:21:46,630 --> 00:21:50,592 If we go back, rewind a little bit to maybe not handling these NA that 412 00:21:50,592 --> 00:21:53,050 could be introduced, but instead just returning, let's say, 413 00:21:53,050 --> 00:21:54,820 whatever the number the user gives us is, 414 00:21:54,820 --> 00:21:56,260 I could probably shorten it even more. 415 00:21:56,260 --> 00:21:59,052 So let me go back to RStudio and show you how that could look like. 416 00:21:59,052 --> 00:22:03,830 I will maybe get rid of this if else here, and I'll instead maybe do this. 417 00:22:03,830 --> 00:22:06,850 I'll go back to what we had before, which was assigning this object 418 00:22:06,850 --> 00:22:09,358 votes whatever number the user types in. 419 00:22:09,358 --> 00:22:12,400 Now, I think you were asking, could we just get rid of this object votes? 420 00:22:12,400 --> 00:22:16,720 Could we simply have this, as.integer, readline, given some prompt? 421 00:22:16,720 --> 00:22:18,820 I think we could, because as.integer will still 422 00:22:18,820 --> 00:22:21,490 return for us whatever number the user has typed in, 423 00:22:21,490 --> 00:22:23,950 and therefore-- because the last line of my function-- 424 00:22:23,950 --> 00:22:27,080 my function will instead return that value as well. 425 00:22:27,080 --> 00:22:29,060 So I'll click on source to run my program. 426 00:22:29,060 --> 00:22:33,070 I'll type 100 for Mario, 150 for Peach, and 120 for Bowser, 427 00:22:33,070 --> 00:22:35,350 and now I see the same result we wanted. 428 00:22:35,350 --> 00:22:40,060 I would argue, though, because we want to keep this value and test its actual-- 429 00:22:40,060 --> 00:22:42,000 its value later on, like, was it NA or was it 430 00:22:42,000 --> 00:22:44,750 a number-- we might want to actually store it in a separate object 431 00:22:44,750 --> 00:22:47,350 and then test that value a little bit later, like we did here. 432 00:22:47,350 --> 00:22:50,740 But a great question, and a good optimization too. 433 00:22:50,740 --> 00:22:53,230 OK, so our program is better. 434 00:22:53,230 --> 00:22:55,300 It's certainly better than it was before, 435 00:22:55,300 --> 00:22:57,550 but there's still one thing I think that's missing, 436 00:22:57,550 --> 00:23:03,220 which is, if I click source now and I type in, maybe, quack for Mario, 437 00:23:03,220 --> 00:23:05,800 I've missed my chance now to enter Mario's votes. 438 00:23:05,800 --> 00:23:08,830 Wouldn't it be nice if instead my program could reprompt me 439 00:23:08,830 --> 00:23:13,300 every time to enter in a number for Mario, and it won't stop, let's say, 440 00:23:13,300 --> 00:23:16,780 until I do comply, I enter in the number for Mario's votes? 441 00:23:16,780 --> 00:23:20,250 Well, for that, we'll actually need some new structure, one called a loop. 442 00:23:20,250 --> 00:23:22,000 And in just a few minutes, we'll come back 443 00:23:22,000 --> 00:23:24,550 and talk about how to implement these loops in R code. 444 00:23:24,550 --> 00:23:26,230 We'll see you all in five. 445 00:23:26,230 --> 00:23:27,332 Well, we're back. 446 00:23:27,332 --> 00:23:29,290 And, as promised, we're going to learn together 447 00:23:29,290 --> 00:23:31,832 about these things called loops, these structures that let us 448 00:23:31,832 --> 00:23:34,570 repeat some code some number of times. 449 00:23:34,570 --> 00:23:37,390 Now, for this, I brought along a friend, the CS50 duck debugger, 450 00:23:37,390 --> 00:23:40,690 which is great to talk to about my code, [? the ?] illogic in my thoughts, 451 00:23:40,690 --> 00:23:42,670 but also great for thinking about loops. 452 00:23:42,670 --> 00:23:45,640 In particular, if you have a duck or any kind of object to squeeze, 453 00:23:45,640 --> 00:23:48,820 you could use that to think about how loops work underneath the hood. 454 00:23:48,820 --> 00:23:52,600 So let's go ahead and jump in and see what this duck can teach us about loops. 455 00:23:52,600 --> 00:23:55,660 So if you go back to my code over here in RStudio, 456 00:23:55,660 --> 00:23:59,320 I have a program that kind of simulates me squeezing this duck three times, 457 00:23:59,320 --> 00:24:02,182 for instance, like this. 458 00:24:02,182 --> 00:24:05,140 It's a bit more of a squeak than a quack, but we'll go with it for now. 459 00:24:05,140 --> 00:24:07,450 If I type source here-- click source-- 460 00:24:07,450 --> 00:24:11,230 I'll see quack, quack, quack, me squeezing this duck now 461 00:24:11,230 --> 00:24:16,360 three different times, so putting what we just did physically now into text. 462 00:24:16,360 --> 00:24:21,400 So let's visualize this code in terms of a flow chart and see what it looks like. 463 00:24:21,400 --> 00:24:24,640 Well, here, at the top of my program, I start it. 464 00:24:24,640 --> 00:24:29,170 I click source, my program begins, and the next step then is to say, quack. 465 00:24:29,170 --> 00:24:32,260 Every arrow, now, indicates some next step of my program. 466 00:24:32,260 --> 00:24:35,260 Well, after I say quack one time, I squeeze once, 467 00:24:35,260 --> 00:24:37,550 what will I do but say quack again? 468 00:24:37,550 --> 00:24:38,540 Just like this. 469 00:24:38,540 --> 00:24:40,490 And I'll quack again, just like this. 470 00:24:40,490 --> 00:24:43,900 And now I'm at the end of my program, I stop entirely. 471 00:24:43,900 --> 00:24:49,810 But let's say I want to quack my duck or squeeze it more than three times. 472 00:24:49,810 --> 00:24:54,730 If I have only this to work with, what might quickly become my problem? 473 00:24:54,730 --> 00:24:59,260 If I want to do this five times or 10 times, or even more, well, 474 00:24:59,260 --> 00:25:01,788 I'd probably need to do a lot of copying and pasting. 475 00:25:01,788 --> 00:25:04,330 If I come back to my code here to show you what I need to do, 476 00:25:04,330 --> 00:25:09,220 if I want to simulate squeezing this duck not just three times but 5 or 6 477 00:25:09,220 --> 00:25:13,780 or 10 or more, well, I need to copy line three, put it on line four, 478 00:25:13,780 --> 00:25:17,350 copy line four, put it on line five, and so on and so forth 479 00:25:17,350 --> 00:25:20,820 to repeat this code some number of times. 480 00:25:20,820 --> 00:25:23,353 Now, thankfully, we don't have to do this in R. 481 00:25:23,353 --> 00:25:26,520 And, in fact, as programmers, you should be looking out for cases like these 482 00:25:26,520 --> 00:25:30,640 and thinking, I could probably use a loop instead. 483 00:25:30,640 --> 00:25:34,740 So let's see what kind of loops R offers us, what kind of keywords 484 00:25:34,740 --> 00:25:39,390 we could use to make a loop and to repeat this code some number of times. 485 00:25:39,390 --> 00:25:44,070 Well, one of the first loops we have at our disposal is one called repeat. 486 00:25:44,070 --> 00:25:46,590 Repeat allows us to repeat whatever code is 487 00:25:46,590 --> 00:25:50,640 inside of its curly braces infinitely, however many times we want to. 488 00:25:50,640 --> 00:25:55,050 So let me go ahead and go into my RStudio again, and I'll type repeat now, 489 00:25:55,050 --> 00:25:58,590 this keyword, and I will then inside of those curly braces 490 00:25:58,590 --> 00:26:02,490 put this function, cat, which will print to the screen quack. 491 00:26:02,490 --> 00:26:07,260 And before I run this code, let's visualize its flowchart 492 00:26:07,260 --> 00:26:09,210 to see how it might work. 493 00:26:09,210 --> 00:26:12,750 Well, here on my screen I have this program. 494 00:26:12,750 --> 00:26:15,540 I'm going to start that program, and then 495 00:26:15,540 --> 00:26:19,360 I'm going to quack or squeeze my duck, and then I'm going to follow the arrow 496 00:26:19,360 --> 00:26:21,560 and quack, just like this. 497 00:26:21,560 --> 00:26:24,378 And then I'm going to follow the arrow and quack just like this, 498 00:26:24,378 --> 00:26:26,170 and I'm going to go follow the arrow again. 499 00:26:26,170 --> 00:26:32,560 And I worry I'd be stuck here for a very long time, because it seems 500 00:26:32,560 --> 00:26:36,280 like our next step is always to go back and to quack and to quack and to quack 501 00:26:36,280 --> 00:26:36,880 again. 502 00:26:36,880 --> 00:26:41,860 So before we dive into fixing this, let's talk about a bit of vocabulary. 503 00:26:41,860 --> 00:26:44,860 Now, what we've created here is, in fact, a loop. 504 00:26:44,860 --> 00:26:48,190 We're repeating this code over and over and over again. 505 00:26:48,190 --> 00:26:53,830 Now, each time we repeat this set of code, we're calling that one iteration. 506 00:26:53,830 --> 00:26:57,100 So, in other words, when I loop again and again and again 507 00:26:57,100 --> 00:26:59,860 I'm iterating again and again and again. 508 00:26:59,860 --> 00:27:04,030 One iteration means one segment of my code, top to bottom, 509 00:27:04,030 --> 00:27:05,620 inside of that loop. 510 00:27:05,620 --> 00:27:08,470 And I think what we've created here is something called 511 00:27:08,470 --> 00:27:11,050 an infinite loop, one that will never, ever end, 512 00:27:11,050 --> 00:27:14,620 because there's no condition telling us when to stop looping. 513 00:27:14,620 --> 00:27:17,980 So we'll need to figure out how to break out of this kind of loop 514 00:27:17,980 --> 00:27:20,230 and figure out what we could do to get out of it. 515 00:27:20,230 --> 00:27:22,780 Now, thankfully, R does offer us some keywords 516 00:27:22,780 --> 00:27:26,110 to do just that, so let's explore them now in R. 517 00:27:26,110 --> 00:27:27,580 I come back to RStudio. 518 00:27:27,580 --> 00:27:32,470 We'll want to introduce these two keywords here, break and next. 519 00:27:32,470 --> 00:27:35,650 Break symbolizes breaking out of some loop. 520 00:27:35,650 --> 00:27:39,610 When R encounters that break keyword, it will end that loop entirely, 521 00:27:39,610 --> 00:27:42,100 will stop wherever it is and end that loop. 522 00:27:42,100 --> 00:27:46,420 Next, on the other hand, says wherever you are in this iteration, 523 00:27:46,420 --> 00:27:49,540 go ahead and start the next iteration from the top. 524 00:27:49,540 --> 00:27:55,630 So let's try these out now in R. If I come back to my program, duck.R, 525 00:27:55,630 --> 00:27:59,470 I don't want to repeat this quack over and over and over again. 526 00:27:59,470 --> 00:28:01,270 But let's say I just-- 527 00:28:01,270 --> 00:28:04,360 maybe accidentally, I click source now, and, well, 528 00:28:04,360 --> 00:28:07,300 my computer is just stuck saying quack, quack, quack, quack 529 00:28:07,300 --> 00:28:09,040 over and over infinitely forever. 530 00:28:09,040 --> 00:28:11,710 I could, if I wanted to, exit this program. 531 00:28:11,710 --> 00:28:16,660 If I type control C, that means stop this program whether we're in a loop 532 00:28:16,660 --> 00:28:17,740 or we're not. 533 00:28:17,740 --> 00:28:21,160 So that can save us, control C. But ideally, I 534 00:28:21,160 --> 00:28:24,640 should consider a stop condition before I go ahead 535 00:28:24,640 --> 00:28:27,370 and repeat something infinitely many times. 536 00:28:27,370 --> 00:28:30,820 Now, what could I do if I wanted to quack, let's say, three times? 537 00:28:30,820 --> 00:28:34,600 One thing I could do is think about counting, like, maybe on my fingers. 538 00:28:34,600 --> 00:28:37,450 If I want to quack three times, I could maybe start at three. 539 00:28:37,450 --> 00:28:42,310 And if I have three here, I could quack, I could go down to two and quack 540 00:28:42,310 --> 00:28:45,040 again, go down to one and quack again. 541 00:28:45,040 --> 00:28:47,050 And then finally, at 0, I'm done. 542 00:28:47,050 --> 00:28:49,120 I shouldn't squeeze my duck anymore. 543 00:28:49,120 --> 00:28:54,790 So one thing we could do is try to put this idea of counting now in code. 544 00:28:54,790 --> 00:28:58,360 Well, I could create an object to store the number of times 545 00:28:58,360 --> 00:29:00,460 I want to squeeze this duck. 546 00:29:00,460 --> 00:29:03,220 I could, by convention, call it i, and that 547 00:29:03,220 --> 00:29:06,560 will keep track of the number of times I want to iterate in this loop. 548 00:29:06,560 --> 00:29:10,930 So on line one, I'll say I'm going to assign this value i to be three. 549 00:29:10,930 --> 00:29:14,050 It's kind of similar to me holding up my hands and saying three fingers, 550 00:29:14,050 --> 00:29:15,010 for instance. 551 00:29:15,010 --> 00:29:19,930 And now, as I repeat this code, I don't want to repeat it infinitely. 552 00:29:19,930 --> 00:29:24,460 I want to have some condition under which I break out of this loop. 553 00:29:24,460 --> 00:29:27,070 And as we saw before with my fingers, maybe the condition 554 00:29:27,070 --> 00:29:33,040 is if i is equal to 0, well, at that point I want to break this loop. 555 00:29:33,040 --> 00:29:35,950 I want to exit it and not loop anymore. 556 00:29:35,950 --> 00:29:41,260 But I shouldn't run this code just yet, because while I've set i equal 557 00:29:41,260 --> 00:29:44,380 to 3, like I have my fingers here, what I haven't done 558 00:29:44,380 --> 00:29:47,590 is made a mechanism for actually dropping fingers, going from 3 to 2, 559 00:29:47,590 --> 00:29:49,420 from 2 to 1, from 1 to 0. 560 00:29:49,420 --> 00:29:54,280 So maybe after I quack, I'll go ahead and adjust the value of i. 561 00:29:54,280 --> 00:29:58,520 I'll set it equal to i minus 1, just like this. 562 00:29:58,520 --> 00:30:03,220 And then let's say-- maybe in the case that i is 0, eventually 563 00:30:03,220 --> 00:30:05,590 we're going to break out of the loop, but if it's not, 564 00:30:05,590 --> 00:30:08,650 why don't I go ahead and go to the next iteration? 565 00:30:08,650 --> 00:30:12,610 When we see the next keyword, we'll stop our current iteration and go to the top 566 00:30:12,610 --> 00:30:16,370 again to repeat our code, top to bottom, just like this. 567 00:30:16,370 --> 00:30:20,150 So the flowchart for this looks a bit more like this. 568 00:30:20,150 --> 00:30:22,750 I'm going to start my program, and I'm first 569 00:30:22,750 --> 00:30:26,830 going to set i equal to 3, kind of like I did on my fingers here. 570 00:30:26,830 --> 00:30:31,600 Then I'm going to squeeze the duck, going to quack, subtract one from i, 571 00:30:31,600 --> 00:30:35,530 and ask a question, is i equal to 0? 572 00:30:35,530 --> 00:30:38,830 If it's not, well, I'll go back up and I'll squeeze the duck again. 573 00:30:38,830 --> 00:30:41,350 I'll subtract 1, and ask that same question. 574 00:30:41,350 --> 00:30:45,130 And if I ever get to ask that question and the result is true, 575 00:30:45,130 --> 00:30:47,960 well, then I'll stop my program. 576 00:30:47,960 --> 00:30:51,190 So let's visualize this here using some interactive stuff. 577 00:30:51,190 --> 00:30:56,300 I have my iPad here that can count, let's say, from 3, just like this-- 578 00:30:56,300 --> 00:30:56,800 whoops-- 579 00:30:56,800 --> 00:30:59,390 3 down to 0, let's say. 580 00:30:59,390 --> 00:31:02,600 So, currently, when i is 3, what do I do? 581 00:31:02,600 --> 00:31:05,360 I squeeze my duck once, just like this. 582 00:31:05,360 --> 00:31:09,280 I'll then subtract one from i, where i is kind of this iPad here 583 00:31:09,280 --> 00:31:12,070 where I subtract 1 now, and now i is 2. 584 00:31:12,070 --> 00:31:15,700 Well, I'll go back up and I'll squeeze again, I'll subtract 1 from i. 585 00:31:15,700 --> 00:31:17,110 Now i is 1. 586 00:31:17,110 --> 00:31:20,300 I'll ask the question, is i equal to 0? 587 00:31:20,300 --> 00:31:22,220 It's not, so we'll go back up again. 588 00:31:22,220 --> 00:31:25,920 I'll squeeze it up one more time, and I'll subtract 1 again. 589 00:31:25,920 --> 00:31:29,510 And now i is equal to 0, so we'll stop our program. 590 00:31:29,510 --> 00:31:32,180 There will be no more squeezing of this duck. 591 00:31:32,180 --> 00:31:36,080 So this is one way to approach the problem of creating some loop 592 00:31:36,080 --> 00:31:39,320 and having it repeat a certain number of times. 593 00:31:39,320 --> 00:31:42,110 But R comes with other kinds of loops too. 594 00:31:42,110 --> 00:31:45,140 A repeat loop is great when you want to do something at least once. 595 00:31:45,140 --> 00:31:47,570 I want to squeeze this duck at least one time. 596 00:31:47,570 --> 00:31:50,750 But if I only want to do it if some condition is true 597 00:31:50,750 --> 00:31:54,570 or while some condition is true, I could use another kind of loop as well. 598 00:31:54,570 --> 00:31:57,020 This loop is called a while loop. 599 00:31:57,020 --> 00:32:03,060 A while loop lets us repeat some set of code while some condition is true. 600 00:32:03,060 --> 00:32:08,330 So let's see what that looks like now in R. I will remove what I currently have 601 00:32:08,330 --> 00:32:11,520 and, instead, implement this while loop. 602 00:32:11,520 --> 00:32:17,550 So if I want to make a while loop, I can use while, just like this, 603 00:32:17,550 --> 00:32:20,510 and I'll make a condition to repeat under. 604 00:32:20,510 --> 00:32:24,920 As long as this condition is true, I will repeat the code inside this 605 00:32:24,920 --> 00:32:26,360 while loops curly braces. 606 00:32:26,360 --> 00:32:30,200 So I could say, maybe, while i is not equal to 0, 607 00:32:30,200 --> 00:32:33,800 I want to repeat whatever code is inside of this while loop. 608 00:32:33,800 --> 00:32:37,820 Well, I want to quack, so I'll say quack just like before, 609 00:32:37,820 --> 00:32:40,730 backslash n, make a new line, and then why 610 00:32:40,730 --> 00:32:44,450 don't I go ahead and add back this kind of helper object I had called i? 611 00:32:44,450 --> 00:32:46,520 I is assigned the value 3. 612 00:32:46,520 --> 00:32:51,180 And after I quack, well, I'll subtract one from i just like this. 613 00:32:51,180 --> 00:32:54,020 I is now assigned the value i minus 1. 614 00:32:54,020 --> 00:32:58,100 So a bit shorter than our repeat loop, but I'd argue 615 00:32:58,100 --> 00:32:59,690 they do the same exact thing now. 616 00:32:59,690 --> 00:33:02,930 So let's visualize what's happening in this program. 617 00:33:02,930 --> 00:33:07,620 Well, the first thing we do is we set i equal to 3, just like this. 618 00:33:07,620 --> 00:33:11,480 And then we ask the question-- before we do anything else, we ask the question, 619 00:33:11,480 --> 00:33:14,300 is i not equal to 0? 620 00:33:14,300 --> 00:33:18,140 If it's not equal to 0, if that is true, we're going to squeeze our duck 621 00:33:18,140 --> 00:33:20,330 and subtract one from i. 622 00:33:20,330 --> 00:33:24,890 And then we'll ask the question again, is i not equal to 0? 623 00:33:24,890 --> 00:33:28,520 And if ever in our loop that question is false, that is, 624 00:33:28,520 --> 00:33:33,290 this condition is no longer true, we will stop, exit our loop entirely. 625 00:33:33,290 --> 00:33:35,540 So let's visualize this now. 626 00:33:35,540 --> 00:33:37,790 I was first set to 3. 627 00:33:37,790 --> 00:33:39,950 So I'll set i here to 3. 628 00:33:39,950 --> 00:33:43,490 And now the difference is, before I do anything, 629 00:33:43,490 --> 00:33:45,020 I'm going to check this condition. 630 00:33:45,020 --> 00:33:49,490 Now is i equal to 0, or is i not equal to 0? 631 00:33:49,490 --> 00:33:52,940 Well i is 3 so, yes, i is not equal to 0. 632 00:33:52,940 --> 00:33:57,650 I'll go ahead and quack, just like this, and I'll subtract one from i. 633 00:33:57,650 --> 00:34:00,980 Now I'll go back to the top of my loop and ask that question again. 634 00:34:00,980 --> 00:34:03,800 Is i not equal to 0? 635 00:34:03,800 --> 00:34:05,810 Well, i is 2, so it's not equal to 0. 636 00:34:05,810 --> 00:34:08,870 I'll go ahead and squeeze my duck, subtract 1 from i, 637 00:34:08,870 --> 00:34:11,000 and then ask the question again. 638 00:34:11,000 --> 00:34:13,190 Is i not equal to 0? 639 00:34:13,190 --> 00:34:14,600 Yes, it's not equal to 0. 640 00:34:14,600 --> 00:34:16,730 I'll squeeze, subtract 1 from i. 641 00:34:16,730 --> 00:34:21,199 But now, when I ask the question, is I not equal to 0, 642 00:34:21,199 --> 00:34:23,870 well, i is not not equal to zero. 643 00:34:23,870 --> 00:34:28,370 In fact, it is 0, so I will exit my loop altogether, squeezing my duck now 644 00:34:28,370 --> 00:34:30,080 three times in total. 645 00:34:30,080 --> 00:34:31,409 So let's visualize this. 646 00:34:31,409 --> 00:34:33,320 I'll come back to RStudio now. 647 00:34:33,320 --> 00:34:36,230 And if I run this code by clicking source, 648 00:34:36,230 --> 00:34:44,060 I will have quacked exactly three times, counting down from three, two, and one. 649 00:34:44,060 --> 00:34:50,630 OK, so just as we have counted down, we could also imagine counting up. 650 00:34:50,630 --> 00:34:58,760 Maybe I start i at 1 and my condition now is to loop so long as i is less than 651 00:34:58,760 --> 00:34:59,660 or equal to 3. 652 00:34:59,660 --> 00:35:04,220 Like I could imagine one, two, three times, but not four. 653 00:35:04,220 --> 00:35:10,340 So I'll say, while i is less than or equal to 3, I want to keep quacking. 654 00:35:10,340 --> 00:35:14,570 But now I need to actually increase i as I go. 655 00:35:14,570 --> 00:35:18,990 On each iteration of my loop, on each run from the code top to bottom, 656 00:35:18,990 --> 00:35:21,570 I want to increase i by 1. 657 00:35:21,570 --> 00:35:25,590 And now let's visualize what this is doing in terms of a flow chart. 658 00:35:25,590 --> 00:35:29,460 Here, very similar idea, but we're starting now from 1. 659 00:35:29,460 --> 00:35:34,740 I equals 1, and we'll ask this question, is i less than or equal to 3? 660 00:35:34,740 --> 00:35:38,550 If it is, we'll squeeze our duck and add 1 to i. 661 00:35:38,550 --> 00:35:41,520 If it's not, as we go back and approach that question again 662 00:35:41,520 --> 00:35:44,760 at the top of our loop, if it's ever not the case that i is less than 663 00:35:44,760 --> 00:35:48,180 or equal to 3, we'll stop, we won't loop anymore. 664 00:35:48,180 --> 00:35:53,550 So, again, let's visualize this, but now i is first set to 1. 665 00:35:53,550 --> 00:35:58,890 So before we do anything, we ask the question, is i less than or equal to 3? 666 00:35:58,890 --> 00:35:59,700 It is. 667 00:35:59,700 --> 00:36:04,320 We'll squeeze our duck, add 1 now to i, and ask the question again. 668 00:36:04,320 --> 00:36:06,870 Is i less than or equal to 3? 669 00:36:06,870 --> 00:36:07,740 It is. 670 00:36:07,740 --> 00:36:09,750 I'll squeeze, add 1 to i. 671 00:36:09,750 --> 00:36:11,070 Now it's 3. 672 00:36:11,070 --> 00:36:13,260 Is 3 less than or equal to 3? 673 00:36:13,260 --> 00:36:16,290 Well, it's equal to, so I'll go ahead and squeeze, add 1 to i. 674 00:36:16,290 --> 00:36:20,670 And now it's 4, and 4 is not less than or equal to 3, 675 00:36:20,670 --> 00:36:24,480 so we'll go ahead and stop our loop and not loop anymore. 676 00:36:24,480 --> 00:36:28,830 Come back now to RStudio and I will show you what this looks like now. 677 00:36:28,830 --> 00:36:32,760 If I click source, I'll see quack, quack, quack. 678 00:36:32,760 --> 00:36:36,160 Again, quacking three separate times. 679 00:36:36,160 --> 00:36:39,510 So we've seen now two kinds of loops in R, 680 00:36:39,510 --> 00:36:43,140 one called a repeat loop and one called a while loop. 681 00:36:43,140 --> 00:36:47,640 Let me ask, what questions do we have about these loops so far? 682 00:36:47,640 --> 00:36:51,870 AUDIENCE: How could we decide when to use repeat 683 00:36:51,870 --> 00:36:55,038 or when to use while, it doesn't matter? 684 00:36:55,038 --> 00:36:56,830 CARTER ZENKE: Yeah, a really good question. 685 00:36:56,830 --> 00:36:59,790 So in general, as we'll see, a repeat loop 686 00:36:59,790 --> 00:37:02,850 tends to be good when you want to do something at least once-- 687 00:37:02,850 --> 00:37:06,180 you want to quack at least once, you want to prompt the user at least 688 00:37:06,180 --> 00:37:10,050 once-- and then check if you should repeat or not. 689 00:37:10,050 --> 00:37:13,322 A while loop is good if, at the very beginning, before you do anything else, 690 00:37:13,322 --> 00:37:15,030 you want to check some condition, and you 691 00:37:15,030 --> 00:37:18,160 want to repeat that code while some condition is true. 692 00:37:18,160 --> 00:37:19,938 So you could think of a repeat being like, 693 00:37:19,938 --> 00:37:22,230 do this once, but then check if you should do it again, 694 00:37:22,230 --> 00:37:24,930 whereas a while loop is more like, if our condition is true, 695 00:37:24,930 --> 00:37:27,800 we should be repeating this code over and over again. 696 00:37:27,800 --> 00:37:29,770 Really good question there. 697 00:37:29,770 --> 00:37:31,080 All right, let's keep going. 698 00:37:31,080 --> 00:37:36,960 And one more loop we have available to us in R is one called a for loop. 699 00:37:36,960 --> 00:37:42,510 A for loop lets us do some piece of code for each element in some list 700 00:37:42,510 --> 00:37:44,580 or vector of elements. 701 00:37:44,580 --> 00:37:49,770 So instead of now using the while keyword, I could use the for keyword. 702 00:37:49,770 --> 00:37:51,000 I could say for-- 703 00:37:51,000 --> 00:37:55,110 and it turns out that inside of the parentheses of a for loop 704 00:37:55,110 --> 00:37:56,670 I need a few different components. 705 00:37:56,670 --> 00:38:00,840 I need, still, some kind of helper object to keep track of each iteration, 706 00:38:00,840 --> 00:38:05,220 but I also need some vector of elements to do some piece of code 707 00:38:05,220 --> 00:38:07,717 for each vector in that element-- 708 00:38:07,717 --> 00:38:09,300 or each element in that vector, sorry. 709 00:38:09,300 --> 00:38:14,760 So I'll say for i in, and then have a vector here, let's say, 1, 2, and 3. 710 00:38:14,760 --> 00:38:19,830 And now I have the same as before, my body of this loop, where inside of it 711 00:38:19,830 --> 00:38:22,840 I want to quack, just like this. 712 00:38:22,840 --> 00:38:27,480 Now, notice there's no need for me now to increment or decrement 713 00:38:27,480 --> 00:38:31,680 i, to add or subtract 1, because this is all taken care of thanks 714 00:38:31,680 --> 00:38:32,730 to our for loop. 715 00:38:32,730 --> 00:38:39,000 What the for loop will do is first set i equal to 1, and then it'll say quack. 716 00:38:39,000 --> 00:38:42,510 And then set i equal to 2, and then quack again. 717 00:38:42,510 --> 00:38:45,960 And then set i equal to 3, and quack again. 718 00:38:45,960 --> 00:38:48,990 But then, at the end of this vector, once there 719 00:38:48,990 --> 00:38:53,410 are no more elements to iterate over, well, our for loop is done. 720 00:38:53,410 --> 00:38:59,430 So i, as it iterates, kind of assumes the value of 1 and then 2 and then 3. 721 00:38:59,430 --> 00:39:02,610 And if I were to click source now, I would see quack, quack, 722 00:39:02,610 --> 00:39:05,760 and quack down here in my console. 723 00:39:05,760 --> 00:39:08,100 If I could simplify this just a little bit more, 724 00:39:08,100 --> 00:39:11,820 I could maybe make a vector that's going to be a little more dynamic than this. 725 00:39:11,820 --> 00:39:17,700 Like I could imagine myself typing in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to quack 726 00:39:17,700 --> 00:39:20,820 10 times now, if I were to click source here. 727 00:39:20,820 --> 00:39:24,870 But that's going to get really tedious if I want to quack more than, let's say, 728 00:39:24,870 --> 00:39:26,080 three or four times. 729 00:39:26,080 --> 00:39:28,680 So what I could do instead is use our syntax 730 00:39:28,680 --> 00:39:32,410 to give me some vector that is between certain numbers. 731 00:39:32,410 --> 00:39:35,160 So 1 colon 10, for instance, would say, give me 732 00:39:35,160 --> 00:39:38,070 a vector that includes 1 through 10 inclusive, which 733 00:39:38,070 --> 00:39:39,690 I can show you in my console here. 734 00:39:39,690 --> 00:39:43,410 1 colon 10 gives me one through 10 inclusive. 735 00:39:43,410 --> 00:39:47,190 With this, could I actually change how many times I loop? 736 00:39:47,190 --> 00:39:49,680 If I click source, I'll see 10 quacks. 737 00:39:49,680 --> 00:39:55,200 If I change this to 1 colon 3, well, now I'm able to see quack, quack, quack 738 00:39:55,200 --> 00:39:57,150 down here in my console. 739 00:39:57,150 --> 00:40:00,300 So a for loop is going to be the tool for you 740 00:40:00,300 --> 00:40:03,510 if you have some list, some vector elements to loop over, 741 00:40:03,510 --> 00:40:09,100 and then you want to do some piece of code for each of those elements there. 742 00:40:09,100 --> 00:40:13,920 OK, so now we've seen the three kinds of loops in R. We've seen repeat loops, 743 00:40:13,920 --> 00:40:16,920 we've seen while loops, and we've seen for loops. 744 00:40:16,920 --> 00:40:20,270 Our next step will be to apply these same loops to improve 745 00:40:20,270 --> 00:40:21,860 the design of our programs. 746 00:40:21,860 --> 00:40:24,230 We'll come back in five and do just that. 747 00:40:24,230 --> 00:40:25,670 See you all soon. 748 00:40:25,670 --> 00:40:26,930 Well, we're back. 749 00:40:26,930 --> 00:40:29,060 And, as promised, we're now going to explore 750 00:40:29,060 --> 00:40:33,690 how we could apply functions to make the design of our programs even better. 751 00:40:33,690 --> 00:40:39,860 So let's pick up where we last left off, writing this program called count.R. 752 00:40:39,860 --> 00:40:42,080 And we left off with this idea of wanting 753 00:40:42,080 --> 00:40:46,130 to reprompt the user any number of times until they 754 00:40:46,130 --> 00:40:50,360 comply with whatever kind of input we want, in this case, a number. 755 00:40:50,360 --> 00:40:54,500 So we saw just a little bit ago that a repeat loop is a great loop 756 00:40:54,500 --> 00:40:56,570 to use when you want to do something at least 757 00:40:56,570 --> 00:41:00,830 once and then check if you should do it again or break out of the loop. 758 00:41:00,830 --> 00:41:04,640 Now, in this case, I do want to prompt the user at least once. 759 00:41:04,640 --> 00:41:07,160 I want to tell them to input some number at least once. 760 00:41:07,160 --> 00:41:09,368 And if they don't comply, well, then I'll loop again, 761 00:41:09,368 --> 00:41:11,330 but at least I want to do it once. 762 00:41:11,330 --> 00:41:14,780 So I could use a repeat loop here, and I could use it 763 00:41:14,780 --> 00:41:17,430 in the same way we just saw, using the repeat keyword 764 00:41:17,430 --> 00:41:20,350 followed by some parentheses or some brackets just like this. 765 00:41:20,350 --> 00:41:24,480 And then inside those brackets-- inside of this loop's body-- 766 00:41:24,480 --> 00:41:28,330 I could be sure to prompt at least once, just like this. 767 00:41:28,330 --> 00:41:30,840 I will ask the user for some number of votes 768 00:41:30,840 --> 00:41:34,080 and store it in this object called votes. 769 00:41:34,080 --> 00:41:38,130 But, now, I don't want to run this code, because this will make an infinite loop. 770 00:41:38,130 --> 00:41:41,100 I'll be constantly asking the user to enter in some number of votes. 771 00:41:41,100 --> 00:41:45,270 So I need some condition under which I would break out of this loop. 772 00:41:45,270 --> 00:41:51,060 And I think that condition might be if the votes we receive is not NA-- 773 00:41:51,060 --> 00:41:55,800 if we get back some valid number of votes, that will be not equal to NA. 774 00:41:55,800 --> 00:42:00,120 If we do get some weird input like duck, though, that will be NA, in which case 775 00:42:00,120 --> 00:42:01,800 we should keep looping. 776 00:42:01,800 --> 00:42:04,650 So let me ask the question, is votes-- 777 00:42:04,650 --> 00:42:08,310 in this case, is it not equal to-- 778 00:42:08,310 --> 00:42:11,190 is votes not equal to NA? 779 00:42:11,190 --> 00:42:12,400 Just like this. 780 00:42:12,400 --> 00:42:15,690 And if it's not, well, we're going to break out of this loop. 781 00:42:15,690 --> 00:42:17,340 We're going to say, look, we're done. 782 00:42:17,340 --> 00:42:21,090 Votes is not Na, we don't need to ask the user any more. 783 00:42:21,090 --> 00:42:27,038 Alternatively, if votes is NA, we could continue on to the next iteration. 784 00:42:27,038 --> 00:42:28,830 Now, there's one improvement here, which is 785 00:42:28,830 --> 00:42:32,880 that technically, when we get to the bottom of this repeat loop, 786 00:42:32,880 --> 00:42:35,580 get to its last curly brace in the body here, 787 00:42:35,580 --> 00:42:38,550 it will automatically go back up to the top, 788 00:42:38,550 --> 00:42:41,010 do the next iteration from the top of its body. 789 00:42:41,010 --> 00:42:45,240 So I'd argue that this extra next here isn't really needed. 790 00:42:45,240 --> 00:42:48,390 We're going to go to the next iteration regardless if we don't break out 791 00:42:48,390 --> 00:42:50,280 of this loop altogether. 792 00:42:50,280 --> 00:42:53,270 And now, I think, we have a good loop going. 793 00:42:53,270 --> 00:42:55,020 I'm going to ask the user for their votes. 794 00:42:55,020 --> 00:42:59,070 If, in other words, this is a valid vote, a valid number, 795 00:42:59,070 --> 00:43:01,740 we're going to break out of the loop, not prompt them anymore. 796 00:43:01,740 --> 00:43:03,070 And what could we do? 797 00:43:03,070 --> 00:43:04,830 Well, now we don't need to check if votes 798 00:43:04,830 --> 00:43:08,760 is NA, because if we get down to line eight we know votes is not NA. 799 00:43:08,760 --> 00:43:13,000 I could simply return votes overall. 800 00:43:13,000 --> 00:43:15,090 So here, I think, is a better implementation, 801 00:43:15,090 --> 00:43:17,220 one that will prompt the user again and again 802 00:43:17,220 --> 00:43:19,740 until they enter some number of votes that we actually want. 803 00:43:19,740 --> 00:43:23,280 Let me go ahead and click source, and let me type 100 votes for Mario. 804 00:43:23,280 --> 00:43:27,420 But now maybe I'll type duck for Peach. 805 00:43:27,420 --> 00:43:28,620 And I'm re-prompted. 806 00:43:28,620 --> 00:43:31,230 OK, maybe I type quack for Peach. 807 00:43:31,230 --> 00:43:32,430 I'm re-prompted. 808 00:43:32,430 --> 00:43:33,600 Maybe I'll now comply. 809 00:43:33,600 --> 00:43:37,545 I'll say, OK, 150 votes for Peach, and now I move on to Bowser. 810 00:43:37,545 --> 00:43:38,920 So this seems to be working here. 811 00:43:38,920 --> 00:43:44,280 I'll go ahead and type 120, and now I'll see my total votes was 370. 812 00:43:44,280 --> 00:43:48,540 Now, one more improvement is that it seems to me a little extraneous 813 00:43:48,540 --> 00:43:54,300 to break out of this loop and then return, because a return actually 814 00:43:54,300 --> 00:43:57,600 signifies that, no matter where we are in our function, 815 00:43:57,600 --> 00:44:01,530 we're going to stop the function altogether and return the value we have. 816 00:44:01,530 --> 00:44:05,250 So it seems to me like I could move this return from line eight 817 00:44:05,250 --> 00:44:07,740 to inside of this if statement. 818 00:44:07,740 --> 00:44:12,420 And now if votes is not NA, if we have some valid number of votes, 819 00:44:12,420 --> 00:44:15,600 we'll not just break and then return, we'll go ahead and just simply return. 820 00:44:15,600 --> 00:44:18,780 Because a return would, by nature, break us out of the loop anyway. 821 00:44:18,780 --> 00:44:21,430 We're going to stop this function altogether. 822 00:44:21,430 --> 00:44:22,810 So let's try this again. 823 00:44:22,810 --> 00:44:24,870 I'll save my program, click source. 824 00:44:24,870 --> 00:44:28,860 I'll type 100 for Mario, 150 for Peach, 120 for Bowser. 825 00:44:28,860 --> 00:44:32,790 And now I think we're in good hands if we have good input. 826 00:44:32,790 --> 00:44:34,050 But if I type source-- 827 00:44:34,050 --> 00:44:39,180 let me go ahead and do duck for Mario, prompt it again, maybe quack for Mario, 828 00:44:39,180 --> 00:44:41,190 maybe 100 for Mario. 829 00:44:41,190 --> 00:44:45,820 Now I think we're doing well with invalid input as well. 830 00:44:45,820 --> 00:44:47,430 So pretty good. 831 00:44:47,430 --> 00:44:52,380 One other thing we could do, though, is think about these lines, 10 through 12. 832 00:44:52,380 --> 00:44:55,830 Well, it seems to me like, for each candidate that I have, 833 00:44:55,830 --> 00:44:58,380 I want to get some number of votes for them. 834 00:44:58,380 --> 00:45:00,600 Notice how I said "for each candidate." 835 00:45:00,600 --> 00:45:03,540 Well, if we want to do something for each candidate, 836 00:45:03,540 --> 00:45:07,170 for each item in some list or some vector, well, a for loop 837 00:45:07,170 --> 00:45:09,473 might be a great tool for us here. 838 00:45:09,473 --> 00:45:11,640 Why don't I go ahead and try to make a for loop now? 839 00:45:11,640 --> 00:45:16,380 I'll say for-- as we saw before, this helper object called i. 840 00:45:16,380 --> 00:45:23,180 For i in-- well, I want to prompt the user for every candidate that I have. 841 00:45:23,180 --> 00:45:27,860 And although we just saw for loops being used with numeric vectors-- vectors that 842 00:45:27,860 --> 00:45:31,040 include 1, 2, 3, 4, and so on-- we can also 843 00:45:31,040 --> 00:45:33,410 use for loops with non-numeric vectors. 844 00:45:33,410 --> 00:45:36,650 I could give a vector of the candidates that I have and know that a for loop 845 00:45:36,650 --> 00:45:39,650 will loop over each candidate in that vector. 846 00:45:39,650 --> 00:45:41,370 So I could do something like this. 847 00:45:41,370 --> 00:45:44,540 I could say, for i in a vector of my candidates-- 848 00:45:44,540 --> 00:45:48,200 Mario, Peach, and then Bowser-- 849 00:45:48,200 --> 00:45:51,800 and then I'll provide the body of this for loop. 850 00:45:51,800 --> 00:45:55,100 Now, what do I want to do in this loop? 851 00:45:55,100 --> 00:46:00,290 Well, each loop i will first be assigned some new element to my vector. 852 00:46:00,290 --> 00:46:03,530 First, it will be Mario, then Peach, then-- 853 00:46:03,530 --> 00:46:06,330 not Boswer-- Bowser, just like this. 854 00:46:06,330 --> 00:46:09,470 And then I want to, in this case, ask the user 855 00:46:09,470 --> 00:46:11,600 for some number of votes on each iteration, 856 00:46:11,600 --> 00:46:14,060 first for Mario, then for Peach, then for Bowser. 857 00:46:14,060 --> 00:46:18,100 So I could probably simply call get_votes just like this, 858 00:46:18,100 --> 00:46:22,120 and maybe store it in some object called votes, like that. 859 00:46:22,120 --> 00:46:26,610 But now the question is, how would I show the user the right prompt? 860 00:46:26,610 --> 00:46:31,100 Like I can't type in Mario here, because then Mario 861 00:46:31,100 --> 00:46:33,600 would show up with the prompt on every iteration of my loop. 862 00:46:33,600 --> 00:46:35,790 I need something more dynamic than that. 863 00:46:35,790 --> 00:46:39,120 One thing I could do is take advantage of how 864 00:46:39,120 --> 00:46:44,820 this object i is actually assigned the value of each element on each iteration. 865 00:46:44,820 --> 00:46:48,420 So on the first iteration, i will be equal to Mario. 866 00:46:48,420 --> 00:46:51,300 On the second iteration, i will be equal to Peach. 867 00:46:51,300 --> 00:46:54,150 On the third iteration, i will be equal to Bowser 868 00:46:54,150 --> 00:46:56,790 so I could use that to my advantage. 869 00:46:56,790 --> 00:46:59,070 I could take the candidate name, let's say, 870 00:46:59,070 --> 00:47:03,300 and maybe add in dynamically this colon space with, 871 00:47:03,300 --> 00:47:05,280 let's say, paste0, like we've seen before. 872 00:47:05,280 --> 00:47:06,660 I could say paste0-- 873 00:47:06,660 --> 00:47:11,700 I want to paste together the candidate's name followed by colon space. 874 00:47:11,700 --> 00:47:15,810 So now, on each iteration, i will first be equal to Mario. 875 00:47:15,810 --> 00:47:19,380 We'll get votes for Mario by prompting the user for Mario's votes. 876 00:47:19,380 --> 00:47:22,110 Then, on the next iteration, i will be Peach, 877 00:47:22,110 --> 00:47:25,320 will prompt the user for Peach's votes, followed by a colon space. 878 00:47:25,320 --> 00:47:27,540 And then the same thing for Bowser. 879 00:47:27,540 --> 00:47:32,070 So I think I could get rid of, let's say, this code down below here. 880 00:47:32,070 --> 00:47:35,130 But what am I left with? 881 00:47:35,130 --> 00:47:40,920 Well, it seems like on line 14 I was summing up Mario, Peach, and Bowser, 882 00:47:40,920 --> 00:47:44,490 but those objects don't exist for me anymore. 883 00:47:44,490 --> 00:47:47,220 I only have this one value now called votes, which 884 00:47:47,220 --> 00:47:50,310 seems to get changed every iteration. 885 00:47:50,310 --> 00:47:54,660 The first it will be Mario's votes, the next iteration it will be Peach's votes, 886 00:47:54,660 --> 00:47:57,240 the next it will be Bowser's votes. 887 00:47:57,240 --> 00:48:02,400 What ideas do we have for how to solve this problem? 888 00:48:02,400 --> 00:48:06,420 Any ideas for how we could maybe count up these votes 889 00:48:06,420 --> 00:48:09,510 while we go through our loop? 890 00:48:09,510 --> 00:48:14,490 AUDIENCE: First I think we should put it inside the for loop or return the sum. 891 00:48:14,490 --> 00:48:15,990 CARTER ZENKE: Yeah, so, a good idea. 892 00:48:15,990 --> 00:48:18,270 We still want to return their sum, and what 893 00:48:18,270 --> 00:48:21,180 you're thinking about trying to do this within the for loop. 894 00:48:21,180 --> 00:48:25,650 One thing that comes to mind is maybe trying to keep a running sum. 895 00:48:25,650 --> 00:48:29,550 That is, let's first get Mario's votes, add them to our total, 896 00:48:29,550 --> 00:48:33,750 then get Peach's votes, add those to our total, then get Bowser's votes, 897 00:48:33,750 --> 00:48:34,710 add those to our total. 898 00:48:34,710 --> 00:48:39,000 And at the end of our loop, we will have a total number of votes to count up. 899 00:48:39,000 --> 00:48:42,550 So let's see this in action in R. I'll come back to RStudio here. 900 00:48:42,550 --> 00:48:46,800 And if I can't have separate objects now for Mario, Peach, and Bowser, 901 00:48:46,800 --> 00:48:48,360 well, no problem. 902 00:48:48,360 --> 00:48:52,260 What I could do instead is start my count a little bit earlier. 903 00:48:52,260 --> 00:48:55,350 Maybe I'll set total initially equal to 0. 904 00:48:55,350 --> 00:48:58,740 So before I loop, I assume, well, there are 0 votes. 905 00:48:58,740 --> 00:49:02,310 But then, on each iteration of my loop, what will I do? 906 00:49:02,310 --> 00:49:05,760 I'll ask the user for some number of votes for the candidate, 907 00:49:05,760 --> 00:49:07,860 whether it's Mario, Peach, or Bowser. 908 00:49:07,860 --> 00:49:12,690 And then, down below, I'll add those votes to the total. 909 00:49:12,690 --> 00:49:19,080 I will update total to include the total plus the new votes we've received. 910 00:49:19,080 --> 00:49:21,708 So I think I could get rid of now line 16. 911 00:49:21,708 --> 00:49:24,250 And let's think through what this is doing line by line here. 912 00:49:24,250 --> 00:49:26,340 Well, first, total is 0. 913 00:49:26,340 --> 00:49:30,690 And if I go into my loop now, i will first be equal to Mario 914 00:49:30,690 --> 00:49:32,160 on this first iteration. 915 00:49:32,160 --> 00:49:34,350 So I'll prompt the user for Mario's votes 916 00:49:34,350 --> 00:49:36,330 and store them in this object called votes. 917 00:49:36,330 --> 00:49:37,740 Let's say it's 100. 918 00:49:37,740 --> 00:49:42,330 On line 13, I take this object called total and update it. 919 00:49:42,330 --> 00:49:43,890 I add Mario's votes to it. 920 00:49:43,890 --> 00:49:47,790 If Mario had 100 votes, total will be now 100. 921 00:49:47,790 --> 00:49:48,900 Then I'll move on. 922 00:49:48,900 --> 00:49:52,650 I will then become Peach, and I'll ask for Peach's votes now. 923 00:49:52,650 --> 00:49:56,310 Well, if Peach's votes is 150, on line 13 924 00:49:56,310 --> 00:49:59,580 I'll again say 100, which is the current value of total, 925 00:49:59,580 --> 00:50:03,302 plus 150, that's the new value of total, so 250. 926 00:50:03,302 --> 00:50:05,010 And you can see how we're kind of keeping 927 00:50:05,010 --> 00:50:08,310 a running track of our number of votes for each candidate. 928 00:50:08,310 --> 00:50:10,080 We'll do the same for Bowser, and I think 929 00:50:10,080 --> 00:50:14,460 at the end of this we will have a total number of votes for every candidate. 930 00:50:14,460 --> 00:50:16,700 So let me go ahead and click source now. 931 00:50:16,700 --> 00:50:19,910 And I'll see, if I type in 100 for Mario, 150 for Peach, 932 00:50:19,910 --> 00:50:26,070 and 120 for Bowser, well, we still now have our total, but now using this loop. 933 00:50:26,070 --> 00:50:30,080 So I'd argue we've made our program a little more efficient using these loops, 934 00:50:30,080 --> 00:50:33,810 and easier to read, easier to change as well. 935 00:50:33,810 --> 00:50:38,570 Now, what questions do we have about this program as we've written it? 936 00:50:38,570 --> 00:50:40,580 We've added in a few loops. 937 00:50:40,580 --> 00:50:43,370 We have a repeat loop and a for loop. 938 00:50:43,370 --> 00:50:47,060 What other questions do we have about this program? 939 00:50:47,060 --> 00:50:49,310 Seeing none so far, so let's keep going here. 940 00:50:49,310 --> 00:50:51,440 And one thing that we can do with these loops 941 00:50:51,440 --> 00:50:53,930 is think about how we could apply them to other problems. 942 00:50:53,930 --> 00:50:56,570 So one problem we saw a little bit earlier 943 00:50:56,570 --> 00:50:59,120 was this problem of working with a table of data 944 00:50:59,120 --> 00:51:00,953 that had our candidate's votes in it. 945 00:51:00,953 --> 00:51:02,870 So, if you recall, we had a table looked a bit 946 00:51:02,870 --> 00:51:05,180 like this, where for each candidate we had 947 00:51:05,180 --> 00:51:08,510 the number of votes they received at the polls, this physical location, 948 00:51:08,510 --> 00:51:11,300 and the number of votes they received in the mail. 949 00:51:11,300 --> 00:51:15,740 So it seems like Mario received 37 votes at the polls, the physical location, 950 00:51:15,740 --> 00:51:18,110 and 63 votes at the mail. 951 00:51:18,110 --> 00:51:21,920 But then our question was, well, how many votes did each candidate receive? 952 00:51:21,920 --> 00:51:25,670 That is, for each candidate, what was the sum of their votes? 953 00:51:25,670 --> 00:51:29,970 And then for each voting method, like poll or mail, well, 954 00:51:29,970 --> 00:51:33,330 how many votes did we receive overall in those columns too? 955 00:51:33,330 --> 00:51:37,370 So to visualize-- let me grab my clicker over here-- to visualize, 956 00:51:37,370 --> 00:51:40,130 let's say that we wanted to find Mario's total votes. 957 00:51:40,130 --> 00:51:43,198 Well, we would just sum up the row for Mario here. 958 00:51:43,198 --> 00:51:46,490 And then if we want to define the total number of votes we received at the poll 959 00:51:46,490 --> 00:51:50,700 or in the mail, we would sum up each value in these columns here. 960 00:51:50,700 --> 00:51:55,970 So notice how, again, we're saying for each candidate, or for each column. 961 00:51:55,970 --> 00:51:57,580 We want to sum up those votes. 962 00:51:57,580 --> 00:52:02,390 Well, we could probably use a for loop to accomplish this same task now. 963 00:52:02,390 --> 00:52:04,550 Let's go back to R and see how this could work. 964 00:52:04,550 --> 00:52:05,960 I'll come to RStudio. 965 00:52:05,960 --> 00:52:10,530 And let's make a new program, one that is called tabulate. 966 00:52:10,530 --> 00:52:11,780 Let me go ahead and actually-- 967 00:52:11,780 --> 00:52:13,072 I think I have it open already. 968 00:52:13,072 --> 00:52:16,550 I'll click on tabulate here, and I'll see a blank file called 969 00:52:16,550 --> 00:52:22,850 tabulate.R. Now, my goal is to read in this csv of votes that we have, 970 00:52:22,850 --> 00:52:25,070 one called votes.csv. 971 00:52:25,070 --> 00:52:30,290 So I'll use read.csv, and I'll try to open votes.csv. 972 00:52:30,290 --> 00:52:32,150 If you look in my File Explorer here, you'll 973 00:52:32,150 --> 00:52:35,600 see I do have a file called votes.csv. 974 00:52:35,600 --> 00:52:38,630 Now let me click source here to run this program, 975 00:52:38,630 --> 00:52:43,010 and I should now be able to view votes the data frame. 976 00:52:43,010 --> 00:52:47,630 So a similar thing to what we've seen earlier, but one thing is now different. 977 00:52:47,630 --> 00:52:52,700 Notice how in a prior lecture we saw that we had a column called candidates. 978 00:52:52,700 --> 00:52:56,840 Well, now what we've done is we've decided that the row names for this data 979 00:52:56,840 --> 00:52:58,610 frame are the candidates themselves. 980 00:52:58,610 --> 00:53:02,840 So Mario is the name of this first row, Peach is the name of the second, 981 00:53:02,840 --> 00:53:04,310 and Bowser is the third. 982 00:53:04,310 --> 00:53:08,180 This allows us to define our data frame as exclusively numbers. 983 00:53:08,180 --> 00:53:12,470 We could sum, like, 37, 63, 43, 107. 984 00:53:12,470 --> 00:53:16,130 And, moreover, it allows us to better subset our data frame, 985 00:53:16,130 --> 00:53:18,210 as we'll see in just a bit. 986 00:53:18,210 --> 00:53:21,260 So let's say my goal, at first, is to sum up 987 00:53:21,260 --> 00:53:25,460 the number of votes for each candidate across both the poll and the mail. 988 00:53:25,460 --> 00:53:29,900 Well, in tabulate.R, I could start by doing that by making a for loop, 989 00:53:29,900 --> 00:53:32,180 doing something for each candidate. 990 00:53:32,180 --> 00:53:39,440 So I could say, for candidate, let's say, in row names votes, 991 00:53:39,440 --> 00:53:43,310 just like this, and get a body for this loop. 992 00:53:43,310 --> 00:53:46,550 And what I've done here is I've decided that I 993 00:53:46,550 --> 00:53:49,370 no longer need to call this value i. 994 00:53:49,370 --> 00:53:50,390 Could call it candidate. 995 00:53:50,390 --> 00:53:52,140 I could call it really anything I want to, 996 00:53:52,140 --> 00:53:55,730 and I could use that inside of my loop here. 997 00:53:55,730 --> 00:53:59,630 The other thing I've decided is that instead of defining a list 998 00:53:59,630 --> 00:54:03,980 of the candidates-- in this case Mario, Peach, and Bowser-- 999 00:54:03,980 --> 00:54:06,410 I could be more dynamic than that. 1000 00:54:06,410 --> 00:54:09,470 I could decide to tell R that it should tell me 1001 00:54:09,470 --> 00:54:12,020 what my row names are, what my candidates are, 1002 00:54:12,020 --> 00:54:14,040 and allow it to iterate over those. 1003 00:54:14,040 --> 00:54:17,930 So I'll get ask for the row names of the votes data frame. 1004 00:54:17,930 --> 00:54:20,330 If I actually see them down in my console below, 1005 00:54:20,330 --> 00:54:24,290 I'll see that I get a vector of Mario, Peach, and Bowser. 1006 00:54:24,290 --> 00:54:28,010 So the same structure for our loop now, but different ways 1007 00:54:28,010 --> 00:54:32,210 of asking for a helper object to iterate with, and an actual vector 1008 00:54:32,210 --> 00:54:35,130 of, in this case, candidate names. 1009 00:54:35,130 --> 00:54:38,330 So now we have a loop to go over every candidate's name, 1010 00:54:38,330 --> 00:54:42,140 and our next goal is to find out how many votes each candidate received 1011 00:54:42,140 --> 00:54:44,600 across all of their columns here. 1012 00:54:44,600 --> 00:54:47,720 Now, the first thing to do might be to subset my data 1013 00:54:47,720 --> 00:54:52,640 frame, to figure out for each candidate which rows correspond to that candidate. 1014 00:54:52,640 --> 00:54:56,960 Now, we saw last time ways to subset data frames using the subset function. 1015 00:54:56,960 --> 00:55:01,070 But now that we actually have this row name being 1016 00:55:01,070 --> 00:55:05,570 equal to the candidate's name, we can make this even more efficient. 1017 00:55:05,570 --> 00:55:07,310 Let me visualize this for you here. 1018 00:55:07,310 --> 00:55:10,010 If we have our data frame called votes and I 1019 00:55:10,010 --> 00:55:12,590 want to find all of Mario's votes-- 1020 00:55:12,590 --> 00:55:16,690 well, if Mario is the row name for one of my rows in my data frame, 1021 00:55:16,690 --> 00:55:19,510 I could simply use the name Mario in the place 1022 00:55:19,510 --> 00:55:21,970 I would normally put the row's index. 1023 00:55:21,970 --> 00:55:23,590 For instance, like this. 1024 00:55:23,590 --> 00:55:27,430 If I say votes bracket Mario as the character string, 1025 00:55:27,430 --> 00:55:30,190 because Mario is the name of one of my row names, 1026 00:55:30,190 --> 00:55:33,760 I'll then get back the row corresponding to the name Mario. 1027 00:55:33,760 --> 00:55:36,790 And same thing with Peach, and same thing with Bowser. 1028 00:55:36,790 --> 00:55:39,310 So we're very quickly now subsetting our data 1029 00:55:39,310 --> 00:55:43,870 to find each candidate's rows and their number of votes across all the columns 1030 00:55:43,870 --> 00:55:44,750 here. 1031 00:55:44,750 --> 00:55:46,300 Let's come back and try this out. 1032 00:55:46,300 --> 00:55:48,910 I will show you in the console that, indeed 1033 00:55:48,910 --> 00:55:54,190 if I do type votes bracket Mario and then comma space 1034 00:55:54,190 --> 00:55:59,140 to say I want all columns, but only the row associated with the name Mario, 1035 00:55:59,140 --> 00:56:01,840 well, I'll get back a single row from this data 1036 00:56:01,840 --> 00:56:04,900 frame that includes Mario's votes. 1037 00:56:04,900 --> 00:56:10,180 So if I can do this at least in my console now with particular names, 1038 00:56:10,180 --> 00:56:14,680 I bet I could do it in my for loop where candidate will stand in 1039 00:56:14,680 --> 00:56:16,430 for any given candidate's name. 1040 00:56:16,430 --> 00:56:18,800 First it will be Mario, then it will be Peach, 1041 00:56:18,800 --> 00:56:21,800 then it will be Bowser on each successive iteration. 1042 00:56:21,800 --> 00:56:26,600 So to subset this data, I could use votes, the data frame's name, 1043 00:56:26,600 --> 00:56:30,980 followed by brackets, followed by the row name, in this case candidate 1044 00:56:30,980 --> 00:56:34,610 on each iteration, updating, and then comma space, 1045 00:56:34,610 --> 00:56:38,270 saying I want all columns for whatever row 1046 00:56:38,270 --> 00:56:43,710 corresponds to this candidate's name, on whatever iteration is that we're on. 1047 00:56:43,710 --> 00:56:46,010 So now that I have this working for me, I 1048 00:56:46,010 --> 00:56:48,830 could probably put this into the function sum 1049 00:56:48,830 --> 00:56:51,950 to get back the total number of votes across all the columns. 1050 00:56:51,950 --> 00:56:55,040 If you give sum a data frame of one row, it 1051 00:56:55,040 --> 00:56:58,670 will sum up all the values in that given row. 1052 00:56:58,670 --> 00:57:04,220 OK, so now I seem to have the sum for each candidate's votes, 1053 00:57:04,220 --> 00:57:07,650 but I still need some place to store it to look at it later. 1054 00:57:07,650 --> 00:57:12,410 So one thing I could do is make this object called total votes, just 1055 00:57:12,410 --> 00:57:13,400 like this. 1056 00:57:13,400 --> 00:57:16,960 But what's the problem now? 1057 00:57:16,960 --> 00:57:22,810 If I were to run this code top to bottom, what might I lose? 1058 00:57:22,810 --> 00:57:26,470 And a question here is, what might the value of total votes 1059 00:57:26,470 --> 00:57:30,370 be if I were to look at it at the very end of my loop? 1060 00:57:30,370 --> 00:57:32,900 Any ideas here? 1061 00:57:32,900 --> 00:57:35,600 Why can't I just leave my code like this, 1062 00:57:35,600 --> 00:57:39,410 and what might be the last value of total votes, do you think? 1063 00:57:39,410 --> 00:57:43,510 AUDIENCE: OK, the last value will be the sum of votes 1064 00:57:43,510 --> 00:57:47,185 of the last candidate, the last one. 1065 00:57:47,185 --> 00:57:49,810 CARTER ZENKE: Yeah, the last candidate that we have in our list 1066 00:57:49,810 --> 00:57:52,030 would be the final value of total votes. 1067 00:57:52,030 --> 00:57:53,620 So let's actually test this out. 1068 00:57:53,620 --> 00:57:56,380 If I go back to my RStudio here-- 1069 00:57:56,380 --> 00:58:00,670 and why don't I run this code by clicking source? 1070 00:58:00,670 --> 00:58:04,730 And now let me check on the value of total votes, just like this. 1071 00:58:04,730 --> 00:58:07,510 Well, total votes seems to be 120. 1072 00:58:07,510 --> 00:58:08,800 Who had 120 votes? 1073 00:58:08,800 --> 00:58:11,020 Seems like it was Bowser. 1074 00:58:11,020 --> 00:58:14,530 But why is total votes equal to Bowser's total votes? 1075 00:58:14,530 --> 00:58:17,960 Well, let's think about this going top to bottom through our loop. 1076 00:58:17,960 --> 00:58:22,580 First, candidate is equal to Mario, and we'll subset our data frame 1077 00:58:22,580 --> 00:58:23,870 to find Mario's votes. 1078 00:58:23,870 --> 00:58:27,890 We'll sum those up across all the columns and store it now in total votes. 1079 00:58:27,890 --> 00:58:30,110 But then we'll go on to the next iteration. 1080 00:58:30,110 --> 00:58:34,460 Candidates will next be Peach, and we'll subset our data frame to find 1081 00:58:34,460 --> 00:58:36,830 Peach's votes, sum them across all the columns, 1082 00:58:36,830 --> 00:58:41,660 and effectively overwrite Mario's votes with Peach's. 1083 00:58:41,660 --> 00:58:44,090 So now total votes is Peach's total votes. 1084 00:58:44,090 --> 00:58:46,190 But then when Bowser comes along, well, Bowser 1085 00:58:46,190 --> 00:58:47,643 will also overwrite Peach's votes. 1086 00:58:47,643 --> 00:58:49,310 At the end of our loop, what do we have? 1087 00:58:49,310 --> 00:58:54,090 Well, only one candidate's votes, and not all of them. 1088 00:58:54,090 --> 00:58:56,420 So it seems like we need some way of making 1089 00:58:56,420 --> 00:59:01,280 a vector of these actual total votes, and we could 1090 00:59:01,280 --> 00:59:03,980 do that using some new syntax in R. 1091 00:59:03,980 --> 00:59:07,850 One thing I could do is initially make an empty vector, 1092 00:59:07,850 --> 00:59:11,960 just like this, total votes, and set it equal to this, 1093 00:59:11,960 --> 00:59:13,910 C followed by some parentheses. 1094 00:59:13,910 --> 00:59:17,480 This is that same C function we saw earlier, but it means the empty vector. 1095 00:59:17,480 --> 00:59:19,460 Nothing, at least at first. 1096 00:59:19,460 --> 00:59:24,800 And then in my loop I bet we could add to this vector so we get back at the end 1097 00:59:24,800 --> 00:59:29,300 not any single candidate's votes, but a whole vector of their votes. 1098 00:59:29,300 --> 00:59:31,070 Now, we'll need some new syntax for this, 1099 00:59:31,070 --> 00:59:33,830 and some new feature we haven't seen yet in R. 1100 00:59:33,830 --> 00:59:37,050 But let's visualize what we could do with that syntax. 1101 00:59:37,050 --> 00:59:40,280 So here is a visualization of the empty vector total votes. 1102 00:59:40,280 --> 00:59:43,020 There's nothing here, because this is an empty vector right now. 1103 00:59:43,020 --> 00:59:46,400 But if I wanted to add some new element to it-- 1104 00:59:46,400 --> 00:59:49,610 and not just add the element but give it some name too-- 1105 00:59:49,610 --> 00:59:50,960 I could certainly do that. 1106 00:59:50,960 --> 00:59:55,580 I could say total votes, bracket, and the name I want to give this element, 1107 00:59:55,580 --> 00:59:58,880 and then assign the value for that element. 1108 00:59:58,880 --> 01:00:02,660 So if I want to add to this vector total votes an element named 1109 01:00:02,660 --> 01:00:08,190 Mario that has the value 100, well, I could do it using this syntax here. 1110 01:00:08,190 --> 01:00:10,850 Well, what if I want to later add Peach's votes? 1111 01:00:10,850 --> 01:00:14,300 Let's imagine this is the next iteration of our for loop. 1112 01:00:14,300 --> 01:00:16,645 I would say, total votes, bracket, Peach. 1113 01:00:16,645 --> 01:00:18,770 And that would then make a new element to my vector 1114 01:00:18,770 --> 01:00:21,410 called Peach with the value 150. 1115 01:00:21,410 --> 01:00:23,750 And same, let's say, for Bowser. 1116 01:00:23,750 --> 01:00:26,390 On my next iteration, I will add Bowser's votes. 1117 01:00:26,390 --> 01:00:28,820 And now, at the end of my loop, let's say, 1118 01:00:28,820 --> 01:00:33,920 I now have a vector of Mario, Peach, and Bowser's votes all together now. 1119 01:00:33,920 --> 01:00:35,280 So let's try it. 1120 01:00:35,280 --> 01:00:38,060 I'll come back to RStudio here, and let's try 1121 01:00:38,060 --> 01:00:42,470 using this process of adding named elements to our vectors. 1122 01:00:42,470 --> 01:00:47,540 Well, on line five I might say that I want to add to total votes 1123 01:00:47,540 --> 01:00:52,790 a new element whose name is, well, whatever the candidate's name is 1124 01:00:52,790 --> 01:00:53,720 on any iteration. 1125 01:00:53,720 --> 01:00:56,810 So I'll say, total votes, bracket, candidate-- 1126 01:00:56,810 --> 01:01:00,680 meaning that whatever the candidate's name is on this iteration, 1127 01:01:00,680 --> 01:01:03,770 I want to add a new element with that name-- 1128 01:01:03,770 --> 01:01:07,730 and I'll give it the value of the sum of their votes. 1129 01:01:07,730 --> 01:01:13,160 So if I click source, and now I go ahead and inspect the value of total votes 1130 01:01:13,160 --> 01:01:17,640 by typing in my console and hitting enter, I'll see a much better output. 1131 01:01:17,640 --> 01:01:20,970 I actually see each candidate's name in my vector, 1132 01:01:20,970 --> 01:01:26,970 and I see the value now that they were assigned, 100 for Mario, 150 for Peach, 1133 01:01:26,970 --> 01:01:29,910 and 120 for Bowser. 1134 01:01:29,910 --> 01:01:34,440 So what questions do we have about this program as it exists now? 1135 01:01:34,440 --> 01:01:40,530 AUDIENCE: Could we, instead of adding the total votes in a named vector, 1136 01:01:40,530 --> 01:01:45,513 could we add a new column to our votes data frame? 1137 01:01:45,513 --> 01:01:46,930 CARTER ZENKE: We absolutely could. 1138 01:01:46,930 --> 01:01:52,620 So we could decide instead to make a new vector and add that as a column 1139 01:01:52,620 --> 01:01:54,060 to our data frame. 1140 01:01:54,060 --> 01:01:56,310 It just depends on what kind of output you want to do. 1141 01:01:56,310 --> 01:01:58,770 So, here, we wanted this output of a named vector, 1142 01:01:58,770 --> 01:02:03,540 but we could change this to, let's say, not supply a name to each element, 1143 01:02:03,540 --> 01:02:06,545 and instead just add some new element after element, much like this. 1144 01:02:06,545 --> 01:02:08,170 Let me show you that real briefly here. 1145 01:02:08,170 --> 01:02:09,540 I'll come back to RStudio. 1146 01:02:09,540 --> 01:02:12,930 And let's say I don't want to give it some name, 1147 01:02:12,930 --> 01:02:15,900 I just want to kind of add in successive elements here. 1148 01:02:15,900 --> 01:02:21,090 I could say total votes becomes the combination 1149 01:02:21,090 --> 01:02:27,160 of the current state of total votes, adding in this new element here. 1150 01:02:27,160 --> 01:02:30,030 So a little bit tricky to parse, but let's see what happens here. 1151 01:02:30,030 --> 01:02:34,260 I'll click source, and I'll show you the value of total votes 1152 01:02:34,260 --> 01:02:35,920 now, just like this. 1153 01:02:35,920 --> 01:02:37,140 And what do I get back? 1154 01:02:37,140 --> 01:02:40,957 Well, a total votes vector that is Mario's votes, then Peach's votes, 1155 01:02:40,957 --> 01:02:41,790 then Bowser's votes. 1156 01:02:41,790 --> 01:02:45,390 And I could, if I wanted to, add this as a column in my data frame. 1157 01:02:45,390 --> 01:02:49,350 I could say votes total, let's say, and then I could say, 1158 01:02:49,350 --> 01:02:53,260 make that the value of this vector here. 1159 01:02:53,260 --> 01:02:55,770 So now if I run source-- 1160 01:02:55,770 --> 01:03:00,870 so I click source, I should see that I have this new column called total. 1161 01:03:00,870 --> 01:03:04,770 And effectively what I've done here is-- if I remove this part first-- 1162 01:03:04,770 --> 01:03:09,390 is I've decided to start with this empty vector and then, on each iteration, 1163 01:03:09,390 --> 01:03:12,210 I want to take whatever is in that current vector 1164 01:03:12,210 --> 01:03:14,880 and simply append or add on some new element, which 1165 01:03:14,880 --> 01:03:17,460 will be the sum of the current candidate's votes. 1166 01:03:17,460 --> 01:03:20,820 So, on the first iteration, we'll add our very first element 1167 01:03:20,820 --> 01:03:22,320 to this empty vector here. 1168 01:03:22,320 --> 01:03:25,500 But then on the next iteration, we'll have a vector of one element 1169 01:03:25,500 --> 01:03:28,140 and we'll add in one more element, and on the third, 1170 01:03:28,140 --> 01:03:30,730 add the third, and the fourth, add the fourth, and so on. 1171 01:03:30,730 --> 01:03:35,740 So a good way to add vectors together using C as well. 1172 01:03:35,740 --> 01:03:37,590 I hope that helps. 1173 01:03:37,590 --> 01:03:40,530 OK, so let's go back to what we had before here. 1174 01:03:40,530 --> 01:03:45,450 Let me do command Z a few times to go back to our named vector, 1175 01:03:45,450 --> 01:03:47,430 and let's see what else we could do. 1176 01:03:47,430 --> 01:03:51,270 So if I click source, I'll see total votes again, 1177 01:03:51,270 --> 01:03:52,740 exactly as you want it to be. 1178 01:03:52,740 --> 01:03:56,250 But what if I wanted to sum up the columns too to figure out, 1179 01:03:56,250 --> 01:03:58,830 for each voting method, how many votes did we receive? 1180 01:03:58,830 --> 01:04:03,610 That is, for each poll and mail column, how many votes were there in those? 1181 01:04:03,610 --> 01:04:05,640 Well, I could really just change my for loop. 1182 01:04:05,640 --> 01:04:09,970 Instead of using row names, I could iterate over column names. 1183 01:04:09,970 --> 01:04:12,810 So column names will tell me-- if I click on the console-- 1184 01:04:12,810 --> 01:04:18,000 column names will tell me, what columns do I have inside this data frame? 1185 01:04:18,000 --> 01:04:20,700 And I could then change my loop appropriately. 1186 01:04:20,700 --> 01:04:25,110 Instead of calling each column candidate on each iteration, 1187 01:04:25,110 --> 01:04:27,450 I could call it maybe, like, voting method 1188 01:04:27,450 --> 01:04:30,360 for the polling or the mail-in ballots, and then I 1189 01:04:30,360 --> 01:04:32,620 could change how I subset my data frame. 1190 01:04:32,620 --> 01:04:37,600 Instead of subsetting by row, I could subset now by column, like this. 1191 01:04:37,600 --> 01:04:41,640 And I could then update the name of each of these elements 1192 01:04:41,640 --> 01:04:44,910 to instead be the same name as the method we're counting. 1193 01:04:44,910 --> 01:04:48,720 So pretty much the same idea, same flow, but just a different 1194 01:04:48,720 --> 01:04:50,610 process across columns now. 1195 01:04:50,610 --> 01:04:53,730 If I click source, I'll see in total votes 1196 01:04:53,730 --> 01:04:58,470 that I've now counted up the total number of votes for each column. 1197 01:04:58,470 --> 01:05:03,210 OK, so it turns out that this method of doing things in R, 1198 01:05:03,210 --> 01:05:07,380 this kind of analysis-- applying some function for every row and for every 1199 01:05:07,380 --> 01:05:07,890 column-- 1200 01:05:07,890 --> 01:05:11,640 is so common in R that we actually have some family of functions 1201 01:05:11,640 --> 01:05:14,670 we can use to do that same analysis. 1202 01:05:14,670 --> 01:05:18,780 And as we move from this world of writing procedures-- that is, 1203 01:05:18,780 --> 01:05:21,180 specifying a loop like this and specifying everything 1204 01:05:21,180 --> 01:05:25,020 we should do inside that loop-- to relying more on functions, 1205 01:05:25,020 --> 01:05:28,920 we'll enter into this world called functional programming, where 1206 01:05:28,920 --> 01:05:31,500 in functional programming we can actually use functions 1207 01:05:31,500 --> 01:05:34,650 to do the work of iteration for us. 1208 01:05:34,650 --> 01:05:38,070 Now, one common hallmark of functional programming 1209 01:05:38,070 --> 01:05:42,670 is applying some function across these individual rows and individual columns. 1210 01:05:42,670 --> 01:05:45,660 So R gives us this function called apply. 1211 01:05:45,660 --> 01:05:49,770 And if I wanted to have the same result we just saw with our for loop but now 1212 01:05:49,770 --> 01:05:53,520 using this function, I could use the following syntax. 1213 01:05:53,520 --> 01:05:57,600 I could use this function called apply and give it three arguments. 1214 01:05:57,600 --> 01:06:00,430 The first one is the data frame to work with. 1215 01:06:00,430 --> 01:06:02,800 In this case, votes, as we see here. 1216 01:06:02,800 --> 01:06:05,380 The next one is one called MARGIN. 1217 01:06:05,380 --> 01:06:07,300 And MARGIN stands for-- 1218 01:06:07,300 --> 01:06:13,270 if I want to apply this function across all of the rows or all of the columns 1219 01:06:13,270 --> 01:06:17,140 here-- when MARGIN is 1, that means apply some function 1220 01:06:17,140 --> 01:06:18,820 across all of the rows. 1221 01:06:18,820 --> 01:06:23,980 When MARGIN is 2, that means apply this function across all of the columns here. 1222 01:06:23,980 --> 01:06:28,808 And then, finally, the third argument to apply is a function itself. 1223 01:06:28,808 --> 01:06:30,850 And this is a hallmark of functional programming. 1224 01:06:30,850 --> 01:06:34,540 We can pass functions as input to other functions. 1225 01:06:34,540 --> 01:06:37,180 In this case, we're telling apply, here is the function 1226 01:06:37,180 --> 01:06:41,500 I want you to use to basically work across all of these rows 1227 01:06:41,500 --> 01:06:43,670 and all of these columns here. 1228 01:06:43,670 --> 01:06:46,750 So when MARGIN is equal to 1, what will happen? 1229 01:06:46,750 --> 01:06:49,990 When I apply the function sum, well, for every row 1230 01:06:49,990 --> 01:06:54,280 I will get back a sum of every element in that row. 1231 01:06:54,280 --> 01:06:57,520 When MARGIN, though, is 2, what will I get? 1232 01:06:57,520 --> 01:07:01,750 I'll get back the sum of every element inside each of these columns here, 1233 01:07:01,750 --> 01:07:05,300 storing it, let's say, at the bottom of our data frame here. 1234 01:07:05,300 --> 01:07:07,390 So let's see this in action. 1235 01:07:07,390 --> 01:07:10,990 I'll come back to RStudio here, and let's try to use these apply 1236 01:07:10,990 --> 01:07:14,890 functions instead of doing things more procedurally, typing a loop 1237 01:07:14,890 --> 01:07:17,260 and then everything we want to do inside of that loop. 1238 01:07:17,260 --> 01:07:20,260 I argue I could actually write all this code in terms 1239 01:07:20,260 --> 01:07:23,110 of a single function call using apply. 1240 01:07:23,110 --> 01:07:29,830 Well, I want to apply this function on a given data frame, votes, as we just saw. 1241 01:07:29,830 --> 01:07:33,940 I want to apply it across all of my rows, that is, for every candidate 1242 01:07:33,940 --> 01:07:36,070 that I have in my data frame. 1243 01:07:36,070 --> 01:07:39,700 And the function I want to apply is the sum function. 1244 01:07:39,700 --> 01:07:43,180 I want to take all of these rows and, for each row, 1245 01:07:43,180 --> 01:07:46,000 I want to sum up all of those values. 1246 01:07:46,000 --> 01:07:51,040 Now, if I run this line of code on line two, what will I get? 1247 01:07:51,040 --> 01:07:54,700 The same exact result. I'll get, now, a named vector-- 1248 01:07:54,700 --> 01:07:59,050 Mario, Peach, and Bowser, these three elements here, with Mario being 100, 1249 01:07:59,050 --> 01:08:02,110 Peach being 150, and Bowser being 120. 1250 01:08:02,110 --> 01:08:06,250 Notice how, if I go back to my data frame, this is the same thing we had, 1251 01:08:06,250 --> 01:08:10,090 where for every row I've now found the sum, and apply 1252 01:08:10,090 --> 01:08:13,120 has returned to me the name of that row and the result 1253 01:08:13,120 --> 01:08:16,660 of summing all the values in that row. 1254 01:08:16,660 --> 01:08:17,979 Let's think about this too. 1255 01:08:17,979 --> 01:08:20,350 What if I changed MARGIN to 2? 1256 01:08:20,350 --> 01:08:25,090 Well, this would find me the sum of every individual column, all the values 1257 01:08:25,090 --> 01:08:26,890 within those individual columns. 1258 01:08:26,890 --> 01:08:29,890 Whereas before we had to change row names to column names 1259 01:08:29,890 --> 01:08:33,970 and change various other objects, now I can simply change 1 to 2 1260 01:08:33,970 --> 01:08:35,920 to work on columns here. 1261 01:08:35,920 --> 01:08:40,149 Let me click source, and now I'll see, if I go ahead and hit line two, 1262 01:08:40,149 --> 01:08:44,770 command enter, now I get back that same result, the names of each of my columns 1263 01:08:44,770 --> 01:08:49,660 and the result of summing up all of their values here. 1264 01:08:49,660 --> 01:08:54,430 OK, so we've seen a much better way now to approach this same problem. 1265 01:08:54,430 --> 01:08:57,732 Instead of doing things procedurally-- writing a loop and saying exactly what 1266 01:08:57,732 --> 01:08:59,649 should happen in each iteration of that loop-- 1267 01:08:59,649 --> 01:09:03,130 I can rely on a function like apply do a lot of that work for me. 1268 01:09:03,130 --> 01:09:06,040 And, moreover, I can pass a function as input 1269 01:09:06,040 --> 01:09:12,250 to apply for it to use on each iteration that it goes through in my data frame. 1270 01:09:12,250 --> 01:09:15,670 Now, let me ask here, having seen these apply functions 1271 01:09:15,670 --> 01:09:18,910 and how they work, what questions do we have about them? 1272 01:09:18,910 --> 01:09:23,470 AUDIENCE: My question, instead of using the procedural approach of, like, 1273 01:09:23,470 --> 01:09:28,330 sorting and un-sorting, are there any existing functions 1274 01:09:28,330 --> 01:09:32,950 that I can use it on the data frame for sorting the datas in the rows 1275 01:09:32,950 --> 01:09:33,479 or columns? 1276 01:09:33,479 --> 01:09:34,729 CARTER ZENKE: A good question. 1277 01:09:34,729 --> 01:09:37,270 So one thing you might want to do is sort your data. 1278 01:09:37,270 --> 01:09:40,283 R does come with a function called sort that can do just that. 1279 01:09:40,283 --> 01:09:41,950 Let me show you a little bit of it here. 1280 01:09:41,950 --> 01:09:46,810 If I come back to RStudio, let's say I want to this vector 1281 01:09:46,810 --> 01:09:48,550 I'm given from apply. 1282 01:09:48,550 --> 01:09:52,220 I could call this vector something like total_votes, just like this. 1283 01:09:52,220 --> 01:09:54,940 And let me run line one and then line two. 1284 01:09:54,940 --> 01:10:00,800 Now I have total votes being this vector of named elements across my columns. 1285 01:10:00,800 --> 01:10:03,140 Let's say I wanted to sort these. 1286 01:10:03,140 --> 01:10:03,980 Let me say-- 1287 01:10:03,980 --> 01:10:08,150 I could use the sort function here, and I could type total_votes inside 1288 01:10:08,150 --> 01:10:08,900 as the input to. 1289 01:10:08,900 --> 01:10:13,220 And now, if I hit command enter on sort, I should see-- well, 1290 01:10:13,220 --> 01:10:17,090 it's kind of already in sorted order, at least going low to high-- 1291 01:10:17,090 --> 01:10:21,620 but now if I type question mark sort to see how I could change the order here, 1292 01:10:21,620 --> 01:10:25,700 I might see that sort has this parameter called decreasing, 1293 01:10:25,700 --> 01:10:27,890 which is initially false, which means that we're 1294 01:10:27,890 --> 01:10:29,810 going to count up instead of down. 1295 01:10:29,810 --> 01:10:34,580 But now if I want to sort going low to high or increasing order, 1296 01:10:34,580 --> 01:10:37,250 I could set decreasing-- 1297 01:10:37,250 --> 01:10:41,780 sorry, no, if I want to set the vector going from high to low, 1298 01:10:41,780 --> 01:10:45,260 let's say, instead of low to high, I could set decreasing equal to true. 1299 01:10:45,260 --> 01:10:49,100 And then we'll see this vector is now in sorted order. 1300 01:10:49,100 --> 01:10:52,160 I could do the same for every candidate that I have. 1301 01:10:52,160 --> 01:10:54,710 Let me update total votes across the columns. 1302 01:10:54,710 --> 01:10:57,930 Let me now on line three run sort. 1303 01:10:57,930 --> 01:11:01,840 And now I have my candidates in sorted order as well. 1304 01:11:01,840 --> 01:11:05,880 So a cool trick if you want to your data now using this sort function. 1305 01:11:05,880 --> 01:11:08,220 And you can change whether it goes up or down 1306 01:11:08,220 --> 01:11:10,690 using this decreasing parameter here. 1307 01:11:10,690 --> 01:11:12,510 So we've seen a lot today. 1308 01:11:12,510 --> 01:11:15,750 We've seen how to define our very own functions. 1309 01:11:15,750 --> 01:11:19,770 We've seen how to write our own loops to repeat code multiple times, 1310 01:11:19,770 --> 01:11:22,380 and we've seen how to combine these two ideas, 1311 01:11:22,380 --> 01:11:24,810 dipping our toes into functional programming. 1312 01:11:24,810 --> 01:11:27,768 That is, using functions to do the work of iteration for us. 1313 01:11:27,768 --> 01:11:30,810 When we come back next time, we'll actually see how to clean up our data, 1314 01:11:30,810 --> 01:11:33,690 how to tidy it to make analysis like these even easier. 1315 01:11:33,690 --> 01:11:35,430 All that and more next time. 1316 01:11:35,430 --> 01:11:37,340 See you soon. 1317 01:11:37,340 --> 01:11:39,000