1 00:00:00,000 --> 00:00:05,210 [AUDIO LOGO] 2 00:00:05,210 --> 00:00:06,270 YULIA: Hi, everyone. 3 00:00:06,270 --> 00:00:09,690 My name is Yulia, And I am one of their preceptors here at CS50. 4 00:00:09,690 --> 00:00:14,060 And today we'll be taking a tour around R Markdown, which is one of another ways 5 00:00:14,060 --> 00:00:16,020 to represent our R programs. 6 00:00:16,020 --> 00:00:19,220 So before, in lectures, while you were completing your problem sets, 7 00:00:19,220 --> 00:00:21,560 you might have seen something like this, which 8 00:00:21,560 --> 00:00:27,380 is called an R script, which is just some lines of code combined under dot R 9 00:00:27,380 --> 00:00:28,640 file. 10 00:00:28,640 --> 00:00:33,980 And what is going on here, and it is from the visualizing data week, 11 00:00:33,980 --> 00:00:36,980 is we're reading in some votes for the candidates. 12 00:00:36,980 --> 00:00:40,730 And then we're using a ggplot function to actually create 13 00:00:40,730 --> 00:00:43,020 a bar chart of those votes. 14 00:00:43,020 --> 00:00:47,390 And before we jump into R Markdown, let's just run these lines of code 15 00:00:47,390 --> 00:00:49,310 to see what they output. 16 00:00:49,310 --> 00:00:53,540 So I get an error that I can't find ggplot, 17 00:00:53,540 --> 00:00:56,880 and that's because I haven't imported my library yet. 18 00:00:56,880 --> 00:00:58,650 So let's do that together. 19 00:00:58,650 --> 00:01:03,990 I'm going to import library(ggplot2), and now when I run my code, 20 00:01:03,990 --> 00:01:12,000 I see it over here on the right, a nice bar chart of the number of votes 21 00:01:12,000 --> 00:01:15,340 that Bowser, Mario, and Peach received. 22 00:01:15,340 --> 00:01:19,590 But as you can tell, this is not a very efficient way 23 00:01:19,590 --> 00:01:22,480 to present this data to a user. 24 00:01:22,480 --> 00:01:25,060 And this is where R Markdown steps in. 25 00:01:25,060 --> 00:01:28,230 So R Markdown is often used for cases where 26 00:01:28,230 --> 00:01:31,170 we want to [INAUDIBLE] data analysis or report where 27 00:01:31,170 --> 00:01:35,880 we combine our code, some text, and maybe some visualizations 28 00:01:35,880 --> 00:01:38,050 together under one file. 29 00:01:38,050 --> 00:01:42,270 So to do that let's actually jump into votes.Rmd file 30 00:01:42,270 --> 00:01:44,470 right here that I've already created. 31 00:01:44,470 --> 00:01:47,760 And before we actually get to the text and the code part, 32 00:01:47,760 --> 00:01:49,810 let's create our header. 33 00:01:49,810 --> 00:01:54,870 And to do that, I'm going to create some space for it using the three dashes. 34 00:01:54,870 --> 00:01:57,630 In between them, I'm going to set some parameters. 35 00:01:57,630 --> 00:02:03,810 The first one being the title, and I'm going to call it "votes analysis." 36 00:02:03,810 --> 00:02:10,060 The second one is author, which is my name. 37 00:02:10,060 --> 00:02:14,720 And the last one is the output format. 38 00:02:14,720 --> 00:02:19,540 So there are several ways in which you can Knit or sort of like render 39 00:02:19,540 --> 00:02:22,610 your R Markdown, one of them being HTML. 40 00:02:22,610 --> 00:02:24,740 And that's what we're going to talk about today. 41 00:02:24,740 --> 00:02:29,150 But you can also Knit it to a PDF or a Word document. 42 00:02:29,150 --> 00:02:33,020 It just might require installation of some other libraries prior to that. 43 00:02:33,020 --> 00:02:37,400 So I'm going to set it to HTML document right here. 44 00:02:37,400 --> 00:02:40,930 And now that our header is done, we can actually 45 00:02:40,930 --> 00:02:44,210 get to the body of our R Markdown. 46 00:02:44,210 --> 00:02:48,230 So as I mentioned before, it consists of some code and some text. 47 00:02:48,230 --> 00:02:50,920 And while the code is the simple part, we already 48 00:02:50,920 --> 00:02:53,560 have it pre-written right here, how do we actually 49 00:02:53,560 --> 00:02:56,690 include some text into our file? 50 00:02:56,690 --> 00:03:00,760 Well, as I said, and as may be clear from the name, 51 00:03:00,760 --> 00:03:02,570 we use markdown language for it. 52 00:03:02,570 --> 00:03:07,120 And so, as you would do in other files or in other setups, 53 00:03:07,120 --> 00:03:10,970 we use hashtag for our titles. 54 00:03:10,970 --> 00:03:14,780 So for example, one hashtag will indicate the biggest title. 55 00:03:14,780 --> 00:03:17,180 And so I'm going to call it CS50R. 56 00:03:17,180 --> 00:03:21,130 And then two hashtags, you can think about it as a smaller title 57 00:03:21,130 --> 00:03:25,720 or a subtitle, and I'm going to call it "Data Visualizing" 58 00:03:25,720 --> 00:03:27,950 because that's the week that we're on. 59 00:03:27,950 --> 00:03:32,590 And then after that, I'm going to use three hashtags to indicate sections 60 00:03:32,590 --> 00:03:35,270 of my code that I will be writing. 61 00:03:35,270 --> 00:03:41,630 So the first section will be loading the necessary libraries. 62 00:03:41,630 --> 00:03:44,620 And the reason I need to do that is, remember 63 00:03:44,620 --> 00:03:48,880 just a minute ago in the console when we tried to run our ggplot function, 64 00:03:48,880 --> 00:03:52,600 it didn't recognize it because we didn't load the library. 65 00:03:52,600 --> 00:03:55,210 And while we already did in the console, and it 66 00:03:55,210 --> 00:03:57,550 might work when we run it just here right 67 00:03:57,550 --> 00:04:03,290 now, when we actually try to Knit it, our compiler won't recognize that. 68 00:04:03,290 --> 00:04:06,740 So we need to make sure we include it in the R Markdown, as well. 69 00:04:06,740 --> 00:04:10,710 And for this, I'm going to create our first code chunk. 70 00:04:10,710 --> 00:04:15,590 To do that, you can click Option Command-I on your keyboard 71 00:04:15,590 --> 00:04:16,620 if you're using a Mac. 72 00:04:16,620 --> 00:04:21,260 And this creates this gray box where you can put your code, 73 00:04:21,260 --> 00:04:25,770 and it will actually be recognized as code and not just as plain text. 74 00:04:25,770 --> 00:04:31,700 So here, I'm going to import the library ggplot2. 75 00:04:31,700 --> 00:04:37,100 And nice thing about R Markdown, is that not only everything runs 76 00:04:37,100 --> 00:04:39,290 when you Knit it, but you can also run it 77 00:04:39,290 --> 00:04:45,990 as you are coding in R Markdown itself by clicking this Play button right here. 78 00:04:45,990 --> 00:04:50,450 And again, we just import library ggplot2. 79 00:04:50,450 --> 00:04:54,620 Now that we've imported our library, let's go back to votes.R 80 00:04:54,620 --> 00:04:59,210 and actually copy our first line of code, which is just 81 00:04:59,210 --> 00:05:02,210 reading in the CSV of votes.csv. 82 00:05:02,210 --> 00:05:06,890 So I'm going to title my next section "Loading the Data." 83 00:05:06,890 --> 00:05:13,430 and again, by clicking Option Command-I, create another other chunk of code 84 00:05:13,430 --> 00:05:14,960 copying votes-- 85 00:05:14,960 --> 00:05:18,110 86 00:05:18,110 --> 00:05:21,020 assigning into votes, read.csv votes.csv, 87 00:05:21,020 --> 00:05:24,110 I can run this line by clicking Command Return, 88 00:05:24,110 --> 00:05:32,300 and now we can see in our environment, we actually have right here, 89 00:05:32,300 --> 00:05:36,000 we actually see that we have our data frame. 90 00:05:36,000 --> 00:05:40,790 And not only can you run things in R Markdown, you can also output them, 91 00:05:40,790 --> 00:05:44,160 but you don't need explicitly to call the print function. 92 00:05:44,160 --> 00:05:47,210 In fact, if you just write the name of the variable 93 00:05:47,210 --> 00:05:50,180 that you would like to print out or output below, 94 00:05:50,180 --> 00:05:54,750 you can just write it on the last line of code, click the Play button, 95 00:05:54,750 --> 00:06:00,410 and it will be outputted right under that chunk of code. 96 00:06:00,410 --> 00:06:03,720 All right, so I'm going to delete this for now. 97 00:06:03,720 --> 00:06:08,600 And next, what we're going to do is actually display our bar plot 98 00:06:08,600 --> 00:06:10,260 from the previous file. 99 00:06:10,260 --> 00:06:16,310 So I'm going to title this section, "Displaying Data," 100 00:06:16,310 --> 00:06:26,150 create my chunk of code again, and then just copy over the ggplot function right 101 00:06:26,150 --> 00:06:28,070 here, perfect. 102 00:06:28,070 --> 00:06:36,350 And so ideally, if we run this, we should see our plot outputted right here 103 00:06:36,350 --> 00:06:39,390 on the bottom. 104 00:06:39,390 --> 00:06:41,910 So the same plot that we've seen before, but it kind of 105 00:06:41,910 --> 00:06:44,430 goes sequentially, which will be nice when we actually 106 00:06:44,430 --> 00:06:47,440 Knit it and see the outputs. 107 00:06:47,440 --> 00:06:50,590 And actually, before we proceed, let's do just that. 108 00:06:50,590 --> 00:06:54,270 So I'm going to click this button right here that says Knit, 109 00:06:54,270 --> 00:07:00,150 and it automatically Knits to HTML, which is my preferred output method. 110 00:07:00,150 --> 00:07:04,090 And so here, we can see all the titles that we've created before. 111 00:07:04,090 --> 00:07:10,560 So CS50.R, Data Visualizing is our subtitle, and all the other parts 112 00:07:10,560 --> 00:07:11,920 our headings. 113 00:07:11,920 --> 00:07:15,670 And you can see that I deleted my last line of code that said votes. 114 00:07:15,670 --> 00:07:18,120 So I actually don't see the data frame outputted, 115 00:07:18,120 --> 00:07:21,190 but I see this line of code that actually loaded it in. 116 00:07:21,190 --> 00:07:25,530 And if I scroll to the very bottom, I can actually see the plot 117 00:07:25,530 --> 00:07:31,150 that I've seen before in my app. 118 00:07:31,150 --> 00:07:35,960 All right, let's go back and actually talk about some other things. 119 00:07:35,960 --> 00:07:40,780 So, so far, I've just been adding some titles and some chunks of code, 120 00:07:40,780 --> 00:07:43,910 but let's actually try to add in some text. 121 00:07:43,910 --> 00:07:52,240 And nice thing about R Markdown, is that it recognizes all characters as text, 122 00:07:52,240 --> 00:07:54,800 unless you put it in this code chunk. 123 00:07:54,800 --> 00:07:57,970 So if I start typing something like, Here I 124 00:07:57,970 --> 00:08:05,560 am going to include electoral data-- 125 00:08:05,560 --> 00:08:11,570 let's be fancy --for Mario and his friends. 126 00:08:11,570 --> 00:08:14,130 Nice thing about it, it's going to be recognized as friends. 127 00:08:14,130 --> 00:08:18,150 It's not going to try in and look for a variable called Mario or friends. 128 00:08:18,150 --> 00:08:20,970 This is just plain text, and that's it. 129 00:08:20,970 --> 00:08:23,250 And let's add some features to it. 130 00:08:23,250 --> 00:08:29,340 So for example, what if I wanted to make electoral data in italics? 131 00:08:29,340 --> 00:08:35,750 I can do that by putting some underscores in front and after it. 132 00:08:35,750 --> 00:08:38,630 And maybe I wanted to bold "Mario," and for that, 133 00:08:38,630 --> 00:08:46,010 I can use these two asterisks before and after to make it look bolder. 134 00:08:46,010 --> 00:08:50,040 So let's Save it and Knit it just to see what it outputs. 135 00:08:50,040 --> 00:08:56,520 And right here, I can see that my text is included, in fact, as plain text, 136 00:08:56,520 --> 00:08:59,550 and my "Mario and his friends" are bolded. 137 00:08:59,550 --> 00:09:02,240 "Electoral data" is in fact in italic. 138 00:09:02,240 --> 00:09:03,540 That's nice. 139 00:09:03,540 --> 00:09:09,490 And let's go back to our code and add in some extra things. 140 00:09:09,490 --> 00:09:13,590 So in the beginning, I mentioned that R Markdown is helpful for data analysis, 141 00:09:13,590 --> 00:09:14,280 right? 142 00:09:14,280 --> 00:09:19,020 Well, we loaded our data, we created a bar chart, we wrote in some tags, 143 00:09:19,020 --> 00:09:22,530 but let's actually add in a conclusion in the end. 144 00:09:22,530 --> 00:09:26,580 For that, I'm again going to create a section using this three hashtag called 145 00:09:26,580 --> 00:09:27,490 "Conclusion." 146 00:09:27,490 --> 00:09:35,010 And here, I'm going to write something like, In the end, 147 00:09:35,010 --> 00:09:41,330 Mario received X votes. 148 00:09:41,330 --> 00:09:51,530 Beach received X votes and Bowser received X votes. 149 00:09:51,530 --> 00:09:53,990 And I'm leaving it as X because I'm not really 150 00:09:53,990 --> 00:09:55,820 sure how many votes it got, right? 151 00:09:55,820 --> 00:09:59,090 I'm looking at the bar chart, and I can kind of tell, 152 00:09:59,090 --> 00:10:01,020 but I don't know the precise numbers. 153 00:10:01,020 --> 00:10:04,550 So here comes another nifty thing about R Markdown, 154 00:10:04,550 --> 00:10:10,080 is that you don't actually need a whole code chunk to call for some variable. 155 00:10:10,080 --> 00:10:11,490 You can actually do it inline. 156 00:10:11,490 --> 00:10:17,160 And to do that, can use two backticks, and write "r" inside of them. 157 00:10:17,160 --> 00:10:21,240 So now this chunk will be recognized as R code. 158 00:10:21,240 --> 00:10:26,180 And inside of here, I'm going to call votes, which is my data frame. 159 00:10:26,180 --> 00:10:31,290 I'm going to grab column votes and then grab the first value of it. 160 00:10:31,290 --> 00:10:35,450 And I'm just going to do the same for Peach and Bowser. 161 00:10:35,450 --> 00:10:42,100 So votes, dollar sign, votes, second index. 162 00:10:42,100 --> 00:10:51,150 And for Bowser, it will be votes, dollar sign, votes, third index. 163 00:10:51,150 --> 00:10:55,140 Let's make sure that we have all the syntax correct, 164 00:10:55,140 --> 00:10:57,990 and let's actually try to Knit it, fingers crossed. 165 00:10:57,990 --> 00:11:02,700 OK, and right here on the very, very bottom, 166 00:11:02,700 --> 00:11:06,250 you can see that Mario, in fact, received 100 votes. 167 00:11:06,250 --> 00:11:10,450 Peach received 200 votes and Bowser received 150 votes. 168 00:11:10,450 --> 00:11:13,920 Even though I did not type in those explicit numbers, 169 00:11:13,920 --> 00:11:18,450 R markdown recognizes that I'm actually looking inside my data frame 170 00:11:18,450 --> 00:11:22,110 and trying to grab those values from there. 171 00:11:22,110 --> 00:11:26,470 Let's go back and add in some other things. 172 00:11:26,470 --> 00:11:30,960 So not only you can just grab a specific value, you can use some functions 173 00:11:30,960 --> 00:11:36,730 and compute values inside these tiny R inline code chunks. 174 00:11:36,730 --> 00:11:42,160 So maybe I want to know how many total votes were casted in this election. 175 00:11:42,160 --> 00:11:49,740 So I can say something, The total number of votes 176 00:11:49,740 --> 00:11:54,520 was, again, using my inline R code chunk. 177 00:11:54,520 --> 00:11:57,750 And I'm going to call a sum function, and I'm just 178 00:11:57,750 --> 00:12:05,130 going to sum up all the votes that appear in my votes column. 179 00:12:05,130 --> 00:12:10,980 And I'm going to make them bolded, just so that the numbers 180 00:12:10,980 --> 00:12:13,000 stand out a little bit more. 181 00:12:13,000 --> 00:12:17,580 And I'm going to do that by using two asterisks before and after, just like we 182 00:12:17,580 --> 00:12:24,480 did on the top and right here, as well. 183 00:12:24,480 --> 00:12:31,750 Save it, and if I Knit it, I should see all of these numbers on the very bottom. 184 00:12:31,750 --> 00:12:34,060 So you can see that my numbers are now bolded. 185 00:12:34,060 --> 00:12:39,750 And I also summed up all the votes in the very end. 186 00:12:39,750 --> 00:12:45,220 Let's go back and see just one more thing that we can do with R Markdown. 187 00:12:45,220 --> 00:12:51,930 So before, when we Knitted, we would see, for example, both the plot 188 00:12:51,930 --> 00:12:52,922 and the code. 189 00:12:52,922 --> 00:12:54,880 But what if I just want to see the plot, right? 190 00:12:54,880 --> 00:12:58,380 I'm not really interested in how I got to the plot itself 191 00:12:58,380 --> 00:13:01,180 but rather just seeing the visualization. 192 00:13:01,180 --> 00:13:04,620 So to do that, you can actually change the specific settings 193 00:13:04,620 --> 00:13:07,000 of that code chunk. 194 00:13:07,000 --> 00:13:15,110 And so right here, what I'm going to say is, echo equals false. 195 00:13:15,110 --> 00:13:20,220 Essentially, what it will do, is it will say, OK, I want to see the output of it. 196 00:13:20,220 --> 00:13:21,970 I want to see the plot, but I don't really 197 00:13:21,970 --> 00:13:25,700 want to see the code in my final Knitted file. 198 00:13:25,700 --> 00:13:30,370 And so if I Knit it again, and I go here, 199 00:13:30,370 --> 00:13:32,300 I see this blank data, which is a good sign. 200 00:13:32,300 --> 00:13:33,290 This is where my section is. 201 00:13:33,290 --> 00:13:35,030 I see the plot, and I don't see the code. 202 00:13:35,030 --> 00:13:36,560 That is what I wanted. 203 00:13:36,560 --> 00:13:40,670 I didn't want to see those few lines of code with all the details. 204 00:13:40,670 --> 00:13:44,900 I just want to see the final result. OK, that's cool. 205 00:13:44,900 --> 00:13:47,870 And maybe let's add in one more thing. 206 00:13:47,870 --> 00:13:51,460 So you notice how I have different sections, my conclusion, 207 00:13:51,460 --> 00:13:55,780 displaying the data, loading the data, loading the libraries. 208 00:13:55,780 --> 00:14:00,370 And it can get really tedious having so many different headings, 209 00:14:00,370 --> 00:14:03,080 and maybe you have like a pages and pages of things. 210 00:14:03,080 --> 00:14:06,280 So cool thing about R Markdown, you can automatically 211 00:14:06,280 --> 00:14:09,790 create a table of contents, that will be generated up 212 00:14:09,790 --> 00:14:12,370 top, that are actually hyperlinks. 213 00:14:12,370 --> 00:14:14,330 And you can click on each of those sections, 214 00:14:14,330 --> 00:14:17,280 and it will automatically take you to that part. 215 00:14:17,280 --> 00:14:21,870 So to do that, I'm actually going to make some changes to our header. 216 00:14:21,870 --> 00:14:29,030 So I'm going to say that my output is HTML document, 217 00:14:29,030 --> 00:14:33,230 but inside that HTML document, I also someone to include Table Of Contents, 218 00:14:33,230 --> 00:14:35,010 or TOC for short. 219 00:14:35,010 --> 00:14:37,040 And I'm going to set it to true. 220 00:14:37,040 --> 00:14:40,200 So now I actually want to have my table of contents. 221 00:14:40,200 --> 00:14:46,530 And if I Knit it, I see, on the very top, my bullet points. 222 00:14:46,530 --> 00:14:51,630 And it cascades in the same way in which you assign hash tags to those titles. 223 00:14:51,630 --> 00:14:55,100 So you can see that says CS50R had only one hashtag, so it 224 00:14:55,100 --> 00:14:56,490 means it's the biggest one. 225 00:14:56,490 --> 00:14:57,380 It's on top. 226 00:14:57,380 --> 00:15:01,920 Data, visualizing is kind of like my big theme of this week, 227 00:15:01,920 --> 00:15:03,840 so it comes as the second subtitle. 228 00:15:03,840 --> 00:15:06,170 And after that, I just have my sections where 229 00:15:06,170 --> 00:15:10,680 I actually broke down my program into parts and actually wrote some code. 230 00:15:10,680 --> 00:15:15,680 And so if we click on conclusion, it takes us to the very, very end. 231 00:15:15,680 --> 00:15:21,680 So to summarize, we've seen some cool things 232 00:15:21,680 --> 00:15:25,250 we can do with R Markdown, including text, headers, 233 00:15:25,250 --> 00:15:29,990 generating table of contents, as well as figuring out what lines of code 234 00:15:29,990 --> 00:15:35,250 you want to show on your final Knitted file, what lines of code, 235 00:15:35,250 --> 00:15:36,330 you don't want to show. 236 00:15:36,330 --> 00:15:38,720 And there are lots of other things that you 237 00:15:38,720 --> 00:15:42,200 can do with R Markdown, including different features 238 00:15:42,200 --> 00:15:46,790 and other different markdown styles that you can include, 239 00:15:46,790 --> 00:15:49,140 which we definitely don't have time to cover today, 240 00:15:49,140 --> 00:15:52,040 but I encourage you to look into it yourself. 241 00:15:52,040 --> 00:15:56,140 I hope this was helpful, and I hope you learned something new today. 242 00:15:56,140 --> 00:15:58,000