1 00:00:00,000 --> 00:00:01,930 2 00:00:01,930 --> 00:00:02,930 SPEAKER 1: Hello, world. 3 00:00:02,930 --> 00:00:07,300 This is the CS50x educator workshop, our session on CS50's tools 4 00:00:07,300 --> 00:00:08,830 for submitting and grading. 5 00:00:08,830 --> 00:00:12,370 This session will be led by CS50's own Brian Yu, who has been instrumental 6 00:00:12,370 --> 00:00:14,830 in the development and deployment of these tools 7 00:00:14,830 --> 00:00:17,120 to CS50's students and teachers. 8 00:00:17,120 --> 00:00:19,225 Brian, the floor is yours. 9 00:00:19,225 --> 00:00:20,350 BRIAN YU: Thanks very much. 10 00:00:20,350 --> 00:00:22,850 Really great to see everyone here today, and looking forward 11 00:00:22,850 --> 00:00:27,260 to talking with you all about CS50's tools for submitting and grading, 12 00:00:27,260 --> 00:00:29,410 which is going to be our topic today. 13 00:00:29,410 --> 00:00:31,540 And in particular today, we're going to be 14 00:00:31,540 --> 00:00:36,670 looking at four tools that are available for you to use, for your students use, 15 00:00:36,670 --> 00:00:41,920 potentially, as you teach CS50 or one of the other CS50 courses. 16 00:00:41,920 --> 00:00:44,870 You certainly don't need to use any or all of these tools, 17 00:00:44,870 --> 00:00:47,080 but we do make them available, and the goal of today 18 00:00:47,080 --> 00:00:50,430 is to show you what these tools are, how they might be useful, 19 00:00:50,430 --> 00:00:54,220 and to give you a sense for what you can do with them. 20 00:00:54,220 --> 00:01:00,460 We'll go through check50, submit50, submit.cs50.io, and compare50. 21 00:01:00,460 --> 00:01:04,280 I thought we would start, just so I can get a sense for all of you all 22 00:01:04,280 --> 00:01:08,230 and which tools you've already used, I'm going to paste into the chat 23 00:01:08,230 --> 00:01:10,000 a link to a poll. 24 00:01:10,000 --> 00:01:14,170 If you wouldn't mind filling out that poll just to give us a sense for which 25 00:01:14,170 --> 00:01:16,790 of these tools have you used before. 26 00:01:16,790 --> 00:01:20,427 So if you've used all of the tools, you can click on all of those four tools, 27 00:01:20,427 --> 00:01:23,260 but if you've only used some of them, you can just click on the ones 28 00:01:23,260 --> 00:01:25,160 that you have used. 29 00:01:25,160 --> 00:01:28,360 But go ahead and click on that link that I just posted in the chat, 30 00:01:28,360 --> 00:01:33,660 and let us know which of these tools you have used previously. 31 00:01:33,660 --> 00:01:36,650 It looks like a couple people have submitted the poll so far, 32 00:01:36,650 --> 00:01:42,070 and most people have used check50, and maybe some people have used submit50. 33 00:01:42,070 --> 00:01:44,560 submit.cs50.io fewer people have used. 34 00:01:44,560 --> 00:01:46,630 That's a web application that we'll see soon 35 00:01:46,630 --> 00:01:50,620 that helps you integrate submit50 and check50 together. 36 00:01:50,620 --> 00:01:55,060 And it looks like a couple people have used compare50, as well, which 37 00:01:55,060 --> 00:01:59,410 is the tool that you can use to check code for similarity, which we'll 38 00:01:59,410 --> 00:02:01,530 talk about a little bit later today. 39 00:02:01,530 --> 00:02:03,280 So regardless of which of these categories 40 00:02:03,280 --> 00:02:06,520 you fall in, regardless of whether you've used all of these tools before 41 00:02:06,520 --> 00:02:09,880 or none of these tools before, hopefully today will be an opportunity 42 00:02:09,880 --> 00:02:12,970 to learn a little something new about each of these tools and how 43 00:02:12,970 --> 00:02:17,170 you can potentially put them into use inside of your classroom. 44 00:02:17,170 --> 00:02:20,500 So with that, let's go ahead and get started 45 00:02:20,500 --> 00:02:25,750 with the first of the tools, which is check50. 46 00:02:25,750 --> 00:02:28,090 And all of these tools you can find documentation 47 00:02:28,090 --> 00:02:32,380 for at this URL, cs50.readthedocs.io, which 48 00:02:32,380 --> 00:02:35,890 will include more information about all of the tools, how to use them, 49 00:02:35,890 --> 00:02:37,210 and more documentation there. 50 00:02:37,210 --> 00:02:39,138 And at any point today, if you have questions 51 00:02:39,138 --> 00:02:41,930 about anything that I'm saying or the tools that I'm talking about, 52 00:02:41,930 --> 00:02:44,290 do you feel free to raise your virtual hand using 53 00:02:44,290 --> 00:02:47,740 that blue raise hand feature that David pointed out a little bit earlier today. 54 00:02:47,740 --> 00:02:49,990 I'm keeping an eye on that so I can call on you if you 55 00:02:49,990 --> 00:02:52,210 have questions about anything going on. 56 00:02:52,210 --> 00:02:55,240 And you're also certainly welcome to ask questions 57 00:02:55,240 --> 00:02:59,560 in the chat window, too, where I might see them, or other CS50 staff, 58 00:02:59,560 --> 00:03:01,510 or other people that are just participants 59 00:03:01,510 --> 00:03:05,230 here today might be able to see your questions and help to answer those, 60 00:03:05,230 --> 00:03:05,930 as well. 61 00:03:05,930 --> 00:03:08,890 So raising your hand or asking in the chat, 62 00:03:08,890 --> 00:03:13,310 but definitely feel free to ask any questions that you might have. 63 00:03:13,310 --> 00:03:15,940 So let's go ahead and begin with the first of the four tools 64 00:03:15,940 --> 00:03:19,330 that we're going to talk about today, which is check50. 65 00:03:19,330 --> 00:03:22,090 check50, it looks like, most of you have used before. 66 00:03:22,090 --> 00:03:27,220 check50 is a command-line tool for running automated tests 67 00:03:27,220 --> 00:03:28,910 on students' code. 68 00:03:28,910 --> 00:03:33,250 So when you run check50, we're going to run a series of automated tests 69 00:03:33,250 --> 00:03:37,030 that we tend to call checks that will check students' code 70 00:03:37,030 --> 00:03:41,020 to make sure that that code behaves in a certain way that meets the problem 71 00:03:41,020 --> 00:03:44,930 specification for a particular problem, for instance. 72 00:03:44,930 --> 00:03:49,000 So what does it actually look like when you run check50, for example? 73 00:03:49,000 --> 00:03:52,720 What's going to happen is inside of the terminal, inside of CS50 IDE 74 00:03:52,720 --> 00:03:59,770 or elsewhere, you can run check50 followed by a submission slug, which 75 00:03:59,770 --> 00:04:04,210 is just some unique identifier that describes which problem we 76 00:04:04,210 --> 00:04:08,530 are referring to, which problem we would like to check for correctness. 77 00:04:08,530 --> 00:04:12,220 So here, for example, we're running check50 followed by the unique 78 00:04:12,220 --> 00:04:16,600 identifier cs50/problems/2020/x/cash. 79 00:04:16,600 --> 00:04:20,350 In other words, we're checking a CS50 problem, one of them from this year. 80 00:04:20,350 --> 00:04:23,500 We version all of our checks, so that if problems change from one year 81 00:04:23,500 --> 00:04:25,900 to another, you can be sure that you're checking 82 00:04:25,900 --> 00:04:27,850 the most recent version of the problem. 83 00:04:27,850 --> 00:04:31,000 You can go back and check previous versions if you'd like to. 84 00:04:31,000 --> 00:04:33,700 And then the name of this particular problem is cash. 85 00:04:33,700 --> 00:04:38,320 And if you've taken CS50x or started it, you might find this familiar from week 86 00:04:38,320 --> 00:04:43,550 one from problem set one in the course. 87 00:04:43,550 --> 00:04:47,270 So when you run check50 followed by a unique identifier, 88 00:04:47,270 --> 00:04:49,453 we're going to upload the student's code, 89 00:04:49,453 --> 00:04:51,370 and once the student's code is uploaded, we're 90 00:04:51,370 --> 00:04:54,760 going to run a series of automated checks on that code, 91 00:04:54,760 --> 00:04:57,790 making sure that it's producing the correct output given 92 00:04:57,790 --> 00:04:59,750 particular input, for example. 93 00:04:59,750 --> 00:05:02,380 And as a result of all of that, you'll end up 94 00:05:02,380 --> 00:05:05,470 seeing something like this, where we see a series of all 95 00:05:05,470 --> 00:05:07,720 of the results of these individual checks 96 00:05:07,720 --> 00:05:11,770 with a green smiley face for any of the checks that have passed successfully 97 00:05:11,770 --> 00:05:15,220 and a red frown face for any of the checks that did not pass, 98 00:05:15,220 --> 00:05:20,170 where the student's code did not do what it was expected to do. 99 00:05:20,170 --> 00:05:23,620 This tool, then, can be useful for both students and for teachers. 100 00:05:23,620 --> 00:05:26,110 For students, students can use this tool to be 101 00:05:26,110 --> 00:05:28,240 able to see does their code actually work? 102 00:05:28,240 --> 00:05:31,477 And as students are working through trying to solve a problem, 103 00:05:31,477 --> 00:05:33,310 they might check to see, all right, it looks 104 00:05:33,310 --> 00:05:36,640 like I'm handling most of these cases, but maybe there 105 00:05:36,640 --> 00:05:40,510 are some corner cases that are less clear, that we're 106 00:05:40,510 --> 00:05:42,370 not quite handling appropriately. 107 00:05:42,370 --> 00:05:46,420 So students can learn from that in order to make their code more accurate. 108 00:05:46,420 --> 00:05:49,090 And in addition to that, you as the teacher 109 00:05:49,090 --> 00:05:52,180 can use this code to facilitate the grading process, 110 00:05:52,180 --> 00:05:55,270 for being able to quickly run check50 on a submission to see 111 00:05:55,270 --> 00:05:57,760 does a student's submission work or not? 112 00:05:57,760 --> 00:06:00,800 So another tool that is available to you. 113 00:06:00,800 --> 00:06:02,920 In addition to just this text-based output 114 00:06:02,920 --> 00:06:06,160 to see which checks did or did not pass, you also 115 00:06:06,160 --> 00:06:08,830 have the ability to see this in web-based format. 116 00:06:08,830 --> 00:06:11,980 Any time a student runs check50, they're given a URL 117 00:06:11,980 --> 00:06:14,470 that will take them to a page that displays 118 00:06:14,470 --> 00:06:17,590 the results of their check50 check in a little bit more detail, 119 00:06:17,590 --> 00:06:20,230 showing them all of the checks the passed or failed, 120 00:06:20,230 --> 00:06:24,250 along with some log describing what it is that the check did 121 00:06:24,250 --> 00:06:27,760 and what potentially went wrong if the student didn't 122 00:06:27,760 --> 00:06:30,460 pass the correctness check. 123 00:06:30,460 --> 00:06:33,290 This URL is shareable, such that if a student wants 124 00:06:33,290 --> 00:06:35,420 to share their results with you, for example, 125 00:06:35,420 --> 00:06:38,480 they can share the URL with their check50 results to you, 126 00:06:38,480 --> 00:06:41,660 and you can then open up that URL to see what the student did, 127 00:06:41,660 --> 00:06:44,720 and you can then do a comparison to maybe identify 128 00:06:44,720 --> 00:06:49,310 where a student might have gone wrong or where they potentially made a mistake. 129 00:06:49,310 --> 00:06:52,940 For any of the checks that did not pass in this sort of environment, 130 00:06:52,940 --> 00:06:55,010 there is a view that looks something like this. 131 00:06:55,010 --> 00:06:57,410 If a check didn't pass, students will generally 132 00:06:57,410 --> 00:07:00,710 see, on the left-hand side, the expected output-- what 133 00:07:00,710 --> 00:07:04,570 it is that check50 thought that the program should produce as output. 134 00:07:04,570 --> 00:07:07,280 And on the right-hand side, students will see their own output-- 135 00:07:07,280 --> 00:07:09,230 what their program actually did. 136 00:07:09,230 --> 00:07:11,210 And by seeing this side by side, this can 137 00:07:11,210 --> 00:07:13,130 help students more visually to get a sense 138 00:07:13,130 --> 00:07:14,900 for what should their code have done? 139 00:07:14,900 --> 00:07:16,520 What did their code do? 140 00:07:16,520 --> 00:07:18,890 And therefore, what could they potentially do to fix it? 141 00:07:18,890 --> 00:07:20,910 And it looks like in this case, for example, 142 00:07:20,910 --> 00:07:24,890 where the student is working on Mario, the actual output 143 00:07:24,890 --> 00:07:30,770 that the student produced has one fewer row than the code was actually 144 00:07:30,770 --> 00:07:32,610 supposed to have. 145 00:07:32,610 --> 00:07:36,770 So that can be a nice visual indicator to tell students 146 00:07:36,770 --> 00:07:40,210 what they might have done wrong. 147 00:07:40,210 --> 00:07:43,260 So that, then, is the web results for check50. 148 00:07:43,260 --> 00:07:45,650 And before I move on, I'll just pause here for a moment. 149 00:07:45,650 --> 00:07:48,250 I know most people have used check50 already before, 150 00:07:48,250 --> 00:07:52,520 but questions about anything so far? 151 00:07:52,520 --> 00:07:54,120 Let's go ahead and go to Joseph. 152 00:07:54,120 --> 00:07:56,080 I see you have your hand raised. 153 00:07:56,080 --> 00:07:56,650 JOSEPH: Yes. 154 00:07:56,650 --> 00:08:02,080 Quick question regarding academic honesty. 155 00:08:02,080 --> 00:08:06,130 One of the skills we would really like to teach the students 156 00:08:06,130 --> 00:08:11,110 is the ability to problem solve and use Google and search, find a code, 157 00:08:11,110 --> 00:08:13,660 or find it in documentation. 158 00:08:13,660 --> 00:08:19,450 How do you balance that with the tools that check for academic honesty 159 00:08:19,450 --> 00:08:21,790 as a goal for the class? 160 00:08:21,790 --> 00:08:23,890 BRIAN YU: We will talk a little bit about tools 161 00:08:23,890 --> 00:08:26,380 for checking for academic honesty a little bit 162 00:08:26,380 --> 00:08:29,660 later in the session when we explore some of the software in order to do so. 163 00:08:29,660 --> 00:08:32,740 164 00:08:32,740 --> 00:08:37,809 check50 is supposed to be an indicator for students when they're 165 00:08:37,809 --> 00:08:39,460 nearing completion on a problem. 166 00:08:39,460 --> 00:08:41,840 When they've been working on a problem for some time, 167 00:08:41,840 --> 00:08:46,547 they can then run check50 to be able to see if there was something 168 00:08:46,547 --> 00:08:49,630 that they were missing, for example, or if there was some change that they 169 00:08:49,630 --> 00:08:51,640 still needed to make. 170 00:08:51,640 --> 00:08:53,830 It's, not for example, telling the student 171 00:08:53,830 --> 00:08:56,320 exactly what line of code they should be adding 172 00:08:56,320 --> 00:08:58,810 or exactly what the solution to the problem is. 173 00:08:58,810 --> 00:09:03,490 It's more of a feedback mechanism that students can use. 174 00:09:03,490 --> 00:09:06,560 Academic honesty certainly it's something that we think about. 175 00:09:06,560 --> 00:09:10,420 We want to make sure that the work that students are submitting is their own. 176 00:09:10,420 --> 00:09:13,120 We do have some other tools for that that we'll talk 177 00:09:13,120 --> 00:09:17,067 about later today in this session, too. 178 00:09:17,067 --> 00:09:18,275 Other questions about things? 179 00:09:18,275 --> 00:09:21,120 180 00:09:21,120 --> 00:09:23,700 Lana, yeah, if you'd like to ask a question. 181 00:09:23,700 --> 00:09:26,270 LANA: I have a question. 182 00:09:26,270 --> 00:09:30,340 Can you please explain briefly the architecture of this check application, 183 00:09:30,340 --> 00:09:33,070 if it's possible? 184 00:09:33,070 --> 00:09:36,260 How it's working underneath? 185 00:09:36,260 --> 00:09:37,810 Yeah, certainly. 186 00:09:37,810 --> 00:09:41,260 In fact, we'll get to that a little bit later, but in short, what's happening 187 00:09:41,260 --> 00:09:45,880 is that we're running a cluster of servers 188 00:09:45,880 --> 00:09:48,700 that are going to download students' code from GitHub, 189 00:09:48,700 --> 00:09:50,890 which we're using the students' code. 190 00:09:50,890 --> 00:09:54,760 On those servers, we're going to then run a series of automated checks 191 00:09:54,760 --> 00:09:56,440 that are also hosted on GitHub. 192 00:09:56,440 --> 00:10:00,942 I'll show you some diagrams of what that architecture looks like soon, too, 193 00:10:00,942 --> 00:10:03,900 just so you get a better sense for how all of that is working together. 194 00:10:03,900 --> 00:10:08,700 195 00:10:08,700 --> 00:10:11,310 A couple of other things about check50. 196 00:10:11,310 --> 00:10:14,850 Importantly, you can use it inside of any of CS50's tools. 197 00:10:14,850 --> 00:10:18,300 Yesterday, Kareem introduced you to CS50 IDE in addition 198 00:10:18,300 --> 00:10:21,420 to CS50 Sandbox and CS50 Lab. 199 00:10:21,420 --> 00:10:24,320 check50 is installed in all of those different environments. 200 00:10:24,320 --> 00:10:27,510 You can run check50 in the IDE, for example, or in the Lab 201 00:10:27,510 --> 00:10:32,070 or in the Sandbox, but you don't need to be using those particular environments 202 00:10:32,070 --> 00:10:34,170 in order to use check50. 203 00:10:34,170 --> 00:10:38,070 So you can install check50 locally onto your own computer 204 00:10:38,070 --> 00:10:41,100 if you would like to to run check50 just on your own computer 205 00:10:41,100 --> 00:10:44,400 without needing to use any of CS50's environments. 206 00:10:44,400 --> 00:10:47,220 You can do so just by having Python 3 installed, 207 00:10:47,220 --> 00:10:51,090 and then running this command that you see here-- pip3 install check50-- 208 00:10:51,090 --> 00:10:54,090 and that will install check50 onto your own computer. 209 00:10:54,090 --> 00:10:57,300 And that's true of CS50's other open-source command-line tools, 210 00:10:57,300 --> 00:11:01,350 as well, for tools like style50 or submit50 211 00:11:01,350 --> 00:11:04,140 and compare50, which we'll see a little bit later today. 212 00:11:04,140 --> 00:11:08,760 You can install them the same way just by installing them locally, using pip3 213 00:11:08,760 --> 00:11:12,750 in order to get access to those tools on your own computer or your own server, 214 00:11:12,750 --> 00:11:15,590 as well, if you would like to. 215 00:11:15,590 --> 00:11:17,480 When students don't pass a check, they're 216 00:11:17,480 --> 00:11:19,160 given certain types of feedback. 217 00:11:19,160 --> 00:11:21,860 In particular, they're given feedback, as we talked about, as 218 00:11:21,860 --> 00:11:25,490 to what it is the program did-- what the actual output of their program was, 219 00:11:25,490 --> 00:11:26,660 for example. 220 00:11:26,660 --> 00:11:30,000 They're also given information about what their program should have done, 221 00:11:30,000 --> 00:11:33,380 so what we expected their program to do that they didn't do. 222 00:11:33,380 --> 00:11:37,220 And then in addition to that, we've added support for occasional hints 223 00:11:37,220 --> 00:11:40,880 that we can provide to students to guide students in the right direction. 224 00:11:40,880 --> 00:11:43,880 This is born, really, out of the fact that there are many errors that we 225 00:11:43,880 --> 00:11:46,640 see in students' code that are quite common, 226 00:11:46,640 --> 00:11:50,150 that have a common cause that we see happen again and again and again, 227 00:11:50,150 --> 00:11:54,740 usually due to not considering some particular case that might take place 228 00:11:54,740 --> 00:11:58,470 or forgetting about one particular part of the problem, for example. 229 00:11:58,470 --> 00:12:01,790 So if you've ever solved CS50's cash problem in problem set one, 230 00:12:01,790 --> 00:12:04,760 for example, you might remember that an important part 231 00:12:04,760 --> 00:12:07,250 of making sure the program is bug-free is 232 00:12:07,250 --> 00:12:10,100 being sure to round numbers to the nearest integer, 233 00:12:10,100 --> 00:12:12,920 because if you don't round numbers to the nearest integer, 234 00:12:12,920 --> 00:12:15,650 you might end up with some fractional number of cents 235 00:12:15,650 --> 00:12:20,120 that ends up causing bugs later on in the program, for instance. 236 00:12:20,120 --> 00:12:24,770 So when students forget to do that, we can offer that as a potential hint 237 00:12:24,770 --> 00:12:29,760 to students, suggesting did you forget to round to the nearest cent? 238 00:12:29,760 --> 00:12:32,540 Just to guide students in the same way that a human teaching 239 00:12:32,540 --> 00:12:35,880 fellow might guide students, as well. 240 00:12:35,880 --> 00:12:38,390 So that's something that we make available. 241 00:12:38,390 --> 00:12:40,280 We'll also allow you to customize. 242 00:12:40,280 --> 00:12:43,970 We'll talk in a moment about how you can write your own checks if you would like 243 00:12:43,970 --> 00:12:46,040 to, for your own problems, for example. 244 00:12:46,040 --> 00:12:49,520 And in those cases, it's up to you to decide what sort of hint 245 00:12:49,520 --> 00:12:51,295 you would like to provide, if any. 246 00:12:51,295 --> 00:12:52,670 You don't have to provide a hint. 247 00:12:52,670 --> 00:12:56,090 You can just show the expected and actual output, for example. 248 00:12:56,090 --> 00:12:59,570 If you'd like to offer some guidance as to why a check might have failed, 249 00:12:59,570 --> 00:13:02,900 you can add some logic to add these sorts of hints 250 00:13:02,900 --> 00:13:07,480 into your check50 checks, as well. 251 00:13:07,480 --> 00:13:09,970 As I mentioned in response to a question previously, 252 00:13:09,970 --> 00:13:12,490 all of the data that check50 cares about is 253 00:13:12,490 --> 00:13:16,750 stored on GitHub, which we use for a number of CS50's tools. 254 00:13:16,750 --> 00:13:20,230 We use GitHub as the place where we store students' code any time that they 255 00:13:20,230 --> 00:13:23,680 run check50, and it's also the place where the correctness checks 256 00:13:23,680 --> 00:13:25,490 themselves are stored. 257 00:13:25,490 --> 00:13:29,720 So if you want to see what it is that we are checking for in check50, 258 00:13:29,720 --> 00:13:31,720 you can find those correctness checks on GitHub, 259 00:13:31,720 --> 00:13:34,130 and I'll show you where in just a moment. 260 00:13:34,130 --> 00:13:37,820 And then if you would like to write your own checks to check your own problems, 261 00:13:37,820 --> 00:13:40,420 for example, you can put those checks on GitHub, 262 00:13:40,420 --> 00:13:45,130 and check50 will be able to find them and access them, as well. 263 00:13:45,130 --> 00:13:49,666 Ignacio, do you have a question about what we've talked about so far? 264 00:13:49,666 --> 00:13:53,055 IGNACIO: Just a simple question. 265 00:13:53,055 --> 00:14:02,902 I'm working with a colleague that would like me to translate the CS code. 266 00:14:02,902 --> 00:14:10,330 [INAUDIBLE] And I see that check sent just message in English. 267 00:14:10,330 --> 00:14:14,385 There is a way to translate this to Portuguese, too? 268 00:14:14,385 --> 00:14:15,010 BRIAN YU: Yeah. 269 00:14:15,010 --> 00:14:16,840 I didn't quite catch all the details of the question, 270 00:14:16,840 --> 00:14:19,007 but I think you were asking about translating checks 271 00:14:19,007 --> 00:14:20,620 into other languages. 272 00:14:20,620 --> 00:14:21,880 Definitely possible. 273 00:14:21,880 --> 00:14:24,820 All of the check output is configurable in terms 274 00:14:24,820 --> 00:14:26,920 of what it is that the message is saying and what 275 00:14:26,920 --> 00:14:29,140 it is that the checks are checking for. 276 00:14:29,140 --> 00:14:32,470 I'll show you in a moment what that configuration looks like. 277 00:14:32,470 --> 00:14:36,610 But that is something that if you would like to create some new check based 278 00:14:36,610 --> 00:14:40,270 on our existing checks in order to translate checks from one language 279 00:14:40,270 --> 00:14:43,700 into another, that's definitely something that you can do. 280 00:14:43,700 --> 00:14:45,700 And when we get to talking about writing checks, 281 00:14:45,700 --> 00:14:48,960 you'll be able to see an example of what it is that that looks like. 282 00:14:48,960 --> 00:14:52,010 283 00:14:52,010 --> 00:14:54,580 We've talked about how checks are stored on GitHub, 284 00:14:54,580 --> 00:14:58,610 but let's talk about now where on GitHub those checks are actually stored, 285 00:14:58,610 --> 00:15:01,430 so that you can find our checks if you're looking for them, 286 00:15:01,430 --> 00:15:04,490 and you can also write your own checks if you'd like to do so. 287 00:15:04,490 --> 00:15:06,470 So when you run check50, we've talked about how 288 00:15:06,470 --> 00:15:10,490 you run check50 followed by a slug or some unique identifier 289 00:15:10,490 --> 00:15:13,370 to describe what problem you would like to check. 290 00:15:13,370 --> 00:15:16,850 That slug is divided into multiple parts. 291 00:15:16,850 --> 00:15:21,140 The first part of the slug represents the GitHub repository 292 00:15:21,140 --> 00:15:22,850 where the code is stored. 293 00:15:22,850 --> 00:15:25,970 And GitHub repository, if unfamiliar, you can think of as, like, 294 00:15:25,970 --> 00:15:29,180 a folder that's stored on the cloud on GitHub that's 295 00:15:29,180 --> 00:15:32,600 going to keep track of all of the data that stores all the checks, 296 00:15:32,600 --> 00:15:33,450 for example. 297 00:15:33,450 --> 00:15:36,860 So CS50/problems is the name of the GitHub repository 298 00:15:36,860 --> 00:15:40,250 that we use to store all of our own check50 checks. 299 00:15:40,250 --> 00:15:45,560 You can find it yourself by going to github.com/cs50/problems to find all 300 00:15:45,560 --> 00:15:47,490 of those there. 301 00:15:47,490 --> 00:15:51,180 Within a GitHub repository, you can divide a repository 302 00:15:51,180 --> 00:15:55,930 into branches for different versions of that repository, for example. 303 00:15:55,930 --> 00:15:58,080 And so the next part of the submission slug here, 304 00:15:58,080 --> 00:16:03,910 2020/x, that represents a branch on that repository. 305 00:16:03,910 --> 00:16:07,980 So it represents a branch of the CS50/problems repository. 306 00:16:07,980 --> 00:16:11,790 And the way that we have generally structured our problems repository 307 00:16:11,790 --> 00:16:14,880 is to have one branch for each offering of the class. 308 00:16:14,880 --> 00:16:19,950 So in CS50x 2020, the branch is 2020/x, but last year we 309 00:16:19,950 --> 00:16:23,430 had a branch that was 2019/x, for example, just 310 00:16:23,430 --> 00:16:28,050 so we can keep different versions of the class separate on different branches. 311 00:16:28,050 --> 00:16:32,340 And then finally, the last part of the check50 submission slug 312 00:16:32,340 --> 00:16:35,370 is usually the name of the problem, but what that really represents 313 00:16:35,370 --> 00:16:37,680 is a folder on that branch. 314 00:16:37,680 --> 00:16:41,400 So there's a folder on the branch called cash, inside of which 315 00:16:41,400 --> 00:16:46,980 are all of the checks that we're going to run whenever a student runs check50 316 00:16:46,980 --> 00:16:49,230 for this particular problem. 317 00:16:49,230 --> 00:16:53,970 So this three-part hierarchy is how we construct a check50 submission slug. 318 00:16:53,970 --> 00:16:58,170 The first part is the GitHub repository where those checks are stored, 319 00:16:58,170 --> 00:17:00,750 the second part is the branch on that repository where 320 00:17:00,750 --> 00:17:02,790 you can find the checks, and the third part 321 00:17:02,790 --> 00:17:06,180 is the folder on that branch where all of the checks 322 00:17:06,180 --> 00:17:07,829 are ultimately going to be stored. 323 00:17:07,829 --> 00:17:10,560 What this means is that you can create your own checks 324 00:17:10,560 --> 00:17:14,067 if you would like to by pushing to a repository of your own. 325 00:17:14,067 --> 00:17:15,900 And then what you really just need to change 326 00:17:15,900 --> 00:17:21,359 is you need to change cs50/problems, our repository, to your own repository, 327 00:17:21,359 --> 00:17:22,200 for example. 328 00:17:22,200 --> 00:17:24,510 Then change the branch in the folder to match 329 00:17:24,510 --> 00:17:29,700 where it is that you have stored all of these individual checks. 330 00:17:29,700 --> 00:17:31,450 So let's go ahead and talk about that now, 331 00:17:31,450 --> 00:17:33,802 too, this process of writing checks. 332 00:17:33,802 --> 00:17:36,260 Now, I should first mention that you never have to do this. 333 00:17:36,260 --> 00:17:41,520 We have written check50 checks already for you for all of CS50's problems. 334 00:17:41,520 --> 00:17:43,640 So if you are teaching CS50x, and you're just 335 00:17:43,640 --> 00:17:46,742 using the problems that we offer in the course, 336 00:17:46,742 --> 00:17:48,950 our check50 checks have already been written for you, 337 00:17:48,950 --> 00:17:52,190 and you can just use them by running check50 followed 338 00:17:52,190 --> 00:17:54,710 by the appropriate submission slug. 339 00:17:54,710 --> 00:17:58,040 Those unique identifiers are located in the problem set 340 00:17:58,040 --> 00:18:02,390 specification in the instructions for each of the individual problems. 341 00:18:02,390 --> 00:18:05,420 Many teachers will choose to add to CS50's curriculum, 342 00:18:05,420 --> 00:18:08,908 adding a problem of their own, for example, or wanting 343 00:18:08,908 --> 00:18:12,200 to add additional checks to our problems to check for different types of things 344 00:18:12,200 --> 00:18:15,610 or to customize it for their particular classroom. 345 00:18:15,610 --> 00:18:19,040 So some teachers will choose to write checks of their own 346 00:18:19,040 --> 00:18:21,680 for usage on their own problems, for example. 347 00:18:21,680 --> 00:18:25,070 Now I'll show you how you can actually go about doing that. 348 00:18:25,070 --> 00:18:28,730 Ultimately, what you will do is push the required files 349 00:18:28,730 --> 00:18:32,870 to a GitHub repository of your own, following that format that we 350 00:18:32,870 --> 00:18:35,930 talked about on the previous slide. 351 00:18:35,930 --> 00:18:40,010 And the files that you'll need in order to configure check50 are twofold. 352 00:18:40,010 --> 00:18:45,050 The first file you'll need is a file called .cs50.yml. 353 00:18:45,050 --> 00:18:48,110 This is a file using the YAML language, which 354 00:18:48,110 --> 00:18:52,370 is just a language that makes it easy to configure CS50's 355 00:18:52,370 --> 00:18:54,650 tools in a human-readable format. 356 00:18:54,650 --> 00:18:57,200 It's used in other places, as well. 357 00:18:57,200 --> 00:18:59,960 What we're specifying here is we're specifying 358 00:18:59,960 --> 00:19:06,732 which files we want to collect in order to run the check50 correctness checks. 359 00:19:06,732 --> 00:19:09,440 And so in general, if maybe a student's in a folder where they've 360 00:19:09,440 --> 00:19:11,720 got a lot of different files, but only one of them 361 00:19:11,720 --> 00:19:15,140 is one that you care about checking, you don't need to collect all of the files. 362 00:19:15,140 --> 00:19:19,460 You really only need to collect the one file that corresponds to the problem 363 00:19:19,460 --> 00:19:20,720 that you're trying to check. 364 00:19:20,720 --> 00:19:25,830 And so what we're saying here in this file is that we're saying !exclude "*," 365 00:19:25,830 --> 00:19:27,950 which means, by default, exclude everything. 366 00:19:27,950 --> 00:19:31,520 Don't include any of the files from the student's current folder 367 00:19:31,520 --> 00:19:35,360 in the code that is uploaded when a student runs check50. 368 00:19:35,360 --> 00:19:39,660 But on the line immediately following that, !require "cash.c," we're saying, 369 00:19:39,660 --> 00:19:44,510 all right, but cash.c, that is a file that needs to be present in order 370 00:19:44,510 --> 00:19:46,930 for us to run check50. 371 00:19:46,930 --> 00:19:48,810 It's a file that we must collect. 372 00:19:48,810 --> 00:19:53,300 So if a student has 10 different files, what this configuration is going to say 373 00:19:53,300 --> 00:19:58,160 is ignore the other nine, but only collect cash.c as the file 374 00:19:58,160 --> 00:20:02,070 that we care about collecting in order to run check50. 375 00:20:02,070 --> 00:20:07,140 So this file just specifies what files we want to collect from the student, 376 00:20:07,140 --> 00:20:10,680 and then in the second file, we actually write the checks. 377 00:20:10,680 --> 00:20:16,710 All of the checks are written in Python in a file called __init__.py. 378 00:20:16,710 --> 00:20:18,960 And the way that we've structured these checks 379 00:20:18,960 --> 00:20:23,850 is that each of the checks that you run is really just a Python function. 380 00:20:23,850 --> 00:20:26,140 You have one function for each check that you want. 381 00:20:26,140 --> 00:20:29,850 So if you've ever seen check50 running, like, 12 checks on a student's code, 382 00:20:29,850 --> 00:20:34,886 for example, that's really just 12 functions inside of this __init__.py 383 00:20:34,886 --> 00:20:36,920 file. 384 00:20:36,920 --> 00:20:38,210 What goes into this function? 385 00:20:38,210 --> 00:20:39,470 What does it do? 386 00:20:39,470 --> 00:20:42,890 The function begins with what's known as a Python decorator. 387 00:20:42,890 --> 00:20:45,110 OK if you're not familiar with what that is. 388 00:20:45,110 --> 00:20:47,420 In short, it's a line of code that, in this case, 389 00:20:47,420 --> 00:20:52,790 is going to tell check50 that this function represents a check50 check. 390 00:20:52,790 --> 00:20:55,550 So we put that line at the top to say that this function is 391 00:20:55,550 --> 00:20:57,770 going to be a check50 check. 392 00:20:57,770 --> 00:21:00,320 And then immediately below the function is 393 00:21:00,320 --> 00:21:03,920 what's known as a Python doc string, just some comment enclosed 394 00:21:03,920 --> 00:21:08,690 in triple quotation marks that describes what it is the function is doing. 395 00:21:08,690 --> 00:21:12,230 In this case, that means describing what it is that this correctness 396 00:21:12,230 --> 00:21:14,180 check is checking for. 397 00:21:14,180 --> 00:21:17,690 So here, this correctness check is checking that a particular input-- 398 00:21:17,690 --> 00:21:19,760 an input of 0.15-- 399 00:21:19,760 --> 00:21:23,580 will produce an output of 2 in this case. 400 00:21:23,580 --> 00:21:26,930 And this text that appears inside of this description 401 00:21:26,930 --> 00:21:31,100 is what the student will see next to their smiley face or frown face 402 00:21:31,100 --> 00:21:35,370 to indicate whether or not the correctness check passed or failed. 403 00:21:35,370 --> 00:21:38,783 So this description ends up being visible to students, as well. 404 00:21:38,783 --> 00:21:41,450 So an answer to the question a little bit earlier about possibly 405 00:21:41,450 --> 00:21:46,040 translating these check results into other languages, all that would take, 406 00:21:46,040 --> 00:21:48,890 then, is replacing this English description 407 00:21:48,890 --> 00:21:51,020 with a description in some other language 408 00:21:51,020 --> 00:21:55,010 to be able to see check results in a different language. 409 00:21:55,010 --> 00:21:56,720 And then inside the body of the function, 410 00:21:56,720 --> 00:21:59,420 you can include any Python logic you want. 411 00:21:59,420 --> 00:22:04,190 Anything that Python could check for these correctness checks in check50 412 00:22:04,190 --> 00:22:05,660 can check for, as well. 413 00:22:05,660 --> 00:22:07,790 We have added some built-in functions just 414 00:22:07,790 --> 00:22:10,580 to make some common operations easier to handle 415 00:22:10,580 --> 00:22:14,640 some common cases for things that you might want to check for, for example. 416 00:22:14,640 --> 00:22:17,870 So here, we're saying go ahead and when you run check50, 417 00:22:17,870 --> 00:22:20,420 run the program ./cash. 418 00:22:20,420 --> 00:22:25,250 Then provide as standard input the value 0.15. 419 00:22:25,250 --> 00:22:30,560 Then expect as the output of that program, standard output, the number 2. 420 00:22:30,560 --> 00:22:34,140 Then expect that the program will exit with a status code of 0. 421 00:22:34,140 --> 00:22:37,370 In other words, make sure the program exits successfully 422 00:22:37,370 --> 00:22:39,400 without any problems. 423 00:22:39,400 --> 00:22:41,720 So just by chaining these functions together, 424 00:22:41,720 --> 00:22:46,010 run ./cash provides some input, expect some output, expect an exit code, 425 00:22:46,010 --> 00:22:50,720 that's all you need to do to construct a check50 check that provides some input 426 00:22:50,720 --> 00:22:53,210 and is looking for some particular output. 427 00:22:53,210 --> 00:22:57,740 And if a student's code doesn't provide the number 2 as output, for example, 428 00:22:57,740 --> 00:23:00,740 then the student will not pass this particular check. 429 00:23:00,740 --> 00:23:03,200 So you can add these checks in functions one 430 00:23:03,200 --> 00:23:05,990 after another to chain together a whole bunch of checks 431 00:23:05,990 --> 00:23:12,800 that will automatically run on students' code any time that check50 is executed. 432 00:23:12,800 --> 00:23:15,860 Questions then about writing checks? 433 00:23:15,860 --> 00:23:21,230 About the syntax, about how you would go about writing your own check50 checks? 434 00:23:21,230 --> 00:23:23,318 Yeah, Ahmaud, question? 435 00:23:23,318 --> 00:23:24,110 AHMAUD: Hey, Brian. 436 00:23:24,110 --> 00:23:25,850 A couple of questions. 437 00:23:25,850 --> 00:23:29,380 I'll start with the least relevant. 438 00:23:29,380 --> 00:23:32,030 What tool are you currently using to switch 439 00:23:32,030 --> 00:23:36,890 your camera with the presentation? 440 00:23:36,890 --> 00:23:39,620 BRIAN YU: The tool that we use for these presentations is 441 00:23:39,620 --> 00:23:42,038 Open Broadcaster or OBS. 442 00:23:42,038 --> 00:23:45,080 Arturo or someone else might be able to paste a link to that in the chat, 443 00:23:45,080 --> 00:23:46,520 as well. 444 00:23:46,520 --> 00:23:47,450 AHMAUD: OK, beautiful. 445 00:23:47,450 --> 00:23:47,950 Thanks. 446 00:23:47,950 --> 00:23:55,250 Second question, regarding check50, when the student runs check50, 447 00:23:55,250 --> 00:24:00,390 where is the actual process happening? 448 00:24:00,390 --> 00:24:03,857 Is it happening in his IDE, or it's happening somewhere else? 449 00:24:03,857 --> 00:24:05,690 BRIAN YU: It's happening externally, so it's 450 00:24:05,690 --> 00:24:08,180 happening on a server that's pre-configured 451 00:24:08,180 --> 00:24:10,220 with a standard environment. 452 00:24:10,220 --> 00:24:14,510 And we do that just to make sure there's consistency across different platforms. 453 00:24:14,510 --> 00:24:17,780 If a student is using a different version of Python 454 00:24:17,780 --> 00:24:20,362 or has different packages installed, for example, their code 455 00:24:20,362 --> 00:24:23,570 might behave differently than if they were to run it on a different computer. 456 00:24:23,570 --> 00:24:25,487 So just to make sure everything is consistent, 457 00:24:25,487 --> 00:24:28,910 we always run check50 in the same environment. 458 00:24:28,910 --> 00:24:37,143 AHMAUD: OK, are we able to change this configuration? 459 00:24:37,143 --> 00:24:38,810 BRIAN YU: There are a couple of options. 460 00:24:38,810 --> 00:24:42,100 You can, if you would like to, run check50 locally such 461 00:24:42,100 --> 00:24:44,492 that you're not running it on our external server. 462 00:24:44,492 --> 00:24:46,450 There's a command-line argument you can provide 463 00:24:46,450 --> 00:24:51,850 to check50, which is just --local that will let you run check50 locally 464 00:24:51,850 --> 00:24:54,040 without connecting to the internet. 465 00:24:54,040 --> 00:24:56,590 That can be helpful if you want to run it 466 00:24:56,590 --> 00:25:00,190 in an environment that has some particular configuration as determined 467 00:25:00,190 --> 00:25:01,510 by you. 468 00:25:01,510 --> 00:25:07,360 We also allow you to specify, inside of the YAML file before, dependencies. 469 00:25:07,360 --> 00:25:09,400 If you have Python packages that you would 470 00:25:09,400 --> 00:25:15,280 like to install before you run check50, you can specify those dependencies, 471 00:25:15,280 --> 00:25:17,690 as well. 472 00:25:17,690 --> 00:25:21,880 AHMAUD: Now, to understand everything, if I am running check50 473 00:25:21,880 --> 00:25:26,650 in the local IDE, which is the darker version, 474 00:25:26,650 --> 00:25:35,130 will it be by default still connecting to the internet to check on your, 475 00:25:35,130 --> 00:25:38,880 or I need to do the --local thing. 476 00:25:38,880 --> 00:25:42,360 BRIAN YU: The offline IDE is actually a little bit outdated. 477 00:25:42,360 --> 00:25:44,610 It's not the most recent version of the IDE, which 478 00:25:44,610 --> 00:25:47,250 I think Kareem mentioned in yesterday's session, 479 00:25:47,250 --> 00:25:52,080 as well, so it's probably currently using an old version of check50. 480 00:25:52,080 --> 00:25:56,100 But in short, if you're upgraded to the latest version of check50, 481 00:25:56,100 --> 00:25:59,700 by default, it will always try to upload code 482 00:25:59,700 --> 00:26:04,140 so that we can run it on our servers, but you can always 483 00:26:04,140 --> 00:26:10,152 use the --local command in order to allow for running that command locally. 484 00:26:10,152 --> 00:26:11,750 AHMAUD: OK, thank you. 485 00:26:11,750 --> 00:26:14,120 BRIAN YU: Yeah, of course. 486 00:26:14,120 --> 00:26:15,270 Other questions? 487 00:26:15,270 --> 00:26:18,250 Let's go to Shefket, if I'm pronouncing that right. 488 00:26:18,250 --> 00:26:19,790 SPEAKER 7: OK, thank you. 489 00:26:19,790 --> 00:26:21,650 Just a short question. 490 00:26:21,650 --> 00:26:26,390 If I'm using virtual machines, virtual systems in general, 491 00:26:26,390 --> 00:26:28,190 I may have any restriction? 492 00:26:28,190 --> 00:26:35,330 Or I can apply in which I want to use these tools? 493 00:26:35,330 --> 00:26:38,460 BRIAN YU: So long as the ends are connected to the internet, 494 00:26:38,460 --> 00:26:41,570 they should be able to run check50 in order to upload the code to GitHub 495 00:26:41,570 --> 00:26:44,410 and then poll for the results to come back. 496 00:26:44,410 --> 00:26:48,070 But if they're offline, you can also run check50 locally 497 00:26:48,070 --> 00:26:50,820 as I was describing in response to the previous question, as well. 498 00:26:50,820 --> 00:26:53,255 So both are potentially options. 499 00:26:53,255 --> 00:26:54,480 SPEAKER 7: OK, thank you. 500 00:26:54,480 --> 00:26:55,563 BRIAN YU: Yeah, of course. 501 00:26:55,563 --> 00:26:57,455 502 00:26:57,455 --> 00:26:58,580 We'll go ahead and move on. 503 00:26:58,580 --> 00:27:02,330 So a couple of other things to note just about writing checks that we support. 504 00:27:02,330 --> 00:27:04,370 One are check dependencies, where you can 505 00:27:04,370 --> 00:27:07,430 have certain checks that are based on previous checks 506 00:27:07,430 --> 00:27:10,220 or that rely upon the passage of other checks. 507 00:27:10,220 --> 00:27:12,230 So in our problems, for example, we generally 508 00:27:12,230 --> 00:27:15,410 require that students code compile first, 509 00:27:15,410 --> 00:27:19,910 and only if it compiles will we bother to run any of the other correctness 510 00:27:19,910 --> 00:27:21,560 checks, for example. 511 00:27:21,560 --> 00:27:23,420 These checks support custom help messages 512 00:27:23,420 --> 00:27:26,870 as I described before, too, where you can provide some additional information 513 00:27:26,870 --> 00:27:28,920 to the student if you would like to. 514 00:27:28,920 --> 00:27:32,180 And then, as I also mentioned in response to a question, 515 00:27:32,180 --> 00:27:34,470 custom Python packages are supported this well. 516 00:27:34,470 --> 00:27:36,440 So if you're doing an assignment in Python that 517 00:27:36,440 --> 00:27:39,530 requires the installation of particular packages, 518 00:27:39,530 --> 00:27:42,590 you can install those into check50 so that we 519 00:27:42,590 --> 00:27:47,070 will install those packages prior to running students' code, for example. 520 00:27:47,070 --> 00:27:49,550 So hopefully that flexibility allows for a lot 521 00:27:49,550 --> 00:27:52,520 of different types of checks for a lot of different types of problems. 522 00:27:52,520 --> 00:27:54,270 And for more detail about all of this, you 523 00:27:54,270 --> 00:27:58,070 can go to cs50.readthedocs.io to read up more about check50 524 00:27:58,070 --> 00:27:59,950 and what that syntax is like. 525 00:27:59,950 --> 00:28:04,880 And you can, of course, go to our GitHub, github.com/cs50/problems, 526 00:28:04,880 --> 00:28:07,520 to see all of our existing correctness checks. 527 00:28:07,520 --> 00:28:09,680 Many teachers, when designing their own, will 528 00:28:09,680 --> 00:28:13,340 choose to look at ours first and model them off of the correctness checks 529 00:28:13,340 --> 00:28:15,020 that we have already created. 530 00:28:15,020 --> 00:28:20,580 So all of those are options to you for check50. 531 00:28:20,580 --> 00:28:24,570 Now, before I move on from check50, any final questions about anything 532 00:28:24,570 --> 00:28:28,140 related to check50 before we go on to tool number two for today? 533 00:28:28,140 --> 00:28:31,070 534 00:28:31,070 --> 00:28:32,460 Final things on check50. 535 00:28:32,460 --> 00:28:35,710 Feel free to either raise your hand or ask in the chat if you have a question. 536 00:28:35,710 --> 00:28:39,070 537 00:28:39,070 --> 00:28:42,070 All right, we can certainly come back to it if there are more questions, 538 00:28:42,070 --> 00:28:44,737 but I think for now, we'll go ahead and move on to the next tool 539 00:28:44,737 --> 00:28:48,010 that we're going to talk about, which is submit50. 540 00:28:48,010 --> 00:28:53,240 submit50 is a command-line tool for submitting students' work. 541 00:28:53,240 --> 00:28:55,490 Different teachers will choose to do this differently. 542 00:28:55,490 --> 00:28:59,200 Some teachers have their own systems from their schools, 543 00:28:59,200 --> 00:29:02,710 for example, that require students to submit their work in a particular way 544 00:29:02,710 --> 00:29:05,350 using some LMS, for example. 545 00:29:05,350 --> 00:29:08,770 submit50 is a tool that we have written to make it easy for students 546 00:29:08,770 --> 00:29:12,880 to submit their own work, and for you then to be able to collect that work 547 00:29:12,880 --> 00:29:14,110 and view the results. 548 00:29:14,110 --> 00:29:19,210 It integrates quite nicely with the other CS50 tools, as well. 549 00:29:19,210 --> 00:29:22,930 The way that submit50 works is very similar to the way that check50 itself 550 00:29:22,930 --> 00:29:24,070 works. 551 00:29:24,070 --> 00:29:27,610 You can run submit50 followed by a submission slug, 552 00:29:27,610 --> 00:29:32,500 like cs50/problems/2020/x/cash, same submission slug that we were using 553 00:29:32,500 --> 00:29:34,417 before for check50. 554 00:29:34,417 --> 00:29:36,250 When that happens, students will be prompted 555 00:29:36,250 --> 00:29:38,720 to sign in with their GitHub username and password. 556 00:29:38,720 --> 00:29:42,700 We use GitHub again for storing all of a student's submissions. 557 00:29:42,700 --> 00:29:47,770 And from there, the students will be told what files will be submitted. 558 00:29:47,770 --> 00:29:51,660 So let's see, cash is the name of the file that's going to be submitted. 559 00:29:51,660 --> 00:29:54,580 If there are files that won't be submitted because they were excluded, 560 00:29:54,580 --> 00:29:56,560 students will see that, as well. 561 00:29:56,560 --> 00:29:58,900 Students will then be prompted to agree to the course's 562 00:29:58,900 --> 00:30:03,550 policy on academic honesty by typing Y or yes to indicate that they've 563 00:30:03,550 --> 00:30:05,920 agreed with that policy and are keeping in mind 564 00:30:05,920 --> 00:30:08,620 that the code that they are submitting should be their own. 565 00:30:08,620 --> 00:30:11,920 So students can type Y or yes in response to that question 566 00:30:11,920 --> 00:30:14,410 to be able to say, yes, I agree to the policy. 567 00:30:14,410 --> 00:30:17,860 Students' code is then uploaded and then it is submitted, 568 00:30:17,860 --> 00:30:21,130 and students can then go to submit.cs50.io, 569 00:30:21,130 --> 00:30:23,170 which we'll talk about in just a second, to view 570 00:30:23,170 --> 00:30:27,910 the results of their submission. 571 00:30:27,910 --> 00:30:31,850 Question in the chat-- can check50 be used for other languages? 572 00:30:31,850 --> 00:30:34,330 Yes, it can. 573 00:30:34,330 --> 00:30:37,480 check50, we've built in some functions that 574 00:30:37,480 --> 00:30:40,930 make it easy to work with languages like C and like Python, 575 00:30:40,930 --> 00:30:44,950 just because most of CS50's problems are in C or Python. 576 00:30:44,950 --> 00:30:49,330 But anything that a Python program could do check50 577 00:30:49,330 --> 00:30:52,660 can check for, because we're really just running a Python function. 578 00:30:52,660 --> 00:30:56,380 For SQL, for example, you can, from Python, run SQL queries 579 00:30:56,380 --> 00:30:57,860 and get back the results. 580 00:30:57,860 --> 00:31:02,290 So you could run a check50 check that is executing a SQL query on SQL line 581 00:31:02,290 --> 00:31:04,930 database, for example, getting back those results 582 00:31:04,930 --> 00:31:07,197 and verifying that the results are correct 583 00:31:07,197 --> 00:31:09,280 or that the correct number of rows have come back, 584 00:31:09,280 --> 00:31:12,790 or anything along those lines. 585 00:31:12,790 --> 00:31:16,320 I see a question from Charlene, if you'd like to ask a question. 586 00:31:16,320 --> 00:31:17,110 CHARLENE: Lovely. 587 00:31:17,110 --> 00:31:27,490 I've just realized with CS50t that you used the submissions via Google Forms. 588 00:31:27,490 --> 00:31:28,840 Is there any reason for that? 589 00:31:28,840 --> 00:31:39,490 And can CS50w, for example, submissions be submitted via Forms, as well? 590 00:31:39,490 --> 00:31:41,655 I just noticed that there's a difference there. 591 00:31:41,655 --> 00:31:42,280 BRIAN YU: Yeah. 592 00:31:42,280 --> 00:31:46,840 In general, we tend to use submit50 for submission of code 593 00:31:46,840 --> 00:31:49,500 in particular, because submit50 uploads code 594 00:31:49,500 --> 00:31:52,720 to GitHub, which makes it very easy to do commenting on code, which 595 00:31:52,720 --> 00:31:54,440 I'll show you in a moment, as well. 596 00:31:54,440 --> 00:31:56,500 So any time students are submitting code, 597 00:31:56,500 --> 00:31:59,410 we'll generally opt to use submit50. 598 00:31:59,410 --> 00:32:04,180 But in some of our courses, CS50t among them, some of the assignments 599 00:32:04,180 --> 00:32:06,400 are text-based where we're just asking a question 600 00:32:06,400 --> 00:32:09,580 and expecting students to write a paragraph, for example. 601 00:32:09,580 --> 00:32:11,590 You certainly could use submit50 for this, 602 00:32:11,590 --> 00:32:14,080 where you would have students open up a text file, 603 00:32:14,080 --> 00:32:17,830 write their response in a text file, and then run submit50 to upload 604 00:32:17,830 --> 00:32:20,090 that text file to GitHub. 605 00:32:20,090 --> 00:32:23,980 But in the case of a class like CS50t, where we assume much less technical 606 00:32:23,980 --> 00:32:27,580 experience-- we're not expecting students to know how to use the command 607 00:32:27,580 --> 00:32:29,090 line, for example-- 608 00:32:29,090 --> 00:32:33,340 Google Forms just makes it a little bit easier to submit text-based responses. 609 00:32:33,340 --> 00:32:38,950 So among our courses, you'll often find that for our lead-in courses into CS50, 610 00:32:38,950 --> 00:32:41,520 where students aren't really doing as much programming, 611 00:32:41,520 --> 00:32:44,590 we'll often use Google Forms just for text-based responses. 612 00:32:44,590 --> 00:32:47,340 But generally, for anything that has to do with code, 613 00:32:47,340 --> 00:32:49,240 we'll more often use submit50. 614 00:32:49,240 --> 00:32:50,437 CHARLENE: Lovely, thank you. 615 00:32:50,437 --> 00:32:51,520 BRIAN YU: Yeah, of course. 616 00:32:51,520 --> 00:32:54,730 617 00:32:54,730 --> 00:32:59,170 So again, when students run submit50, all of that goes onto GitHub. 618 00:32:59,170 --> 00:33:03,440 Every student gets a repository for their own submissions. 619 00:33:03,440 --> 00:33:05,560 It's located in a GitHub organization called 620 00:33:05,560 --> 00:33:10,970 me50, and the name of the repository the student's GitHub username. 621 00:33:10,970 --> 00:33:16,200 So if you go to, for example, github.com/me50/ your own GitHub 622 00:33:16,200 --> 00:33:19,660 username, if you've ever taken one of CS50's courses, 623 00:33:19,660 --> 00:33:23,650 you'll probably find that you have a me50 repository that has everything 624 00:33:23,650 --> 00:33:27,310 that you've ever submitted to CS50's courses. 625 00:33:27,310 --> 00:33:29,450 So that repository stores everything. 626 00:33:29,450 --> 00:33:33,340 And the way that we divide that up is that we have one branch per problem, 627 00:33:33,340 --> 00:33:36,370 so any time you submit to a different submission slug, 628 00:33:36,370 --> 00:33:39,715 we'll end up pushing that code to a different branch. 629 00:33:39,715 --> 00:33:42,410 630 00:33:42,410 --> 00:33:49,040 Question in the chat-- do you collect community-written check50 extensions? 631 00:33:49,040 --> 00:33:52,110 check50 is designed to support extensions, though, to date, 632 00:33:52,110 --> 00:33:54,170 most people have just been using the tools 633 00:33:54,170 --> 00:33:56,240 that are built into check50 itself. 634 00:33:56,240 --> 00:33:58,460 You can find more details about how that works 635 00:33:58,460 --> 00:34:02,180 and what extensions are like on check50's GitHub repository. 636 00:34:02,180 --> 00:34:08,199 It's all open-source, and it's all available at github.com/cs50/check50. 637 00:34:08,199 --> 00:34:09,199 So all of that is there. 638 00:34:09,199 --> 00:34:12,400 And I know that people in the past have been able to run check50 639 00:34:12,400 --> 00:34:13,818 to test Java code, as well. 640 00:34:13,818 --> 00:34:16,360 I think generally, they're not writing additional extensions, 641 00:34:16,360 --> 00:34:18,732 they're just writing checks that run Java, 642 00:34:18,732 --> 00:34:21,440 but Java is definitely something that you can check with check50. 643 00:34:21,440 --> 00:34:25,989 644 00:34:25,989 --> 00:34:29,679 All right, when a student submits code via submit50 and it gets 645 00:34:29,679 --> 00:34:32,590 pushed to GitHub, ultimately, where you the teacher 646 00:34:32,590 --> 00:34:36,489 can then view that information is via this web application that a few of you 647 00:34:36,489 --> 00:34:37,900 had used, but not too many-- 648 00:34:37,900 --> 00:34:39,880 submit.cs50.io. 649 00:34:39,880 --> 00:34:44,080 So submit.cs50.io is a web application where 650 00:34:44,080 --> 00:34:46,750 you can view all of your students' submissions, 651 00:34:46,750 --> 00:34:49,780 as well as all of their scores on those submissions. 652 00:34:49,780 --> 00:34:52,000 So I'll go ahead and demonstrate for you what 653 00:34:52,000 --> 00:34:54,340 this entire workflow might look like. 654 00:34:54,340 --> 00:34:57,730 Let's imagine a student who's working on some assignment, 655 00:34:57,730 --> 00:35:00,460 and that student runs submit50. 656 00:35:00,460 --> 00:35:03,280 The student runs submit50 inside of their IDE, 657 00:35:03,280 --> 00:35:07,060 but it could also be on their computer or elsewhere. 658 00:35:07,060 --> 00:35:10,300 When that happens, we take the student's submission, 659 00:35:10,300 --> 00:35:12,490 and we upload that submission to GitHub. 660 00:35:12,490 --> 00:35:15,700 So what submit50 does is it takes the student's submission, 661 00:35:15,700 --> 00:35:18,850 and it pushes their code to a GitHub repository 662 00:35:18,850 --> 00:35:22,510 unique to that student, where every student has a different GitHub 663 00:35:22,510 --> 00:35:23,860 repository. 664 00:35:23,860 --> 00:35:28,420 Then what GitHub will do is that any time GitHub receives a new submission, 665 00:35:28,420 --> 00:35:31,270 GitHub will notify submit.cs50.io. 666 00:35:31,270 --> 00:35:35,080 It will tell submit.cs50.io that this student has 667 00:35:35,080 --> 00:35:39,760 a new submission for a particular problem, and we on submit.cs50.io 668 00:35:39,760 --> 00:35:42,640 will then download the student's code from GitHub, 669 00:35:42,640 --> 00:35:46,090 and we will automatically run check50 on that submission, 670 00:35:46,090 --> 00:35:49,860 automatically running all of the correctness checks for that problem 671 00:35:49,860 --> 00:35:52,810 and getting back some results for which checks passed 672 00:35:52,810 --> 00:35:56,470 and which checks did not pass, so that we can store those results 673 00:35:56,470 --> 00:35:58,420 and then present them to you. 674 00:35:58,420 --> 00:36:02,320 So all in all, as soon as a student submits via submit50, 675 00:36:02,320 --> 00:36:05,540 it kicks off this workflow, uploading the code to GitHub, 676 00:36:05,540 --> 00:36:09,640 us getting that code on submit.cs50.io, running the correctness checks, 677 00:36:09,640 --> 00:36:13,270 and then storing the result of those correctness checks 678 00:36:13,270 --> 00:36:17,160 inside of submit.cs50.io. 679 00:36:17,160 --> 00:36:20,150 So that makes it much easier from the perspective of someone who 680 00:36:20,150 --> 00:36:23,640 is running the course, if you're the teacher of the class, for example. 681 00:36:23,640 --> 00:36:29,640 You don't need to worry about downloading all of your students' code 682 00:36:29,640 --> 00:36:33,720 and running check50 on all of your submissions. 683 00:36:33,720 --> 00:36:38,250 You can just have submit.cs50.io take care of this process automatically. 684 00:36:38,250 --> 00:36:42,090 Any time a student submits, you'll be able to see the results 685 00:36:42,090 --> 00:36:44,860 of that automated correctness checking. 686 00:36:44,860 --> 00:36:48,700 So how is it that you can actually go about collecting these responses 687 00:36:48,700 --> 00:36:50,710 and viewing all students' work? 688 00:36:50,710 --> 00:36:55,840 Well, if you go to this URL, submit.cs50.io/courses/new, 689 00:36:55,840 --> 00:37:00,520 that will allow you to create a new course on submit.cs50.io. 690 00:37:00,520 --> 00:37:02,440 And generally speaking, you'll see a window 691 00:37:02,440 --> 00:37:05,740 that looks a little something like this where you can name the course. 692 00:37:05,740 --> 00:37:09,100 Most teachers will name the course after, like, the year of the course, 693 00:37:09,100 --> 00:37:12,160 and if they're teaching multiple, different classes in the same year, 694 00:37:12,160 --> 00:37:14,590 maybe assign each one a different, unique name just 695 00:37:14,590 --> 00:37:17,060 to help keep different things separate. 696 00:37:17,060 --> 00:37:21,200 So you give a name to your new course, and then once you create the course, 697 00:37:21,200 --> 00:37:27,460 you'll be presented with a page that looks a little something like this. 698 00:37:27,460 --> 00:37:33,620 And on that page, you as the teacher will see this link here, 699 00:37:33,620 --> 00:37:36,940 which is the invitation link for the course. 700 00:37:36,940 --> 00:37:41,080 That invitation is how you add new students to the course. 701 00:37:41,080 --> 00:37:43,690 So by taking that link and sharing it with students, 702 00:37:43,690 --> 00:37:47,350 students will then be able to click on that URL, and when they do, 703 00:37:47,350 --> 00:37:53,680 they'll be prompted to join your course on submit.cs50.io. 704 00:37:53,680 --> 00:37:56,620 What that will do is it will give you the teacher access 705 00:37:56,620 --> 00:37:58,210 to the students' submissions. 706 00:37:58,210 --> 00:38:02,320 So recall, again, that every student has their own me50 repository. 707 00:38:02,320 --> 00:38:04,507 By default, that repository is private because we 708 00:38:04,507 --> 00:38:06,340 don't want just anyone on the internet to be 709 00:38:06,340 --> 00:38:08,950 able to go to that me50 repository and be 710 00:38:08,950 --> 00:38:11,620 able to see all of the student's work. 711 00:38:11,620 --> 00:38:14,230 But when a student clicks on your invitation 712 00:38:14,230 --> 00:38:18,160 link for your submit.cs50.io course, you will automatically 713 00:38:18,160 --> 00:38:22,000 be granted access to that student's me50 repository 714 00:38:22,000 --> 00:38:26,660 so that you can see all of the work that they have submitted. 715 00:38:26,660 --> 00:38:29,770 So that's what the invitation link there is used for. 716 00:38:29,770 --> 00:38:34,300 Beneath that, you can specify all of these submission slugs 717 00:38:34,300 --> 00:38:37,590 that you care about collecting for the course. 718 00:38:37,590 --> 00:38:41,990 So submission slugs again are composed of those various, different parts, 719 00:38:41,990 --> 00:38:45,160 like a repository, then a branch, then the name of the problem. 720 00:38:45,160 --> 00:38:48,700 And the me50 repository contains all of the submissions 721 00:38:48,700 --> 00:38:52,180 that a student has across any course within CF50's ecosystem 722 00:38:52,180 --> 00:38:53,960 that they've taken, for example. 723 00:38:53,960 --> 00:38:57,370 And so if you only want to collect particular slugs, 724 00:38:57,370 --> 00:39:01,900 you can specify here which submission slugs you want to track, 725 00:39:01,900 --> 00:39:04,810 or even which prefixes of submission slugs 726 00:39:04,810 --> 00:39:06,590 that you want to catch for example. 727 00:39:06,590 --> 00:39:11,770 So if you only want to collect submission slugs from the 2020x version 728 00:39:11,770 --> 00:39:17,590 of CS50x, then you can specify, like, CS50 problems 2020x in the submission 729 00:39:17,590 --> 00:39:20,860 slugs area there to be sure that you're only collecting 730 00:39:20,860 --> 00:39:24,790 those particular submissions. 731 00:39:24,790 --> 00:39:26,350 That can all be specified there. 732 00:39:26,350 --> 00:39:29,470 And there are some other settings further down below, as well, 733 00:39:29,470 --> 00:39:33,170 where you can, for example, add additional teachers to your course. 734 00:39:33,170 --> 00:39:35,110 So, for example, if you're teaching a course, 735 00:39:35,110 --> 00:39:37,870 and that course has TAs that are assisting you 736 00:39:37,870 --> 00:39:40,150 in the process of working with students, as well, 737 00:39:40,150 --> 00:39:43,630 you can provide staff-specific invitation links 738 00:39:43,630 --> 00:39:47,410 that will give the staff access to all of their students' repositories 739 00:39:47,410 --> 00:39:50,220 as well. 740 00:39:50,220 --> 00:39:54,990 That, then, is how you can configure a submit.cs50.io course. 741 00:39:54,990 --> 00:39:57,820 And from there, if you notice at the top of the page, 742 00:39:57,820 --> 00:39:59,970 you'll notice a button that says Submissions. 743 00:39:59,970 --> 00:40:02,520 By clicking on that button, you'll be able to see 744 00:40:02,520 --> 00:40:07,020 all of the submissions from any of your students inside of this course. 745 00:40:07,020 --> 00:40:11,640 And what that usually looks like is a little something like this. 746 00:40:11,640 --> 00:40:13,770 Along the top, you'll see all of the problems 747 00:40:13,770 --> 00:40:15,455 to which students have submitted. 748 00:40:15,455 --> 00:40:17,580 So in this case, this is an example from last fall, 749 00:40:17,580 --> 00:40:21,720 where students submitted to cs50/problems/2019/fall/cash, 750 00:40:21,720 --> 00:40:22,770 for example. 751 00:40:22,770 --> 00:40:25,950 And I can see that, OK, this is the number of students that have submitted, 752 00:40:25,950 --> 00:40:28,740 and then I'll see a list of all of the submissions 753 00:40:28,740 --> 00:40:31,170 that have been made to that branch. 754 00:40:31,170 --> 00:40:33,570 So here, I'm seeing, for example, one student 755 00:40:33,570 --> 00:40:36,590 who looks like they submitted five minutes ago. 756 00:40:36,590 --> 00:40:39,480 And importantly, beneath the time of their submission, 757 00:40:39,480 --> 00:40:43,940 you'll see the automated results from check50 and style50. 758 00:40:43,940 --> 00:40:47,520 So you'll see the score that check50 provided for the submission. 759 00:40:47,520 --> 00:40:52,470 It looks like, in this case, the student got 10 out of the 11 checks correct. 760 00:40:52,470 --> 00:40:56,550 Then they'll also see a style score, rated on a scale from 0 to 1, 761 00:40:56,550 --> 00:41:01,180 1 meaning all of the lines or 100% of the lines are correctly styled, 762 00:41:01,180 --> 00:41:06,150 and 0 meaning none of the lines are correctly styled. 763 00:41:06,150 --> 00:41:09,900 Questions about how all of this works? 764 00:41:09,900 --> 00:41:11,850 I see Oleg has a question. 765 00:41:11,850 --> 00:41:14,050 OLEG: Thank you very much for waiting. 766 00:41:14,050 --> 00:41:20,520 I have a question regarding the grade that is being displayed in [INAUDIBLE].. 767 00:41:20,520 --> 00:41:24,660 I have written checks on my own, and then I've read all the documentation, 768 00:41:24,660 --> 00:41:28,440 but I haven't found the instruction of how we can 769 00:41:28,440 --> 00:41:34,350 apply the gradings for our own tasks. 770 00:41:34,350 --> 00:41:35,920 Is it possible? 771 00:41:35,920 --> 00:41:37,380 BRIAN YU: Absolutely possible. 772 00:41:37,380 --> 00:41:42,870 All students need to do is run submit50 with your submission slug. 773 00:41:42,870 --> 00:41:45,900 So if they run submit50 with your submission slug, 774 00:41:45,900 --> 00:41:50,340 that will automatically run the check50 those submissions, 775 00:41:50,340 --> 00:41:54,670 assuming that you've enabled check50 in the .cs50 YAML file from before. 776 00:41:54,670 --> 00:41:58,680 So back when I was talking about how to configure check50, 777 00:41:58,680 --> 00:42:03,390 you saw the CS50 YAML file where we had, like, check50, and then specified what 778 00:42:03,390 --> 00:42:04,560 files to collect. 779 00:42:04,560 --> 00:42:08,940 That will just need to be there so that we know which files we're expecting to, 780 00:42:08,940 --> 00:42:11,280 check and we know to run check50. 781 00:42:11,280 --> 00:42:15,640 But so long as that a file is there and configured to enable check50, 782 00:42:15,640 --> 00:42:20,550 we will automatically run check50 whenever a new student submits. 783 00:42:20,550 --> 00:42:22,770 OLEG: And as a result, the grade should be 784 00:42:22,770 --> 00:42:27,010 displayed, because this styling is being displayed, but the grade is not there. 785 00:42:27,010 --> 00:42:29,582 BRIAN YU: Yeah, the style is run automatically, too. 786 00:42:29,582 --> 00:42:31,290 If the correctness score doesn't show up, 787 00:42:31,290 --> 00:42:35,070 it probably means that your check50 for that submission slug 788 00:42:35,070 --> 00:42:37,430 isn't correctly configured. 789 00:42:37,430 --> 00:42:41,220 And if ever having any trouble with configuration for check50, 790 00:42:41,220 --> 00:42:44,250 you can always email this email address that I've pasted into the chat 791 00:42:44,250 --> 00:42:47,970 here, sysadmins@cs50.harvard.edu. 792 00:42:47,970 --> 00:42:51,360 When you email that email address, just include a screenshot of the area 793 00:42:51,360 --> 00:42:54,060 that you're seeing, and give us the name of the submission slug 794 00:42:54,060 --> 00:42:56,520 that you're using, and the team can take a look 795 00:42:56,520 --> 00:43:00,210 and help you out with getting all of that configuration set up. 796 00:43:00,210 --> 00:43:01,430 OLEG: Thank you. 797 00:43:01,430 --> 00:43:02,690 BRIAN YU: Yeah, of course. 798 00:43:02,690 --> 00:43:05,210 Let's go now to John. 799 00:43:05,210 --> 00:43:09,090 John, if you'd like to ask a question. 800 00:43:09,090 --> 00:43:20,180 JOHN: Is there a way to gain greater control or custom rules for style50? 801 00:43:20,180 --> 00:43:22,550 Is that in the docs? 802 00:43:22,550 --> 00:43:25,970 BRIAN YU: Style50, ultimately, in terms of its implementation, 803 00:43:25,970 --> 00:43:30,390 is really a wrapper around existing linting tools. 804 00:43:30,390 --> 00:43:35,330 So for C, for example, we're using the tool AStyle to style the code, 805 00:43:35,330 --> 00:43:40,520 and for Python, I believe we're using autopep8 or something that 806 00:43:40,520 --> 00:43:43,770 checks it against the PEP8 standard Python style guide. 807 00:43:43,770 --> 00:43:46,760 So if you're just using style50 alone, it's 808 00:43:46,760 --> 00:43:51,110 going to be using those particular tools in their standard configuration, 809 00:43:51,110 --> 00:43:54,200 but the entire tool is open-source such that if you 810 00:43:54,200 --> 00:43:58,760 wanted to change the particular flags that we're providing to these styling 811 00:43:58,760 --> 00:44:02,290 tools for example, or even swap it out entirely with a different tool, 812 00:44:02,290 --> 00:44:04,040 that's something that you can do, as well. 813 00:44:04,040 --> 00:44:08,420 And I'll go ahead and paste the link to the style50 repository 814 00:44:08,420 --> 00:44:12,910 where you can see exactly how that's working for any of the files. 815 00:44:12,910 --> 00:44:15,800 And if you scroll down in the read me to the section 816 00:44:15,800 --> 00:44:19,940 about adding a new language, that will show you 817 00:44:19,940 --> 00:44:25,400 what it looks like to add some custom styling rules to style50. 818 00:44:25,400 --> 00:44:27,740 In short, you just need to implement a function called 819 00:44:27,740 --> 00:44:31,028 style that takes as argument the code, and then determines how 820 00:44:31,028 --> 00:44:32,570 it is that you want to style to code. 821 00:44:32,570 --> 00:44:34,610 And you can see the examples for how we've 822 00:44:34,610 --> 00:44:38,290 done that for C and Python and a few other languages already. 823 00:44:38,290 --> 00:44:39,800 JOHN: Awesome, thank you. 824 00:44:39,800 --> 00:44:41,500 BRIAN YU: Yeah, of course. 825 00:44:41,500 --> 00:44:43,240 Let's go back to Ahmaud. 826 00:44:43,240 --> 00:44:44,110 AHMAUD: OK. 827 00:44:44,110 --> 00:44:49,870 I'm wondering, regarding the Scratch, the Scratch is 828 00:44:49,870 --> 00:44:52,630 a little bit different from other problem sets, 829 00:44:52,630 --> 00:44:57,520 so how is submit50 different in the case of Scratch, 830 00:44:57,520 --> 00:45:02,890 and what kind of customizations are we allowed to do here? 831 00:45:02,890 --> 00:45:05,210 SPEAKER 1: Yeah, it's a good point. 832 00:45:05,210 --> 00:45:09,793 So when students are working on Scratch, at least in the context of CS50, 833 00:45:09,793 --> 00:45:12,960 at this point in time, they haven't yet been introduced to the command line, 834 00:45:12,960 --> 00:45:16,780 so we wouldn't expect them to run submit50 from the command line. 835 00:45:16,780 --> 00:45:20,590 So instead, we also have a front-end uploading interface 836 00:45:20,590 --> 00:45:25,870 built into submit.cs50.io where students can directly upload a file to submit, 837 00:45:25,870 --> 00:45:28,840 rather than running submit50, so there are multiple ways 838 00:45:28,840 --> 00:45:30,670 you can potentially submit work. 839 00:45:30,670 --> 00:45:33,580 But the front end of uploading the file is really 840 00:45:33,580 --> 00:45:36,680 doing the same thing that submit50 itself is doing, 841 00:45:36,680 --> 00:45:39,010 which is to say just pushing students' code 842 00:45:39,010 --> 00:45:42,280 to a branch of their me50 repository. 843 00:45:42,280 --> 00:45:46,090 So Scratch works in that way, where we just have students directly upload 844 00:45:46,090 --> 00:45:50,090 their Scratch project to submit50. 845 00:45:50,090 --> 00:45:53,110 And then in terms of running the automated correctness checks, 846 00:45:53,110 --> 00:45:55,390 this is another case where, with Scratch, it's 847 00:45:55,390 --> 00:45:58,390 not quite as simple as other programs where we're testing 848 00:45:58,390 --> 00:46:00,700 an input and testing and output. 849 00:46:00,700 --> 00:46:02,800 Because in our Scratch requirements, we say 850 00:46:02,800 --> 00:46:06,070 you need to have at least one sprite that isn't a cat, 851 00:46:06,070 --> 00:46:09,070 and you need to have at least one loop and at least one condition, 852 00:46:09,070 --> 00:46:10,270 for example. 853 00:46:10,270 --> 00:46:14,140 It turns out that the file format that Scratch uses, 854 00:46:14,140 --> 00:46:18,130 the .sb3 file format for the latest version of Scratch, 855 00:46:18,130 --> 00:46:24,820 is really just a zipped up package that includes a .json file that contains all 856 00:46:24,820 --> 00:46:27,400 of the details about the students' Scratch submission. 857 00:46:27,400 --> 00:46:31,360 So what we do when we grade Scratch projects is that we, in check50, 858 00:46:31,360 --> 00:46:36,210 are parsing that .json file and are searching through it to find particular 859 00:46:36,210 --> 00:46:37,460 things that we're looking for. 860 00:46:37,460 --> 00:46:43,010 So we're looking to see do they use a loop somewhere in that configuration? 861 00:46:43,010 --> 00:46:45,970 Do they use a sound somewhere, for example? 862 00:46:45,970 --> 00:46:52,210 A fun little tidbit is that they need to use a sprite that is not a cat, 863 00:46:52,210 --> 00:46:56,080 for example, and we were unsure for a little bit of time how to do that. 864 00:46:56,080 --> 00:46:58,567 We found that, at least in a previous version of the check, 865 00:46:58,567 --> 00:47:00,400 one of the ways that we could check for that 866 00:47:00,400 --> 00:47:03,160 is that we could check to see if any sprite did not 867 00:47:03,160 --> 00:47:06,250 have the ability to meow, because in Scratch by default, 868 00:47:06,250 --> 00:47:08,510 the cat has a meow sound. 869 00:47:08,510 --> 00:47:11,110 So in one of the original versions of check50, at least, 870 00:47:11,110 --> 00:47:12,880 we would search through the students' code 871 00:47:12,880 --> 00:47:14,922 and make sure they had a sprite that didn't meow, 872 00:47:14,922 --> 00:47:19,210 and that was how we knew that they had a non-cat sprite somewhere in their code. 873 00:47:19,210 --> 00:47:22,150 All of that is open-source if you'd like to take a look at our checks 874 00:47:22,150 --> 00:47:24,150 for how we check through Scratch, and I'm 875 00:47:24,150 --> 00:47:26,150 happy to talk through that file format because I 876 00:47:26,150 --> 00:47:28,608 know it's a little bit confusing the first time you see it. 877 00:47:28,608 --> 00:47:31,140 878 00:47:31,140 --> 00:47:32,440 Shefket, question. 879 00:47:32,440 --> 00:47:36,900 SPEAKER 7: OK, thank you. 880 00:47:36,900 --> 00:47:40,760 Following the question of Ahmaud, does it-- 881 00:47:40,760 --> 00:47:44,280 any tool established to check for security projects or files 882 00:47:44,280 --> 00:47:45,850 submitted there? 883 00:47:45,850 --> 00:47:50,340 And the second one, for plagiarism in general. 884 00:47:50,340 --> 00:47:53,410 BRIAN YU: Yeah, so for check it for security project. 885 00:47:53,410 --> 00:47:56,985 If students are writing, I don't know, security-related projects 886 00:47:56,985 --> 00:47:59,860 or doing cryptography assignments-- is that what you're referring to? 887 00:47:59,860 --> 00:48:03,920 If so, check50 is designed to be as customizable as you want. 888 00:48:03,920 --> 00:48:06,580 As I mentioned a couple of times, each check 889 00:48:06,580 --> 00:48:10,510 itself is just a Python function, and so anything 890 00:48:10,510 --> 00:48:14,290 that you could write a Python function to do you could 891 00:48:14,290 --> 00:48:16,725 write a check50 check to check for it. 892 00:48:16,725 --> 00:48:18,850 And ultimately, what that function just needs to do 893 00:48:18,850 --> 00:48:22,480 is, for example, raise an exception if the check fails, 894 00:48:22,480 --> 00:48:25,930 and that is how check50 will know whether the check has passed or failed. 895 00:48:25,930 --> 00:48:28,490 So regardless of the domain in which the project falls, 896 00:48:28,490 --> 00:48:30,940 there is likely a way to write check50 checks to be 897 00:48:30,940 --> 00:48:34,950 able to integrate nicely with that. 898 00:48:34,950 --> 00:48:36,220 SPEAKER 7: Thank you, Brian. 899 00:48:36,220 --> 00:48:37,512 BRIAN YU: Yeah, of course. 900 00:48:37,512 --> 00:48:38,470 Was there another hand? 901 00:48:38,470 --> 00:48:41,480 I thought I saw another hand, but maybe it went down. 902 00:48:41,480 --> 00:48:48,805 Other questions about things related to check50 or submit50 or submit.cs50.io? 903 00:48:48,805 --> 00:48:53,860 904 00:48:53,860 --> 00:48:55,620 Patrick/ 905 00:48:55,620 --> 00:48:56,525 PATRICK: Yes, hi. 906 00:48:56,525 --> 00:49:01,800 I had a question regarding weighting the marks when doing grading. 907 00:49:01,800 --> 00:49:07,920 Is it possible to somehow make some checks more important than others? 908 00:49:07,920 --> 00:49:10,680 BRIAN YU: All we do is we run all of the checks 909 00:49:10,680 --> 00:49:15,710 and store the results of those checks, and what submit.cs50.io's interface 910 00:49:15,710 --> 00:49:19,080 will show you is the number of checks that were passed out 911 00:49:19,080 --> 00:49:21,150 of the total number of checks. 912 00:49:21,150 --> 00:49:26,650 That being said, you do have access to all of the raw check results, as well, 913 00:49:26,650 --> 00:49:30,120 such that you could request via API call to get 914 00:49:30,120 --> 00:49:32,190 the results for all of the correctness checks 915 00:49:32,190 --> 00:49:35,790 to be able to see individually the check50 results for which checks 916 00:49:35,790 --> 00:49:38,110 are passing and failing, for example. 917 00:49:38,110 --> 00:49:40,800 And so based on that, you could then decide 918 00:49:40,800 --> 00:49:45,000 if you want to weight certain checks as more valuable than other checks. 919 00:49:45,000 --> 00:49:47,640 And the unique identifier we use for those checks 920 00:49:47,640 --> 00:49:50,950 is just the name of the function for each of | checks, 921 00:49:50,950 --> 00:49:54,450 because each check is really just a Python function. 922 00:49:54,450 --> 00:49:56,500 In fact, the easiest way to do this, for example, 923 00:49:56,500 --> 00:49:58,560 if you're running check50 locally, is that you 924 00:49:58,560 --> 00:50:01,260 can adjust check50's output format. 925 00:50:01,260 --> 00:50:05,130 Via command-line argument, if you look at check50 itself, 926 00:50:05,130 --> 00:50:08,220 by default, the check50 will output the smiley 927 00:50:08,220 --> 00:50:10,870 faces with the names of the individual checks. 928 00:50:10,870 --> 00:50:14,730 But you can also have check50 configured to output in machine-readable mode, 929 00:50:14,730 --> 00:50:18,570 where it's really just going to output some .json data with all of the checks 930 00:50:18,570 --> 00:50:21,660 and whether they passed or failed, along with an identifier for each 931 00:50:21,660 --> 00:50:23,640 of the checks based on their function name. 932 00:50:23,640 --> 00:50:25,770 And so using that information, if you wanted 933 00:50:25,770 --> 00:50:28,890 to weight certain checks as more valuable than others, 934 00:50:28,890 --> 00:50:31,890 you could just multiply the values of particular checks 935 00:50:31,890 --> 00:50:33,830 in order to get the results you want. 936 00:50:33,830 --> 00:50:38,780 937 00:50:38,780 --> 00:50:40,050 Other things here? 938 00:50:40,050 --> 00:50:44,680 939 00:50:44,680 --> 00:50:48,670 All right, a few other things I'll note about submit.cs50.io. 940 00:50:48,670 --> 00:50:53,030 One is the comments link on the right-hand side here. 941 00:50:53,030 --> 00:50:59,480 So I talked about check50, style50, as well as, now, comments. 942 00:50:59,480 --> 00:51:02,770 One of the advantages of using GitHub for all of our submission 943 00:51:02,770 --> 00:51:07,630 is that it makes it very easy to comment on a student submission. 944 00:51:07,630 --> 00:51:09,700 When you click on 0 comments, for example, 945 00:51:09,700 --> 00:51:13,870 you'll be taken to a page in GitHub, so this is now GitHub's user interface 946 00:51:13,870 --> 00:51:17,440 and not our own, where you'll see the students code. 947 00:51:17,440 --> 00:51:19,870 If you want to comment on an individual line, 948 00:51:19,870 --> 00:51:22,690 you can use GitHub's inline commenting abilities, 949 00:51:22,690 --> 00:51:26,390 where, next to an individual line, I could, for example, 950 00:51:26,390 --> 00:51:30,190 click on the plus button to say I'd like to add a comment to this line. 951 00:51:30,190 --> 00:51:32,260 When you click on that, a little text field 952 00:51:32,260 --> 00:51:35,280 will open up where you can then write a comment to them. 953 00:51:35,280 --> 00:51:37,600 And because this is all integrated into GitHub, 954 00:51:37,600 --> 00:51:40,420 you get all of the nice GitHub commenting features for free. 955 00:51:40,420 --> 00:51:43,150 So when you add this comment, students are automatically 956 00:51:43,150 --> 00:51:46,732 notified by email of that new comment, so they'll be able to see it. 957 00:51:46,732 --> 00:51:49,690 And they can also start a threaded discussion, where a student can then 958 00:51:49,690 --> 00:51:52,690 reply to the comment, and you can reply to it, as well, in order 959 00:51:52,690 --> 00:51:56,200 to have a conversation about students' code right inside of GitHub 960 00:51:56,200 --> 00:51:58,570 interface itself. 961 00:51:58,570 --> 00:52:00,970 This feature, we hope, will help to make it easier 962 00:52:00,970 --> 00:52:03,190 for you to provide feedback on students' code 963 00:52:03,190 --> 00:52:05,770 by centralizing into one place where you can find 964 00:52:05,770 --> 00:52:09,310 the results of the correctness checks, the results of the style checks, 965 00:52:09,310 --> 00:52:12,460 and also the place where you can leave feedback on students' code, 966 00:52:12,460 --> 00:52:14,680 as well, by seeing all the code in one place, 967 00:52:14,680 --> 00:52:19,600 and then being able to provide inline comments on it, too. 968 00:52:19,600 --> 00:52:23,560 So that is a brief overview of now submit.cs50.io, 969 00:52:23,560 --> 00:52:25,950 in addition to submit50 and check50. 970 00:52:25,950 --> 00:52:28,930 I'll pause before moving into our final tool of the day 971 00:52:28,930 --> 00:52:30,460 to take a few questions. 972 00:52:30,460 --> 00:52:33,190 973 00:52:33,190 --> 00:52:35,180 There is a teacher's email list? 974 00:52:35,180 --> 00:52:40,610 Yes, there is a teacher's email list-- teachers@list.cs50.harvard.edu. 975 00:52:40,610 --> 00:52:44,000 We will send some emails to give you a way to sign up for that list 976 00:52:44,000 --> 00:52:46,500 if you're not already on it. 977 00:52:46,500 --> 00:52:49,640 Arturo, if you'd like to ask a question. 978 00:52:49,640 --> 00:52:51,670 ARTURO: Hey. 979 00:52:51,670 --> 00:52:56,540 Is there any way to create a batch? 980 00:52:56,540 --> 00:52:57,510 The problem is this-- 981 00:52:57,510 --> 00:53:00,420 in this community, the internet is very slow, 982 00:53:00,420 --> 00:53:04,340 and if I want the students to go through CS50, 983 00:53:04,340 --> 00:53:07,380 create the GitHub and everything, all of them [INAUDIBLE] 984 00:53:07,380 --> 00:53:09,720 be able to do it at once? 985 00:53:09,720 --> 00:53:14,100 So is there any way to do a batch submission 986 00:53:14,100 --> 00:53:20,490 where do they give me their software, their programs, and upload everything 987 00:53:20,490 --> 00:53:25,680 at once so it can be checked or they can go through CS50? 988 00:53:25,680 --> 00:53:29,725 I don't if I expressed well the question. 989 00:53:29,725 --> 00:53:32,400 BRIAN YU: Yeah, all a submission is as far 990 00:53:32,400 --> 00:53:36,450 as CS50's submit tools are concerned is that a submission is just 991 00:53:36,450 --> 00:53:40,660 a push to a particular branch of a GitHub repository. 992 00:53:40,660 --> 00:53:43,590 But because we don't expect that first-year computer science 993 00:53:43,590 --> 00:53:48,470 students have any familiarity with GitHub, submit50 as a command-line tool 994 00:53:48,470 --> 00:53:51,990 is really just a wrapper around a bunch of git commands 995 00:53:51,990 --> 00:53:55,140 that are pushing students code to get, without the students needing 996 00:53:55,140 --> 00:53:58,560 to know what git or GitHub or repositories of branches or commits 997 00:53:58,560 --> 00:53:59,460 are at all. 998 00:53:59,460 --> 00:54:01,680 We just abstract all of that away from students 999 00:54:01,680 --> 00:54:03,780 so they don't have to worry about it. 1000 00:54:03,780 --> 00:54:07,860 But because it's just pushing something to a branch, 1001 00:54:07,860 --> 00:54:11,580 while we don't have a script that automates it already, 1002 00:54:11,580 --> 00:54:13,770 it's something that anyone theoretically could 1003 00:54:13,770 --> 00:54:17,550 write to write a script that just pushes a whole bunch of submissions to GitHub, 1004 00:54:17,550 --> 00:54:21,540 because all that you need to do to submit, as far as CS50 is concerned, 1005 00:54:21,540 --> 00:54:26,430 is push that code to the corresponding problem branch of the student's GitHub 1006 00:54:26,430 --> 00:54:27,540 repository. 1007 00:54:27,540 --> 00:54:29,910 So we don't have a tool itself to do a bulk upload, 1008 00:54:29,910 --> 00:54:32,880 but it would be possible to be able to create it. 1009 00:54:32,880 --> 00:54:36,467 And if you are interested in doing that and reach out to the sysadmins, 1010 00:54:36,467 --> 00:54:38,550 we can talk you through what that might look like. 1011 00:54:38,550 --> 00:54:45,080 1012 00:54:45,080 --> 00:54:50,150 Other questions about any of these tools now? 1013 00:54:50,150 --> 00:54:51,940 I thought I saw another hand raised. 1014 00:54:51,940 --> 00:54:54,482 Maybe it went down, maybe your question was already answered. 1015 00:54:54,482 --> 00:54:59,600 1016 00:54:59,600 --> 00:55:03,290 In that case, let's finally move on to the last tool 1017 00:55:03,290 --> 00:55:06,560 that we'll be talking about today, which is compare50. 1018 00:55:06,560 --> 00:55:09,450 A couple people today have already asked me about academic honesty 1019 00:55:09,450 --> 00:55:10,910 and what's involved there. 1020 00:55:10,910 --> 00:55:13,370 compare50 is a tool designed for this purpose. 1021 00:55:13,370 --> 00:55:18,050 compare50 is a command-line tool for detecting similarity 1022 00:55:18,050 --> 00:55:19,490 within students' code. 1023 00:55:19,490 --> 00:55:22,970 So ultimately, compare50 is a tool that will 1024 00:55:22,970 --> 00:55:25,370 take a whole bunch of students' submissions, 1025 00:55:25,370 --> 00:55:28,340 and when you run compare50, compare50 will 1026 00:55:28,340 --> 00:55:32,870 examine all pairs of submissions looking for pairs of submissions 1027 00:55:32,870 --> 00:55:38,840 that are unusually similar and might be an instance of plagiarism, for example. 1028 00:55:38,840 --> 00:55:41,890 It will try to highlight those very similar submissions, 1029 00:55:41,890 --> 00:55:45,750 those pairs and submissions that share a lot in common, 1030 00:55:45,750 --> 00:55:49,970 and try to draw that to your attention so that you can see it, as well. 1031 00:55:49,970 --> 00:55:53,870 The way that you might install compare50 is the same way 1032 00:55:53,870 --> 00:55:55,730 you might install other CS50 tools. 1033 00:55:55,730 --> 00:56:00,870 You can run pip3 install compare50 on your computer, for example. 1034 00:56:00,870 --> 00:56:04,010 And then the way we generally structure compare50 1035 00:56:04,010 --> 00:56:07,330 is to have you create one folder for each student, 1036 00:56:07,330 --> 00:56:09,800 and inside of that folder, you put all of the code 1037 00:56:09,800 --> 00:56:14,578 that you want to check for similarity against other student submissions. 1038 00:56:14,578 --> 00:56:17,120 And so what you might do, then, is inside your terminal here, 1039 00:56:17,120 --> 00:56:20,330 I've shown you an example just to show you what it might look like. 1040 00:56:20,330 --> 00:56:24,650 I can run compare50* to say, check all of the submissions in this current 1041 00:56:24,650 --> 00:56:25,520 folder. 1042 00:56:25,520 --> 00:56:28,670 It is going to compare all of the submissions against each other, 1043 00:56:28,670 --> 00:56:33,440 score them, and then output a web page that I can visit in order 1044 00:56:33,440 --> 00:56:37,280 to see the results of compare50 itself. 1045 00:56:37,280 --> 00:56:39,470 So what does that ultimately look like? 1046 00:56:39,470 --> 00:56:42,920 Well, if I click on that URL at the bottom, where 1047 00:56:42,920 --> 00:56:47,880 it says visit this particular HTML page, what you will see 1048 00:56:47,880 --> 00:56:51,840 is a page that looks like this, where, along the left-hand side, 1049 00:56:51,840 --> 00:56:56,060 you will see all of the top matches for pairs of submissions. 1050 00:56:56,060 --> 00:56:59,540 And by default, we output something like the 50 top matches, 1051 00:56:59,540 --> 00:57:01,670 but that's configurable, where you can decide which 1052 00:57:01,670 --> 00:57:03,950 top matches you actually want to see. 1053 00:57:03,950 --> 00:57:08,600 And we score each of them based on how similar those submissions actually are, 1054 00:57:08,600 --> 00:57:11,000 where 10 means they're virtually identical, it's 1055 00:57:11,000 --> 00:57:15,200 the highest possible score, down to 1, meaning 1056 00:57:15,200 --> 00:57:17,750 they're basically not similar at all. 1057 00:57:17,750 --> 00:57:22,190 So we'll score them from 1 to 10 based on how similar these submissions happen 1058 00:57:22,190 --> 00:57:23,610 to be. 1059 00:57:23,610 --> 00:57:25,980 And then on the right-hand side of this window, 1060 00:57:25,980 --> 00:57:32,240 you'll see a graph, some visualization of clusters of students that 1061 00:57:32,240 --> 00:57:34,400 happen to have submitted similar work. 1062 00:57:34,400 --> 00:57:36,960 So oftentimes, in cases of collaboration, 1063 00:57:36,960 --> 00:57:40,250 it might not just be two students that are collaborating with each other that 1064 00:57:40,250 --> 00:57:44,120 might cross a line potentially, but it could be three or four students working 1065 00:57:44,120 --> 00:57:47,360 in a cluster that are all sharing code, for example, where that might 1066 00:57:47,360 --> 00:57:49,640 have crossed some line of academic honesty 1067 00:57:49,640 --> 00:57:51,410 that you might want to be mindful of. 1068 00:57:51,410 --> 00:57:54,170 And what we've tried to do on the right-hand side of this window 1069 00:57:54,170 --> 00:57:57,320 is demonstrate that visualization just to show you 1070 00:57:57,320 --> 00:57:59,810 how these clusters are connected. 1071 00:57:59,810 --> 00:58:02,510 And then along the bottom, you'll see a bit of a slider, 1072 00:58:02,510 --> 00:58:08,690 and that slider indicates the threshold for how strong of a match 1073 00:58:08,690 --> 00:58:12,300 you want to actually be able to view on this page. 1074 00:58:12,300 --> 00:58:15,680 So right now, the threshold is set at the minimum, 1, just at the far left 1075 00:58:15,680 --> 00:58:20,910 of that slider, so we're seeing all 50 of the top matches in this case. 1076 00:58:20,910 --> 00:58:23,910 But maybe that's more matches than we actually want to see, 1077 00:58:23,910 --> 00:58:26,690 so we might drag that threshold to the right 1078 00:58:26,690 --> 00:58:30,500 to increase the threshold for how similar we want something to be 1079 00:58:30,500 --> 00:58:32,370 before we actually pay attention to it. 1080 00:58:32,370 --> 00:58:36,080 So by dragging that threshold, you can watch, and what you'll see 1081 00:58:36,080 --> 00:58:39,090 is that the a list of submissions begins to narrow down, 1082 00:58:39,090 --> 00:58:41,090 and on the right-hand side, you'll see now we're 1083 00:58:41,090 --> 00:58:46,340 left with only the submissions that have a score above, like, a 4, for example. 1084 00:58:46,340 --> 00:58:50,180 And we see here, we have one pair of two submissions at that bottom cluster 1085 00:58:50,180 --> 00:58:52,850 down below, and up a little bit higher, you 1086 00:58:52,850 --> 00:58:55,340 see a cluster of three students who all appear 1087 00:58:55,340 --> 00:58:58,310 to have similar code to one another. 1088 00:58:58,310 --> 00:59:02,330 Those clusters are color-coded such that you can see on the right-hand side 1089 00:59:02,330 --> 00:59:04,820 the colors that correspond to each cluster, 1090 00:59:04,820 --> 00:59:07,850 and then on the left-hand side next to each of the pairs, 1091 00:59:07,850 --> 00:59:10,940 you'll also see the color that corresponds with that cluster, 1092 00:59:10,940 --> 00:59:13,790 just to make it visually easier for you the teacher to be 1093 00:59:13,790 --> 00:59:16,730 able to see where it is these clusters might exist 1094 00:59:16,730 --> 00:59:20,870 and what scores they're getting in terms of how similar these submissions happen 1095 00:59:20,870 --> 00:59:22,390 to be. 1096 00:59:22,390 --> 00:59:25,080 So when you then go into a particular submission 1097 00:59:25,080 --> 00:59:28,560 and click on the submission to view it in more detail, what you will then see 1098 00:59:28,560 --> 00:59:32,020 it is a side-by-side comparison of the two students. 1099 00:59:32,020 --> 00:59:35,790 So here, we're seeing one student on the left and another student on the right, 1100 00:59:35,790 --> 00:59:39,210 and looking at where it is that these two submissions are similar or not. 1101 00:59:39,210 --> 00:59:43,860 And we allow you different modes for checking the similarity of submissions. 1102 00:59:43,860 --> 00:59:46,710 So right now, we're in what's known as text mode that 1103 00:59:46,710 --> 00:59:49,740 is literally just looking at the text of the submission, 1104 00:59:49,740 --> 00:59:52,680 and looking for sequences of characters that happen 1105 00:59:52,680 --> 00:59:54,390 to match between the two submissions. 1106 00:59:54,390 --> 00:59:58,020 So what you're seeing highlighted are sequences 1107 00:59:58,020 --> 01:00:03,870 of characters that are the same, both with the student on the left 1108 01:00:03,870 --> 01:00:05,720 and with the student on the right. 1109 01:00:05,720 --> 01:00:09,590 So in that case, you're seeing any of the similarities between these two 1110 01:00:09,590 --> 01:00:12,860 submissions that are highlighted, just to make it a little bit more obvious 1111 01:00:12,860 --> 01:00:15,320 to you, the teacher, who's looking at all of this, 1112 01:00:15,320 --> 01:00:17,750 but there are other modes of comparison that you 1113 01:00:17,750 --> 01:00:21,320 can jump into, as well, if you want to view this data in different ways. 1114 01:00:21,320 --> 01:00:23,870 In particular, something that we sometimes will see 1115 01:00:23,870 --> 01:00:26,840 is that a student will copy another student's code, 1116 01:00:26,840 --> 01:00:29,960 but then change all the variable names, for example, 1117 01:00:29,960 --> 01:00:36,710 or add some spaces, such that the text of the submission is now different. 1118 01:00:36,710 --> 01:00:38,310 The variable names have changed. 1119 01:00:38,310 --> 01:00:41,480 They're not going to match big sections of code 1120 01:00:41,480 --> 01:00:45,680 to be identical in terms of the text, even though structurally, these two 1121 01:00:45,680 --> 01:00:48,200 programs are identical, save for just some changes 1122 01:00:48,200 --> 01:00:50,900 in what the names of the variables are. 1123 01:00:50,900 --> 01:00:54,470 So in addition to a text-based pass, you'll see on the left-hand side 1124 01:00:54,470 --> 01:00:57,440 that I can switch what mode I'm currently looking at. 1125 01:00:57,440 --> 01:01:01,910 I can move from text mode, for example, into structure mode, where 1126 01:01:01,910 --> 01:01:06,137 now, in structure mode, I'm comparing not just the text of the submission, 1127 01:01:06,137 --> 01:01:07,970 but the overall structure of the submission. 1128 01:01:07,970 --> 01:01:11,195 And here, we see that these two submissions are structurally identical. 1129 01:01:11,195 --> 01:01:13,820 They might have changed or varied in some of the variable names 1130 01:01:13,820 --> 01:01:16,880 or in some of the spacing, but in terms of how the program overall 1131 01:01:16,880 --> 01:01:21,847 is structured, these two submissions are basically the same. 1132 01:01:21,847 --> 01:01:23,930 So we allow you to switch between those modes just 1133 01:01:23,930 --> 01:01:27,080 to get different views into the information that you're looking at 1134 01:01:27,080 --> 01:01:31,040 and bring to your attention different things that might catch your eye. 1135 01:01:31,040 --> 01:01:35,020 compare50 is just going to provide to you all of these top matches, 1136 01:01:35,020 --> 01:01:37,070 but then it's usually up to you, the human, 1137 01:01:37,070 --> 01:01:39,920 to actually go through and decide for any given pair 1138 01:01:39,920 --> 01:01:44,680 if it actually crossed some line or not, for example. 1139 01:01:44,680 --> 01:01:46,920 Ahmaud, question about this. 1140 01:01:46,920 --> 01:01:47,810 AHMAUD: Yes. 1141 01:01:47,810 --> 01:01:50,640 From your experience, Brian, you and the team, 1142 01:01:50,640 --> 01:01:57,290 what would be a fair threshold for compare50? 1143 01:01:57,290 --> 01:02:01,430 Since there will be similarities, after all, if students 1144 01:02:01,430 --> 01:02:05,920 are submitting the same problem set. 1145 01:02:05,920 --> 01:02:07,710 I know this can be different. 1146 01:02:07,710 --> 01:02:10,638 For example, hello is going to be very similar, 1147 01:02:10,638 --> 01:02:12,680 but for others, they're going to be less similar. 1148 01:02:12,680 --> 01:02:20,085 But what would be a fair threshold for similarities? 1149 01:02:20,085 --> 01:02:22,460 BRIAN YU: I'm reluctant to give an exact number because I 1150 01:02:22,460 --> 01:02:25,190 think it's going to vary a lot based on the pair of submissions 1151 01:02:25,190 --> 01:02:27,410 and based on the particular problem. 1152 01:02:27,410 --> 01:02:29,690 Certainly, with the problems we assign, there 1153 01:02:29,690 --> 01:02:32,400 are multiple ways to solve these problems. 1154 01:02:32,400 --> 01:02:36,470 Oftentimes, you're going to find similarities just by chance. 1155 01:02:36,470 --> 01:02:39,500 Or just naturally, because of common approaches to the problem, 1156 01:02:39,500 --> 01:02:42,680 multiple submissions are going to look similar to one another. 1157 01:02:42,680 --> 01:02:44,600 Generally, what we're looking for, though, 1158 01:02:44,600 --> 01:02:47,840 is compare50 will score the submissions for us, 1159 01:02:47,840 --> 01:02:51,950 and then we'll go through an order and look for ones where something 1160 01:02:51,950 --> 01:02:54,740 stands out as particularly suspicious. 1161 01:02:54,740 --> 01:02:58,340 Oftentimes, that's something like a comment written 1162 01:02:58,340 --> 01:03:01,370 in English that happens to be identically worded 1163 01:03:01,370 --> 01:03:04,310 between one submission and another in exactly the same place 1164 01:03:04,310 --> 01:03:07,970 where it would be very unlikely for that thing to happen by chance. 1165 01:03:07,970 --> 01:03:11,120 Or we'll look for a couple of such suspicious items that happen 1166 01:03:11,120 --> 01:03:14,155 multiple times throughout a submission. 1167 01:03:14,155 --> 01:03:16,280 This is also something that we'll occasionally look 1168 01:03:16,280 --> 01:03:18,470 at across multiple submissions, too. 1169 01:03:18,470 --> 01:03:21,350 If we notice unusual similarities in one problem, 1170 01:03:21,350 --> 01:03:25,610 and then we notice unusual similarities in a problem the following week 1171 01:03:25,610 --> 01:03:28,070 between the same pair of students, then that 1172 01:03:28,070 --> 01:03:29,990 might be additional indication, as well. 1173 01:03:29,990 --> 01:03:33,230 And compare50 is not designed to make these determinations for you, 1174 01:03:33,230 --> 01:03:35,780 it's just designed to highlight to you the things that you 1175 01:03:35,780 --> 01:03:39,500 might want to pay attention to such that you can then go in and use your, 1176 01:03:39,500 --> 01:03:42,860 hopefully, better human judgment to be able to make those assessments, 1177 01:03:42,860 --> 01:03:44,720 as well. 1178 01:03:44,720 --> 01:03:47,540 If two students submitted the identical code, then, 1179 01:03:47,540 --> 01:03:52,190 as someone asked in the chat, you would just literally see a perfect match, 1180 01:03:52,190 --> 01:03:55,250 where structurally, textually, in terms of the exact match, 1181 01:03:55,250 --> 01:03:59,770 they have all the exact, same text. 1182 01:03:59,770 --> 01:04:01,560 Oleg, question? 1183 01:04:01,560 --> 01:04:03,510 OLEG: Yes. 1184 01:04:03,510 --> 01:04:07,920 As for now, the documentation says that this tool, compare50, 1185 01:04:07,920 --> 01:04:10,742 is available locally. 1186 01:04:10,742 --> 01:04:11,700 You have to install it. 1187 01:04:11,700 --> 01:04:16,830 And at the same time, I remember you were telling that at some point, 1188 01:04:16,830 --> 01:04:19,080 it might become a web version. 1189 01:04:19,080 --> 01:04:21,682 Do you have any updates on that? 1190 01:04:21,682 --> 01:04:23,140 BRIAN YU: No updates at the moment. 1191 01:04:23,140 --> 01:04:26,100 It's certainly something that we're thinking about. 1192 01:04:26,100 --> 01:04:30,690 For now, given some of the processing required in order 1193 01:04:30,690 --> 01:04:34,440 to get check50 to run, and we also want to be mindful about things like privacy 1194 01:04:34,440 --> 01:04:37,770 with student submissions, so at the moment, 1195 01:04:37,770 --> 01:04:40,980 we're only currently making compare50 available as a command-m line 1196 01:04:40,980 --> 01:04:43,037 tool for usage. 1197 01:04:43,037 --> 01:04:45,870 Thinking about allowing a web version is something we thought about, 1198 01:04:45,870 --> 01:04:49,760 but no current plans, or nothing active there at the moment. 1199 01:04:49,760 --> 01:04:54,230 OLEG: So for this summer session, it's fair to say that it won't be in online. 1200 01:04:54,230 --> 01:04:55,980 BRIAN YU: For this summer, certainly, yes. 1201 01:04:55,980 --> 01:04:58,620 It's just going to be available as a command-line tool. 1202 01:04:58,620 --> 01:05:01,510 You can find the source code for that command-line tool here. 1203 01:05:01,510 --> 01:05:03,840 I've just pasted the GitHub link in the chat. 1204 01:05:03,840 --> 01:05:06,180 BRIAN YU: It's entirely open-source so you can see 1205 01:05:06,180 --> 01:05:08,010 how exactly it's doing that comparison. 1206 01:05:08,010 --> 01:05:11,470 You can add to it if you would like to, as well. 1207 01:05:11,470 --> 01:05:13,950 OLEG: Thank you. 1208 01:05:13,950 --> 01:05:16,320 BRIAN YU: Let's go to Joseph now. 1209 01:05:16,320 --> 01:05:18,390 Question from Joseph? 1210 01:05:18,390 --> 01:05:21,420 JOSEPH: This was what I was trying to ask earlier. 1211 01:05:21,420 --> 01:05:28,635 Does the tool look in the internet-- for example, Stack Overflow-- or scours 1212 01:05:28,635 --> 01:05:31,710 a huge database to compare code? 1213 01:05:31,710 --> 01:05:36,270 Or is it just limited to work that is submitted by other students? 1214 01:05:36,270 --> 01:05:42,060 Because my thinking is English papers, when they check for plagiarism, 1215 01:05:42,060 --> 01:05:44,790 they go through the internet. 1216 01:05:44,790 --> 01:05:48,370 But with this case, you are kind of encouraged 1217 01:05:48,370 --> 01:05:52,620 to be resourceful and search and use Google 1218 01:05:52,620 --> 01:05:55,992 because, at work, for example, we're always using Google. 1219 01:05:55,992 --> 01:05:58,200 There is always a new problem that you have to solve, 1220 01:05:58,200 --> 01:06:01,790 and Google is almost better than documentation. 1221 01:06:01,790 --> 01:06:04,350 So that's my question, how do you balance the two? 1222 01:06:04,350 --> 01:06:07,620 Because you want to give them a skill that is marketable. 1223 01:06:07,620 --> 01:06:10,810 When they go to work, they could write real programs, 1224 01:06:10,810 --> 01:06:14,310 and that's what we do every day when we're writing real programs. 1225 01:06:14,310 --> 01:06:15,600 BRIAN YU: Yeah, absolutely. 1226 01:06:15,600 --> 01:06:19,080 It's a balance that we have to work with students to make sure they understand. 1227 01:06:19,080 --> 01:06:23,160 Generally, our philosophy has been that if you are looking up 1228 01:06:23,160 --> 01:06:30,870 how to solve some narrowly defined problem that is part of the submission 1229 01:06:30,870 --> 01:06:32,070 that you're working on-- 1230 01:06:32,070 --> 01:06:35,250 for example, in Caesar if you're looking for how do 1231 01:06:35,250 --> 01:06:38,940 I check if a character is uppercase or not, and you borrow a snippet of code 1232 01:06:38,940 --> 01:06:44,970 for how you use the isupper function in C to check if a character is uppercase, 1233 01:06:44,970 --> 01:06:47,040 that we consider to be reasonable. 1234 01:06:47,040 --> 01:06:49,710 You're looking and borrowing a snippet of code 1235 01:06:49,710 --> 01:06:52,150 that is not solving the entire problem for you, 1236 01:06:52,150 --> 01:06:54,570 but is just helping you to solve a piece of it. 1237 01:06:54,570 --> 01:06:56,790 What we tell students they shouldn't be doing 1238 01:06:56,790 --> 01:07:00,510 is Googling for solutions to the problem set itself. 1239 01:07:00,510 --> 01:07:02,790 If they were to Google, for example, how do 1240 01:07:02,790 --> 01:07:07,710 I write C code to perform a Caesar cipher on a string, 1241 01:07:07,710 --> 01:07:10,860 or to rotate a string by a certain number of characters, 1242 01:07:10,860 --> 01:07:13,590 that would probably be crossing some line, where they're really 1243 01:07:13,590 --> 01:07:18,690 taking the entirety of the solution and using that as their own submission. 1244 01:07:18,690 --> 01:07:20,880 And so we have an academic honesty policy 1245 01:07:20,880 --> 01:07:24,830 that delineates the types of behaviors that we consider to be reasonable 1246 01:07:24,830 --> 01:07:27,330 and that we consider to be not reasonable. 1247 01:07:27,330 --> 01:07:30,060 In general, we think that borrowing a snippet of code, 1248 01:07:30,060 --> 01:07:34,410 that seems very reasonable, but finding a snippet of code that 1249 01:07:34,410 --> 01:07:38,880 solves the entire problem, that we consider to not be reasonable, 1250 01:07:38,880 --> 01:07:39,580 for example. 1251 01:07:39,580 --> 01:07:42,420 But there certainly is a bit of a judgment call. 1252 01:07:42,420 --> 01:07:46,410 As for what compare50 itself does, compare50 1253 01:07:46,410 --> 01:07:51,388 is only going to search through the data that you give to it. 1254 01:07:51,388 --> 01:07:53,180 So if you only give it student submissions, 1255 01:07:53,180 --> 01:07:56,400 it will only compare those student submissions against each other. 1256 01:07:56,400 --> 01:07:58,260 But what we will generally do as teachers 1257 01:07:58,260 --> 01:08:05,720 is look up a solution to particular problem sets online and use that code, 1258 01:08:05,720 --> 01:08:09,560 and we'll download that and use compare50 to check against it, as well. 1259 01:08:09,560 --> 01:08:12,560 And there's a way that you can do that that I'll talk about in a moment, 1260 01:08:12,560 --> 01:08:18,410 too, that's supported by compare50. 1261 01:08:18,410 --> 01:08:20,585 Let's go now to Ramon. 1262 01:08:20,585 --> 01:08:21,710 Ramon, you have a question? 1263 01:08:21,710 --> 01:08:24,350 1264 01:08:24,350 --> 01:08:27,920 RAMON: You have talked about check50 and submit50, 1265 01:08:27,920 --> 01:08:30,710 so I have a question for certain p set submission 1266 01:08:30,710 --> 01:08:34,640 and the course's rigor, which is to what extent 1267 01:08:34,640 --> 01:08:38,149 are we allowed to include additional instructions on the p sets 1268 01:08:38,149 --> 01:08:43,100 in order to help students who are taking CS50 in another language, considering 1269 01:08:43,100 --> 01:08:46,970 the fact that [INAUDIBLE] require students 1270 01:08:46,970 --> 01:08:51,200 to have some prior knowledge of certain English words, for example? 1271 01:08:51,200 --> 01:09:06,520 Furthermore, [AUDIO OUT] submission code requires knowledge of English words 1272 01:09:06,520 --> 01:09:09,680 like red, blue, green, and many others. 1273 01:09:09,680 --> 01:09:14,439 So to what extent can we modify the p set and the distribution code? 1274 01:09:14,439 --> 01:09:18,670 And more importantly, how can we maintain the course's rigor 1275 01:09:18,670 --> 01:09:21,130 while providing these additional instructions, 1276 01:09:21,130 --> 01:09:25,359 considering that one wants to produce a translation of the course that is 1277 01:09:25,359 --> 01:09:28,052 closest possible to the original one? 1278 01:09:28,052 --> 01:09:29,260 BRIAN YU: Excellent question. 1279 01:09:29,260 --> 01:09:31,540 To answer the first part of that, you are absolutely 1280 01:09:31,540 --> 01:09:36,310 allowed to adapt the course's material by adding to the specifications, 1281 01:09:36,310 --> 01:09:39,950 by editing the specifications to your liking. 1282 01:09:39,950 --> 01:09:45,463 The course is available for you to be able to modify if you would like to. 1283 01:09:45,463 --> 01:09:47,380 That's certainly something you can do if you'd 1284 01:09:47,380 --> 01:09:49,630 like to change the problem set specifications for work 1285 01:09:49,630 --> 01:09:52,870 with your own students, translating it to a different language, for example. 1286 01:09:52,870 --> 01:09:54,670 All definitely OK. 1287 01:09:54,670 --> 01:09:57,730 In terms of maintaining the rigor of the class, 1288 01:09:57,730 --> 01:10:00,670 I would generally suggest that if you're translating the problem 1289 01:10:00,670 --> 01:10:05,560 sets from one language to another, certainly fine to explain 1290 01:10:05,560 --> 01:10:07,243 what these individual terms mean. 1291 01:10:07,243 --> 01:10:09,910 In terms of pixels, if you're explaining red and green and blue, 1292 01:10:09,910 --> 01:10:13,310 you can translate that, certainly, but so long as you are not, 1293 01:10:13,310 --> 01:10:15,880 for example, giving away what code they would 1294 01:10:15,880 --> 01:10:19,640 write in order to solve a particular part of the image filtering problem, 1295 01:10:19,640 --> 01:10:20,860 for example. 1296 01:10:20,860 --> 01:10:25,210 I think you can largely preserve the core of the problem solving process 1297 01:10:25,210 --> 01:10:27,610 while just translating the instructions and the guidance 1298 01:10:27,610 --> 01:10:30,190 to make it a little bit more accessible. 1299 01:10:30,190 --> 01:10:32,410 But you're certainly welcome to adapt the course 1300 01:10:32,410 --> 01:10:35,425 materials to whatever you think would be best for your own students. 1301 01:10:35,425 --> 01:10:38,140 1302 01:10:38,140 --> 01:10:41,340 And a question now from Ohm, if you'd like to ask a question, Ohm. 1303 01:10:41,340 --> 01:10:44,070 1304 01:10:44,070 --> 01:10:44,820 OHM: Hello, Brian. 1305 01:10:44,820 --> 01:10:47,010 I'm Ohm from India. 1306 01:10:47,010 --> 01:10:55,440 My question is when I go to create a new post in submit.cs50.io, 1307 01:10:55,440 --> 01:10:58,750 it always tells failed to create GitHub teams. 1308 01:10:58,750 --> 01:10:59,835 Why? 1309 01:10:59,835 --> 01:11:01,710 BRIAN YU: If you're ever seeing an error when 1310 01:11:01,710 --> 01:11:04,350 trying to create a course on submit.cs50.io, 1311 01:11:04,350 --> 01:11:07,230 that can sometimes be due to a problem on GitHub's end. 1312 01:11:07,230 --> 01:11:10,350 There's some GitHub configuration that needs to happen in order 1313 01:11:10,350 --> 01:11:11,940 to perform that process. 1314 01:11:11,940 --> 01:11:15,630 If you go ahead and just email the email that I sent in the chat here, 1315 01:11:15,630 --> 01:11:20,910 sysadmins@cs50.harvard.edu, that will just let us know about the issue, 1316 01:11:20,910 --> 01:11:27,450 and hopefully we'll be able to get back to you shortly when it's resolved. 1317 01:11:27,450 --> 01:11:28,510 OHM: Thanks, Brian. 1318 01:11:28,510 --> 01:11:29,593 BRIAN YU: Yeah, of course. 1319 01:11:29,593 --> 01:11:31,980 1320 01:11:31,980 --> 01:11:33,990 Other questions about compare50? 1321 01:11:33,990 --> 01:11:35,640 I'm looking through the chat. 1322 01:11:35,640 --> 01:11:38,460 How does compare50 check for similar logic, 1323 01:11:38,460 --> 01:11:41,040 like when variable names are changed? 1324 01:11:41,040 --> 01:11:43,050 Without getting too technical, in short what 1325 01:11:43,050 --> 01:11:46,020 we do is we parse the student's submission 1326 01:11:46,020 --> 01:11:50,340 and get a sense for what the various, different parts of that submission 1327 01:11:50,340 --> 01:11:53,580 are-- like, here is a curly brace, here is a variable name, 1328 01:11:53,580 --> 01:11:55,790 here is the start of an if statement. 1329 01:11:55,790 --> 01:11:59,610 And using that tokenization process, as it's called, 1330 01:11:59,610 --> 01:12:02,640 we can then do a comparison while ignoring 1331 01:12:02,640 --> 01:12:04,890 what the exact names of those variables are, 1332 01:12:04,890 --> 01:12:07,350 just to check the structure of the program overall. 1333 01:12:07,350 --> 01:12:10,820 1334 01:12:10,820 --> 01:12:15,020 Other questions about compare 50? 1335 01:12:15,020 --> 01:12:21,710 1336 01:12:21,710 --> 01:12:26,630 All right, a couple other features that I'll mention about compare50. 1337 01:12:26,630 --> 01:12:29,720 One is distribution code that you can specify. 1338 01:12:29,720 --> 01:12:32,330 Oftentimes, if you are releasing a problem that 1339 01:12:32,330 --> 01:12:35,330 already has some code written for students, if you were to just 1340 01:12:35,330 --> 01:12:39,350 naively perform all of these comparisons, 1341 01:12:39,350 --> 01:12:42,170 you would end up finding matches across all of your students 1342 01:12:42,170 --> 01:12:45,080 because all of them are using the same distribution code. 1343 01:12:45,080 --> 01:12:48,620 compare50 will allow you to specify what the distribution code is 1344 01:12:48,620 --> 01:12:51,980 for a particular problem, such that when we're performing these comparisons, 1345 01:12:51,980 --> 01:12:55,400 we're not going to bother comparing the distribution code, as well. 1346 01:12:55,400 --> 01:12:58,160 We're only comparing the code the students actually 1347 01:12:58,160 --> 01:13:01,130 wrote, just to hopefully make the results of compare50 1348 01:13:01,130 --> 01:13:05,080 a little more accurate and a little bit more relevant to you. 1349 01:13:05,080 --> 01:13:08,890 compare50 also has support for archived submissions, such 1350 01:13:08,890 --> 01:13:11,830 that if you have students that have taken your course in the past, 1351 01:13:11,830 --> 01:13:13,660 or if you have online submissions that you 1352 01:13:13,660 --> 01:13:15,400 want to check students' code against, you 1353 01:13:15,400 --> 01:13:18,730 can store a whole bunch of archived submissions to a problem, 1354 01:13:18,730 --> 01:13:22,000 and then compare your current students not only against each other, 1355 01:13:22,000 --> 01:13:24,980 but also against any of the submissions inside of that archive. 1356 01:13:24,980 --> 01:13:26,910 So we do this year after year after year, 1357 01:13:26,910 --> 01:13:29,830 too, where when students are taking the course this year, 1358 01:13:29,830 --> 01:13:32,560 we can compare their submissions against all of the submissions 1359 01:13:32,560 --> 01:13:34,490 from prior years, as well. 1360 01:13:34,490 --> 01:13:36,730 So you can store these archive submissions 1361 01:13:36,730 --> 01:13:39,220 to be able to do these comparisons. 1362 01:13:39,220 --> 01:13:43,720 And in addition to that, compare50 is open-source and extensible, such 1363 01:13:43,720 --> 01:13:46,510 that if you wanted to add to compare50, we 1364 01:13:46,510 --> 01:13:49,300 already do a fair amount of pre-processing 1365 01:13:49,300 --> 01:13:51,250 and various, different types of passes to be 1366 01:13:51,250 --> 01:13:53,740 able to try and produce useful results. 1367 01:13:53,740 --> 01:13:56,920 If you want to extend it to add an additional pass for looking 1368 01:13:56,920 --> 01:14:00,560 for something in particular that suits your needs as a teacher, 1369 01:14:00,560 --> 01:14:02,870 you can absolutely extend compare50, as well. 1370 01:14:02,870 --> 01:14:06,070 You can see how we've done it already inside of compare50 source 1371 01:14:06,070 --> 01:14:08,560 code, which is, again, all open-source. 1372 01:14:08,560 --> 01:14:10,360 We've designed it such that you can easily 1373 01:14:10,360 --> 01:14:14,200 add additional passes to compare50, too, if you 1374 01:14:14,200 --> 01:14:17,207 want to add functionality to deal with a particular use case 1375 01:14:17,207 --> 01:14:18,040 that you might have. 1376 01:14:18,040 --> 01:14:20,600 1377 01:14:20,600 --> 01:14:24,710 And that just about covers the four tools that I wanted to introduce to you 1378 01:14:24,710 --> 01:14:29,143 all today-- check50, submit50, submit.cs50.io, and compare50. 1379 01:14:29,143 --> 01:14:32,060 Before we wrap, though, I do wanted to leave an opportunity for anyone 1380 01:14:32,060 --> 01:14:35,130 to ask questions about any of these tools 1381 01:14:35,130 --> 01:14:37,838 if you still have any questions about any of the tools here, 1382 01:14:37,838 --> 01:14:40,130 or any of the tools that Kareem talked about yesterday, 1383 01:14:40,130 --> 01:14:44,500 or anything that you've seen in the workshop so far. 1384 01:14:44,500 --> 01:14:45,000