SPEAKER 1: Hello, world. This is the CS50x educator workshop, our session on CS50's tools for submitting and grading. This session will be led by CS50's own Brian Yu, who has been instrumental in the development and deployment of these tools to CS50's students and teachers. Brian, the floor is yours. BRIAN YU: Thanks very much. Really great to see everyone here today, and looking forward to talking with you all about CS50's tools for submitting and grading, which is going to be our topic today. And in particular today, we're going to be looking at four tools that are available for you to use, for your students use, potentially, as you teach CS50 or one of the other CS50 courses. You certainly don't need to use any or all of these tools, but we do make them available, and the goal of today is to show you what these tools are, how they might be useful, and to give you a sense for what you can do with them. We'll go through check50, submit50, submit.cs50.io, and compare50. I thought we would start, just so I can get a sense for all of you all and which tools you've already used, I'm going to paste into the chat a link to a poll. If you wouldn't mind filling out that poll just to give us a sense for which of these tools have you used before. So if you've used all of the tools, you can click on all of those four tools, but if you've only used some of them, you can just click on the ones that you have used. But go ahead and click on that link that I just posted in the chat, and let us know which of these tools you have used previously. It looks like a couple people have submitted the poll so far, and most people have used check50, and maybe some people have used submit50. submit.cs50.io fewer people have used. That's a web application that we'll see soon that helps you integrate submit50 and check50 together. And it looks like a couple people have used compare50, as well, which is the tool that you can use to check code for similarity, which we'll talk about a little bit later today. So regardless of which of these categories you fall in, regardless of whether you've used all of these tools before or none of these tools before, hopefully today will be an opportunity to learn a little something new about each of these tools and how you can potentially put them into use inside of your classroom. So with that, let's go ahead and get started with the first of the tools, which is check50. And all of these tools you can find documentation for at this URL, cs50.readthedocs.io, which will include more information about all of the tools, how to use them, and more documentation there. And at any point today, if you have questions about anything that I'm saying or the tools that I'm talking about, do you feel free to raise your virtual hand using that blue raise hand feature that David pointed out a little bit earlier today. I'm keeping an eye on that so I can call on you if you have questions about anything going on. And you're also certainly welcome to ask questions in the chat window, too, where I might see them, or other CS50 staff, or other people that are just participants here today might be able to see your questions and help to answer those, as well. So raising your hand or asking in the chat, but definitely feel free to ask any questions that you might have. So let's go ahead and begin with the first of the four tools that we're going to talk about today, which is check50. check50, it looks like, most of you have used before. check50 is a command-line tool for running automated tests on students' code. So when you run check50, we're going to run a series of automated tests that we tend to call checks that will check students' code to make sure that that code behaves in a certain way that meets the problem specification for a particular problem, for instance. So what does it actually look like when you run check50, for example? What's going to happen is inside of the terminal, inside of CS50 IDE or elsewhere, you can run check50 followed by a submission slug, which is just some unique identifier that describes which problem we are referring to, which problem we would like to check for correctness. So here, for example, we're running check50 followed by the unique identifier cs50/problems/2020/x/cash. In other words, we're checking a CS50 problem, one of them from this year. We version all of our checks, so that if problems change from one year to another, you can be sure that you're checking the most recent version of the problem. You can go back and check previous versions if you'd like to. And then the name of this particular problem is cash. And if you've taken CS50x or started it, you might find this familiar from week one from problem set one in the course. So when you run check50 followed by a unique identifier, we're going to upload the student's code, and once the student's code is uploaded, we're going to run a series of automated checks on that code, making sure that it's producing the correct output given particular input, for example. And as a result of all of that, you'll end up seeing something like this, where we see a series of all of the results of these individual checks with a green smiley face for any of the checks that have passed successfully and a red frown face for any of the checks that did not pass, where the student's code did not do what it was expected to do. This tool, then, can be useful for both students and for teachers. For students, students can use this tool to be able to see does their code actually work? And as students are working through trying to solve a problem, they might check to see, all right, it looks like I'm handling most of these cases, but maybe there are some corner cases that are less clear, that we're not quite handling appropriately. So students can learn from that in order to make their code more accurate. And in addition to that, you as the teacher can use this code to facilitate the grading process, for being able to quickly run check50 on a submission to see does a student's submission work or not? So another tool that is available to you. In addition to just this text-based output to see which checks did or did not pass, you also have the ability to see this in web-based format. Any time a student runs check50, they're given a URL that will take them to a page that displays the results of their check50 check in a little bit more detail, showing them all of the checks the passed or failed, along with some log describing what it is that the check did and what potentially went wrong if the student didn't pass the correctness check. This URL is shareable, such that if a student wants to share their results with you, for example, they can share the URL with their check50 results to you, and you can then open up that URL to see what the student did, and you can then do a comparison to maybe identify where a student might have gone wrong or where they potentially made a mistake. For any of the checks that did not pass in this sort of environment, there is a view that looks something like this. If a check didn't pass, students will generally see, on the left-hand side, the expected output-- what it is that check50 thought that the program should produce as output. And on the right-hand side, students will see their own output-- what their program actually did. And by seeing this side by side, this can help students more visually to get a sense for what should their code have done? What did their code do? And therefore, what could they potentially do to fix it? And it looks like in this case, for example, where the student is working on Mario, the actual output that the student produced has one fewer row than the code was actually supposed to have. So that can be a nice visual indicator to tell students what they might have done wrong. So that, then, is the web results for check50. And before I move on, I'll just pause here for a moment. I know most people have used check50 already before, but questions about anything so far? Let's go ahead and go to Joseph. I see you have your hand raised. JOSEPH: Yes. Quick question regarding academic honesty. One of the skills we would really like to teach the students is the ability to problem solve and use Google and search, find a code, or find it in documentation. How do you balance that with the tools that check for academic honesty as a goal for the class? BRIAN YU: We will talk a little bit about tools for checking for academic honesty a little bit later in the session when we explore some of the software in order to do so. check50 is supposed to be an indicator for students when they're nearing completion on a problem. When they've been working on a problem for some time, they can then run check50 to be able to see if there was something that they were missing, for example, or if there was some change that they still needed to make. It's, not for example, telling the student exactly what line of code they should be adding or exactly what the solution to the problem is. It's more of a feedback mechanism that students can use. Academic honesty certainly it's something that we think about. We want to make sure that the work that students are submitting is their own. We do have some other tools for that that we'll talk about later today in this session, too. Other questions about things? Lana, yeah, if you'd like to ask a question. LANA: I have a question. Can you please explain briefly the architecture of this check application, if it's possible? How it's working underneath? Yeah, certainly. In fact, we'll get to that a little bit later, but in short, what's happening is that we're running a cluster of servers that are going to download students' code from GitHub, which we're using the students' code. On those servers, we're going to then run a series of automated checks that are also hosted on GitHub. I'll show you some diagrams of what that architecture looks like soon, too, just so you get a better sense for how all of that is working together. A couple of other things about check50. Importantly, you can use it inside of any of CS50's tools. Yesterday, Kareem introduced you to CS50 IDE in addition to CS50 Sandbox and CS50 Lab. check50 is installed in all of those different environments. You can run check50 in the IDE, for example, or in the Lab or in the Sandbox, but you don't need to be using those particular environments in order to use check50. So you can install check50 locally onto your own computer if you would like to to run check50 just on your own computer without needing to use any of CS50's environments. You can do so just by having Python 3 installed, and then running this command that you see here-- pip3 install check50-- and that will install check50 onto your own computer. And that's true of CS50's other open-source command-line tools, as well, for tools like style50 or submit50 and compare50, which we'll see a little bit later today. You can install them the same way just by installing them locally, using pip3 in order to get access to those tools on your own computer or your own server, as well, if you would like to. When students don't pass a check, they're given certain types of feedback. In particular, they're given feedback, as we talked about, as to what it is the program did-- what the actual output of their program was, for example. They're also given information about what their program should have done, so what we expected their program to do that they didn't do. And then in addition to that, we've added support for occasional hints that we can provide to students to guide students in the right direction. This is born, really, out of the fact that there are many errors that we see in students' code that are quite common, that have a common cause that we see happen again and again and again, usually due to not considering some particular case that might take place or forgetting about one particular part of the problem, for example. So if you've ever solved CS50's cash problem in problem set one, for example, you might remember that an important part of making sure the program is bug-free is being sure to round numbers to the nearest integer, because if you don't round numbers to the nearest integer, you might end up with some fractional number of cents that ends up causing bugs later on in the program, for instance. So when students forget to do that, we can offer that as a potential hint to students, suggesting did you forget to round to the nearest cent? Just to guide students in the same way that a human teaching fellow might guide students, as well. So that's something that we make available. We'll also allow you to customize. We'll talk in a moment about how you can write your own checks if you would like to, for your own problems, for example. And in those cases, it's up to you to decide what sort of hint you would like to provide, if any. You don't have to provide a hint. You can just show the expected and actual output, for example. If you'd like to offer some guidance as to why a check might have failed, you can add some logic to add these sorts of hints into your check50 checks, as well. As I mentioned in response to a question previously, all of the data that check50 cares about is stored on GitHub, which we use for a number of CS50's tools. We use GitHub as the place where we store students' code any time that they run check50, and it's also the place where the correctness checks themselves are stored. So if you want to see what it is that we are checking for in check50, you can find those correctness checks on GitHub, and I'll show you where in just a moment. And then if you would like to write your own checks to check your own problems, for example, you can put those checks on GitHub, and check50 will be able to find them and access them, as well. Ignacio, do you have a question about what we've talked about so far? IGNACIO: Just a simple question. I'm working with a colleague that would like me to translate the CS code. [INAUDIBLE] And I see that check sent just message in English. There is a way to translate this to Portuguese, too? BRIAN YU: Yeah. I didn't quite catch all the details of the question, but I think you were asking about translating checks into other languages. Definitely possible. All of the check output is configurable in terms of what it is that the message is saying and what it is that the checks are checking for. I'll show you in a moment what that configuration looks like. But that is something that if you would like to create some new check based on our existing checks in order to translate checks from one language into another, that's definitely something that you can do. And when we get to talking about writing checks, you'll be able to see an example of what it is that that looks like. We've talked about how checks are stored on GitHub, but let's talk about now where on GitHub those checks are actually stored, so that you can find our checks if you're looking for them, and you can also write your own checks if you'd like to do so. So when you run check50, we've talked about how you run check50 followed by a slug or some unique identifier to describe what problem you would like to check. That slug is divided into multiple parts. The first part of the slug represents the GitHub repository where the code is stored. And GitHub repository, if unfamiliar, you can think of as, like, a folder that's stored on the cloud on GitHub that's going to keep track of all of the data that stores all the checks, for example. So CS50/problems is the name of the GitHub repository that we use to store all of our own check50 checks. You can find it yourself by going to github.com/cs50/problems to find all of those there. Within a GitHub repository, you can divide a repository into branches for different versions of that repository, for example. And so the next part of the submission slug here, 2020/x, that represents a branch on that repository. So it represents a branch of the CS50/problems repository. And the way that we have generally structured our problems repository is to have one branch for each offering of the class. So in CS50x 2020, the branch is 2020/x, but last year we had a branch that was 2019/x, for example, just so we can keep different versions of the class separate on different branches. And then finally, the last part of the check50 submission slug is usually the name of the problem, but what that really represents is a folder on that branch. So there's a folder on the branch called cash, inside of which are all of the checks that we're going to run whenever a student runs check50 for this particular problem. So this three-part hierarchy is how we construct a check50 submission slug. The first part is the GitHub repository where those checks are stored, the second part is the branch on that repository where you can find the checks, and the third part is the folder on that branch where all of the checks are ultimately going to be stored. What this means is that you can create your own checks if you would like to by pushing to a repository of your own. And then what you really just need to change is you need to change cs50/problems, our repository, to your own repository, for example. Then change the branch in the folder to match where it is that you have stored all of these individual checks. So let's go ahead and talk about that now, too, this process of writing checks. Now, I should first mention that you never have to do this. We have written check50 checks already for you for all of CS50's problems. So if you are teaching CS50x, and you're just using the problems that we offer in the course, our check50 checks have already been written for you, and you can just use them by running check50 followed by the appropriate submission slug. Those unique identifiers are located in the problem set specification in the instructions for each of the individual problems. Many teachers will choose to add to CS50's curriculum, adding a problem of their own, for example, or wanting to add additional checks to our problems to check for different types of things or to customize it for their particular classroom. So some teachers will choose to write checks of their own for usage on their own problems, for example. Now I'll show you how you can actually go about doing that. Ultimately, what you will do is push the required files to a GitHub repository of your own, following that format that we talked about on the previous slide. And the files that you'll need in order to configure check50 are twofold. The first file you'll need is a file called .cs50.yml. This is a file using the YAML language, which is just a language that makes it easy to configure CS50's tools in a human-readable format. It's used in other places, as well. What we're specifying here is we're specifying which files we want to collect in order to run the check50 correctness checks. And so in general, if maybe a student's in a folder where they've got a lot of different files, but only one of them is one that you care about checking, you don't need to collect all of the files. You really only need to collect the one file that corresponds to the problem that you're trying to check. And so what we're saying here in this file is that we're saying !exclude "*," which means, by default, exclude everything. Don't include any of the files from the student's current folder in the code that is uploaded when a student runs check50. But on the line immediately following that, !require "cash.c," we're saying, all right, but cash.c, that is a file that needs to be present in order for us to run check50. It's a file that we must collect. So if a student has 10 different files, what this configuration is going to say is ignore the other nine, but only collect cash.c as the file that we care about collecting in order to run check50. So this file just specifies what files we want to collect from the student, and then in the second file, we actually write the checks. All of the checks are written in Python in a file called __init__.py. And the way that we've structured these checks is that each of the checks that you run is really just a Python function. You have one function for each check that you want. So if you've ever seen check50 running, like, 12 checks on a student's code, for example, that's really just 12 functions inside of this __init__.py file. What goes into this function? What does it do? The function begins with what's known as a Python decorator. OK if you're not familiar with what that is. In short, it's a line of code that, in this case, is going to tell check50 that this function represents a check50 check. So we put that line at the top to say that this function is going to be a check50 check. And then immediately below the function is what's known as a Python doc string, just some comment enclosed in triple quotation marks that describes what it is the function is doing. In this case, that means describing what it is that this correctness check is checking for. So here, this correctness check is checking that a particular input-- an input of 0.15-- will produce an output of 2 in this case. And this text that appears inside of this description is what the student will see next to their smiley face or frown face to indicate whether or not the correctness check passed or failed. So this description ends up being visible to students, as well. So an answer to the question a little bit earlier about possibly translating these check results into other languages, all that would take, then, is replacing this English description with a description in some other language to be able to see check results in a different language. And then inside the body of the function, you can include any Python logic you want. Anything that Python could check for these correctness checks in check50 can check for, as well. We have added some built-in functions just to make some common operations easier to handle some common cases for things that you might want to check for, for example. So here, we're saying go ahead and when you run check50, run the program ./cash. Then provide as standard input the value 0.15. Then expect as the output of that program, standard output, the number 2. Then expect that the program will exit with a status code of 0. In other words, make sure the program exits successfully without any problems. So just by chaining these functions together, run ./cash provides some input, expect some output, expect an exit code, that's all you need to do to construct a check50 check that provides some input and is looking for some particular output. And if a student's code doesn't provide the number 2 as output, for example, then the student will not pass this particular check. So you can add these checks in functions one after another to chain together a whole bunch of checks that will automatically run on students' code any time that check50 is executed. Questions then about writing checks? About the syntax, about how you would go about writing your own check50 checks? Yeah, Ahmaud, question? AHMAUD: Hey, Brian. A couple of questions. I'll start with the least relevant. What tool are you currently using to switch your camera with the presentation? BRIAN YU: The tool that we use for these presentations is Open Broadcaster or OBS. Arturo or someone else might be able to paste a link to that in the chat, as well. AHMAUD: OK, beautiful. Thanks. Second question, regarding check50, when the student runs check50, where is the actual process happening? Is it happening in his IDE, or it's happening somewhere else? BRIAN YU: It's happening externally, so it's happening on a server that's pre-configured with a standard environment. And we do that just to make sure there's consistency across different platforms. If a student is using a different version of Python or has different packages installed, for example, their code might behave differently than if they were to run it on a different computer. So just to make sure everything is consistent, we always run check50 in the same environment. AHMAUD: OK, are we able to change this configuration? BRIAN YU: There are a couple of options. You can, if you would like to, run check50 locally such that you're not running it on our external server. There's a command-line argument you can provide to check50, which is just --local that will let you run check50 locally without connecting to the internet. That can be helpful if you want to run it in an environment that has some particular configuration as determined by you. We also allow you to specify, inside of the YAML file before, dependencies. If you have Python packages that you would like to install before you run check50, you can specify those dependencies, as well. AHMAUD: Now, to understand everything, if I am running check50 in the local IDE, which is the darker version, will it be by default still connecting to the internet to check on your, or I need to do the --local thing. BRIAN YU: The offline IDE is actually a little bit outdated. It's not the most recent version of the IDE, which I think Kareem mentioned in yesterday's session, as well, so it's probably currently using an old version of check50. But in short, if you're upgraded to the latest version of check50, by default, it will always try to upload code so that we can run it on our servers, but you can always use the --local command in order to allow for running that command locally. AHMAUD: OK, thank you. BRIAN YU: Yeah, of course. Other questions? Let's go to Shefket, if I'm pronouncing that right. SPEAKER 7: OK, thank you. Just a short question. If I'm using virtual machines, virtual systems in general, I may have any restriction? Or I can apply in which I want to use these tools? BRIAN YU: So long as the ends are connected to the internet, they should be able to run check50 in order to upload the code to GitHub and then poll for the results to come back. But if they're offline, you can also run check50 locally as I was describing in response to the previous question, as well. So both are potentially options. SPEAKER 7: OK, thank you. BRIAN YU: Yeah, of course. We'll go ahead and move on. So a couple of other things to note just about writing checks that we support. One are check dependencies, where you can have certain checks that are based on previous checks or that rely upon the passage of other checks. So in our problems, for example, we generally require that students code compile first, and only if it compiles will we bother to run any of the other correctness checks, for example. These checks support custom help messages as I described before, too, where you can provide some additional information to the student if you would like to. And then, as I also mentioned in response to a question, custom Python packages are supported this well. So if you're doing an assignment in Python that requires the installation of particular packages, you can install those into check50 so that we will install those packages prior to running students' code, for example. So hopefully that flexibility allows for a lot of different types of checks for a lot of different types of problems. And for more detail about all of this, you can go to cs50.readthedocs.io to read up more about check50 and what that syntax is like. And you can, of course, go to our GitHub, github.com/cs50/problems, to see all of our existing correctness checks. Many teachers, when designing their own, will choose to look at ours first and model them off of the correctness checks that we have already created. So all of those are options to you for check50. Now, before I move on from check50, any final questions about anything related to check50 before we go on to tool number two for today? Final things on check50. Feel free to either raise your hand or ask in the chat if you have a question. All right, we can certainly come back to it if there are more questions, but I think for now, we'll go ahead and move on to the next tool that we're going to talk about, which is submit50. submit50 is a command-line tool for submitting students' work. Different teachers will choose to do this differently. Some teachers have their own systems from their schools, for example, that require students to submit their work in a particular way using some LMS, for example. submit50 is a tool that we have written to make it easy for students to submit their own work, and for you then to be able to collect that work and view the results. It integrates quite nicely with the other CS50 tools, as well. The way that submit50 works is very similar to the way that check50 itself works. You can run submit50 followed by a submission slug, like cs50/problems/2020/x/cash, same submission slug that we were using before for check50. When that happens, students will be prompted to sign in with their GitHub username and password. We use GitHub again for storing all of a student's submissions. And from there, the students will be told what files will be submitted. So let's see, cash is the name of the file that's going to be submitted. If there are files that won't be submitted because they were excluded, students will see that, as well. Students will then be prompted to agree to the course's policy on academic honesty by typing Y or yes to indicate that they've agreed with that policy and are keeping in mind that the code that they are submitting should be their own. So students can type Y or yes in response to that question to be able to say, yes, I agree to the policy. Students' code is then uploaded and then it is submitted, and students can then go to submit.cs50.io, which we'll talk about in just a second, to view the results of their submission. Question in the chat-- can check50 be used for other languages? Yes, it can. check50, we've built in some functions that make it easy to work with languages like C and like Python, just because most of CS50's problems are in C or Python. But anything that a Python program could do check50 can check for, because we're really just running a Python function. For SQL, for example, you can, from Python, run SQL queries and get back the results. So you could run a check50 check that is executing a SQL query on SQL line database, for example, getting back those results and verifying that the results are correct or that the correct number of rows have come back, or anything along those lines. I see a question from Charlene, if you'd like to ask a question. CHARLENE: Lovely. I've just realized with CS50t that you used the submissions via Google Forms. Is there any reason for that? And can CS50w, for example, submissions be submitted via Forms, as well? I just noticed that there's a difference there. BRIAN YU: Yeah. In general, we tend to use submit50 for submission of code in particular, because submit50 uploads code to GitHub, which makes it very easy to do commenting on code, which I'll show you in a moment, as well. So any time students are submitting code, we'll generally opt to use submit50. But in some of our courses, CS50t among them, some of the assignments are text-based where we're just asking a question and expecting students to write a paragraph, for example. You certainly could use submit50 for this, where you would have students open up a text file, write their response in a text file, and then run submit50 to upload that text file to GitHub. But in the case of a class like CS50t, where we assume much less technical experience-- we're not expecting students to know how to use the command line, for example-- Google Forms just makes it a little bit easier to submit text-based responses. So among our courses, you'll often find that for our lead-in courses into CS50, where students aren't really doing as much programming, we'll often use Google Forms just for text-based responses. But generally, for anything that has to do with code, we'll more often use submit50. CHARLENE: Lovely, thank you. BRIAN YU: Yeah, of course. So again, when students run submit50, all of that goes onto GitHub. Every student gets a repository for their own submissions. It's located in a GitHub organization called me50, and the name of the repository the student's GitHub username. So if you go to, for example, github.com/me50/ your own GitHub username, if you've ever taken one of CS50's courses, you'll probably find that you have a me50 repository that has everything that you've ever submitted to CS50's courses. So that repository stores everything. And the way that we divide that up is that we have one branch per problem, so any time you submit to a different submission slug, we'll end up pushing that code to a different branch. Question in the chat-- do you collect community-written check50 extensions? check50 is designed to support extensions, though, to date, most people have just been using the tools that are built into check50 itself. You can find more details about how that works and what extensions are like on check50's GitHub repository. It's all open-source, and it's all available at github.com/cs50/check50. So all of that is there. And I know that people in the past have been able to run check50 to test Java code, as well. I think generally, they're not writing additional extensions, they're just writing checks that run Java, but Java is definitely something that you can check with check50. All right, when a student submits code via submit50 and it gets pushed to GitHub, ultimately, where you the teacher can then view that information is via this web application that a few of you had used, but not too many-- submit.cs50.io. So submit.cs50.io is a web application where you can view all of your students' submissions, as well as all of their scores on those submissions. So I'll go ahead and demonstrate for you what this entire workflow might look like. Let's imagine a student who's working on some assignment, and that student runs submit50. The student runs submit50 inside of their IDE, but it could also be on their computer or elsewhere. When that happens, we take the student's submission, and we upload that submission to GitHub. So what submit50 does is it takes the student's submission, and it pushes their code to a GitHub repository unique to that student, where every student has a different GitHub repository. Then what GitHub will do is that any time GitHub receives a new submission, GitHub will notify submit.cs50.io. It will tell submit.cs50.io that this student has a new submission for a particular problem, and we on submit.cs50.io will then download the student's code from GitHub, and we will automatically run check50 on that submission, automatically running all of the correctness checks for that problem and getting back some results for which checks passed and which checks did not pass, so that we can store those results and then present them to you. So all in all, as soon as a student submits via submit50, it kicks off this workflow, uploading the code to GitHub, us getting that code on submit.cs50.io, running the correctness checks, and then storing the result of those correctness checks inside of submit.cs50.io. So that makes it much easier from the perspective of someone who is running the course, if you're the teacher of the class, for example. You don't need to worry about downloading all of your students' code and running check50 on all of your submissions. You can just have submit.cs50.io take care of this process automatically. Any time a student submits, you'll be able to see the results of that automated correctness checking. So how is it that you can actually go about collecting these responses and viewing all students' work? Well, if you go to this URL, submit.cs50.io/courses/new, that will allow you to create a new course on submit.cs50.io. And generally speaking, you'll see a window that looks a little something like this where you can name the course. Most teachers will name the course after, like, the year of the course, and if they're teaching multiple, different classes in the same year, maybe assign each one a different, unique name just to help keep different things separate. So you give a name to your new course, and then once you create the course, you'll be presented with a page that looks a little something like this. And on that page, you as the teacher will see this link here, which is the invitation link for the course. That invitation is how you add new students to the course. So by taking that link and sharing it with students, students will then be able to click on that URL, and when they do, they'll be prompted to join your course on submit.cs50.io. What that will do is it will give you the teacher access to the students' submissions. So recall, again, that every student has their own me50 repository. By default, that repository is private because we don't want just anyone on the internet to be able to go to that me50 repository and be able to see all of the student's work. But when a student clicks on your invitation link for your submit.cs50.io course, you will automatically be granted access to that student's me50 repository so that you can see all of the work that they have submitted. So that's what the invitation link there is used for. Beneath that, you can specify all of these submission slugs that you care about collecting for the course. So submission slugs again are composed of those various, different parts, like a repository, then a branch, then the name of the problem. And the me50 repository contains all of the submissions that a student has across any course within CF50's ecosystem that they've taken, for example. And so if you only want to collect particular slugs, you can specify here which submission slugs you want to track, or even which prefixes of submission slugs that you want to catch for example. So if you only want to collect submission slugs from the 2020x version of CS50x, then you can specify, like, CS50 problems 2020x in the submission slugs area there to be sure that you're only collecting those particular submissions. That can all be specified there. And there are some other settings further down below, as well, where you can, for example, add additional teachers to your course. So, for example, if you're teaching a course, and that course has TAs that are assisting you in the process of working with students, as well, you can provide staff-specific invitation links that will give the staff access to all of their students' repositories as well. That, then, is how you can configure a submit.cs50.io course. And from there, if you notice at the top of the page, you'll notice a button that says Submissions. By clicking on that button, you'll be able to see all of the submissions from any of your students inside of this course. And what that usually looks like is a little something like this. Along the top, you'll see all of the problems to which students have submitted. So in this case, this is an example from last fall, where students submitted to cs50/problems/2019/fall/cash, for example. And I can see that, OK, this is the number of students that have submitted, and then I'll see a list of all of the submissions that have been made to that branch. So here, I'm seeing, for example, one student who looks like they submitted five minutes ago. And importantly, beneath the time of their submission, you'll see the automated results from check50 and style50. So you'll see the score that check50 provided for the submission. It looks like, in this case, the student got 10 out of the 11 checks correct. Then they'll also see a style score, rated on a scale from 0 to 1, 1 meaning all of the lines or 100% of the lines are correctly styled, and 0 meaning none of the lines are correctly styled. Questions about how all of this works? I see Oleg has a question. OLEG: Thank you very much for waiting. I have a question regarding the grade that is being displayed in [INAUDIBLE]. I have written checks on my own, and then I've read all the documentation, but I haven't found the instruction of how we can apply the gradings for our own tasks. Is it possible? BRIAN YU: Absolutely possible. All students need to do is run submit50 with your submission slug. So if they run submit50 with your submission slug, that will automatically run the check50 those submissions, assuming that you've enabled check50 in the .cs50 YAML file from before. So back when I was talking about how to configure check50, you saw the CS50 YAML file where we had, like, check50, and then specified what files to collect. That will just need to be there so that we know which files we're expecting to, check and we know to run check50. But so long as that a file is there and configured to enable check50, we will automatically run check50 whenever a new student submits. OLEG: And as a result, the grade should be displayed, because this styling is being displayed, but the grade is not there. BRIAN YU: Yeah, the style is run automatically, too. If the correctness score doesn't show up, it probably means that your check50 for that submission slug isn't correctly configured. And if ever having any trouble with configuration for check50, you can always email this email address that I've pasted into the chat here, sysadmins@cs50.harvard.edu. When you email that email address, just include a screenshot of the area that you're seeing, and give us the name of the submission slug that you're using, and the team can take a look and help you out with getting all of that configuration set up. OLEG: Thank you. BRIAN YU: Yeah, of course. Let's go now to John. John, if you'd like to ask a question. JOHN: Is there a way to gain greater control or custom rules for style50? Is that in the docs? BRIAN YU: Style50, ultimately, in terms of its implementation, is really a wrapper around existing linting tools. So for C, for example, we're using the tool AStyle to style the code, and for Python, I believe we're using autopep8 or something that checks it against the PEP8 standard Python style guide. So if you're just using style50 alone, it's going to be using those particular tools in their standard configuration, but the entire tool is open-source such that if you wanted to change the particular flags that we're providing to these styling tools for example, or even swap it out entirely with a different tool, that's something that you can do, as well. And I'll go ahead and paste the link to the style50 repository where you can see exactly how that's working for any of the files. And if you scroll down in the read me to the section about adding a new language, that will show you what it looks like to add some custom styling rules to style50. In short, you just need to implement a function called style that takes as argument the code, and then determines how it is that you want to style to code. And you can see the examples for how we've done that for C and Python and a few other languages already. JOHN: Awesome, thank you. BRIAN YU: Yeah, of course. Let's go back to Ahmaud. AHMAUD: OK. I'm wondering, regarding the Scratch, the Scratch is a little bit different from other problem sets, so how is submit50 different in the case of Scratch, and what kind of customizations are we allowed to do here? SPEAKER 1: Yeah, it's a good point. So when students are working on Scratch, at least in the context of CS50, at this point in time, they haven't yet been introduced to the command line, so we wouldn't expect them to run submit50 from the command line. So instead, we also have a front-end uploading interface built into submit.cs50.io where students can directly upload a file to submit, rather than running submit50, so there are multiple ways you can potentially submit work. But the front end of uploading the file is really doing the same thing that submit50 itself is doing, which is to say just pushing students' code to a branch of their me50 repository. So Scratch works in that way, where we just have students directly upload their Scratch project to submit50. And then in terms of running the automated correctness checks, this is another case where, with Scratch, it's not quite as simple as other programs where we're testing an input and testing and output. Because in our Scratch requirements, we say you need to have at least one sprite that isn't a cat, and you need to have at least one loop and at least one condition, for example. It turns out that the file format that Scratch uses, the .sb3 file format for the latest version of Scratch, is really just a zipped up package that includes a .json file that contains all of the details about the students' Scratch submission. So what we do when we grade Scratch projects is that we, in check50, are parsing that .json file and are searching through it to find particular things that we're looking for. So we're looking to see do they use a loop somewhere in that configuration? Do they use a sound somewhere, for example? A fun little tidbit is that they need to use a sprite that is not a cat, for example, and we were unsure for a little bit of time how to do that. We found that, at least in a previous version of the check, one of the ways that we could check for that is that we could check to see if any sprite did not have the ability to meow, because in Scratch by default, the cat has a meow sound. So in one of the original versions of check50, at least, we would search through the students' code and make sure they had a sprite that didn't meow, and that was how we knew that they had a non-cat sprite somewhere in their code. All of that is open-source if you'd like to take a look at our checks for how we check through Scratch, and I'm happy to talk through that file format because I know it's a little bit confusing the first time you see it. Shefket, question. SPEAKER 7: OK, thank you. Following the question of Ahmaud, does it-- any tool established to check for security projects or files submitted there? And the second one, for plagiarism in general. BRIAN YU: Yeah, so for check it for security project. If students are writing, I don't know, security-related projects or doing cryptography assignments-- is that what you're referring to? If so, check50 is designed to be as customizable as you want. As I mentioned a couple of times, each check itself is just a Python function, and so anything that you could write a Python function to do you could write a check50 check to check for it. And ultimately, what that function just needs to do is, for example, raise an exception if the check fails, and that is how check50 will know whether the check has passed or failed. So regardless of the domain in which the project falls, there is likely a way to write check50 checks to be able to integrate nicely with that. SPEAKER 7: Thank you, Brian. BRIAN YU: Yeah, of course. Was there another hand? I thought I saw another hand, but maybe it went down. Other questions about things related to check50 or submit50 or submit.cs50.io? Patrick/ PATRICK: Yes, hi. I had a question regarding weighting the marks when doing grading. Is it possible to somehow make some checks more important than others? BRIAN YU: All we do is we run all of the checks and store the results of those checks, and what submit.cs50.io's interface will show you is the number of checks that were passed out of the total number of checks. That being said, you do have access to all of the raw check results, as well, such that you could request via API call to get the results for all of the correctness checks to be able to see individually the check50 results for which checks are passing and failing, for example. And so based on that, you could then decide if you want to weight certain checks as more valuable than other checks. And the unique identifier we use for those checks is just the name of the function for each of | checks, because each check is really just a Python function. In fact, the easiest way to do this, for example, if you're running check50 locally, is that you can adjust check50's output format. Via command-line argument, if you look at check50 itself, by default, the check50 will output the smiley faces with the names of the individual checks. But you can also have check50 configured to output in machine-readable mode, where it's really just going to output some .json data with all of the checks and whether they passed or failed, along with an identifier for each of the checks based on their function name. And so using that information, if you wanted to weight certain checks as more valuable than others, you could just multiply the values of particular checks in order to get the results you want. Other things here? All right, a few other things I'll note about submit.cs50.io. One is the comments link on the right-hand side here. So I talked about check50, style50, as well as, now, comments. One of the advantages of using GitHub for all of our submission is that it makes it very easy to comment on a student submission. When you click on 0 comments, for example, you'll be taken to a page in GitHub, so this is now GitHub's user interface and not our own, where you'll see the students code. If you want to comment on an individual line, you can use GitHub's inline commenting abilities, where, next to an individual line, I could, for example, click on the plus button to say I'd like to add a comment to this line. When you click on that, a little text field will open up where you can then write a comment to them. And because this is all integrated into GitHub, you get all of the nice GitHub commenting features for free. So when you add this comment, students are automatically notified by email of that new comment, so they'll be able to see it. And they can also start a threaded discussion, where a student can then reply to the comment, and you can reply to it, as well, in order to have a conversation about students' code right inside of GitHub interface itself. This feature, we hope, will help to make it easier for you to provide feedback on students' code by centralizing into one place where you can find the results of the correctness checks, the results of the style checks, and also the place where you can leave feedback on students' code, as well, by seeing all the code in one place, and then being able to provide inline comments on it, too. So that is a brief overview of now submit.cs50.io, in addition to submit50 and check50. I'll pause before moving into our final tool of the day to take a few questions. There is a teacher's email list? Yes, there is a teacher's email list-- teachers@list.cs50.harvard.edu. We will send some emails to give you a way to sign up for that list if you're not already on it. Arturo, if you'd like to ask a question. ARTURO: Hey. Is there any way to create a batch? The problem is this-- in this community, the internet is very slow, and if I want the students to go through CS50, create the GitHub and everything, all of them [INAUDIBLE] be able to do it at once? So is there any way to do a batch submission where do they give me their software, their programs, and upload everything at once so it can be checked or they can go through CS50? I don't if I expressed well the question. BRIAN YU: Yeah, all a submission is as far as CS50's submit tools are concerned is that a submission is just a push to a particular branch of a GitHub repository. But because we don't expect that first-year computer science students have any familiarity with GitHub, submit50 as a command-line tool is really just a wrapper around a bunch of git commands that are pushing students code to get, without the students needing to know what git or GitHub or repositories of branches or commits are at all. We just abstract all of that away from students so they don't have to worry about it. But because it's just pushing something to a branch, while we don't have a script that automates it already, it's something that anyone theoretically could write to write a script that just pushes a whole bunch of submissions to GitHub, because all that you need to do to submit, as far as CS50 is concerned, is push that code to the corresponding problem branch of the student's GitHub repository. So we don't have a tool itself to do a bulk upload, but it would be possible to be able to create it. And if you are interested in doing that and reach out to the sysadmins, we can talk you through what that might look like. Other questions about any of these tools now? I thought I saw another hand raised. Maybe it went down, maybe your question was already answered. In that case, let's finally move on to the last tool that we'll be talking about today, which is compare50. A couple people today have already asked me about academic honesty and what's involved there. compare50 is a tool designed for this purpose. compare50 is a command-line tool for detecting similarity within students' code. So ultimately, compare50 is a tool that will take a whole bunch of students' submissions, and when you run compare50, compare50 will examine all pairs of submissions looking for pairs of submissions that are unusually similar and might be an instance of plagiarism, for example. It will try to highlight those very similar submissions, those pairs and submissions that share a lot in common, and try to draw that to your attention so that you can see it, as well. The way that you might install compare50 is the same way you might install other CS50 tools. You can run pip3 install compare50 on your computer, for example. And then the way we generally structure compare50 is to have you create one folder for each student, and inside of that folder, you put all of the code that you want to check for similarity against other student submissions. And so what you might do, then, is inside your terminal here, I've shown you an example just to show you what it might look like. I can run compare50* to say, check all of the submissions in this current folder. It is going to compare all of the submissions against each other, score them, and then output a web page that I can visit in order to see the results of compare50 itself. So what does that ultimately look like? Well, if I click on that URL at the bottom, where it says visit this particular HTML page, what you will see is a page that looks like this, where, along the left-hand side, you will see all of the top matches for pairs of submissions. And by default, we output something like the 50 top matches, but that's configurable, where you can decide which top matches you actually want to see. And we score each of them based on how similar those submissions actually are, where 10 means they're virtually identical, it's the highest possible score, down to 1, meaning they're basically not similar at all. So we'll score them from 1 to 10 based on how similar these submissions happen to be. And then on the right-hand side of this window, you'll see a graph, some visualization of clusters of students that happen to have submitted similar work. So oftentimes, in cases of collaboration, it might not just be two students that are collaborating with each other that might cross a line potentially, but it could be three or four students working in a cluster that are all sharing code, for example, where that might have crossed some line of academic honesty that you might want to be mindful of. And what we've tried to do on the right-hand side of this window is demonstrate that visualization just to show you how these clusters are connected. And then along the bottom, you'll see a bit of a slider, and that slider indicates the threshold for how strong of a match you want to actually be able to view on this page. So right now, the threshold is set at the minimum, 1, just at the far left of that slider, so we're seeing all 50 of the top matches in this case. But maybe that's more matches than we actually want to see, so we might drag that threshold to the right to increase the threshold for how similar we want something to be before we actually pay attention to it. So by dragging that threshold, you can watch, and what you'll see is that the a list of submissions begins to narrow down, and on the right-hand side, you'll see now we're left with only the submissions that have a score above, like, a 4, for example. And we see here, we have one pair of two submissions at that bottom cluster down below, and up a little bit higher, you see a cluster of three students who all appear to have similar code to one another. Those clusters are color-coded such that you can see on the right-hand side the colors that correspond to each cluster, and then on the left-hand side next to each of the pairs, you'll also see the color that corresponds with that cluster, just to make it visually easier for you the teacher to be able to see where it is these clusters might exist and what scores they're getting in terms of how similar these submissions happen to be. So when you then go into a particular submission and click on the submission to view it in more detail, what you will then see it is a side-by-side comparison of the two students. So here, we're seeing one student on the left and another student on the right, and looking at where it is that these two submissions are similar or not. And we allow you different modes for checking the similarity of submissions. So right now, we're in what's known as text mode that is literally just looking at the text of the submission, and looking for sequences of characters that happen to match between the two submissions. So what you're seeing highlighted are sequences of characters that are the same, both with the student on the left and with the student on the right. So in that case, you're seeing any of the similarities between these two submissions that are highlighted, just to make it a little bit more obvious to you, the teacher, who's looking at all of this, but there are other modes of comparison that you can jump into, as well, if you want to view this data in different ways. In particular, something that we sometimes will see is that a student will copy another student's code, but then change all the variable names, for example, or add some spaces, such that the text of the submission is now different. The variable names have changed. They're not going to match big sections of code to be identical in terms of the text, even though structurally, these two programs are identical, save for just some changes in what the names of the variables are. So in addition to a text-based pass, you'll see on the left-hand side that I can switch what mode I'm currently looking at. I can move from text mode, for example, into structure mode, where now, in structure mode, I'm comparing not just the text of the submission, but the overall structure of the submission. And here, we see that these two submissions are structurally identical. They might have changed or varied in some of the variable names or in some of the spacing, but in terms of how the program overall is structured, these two submissions are basically the same. So we allow you to switch between those modes just to get different views into the information that you're looking at and bring to your attention different things that might catch your eye. compare50 is just going to provide to you all of these top matches, but then it's usually up to you, the human, to actually go through and decide for any given pair if it actually crossed some line or not, for example. Ahmaud, question about this. AHMAUD: Yes. From your experience, Brian, you and the team, what would be a fair threshold for compare50? Since there will be similarities, after all, if students are submitting the same problem set. I know this can be different. For example, hello is going to be very similar, but for others, they're going to be less similar. But what would be a fair threshold for similarities? BRIAN YU: I'm reluctant to give an exact number because I think it's going to vary a lot based on the pair of submissions and based on the particular problem. Certainly, with the problems we assign, there are multiple ways to solve these problems. Oftentimes, you're going to find similarities just by chance. Or just naturally, because of common approaches to the problem, multiple submissions are going to look similar to one another. Generally, what we're looking for, though, is compare50 will score the submissions for us, and then we'll go through an order and look for ones where something stands out as particularly suspicious. Oftentimes, that's something like a comment written in English that happens to be identically worded between one submission and another in exactly the same place where it would be very unlikely for that thing to happen by chance. Or we'll look for a couple of such suspicious items that happen multiple times throughout a submission. This is also something that we'll occasionally look at across multiple submissions, too. If we notice unusual similarities in one problem, and then we notice unusual similarities in a problem the following week between the same pair of students, then that might be additional indication, as well. And compare50 is not designed to make these determinations for you, it's just designed to highlight to you the things that you might want to pay attention to such that you can then go in and use your, hopefully, better human judgment to be able to make those assessments, as well. If two students submitted the identical code, then, as someone asked in the chat, you would just literally see a perfect match, where structurally, textually, in terms of the exact match, they have all the exact, same text. Oleg, question? OLEG: Yes. As for now, the documentation says that this tool, compare50, is available locally. You have to install it. And at the same time, I remember you were telling that at some point, it might become a web version. Do you have any updates on that? BRIAN YU: No updates at the moment. It's certainly something that we're thinking about. For now, given some of the processing required in order to get check50 to run, and we also want to be mindful about things like privacy with student submissions, so at the moment, we're only currently making compare50 available as a command-m line tool for usage. Thinking about allowing a web version is something we thought about, but no current plans, or nothing active there at the moment. OLEG: So for this summer session, it's fair to say that it won't be in online. BRIAN YU: For this summer, certainly, yes. It's just going to be available as a command-line tool. You can find the source code for that command-line tool here. I've just pasted the GitHub link in the chat. BRIAN YU: It's entirely open-source so you can see how exactly it's doing that comparison. You can add to it if you would like to, as well. OLEG: Thank you. BRIAN YU: Let's go to Joseph now. Question from Joseph? JOSEPH: This was what I was trying to ask earlier. Does the tool look in the internet-- for example, Stack Overflow-- or scours a huge database to compare code? Or is it just limited to work that is submitted by other students? Because my thinking is English papers, when they check for plagiarism, they go through the internet. But with this case, you are kind of encouraged to be resourceful and search and use Google because, at work, for example, we're always using Google. There is always a new problem that you have to solve, and Google is almost better than documentation. So that's my question, how do you balance the two? Because you want to give them a skill that is marketable. When they go to work, they could write real programs, and that's what we do every day when we're writing real programs. BRIAN YU: Yeah, absolutely. It's a balance that we have to work with students to make sure they understand. Generally, our philosophy has been that if you are looking up how to solve some narrowly defined problem that is part of the submission that you're working on-- for example, in Caesar if you're looking for how do I check if a character is uppercase or not, and you borrow a snippet of code for how you use the isupper function in C to check if a character is uppercase, that we consider to be reasonable. You're looking and borrowing a snippet of code that is not solving the entire problem for you, but is just helping you to solve a piece of it. What we tell students they shouldn't be doing is Googling for solutions to the problem set itself. If they were to Google, for example, how do I write C code to perform a Caesar cipher on a string, or to rotate a string by a certain number of characters, that would probably be crossing some line, where they're really taking the entirety of the solution and using that as their own submission. And so we have an academic honesty policy that delineates the types of behaviors that we consider to be reasonable and that we consider to be not reasonable. In general, we think that borrowing a snippet of code, that seems very reasonable, but finding a snippet of code that solves the entire problem, that we consider to not be reasonable, for example. But there certainly is a bit of a judgment call. As for what compare50 itself does, compare50 is only going to search through the data that you give to it. So if you only give it student submissions, it will only compare those student submissions against each other. But what we will generally do as teachers is look up a solution to particular problem sets online and use that code, and we'll download that and use compare50 to check against it, as well. And there's a way that you can do that that I'll talk about in a moment, too, that's supported by compare50. Let's go now to Ramon. Ramon, you have a question? RAMON: You have talked about check50 and submit50, so I have a question for certain p set submission and the course's rigor, which is to what extent are we allowed to include additional instructions on the p sets in order to help students who are taking CS50 in another language, considering the fact that [INAUDIBLE] require students to have some prior knowledge of certain English words, for example? Furthermore, [AUDIO OUT] submission code requires knowledge of English words like red, blue, green, and many others. So to what extent can we modify the p set and the distribution code? And more importantly, how can we maintain the course's rigor while providing these additional instructions, considering that one wants to produce a translation of the course that is closest possible to the original one? BRIAN YU: Excellent question. To answer the first part of that, you are absolutely allowed to adapt the course's material by adding to the specifications, by editing the specifications to your liking. The course is available for you to be able to modify if you would like to. That's certainly something you can do if you'd like to change the problem set specifications for work with your own students, translating it to a different language, for example. All definitely OK. In terms of maintaining the rigor of the class, I would generally suggest that if you're translating the problem sets from one language to another, certainly fine to explain what these individual terms mean. In terms of pixels, if you're explaining red and green and blue, you can translate that, certainly, but so long as you are not, for example, giving away what code they would write in order to solve a particular part of the image filtering problem, for example. I think you can largely preserve the core of the problem solving process while just translating the instructions and the guidance to make it a little bit more accessible. But you're certainly welcome to adapt the course materials to whatever you think would be best for your own students. And a question now from Ohm, if you'd like to ask a question, Ohm. OHM: Hello, Brian. I'm Ohm from India. My question is when I go to create a new post in submit.cs50.io, it always tells failed to create GitHub teams. Why? BRIAN YU: If you're ever seeing an error when trying to create a course on submit.cs50.io, that can sometimes be due to a problem on GitHub's end. There's some GitHub configuration that needs to happen in order to perform that process. If you go ahead and just email the email that I sent in the chat here, sysadmins@cs50.harvard.edu, that will just let us know about the issue, and hopefully we'll be able to get back to you shortly when it's resolved. OHM: Thanks, Brian. BRIAN YU: Yeah, of course. Other questions about compare50? I'm looking through the chat. How does compare50 check for similar logic, like when variable names are changed? Without getting too technical, in short what we do is we parse the student's submission and get a sense for what the various, different parts of that submission are-- like, here is a curly brace, here is a variable name, here is the start of an if statement. And using that tokenization process, as it's called, we can then do a comparison while ignoring what the exact names of those variables are, just to check the structure of the program overall. Other questions about compare 50? All right, a couple other features that I'll mention about compare50. One is distribution code that you can specify. Oftentimes, if you are releasing a problem that already has some code written for students, if you were to just naively perform all of these comparisons, you would end up finding matches across all of your students because all of them are using the same distribution code. compare50 will allow you to specify what the distribution code is for a particular problem, such that when we're performing these comparisons, we're not going to bother comparing the distribution code, as well. We're only comparing the code the students actually wrote, just to hopefully make the results of compare50 a little more accurate and a little bit more relevant to you. compare50 also has support for archived submissions, such that if you have students that have taken your course in the past, or if you have online submissions that you want to check students' code against, you can store a whole bunch of archived submissions to a problem, and then compare your current students not only against each other, but also against any of the submissions inside of that archive. So we do this year after year after year, too, where when students are taking the course this year, we can compare their submissions against all of the submissions from prior years, as well. So you can store these archive submissions to be able to do these comparisons. And in addition to that, compare50 is open-source and extensible, such that if you wanted to add to compare50, we already do a fair amount of pre-processing and various, different types of passes to be able to try and produce useful results. If you want to extend it to add an additional pass for looking for something in particular that suits your needs as a teacher, you can absolutely extend compare50, as well. You can see how we've done it already inside of compare50 source code, which is, again, all open-source. We've designed it such that you can easily add additional passes to compare50, too, if you want to add functionality to deal with a particular use case that you might have. And that just about covers the four tools that I wanted to introduce to you all today-- check50, submit50, submit.cs50.io, and compare50. Before we wrap, though, I do wanted to leave an opportunity for anyone to ask questions about any of these tools if you still have any questions about any of the tools here, or any of the tools that Kareem talked about yesterday, or anything that you've seen in the workshop so far.