WEBVTT 00:00:00.000 --> 00:00:02.950 align:middle line:90% [MUSIC PLAYING] 00:00:02.950 --> 00:00:04.755 align:middle line:90% DAVID MALAN: This is CS50. 00:00:04.755 --> 00:00:07.677 align:middle line:90% [MUSIC PLAYING] 00:00:07.677 --> 00:00:08.760 align:middle line:90% DAVID MALAN: Hello, world. 00:00:08.760 --> 00:00:10.260 align:middle line:90% This is the CS50 podcast. 00:00:10.260 --> 00:00:11.370 align:middle line:90% My name is David Malan. 00:00:11.370 --> 00:00:12.787 align:middle line:90% BRIAN YU: And my name is Brian Yu. 00:00:12.787 --> 00:00:16.379 align:middle line:84% And today, we thought we'd discuss academic honesty in CS50. 00:00:16.379 --> 00:00:19.230 align:middle line:84% And so every year in CS50, we always have some number of cases 00:00:19.230 --> 00:00:22.320 align:middle line:84% of academic dishonesty where some number of students 00:00:22.320 --> 00:00:27.180 align:middle line:84% submit work that isn't their own, either by copying homework from a friend 00:00:27.180 --> 00:00:29.880 align:middle line:84% or by looking something up online and using a solution they 00:00:29.880 --> 00:00:32.229 align:middle line:90% find online as part of their solution. 00:00:32.229 --> 00:00:35.070 align:middle line:84% And so this is something that CS50 has had to deal with for years 00:00:35.070 --> 00:00:38.100 align:middle line:84% now in terms of how best to address this type of situation, 00:00:38.100 --> 00:00:40.593 align:middle line:84% and how best to prevent academic dishonesty in general. 00:00:40.593 --> 00:00:43.260 align:middle line:84% DAVID MALAN: Indeed this was-- when I first took over the course 00:00:43.260 --> 00:00:47.790 align:middle line:84% myself back in 2007, it was really an end of semester process. 00:00:47.790 --> 00:00:50.430 align:middle line:84% After the teaching Fellows would evaluate student's work 00:00:50.430 --> 00:00:52.305 align:middle line:84% and provide feedback throughout the semester, 00:00:52.305 --> 00:00:54.600 align:middle line:84% I would finally, all too often by semester end, 00:00:54.600 --> 00:00:58.260 align:middle line:84% carve out some time in order to then cross compare all of the submissions 00:00:58.260 --> 00:01:02.340 align:middle line:84% from that semester looking for statistically unlikely similarities 00:01:02.340 --> 00:01:03.660 align:middle line:90% between students work. 00:01:03.660 --> 00:01:06.540 align:middle line:84% Indeed, what a student might sometimes unfortunately do 00:01:06.540 --> 00:01:09.570 align:middle line:84% is copy the work of another student, lean too heavily 00:01:09.570 --> 00:01:12.330 align:middle line:84% on some resource online, copying more than a reasonable number 00:01:12.330 --> 00:01:13.350 align:middle line:90% of lines of code. 00:01:13.350 --> 00:01:16.290 align:middle line:84% And so by cross comparing all submissions with software 00:01:16.290 --> 00:01:19.080 align:middle line:84% itself, do we then notice which lines of code 00:01:19.080 --> 00:01:23.610 align:middle line:84% are in both student A student B's work, and then conclude ultimately, 00:01:23.610 --> 00:01:25.890 align:middle line:84% that statistically this was unlikely to happen. 00:01:25.890 --> 00:01:27.735 align:middle line:84% BRIAN YU: Now, how exactly do you draw those conclusions. 00:01:27.735 --> 00:01:30.180 align:middle line:84% Because I'm thinking about a programming language like C, 00:01:30.180 --> 00:01:32.770 align:middle line:84% there are only so many parts of the language. 00:01:32.770 --> 00:01:34.830 align:middle line:90% Their for loops and their conditions. 00:01:34.830 --> 00:01:37.230 align:middle line:84% And probably everyone's solutions to similar problems 00:01:37.230 --> 00:01:39.120 align:middle line:90% probably have these sorts of elements. 00:01:39.120 --> 00:01:41.120 align:middle line:84% So what exactly do you look for in this process? 00:01:41.120 --> 00:01:42.578 align:middle line:90% DAVID MALAN: Yeah, it's quite fair. 00:01:42.578 --> 00:01:44.502 align:middle line:84% If we relied on this kind of cross comparison 00:01:44.502 --> 00:01:46.710 align:middle line:84% for programs like Hello, World, everyone would appear 00:01:46.710 --> 00:01:48.640 align:middle line:90% to have written exactly the same code. 00:01:48.640 --> 00:01:52.140 align:middle line:84% But as soon as we get into CS50's second and third weeks 00:01:52.140 --> 00:01:55.470 align:middle line:84% where the programs they write in C tend to get a little longer, 00:01:55.470 --> 00:01:58.620 align:middle line:84% there does end up being more opportunity for creativity, 00:01:58.620 --> 00:02:02.192 align:middle line:84% for different stylized actions by students. 00:02:02.192 --> 00:02:03.900 align:middle line:84% And so students code does start to drift. 00:02:03.900 --> 00:02:05.858 align:middle line:84% Even though at the end of the day the solutions 00:02:05.858 --> 00:02:09.000 align:middle line:84% might still be using for loops and while loops and conditions and so forth, 00:02:09.000 --> 00:02:11.250 align:middle line:84% students might format their code slightly differently. 00:02:11.250 --> 00:02:13.620 align:middle line:84% They might write slightly different comments. 00:02:13.620 --> 00:02:17.680 align:middle line:84% And so what tends to happen over time, as the programs exceed 00:02:17.680 --> 00:02:21.750 align:middle line:84% maybe 10, 20, 30 lines of code, is there enough variation? 00:02:21.750 --> 00:02:24.300 align:middle line:84% And indeed, unfortunately, what we often notice 00:02:24.300 --> 00:02:27.527 align:middle line:84% is not even necessarily that the code is identical, because as you know, 00:02:27.527 --> 00:02:29.610 align:middle line:84% that in and of itself might just be a coincidence. 00:02:29.610 --> 00:02:31.767 align:middle line:84% Especially, when nowadays we have 800 students, 00:02:31.767 --> 00:02:34.350 align:middle line:84% it is absolutely going to be the case that two students write, 00:02:34.350 --> 00:02:36.397 align:middle line:90% by chance, very similar code. 00:02:36.397 --> 00:02:38.730 align:middle line:84% But unfortunately, the kinds of things we tend to notice 00:02:38.730 --> 00:02:41.940 align:middle line:84% is when students have the same typographical errors, 00:02:41.940 --> 00:02:44.460 align:middle line:84% or they use precisely the same variable names, 00:02:44.460 --> 00:02:47.850 align:middle line:84% or they make precisely the same mistake in precisely the same location. 00:02:47.850 --> 00:02:50.160 align:middle line:84% And at that point, our instincts start to kick in 00:02:50.160 --> 00:02:52.830 align:middle line:84% and we look at code like this and start to realize, 00:02:52.830 --> 00:02:55.650 align:middle line:84% while this may have happened by chance, on scale 00:02:55.650 --> 00:02:58.200 align:middle line:84% the odds that had happened in this line and in this line 00:02:58.200 --> 00:03:00.420 align:middle line:84% and in this line between two students code is 00:03:00.420 --> 00:03:04.440 align:middle line:84% just more likely than not better explained by some deliberate act. 00:03:04.440 --> 00:03:07.950 align:middle line:84% BRIAN YU: So at Harvard at least, when there are cases of academic dishonesty, 00:03:07.950 --> 00:03:10.680 align:middle line:84% they're usually referred to some administrative body, which 00:03:10.680 --> 00:03:12.870 align:middle line:84% now is called the Honor Council here at Harvard. 00:03:12.870 --> 00:03:15.203 align:middle line:84% And I think you've pointed out and a couple other people 00:03:15.203 --> 00:03:19.200 align:middle line:84% have pointed out that CS50, though it is the largest course that the university, 00:03:19.200 --> 00:03:24.772 align:middle line:84% does refer far more people to the Honor Council like any other class on campus. 00:03:24.772 --> 00:03:27.480 align:middle line:84% Do you think that has to do with something about computer science 00:03:27.480 --> 00:03:28.980 align:middle line:90% or introduction to computer science? 00:03:28.980 --> 00:03:30.428 align:middle line:90% Or why do you think that might be? 00:03:30.428 --> 00:03:31.470 align:middle line:90% DAVID MALAN: No, I don't. 00:03:31.470 --> 00:03:34.170 align:middle line:84% And that's certainly an unfortunate distinction that we've long had, 00:03:34.170 --> 00:03:37.200 align:middle line:84% say for, one or two years where there are issues in other departments. 00:03:37.200 --> 00:03:40.080 align:middle line:84% No, I don't think that computer science students are any less honest 00:03:40.080 --> 00:03:41.820 align:middle line:90% than their classmates in other fields. 00:03:41.820 --> 00:03:44.670 align:middle line:84% I don't think students in CS50 or any less honest than students 00:03:44.670 --> 00:03:46.290 align:middle line:90% in other computer science courses. 00:03:46.290 --> 00:03:51.180 align:middle line:84% I think it really boils down to one, you and I and educators in computer science 00:03:51.180 --> 00:03:54.720 align:middle line:84% are perhaps somewhat uniquely positioned with tools-- 00:03:54.720 --> 00:03:57.110 align:middle line:84% with software tools via which to detect it. 00:03:57.110 --> 00:03:59.422 align:middle line:84% And in a large introductory course like CS50, 00:03:59.422 --> 00:04:02.130 align:middle line:84% I think it's important not only out of fairness to those students 00:04:02.130 --> 00:04:06.360 align:middle line:84% who are behaving honestly throughout the term, but also because one of our goals 00:04:06.360 --> 00:04:08.430 align:middle line:84% should be in this course, to teach students 00:04:08.430 --> 00:04:10.530 align:middle line:84% the ethical application of computer science. 00:04:10.530 --> 00:04:14.610 align:middle line:84% That we should be holding students to those same expectations as 00:04:14.610 --> 00:04:17.279 align:middle line:84% are prescribed in great detail in the courses syllabus. 00:04:17.279 --> 00:04:23.160 align:middle line:84% And so I think it's really a function of our one, looking for it. 00:04:23.160 --> 00:04:27.240 align:middle line:84% And to two, through on it that really ends up explaining the large numbers. 00:04:27.240 --> 00:04:30.240 align:middle line:84% BRIAN YU: Yeah, so I'm looking here at the data from past years in CS50, 00:04:30.240 --> 00:04:32.865 align:middle line:84% and it does seem that there's also a fair amount of fluctuation 00:04:32.865 --> 00:04:36.060 align:middle line:84% in terms of what percentage of students in the course end 00:04:36.060 --> 00:04:37.800 align:middle line:90% up being referred to the Honor Council. 00:04:37.800 --> 00:04:40.380 align:middle line:84% Like, in 2009 for example, it looks like nobody 00:04:40.380 --> 00:04:42.030 align:middle line:90% was referred to the Honor Council. 00:04:42.030 --> 00:04:46.680 align:middle line:84% And in other years like 2010, 2012, there's like 1% or 2% of students. 00:04:46.680 --> 00:04:51.112 align:middle line:84% But in other years like 2015, it's up to 5%, 2016 is up to 10%. 00:04:51.112 --> 00:04:53.070 align:middle line:84% What do you think accounts for that fluctuation 00:04:53.070 --> 00:04:55.903 align:middle line:84% because that's a pretty big difference between one year and another? 00:04:55.903 --> 00:04:58.900 align:middle line:84% DAVID MALAN: Yeah, there really has been as you say, from 0% to 10% 00:04:58.900 --> 00:04:59.910 align:middle line:90% depending on the year. 00:04:59.910 --> 00:05:01.620 align:middle line:90% I think it's a few things. 00:05:01.620 --> 00:05:03.870 align:middle line:84% Part of it I think is just a function of how much time 00:05:03.870 --> 00:05:06.450 align:middle line:90% I or we put into the process. 00:05:06.450 --> 00:05:13.042 align:middle line:84% I think the year in 2009 when there were 0%, I did look for worrisome instances 00:05:13.042 --> 00:05:15.750 align:middle line:84% at that particular year, but admittedly in retrospect, I probably 00:05:15.750 --> 00:05:17.875 align:middle line:84% spent less time that year than the subsequent year. 00:05:17.875 --> 00:05:20.010 align:middle line:84% Because the subsequent year it went up to 2%. 00:05:20.010 --> 00:05:21.930 align:middle line:84% With that said, it might have been by chance, 00:05:21.930 --> 00:05:24.720 align:middle line:84% just a group of students who exhibited this pattern of behavior 00:05:24.720 --> 00:05:26.590 align:middle line:90% with far less frequency than others. 00:05:26.590 --> 00:05:28.870 align:middle line:84% So I think that's certainly possible as well. 00:05:28.870 --> 00:05:33.090 align:middle line:84% But I think the uptick in more recent years for instance, 10% in 2016 00:05:33.090 --> 00:05:36.450 align:middle line:84% and roughly 4% or 5% then, which is where 00:05:36.450 --> 00:05:38.940 align:middle line:84% we've been rather in equilibrium the past few years, 00:05:38.940 --> 00:05:43.380 align:middle line:84% I think is also a function of just how much time we invest in it. 00:05:43.380 --> 00:05:45.570 align:middle line:84% So back in 2008, and for a few years there after, 00:05:45.570 --> 00:05:47.760 align:middle line:84% it was only me who is engaged in this process. 00:05:47.760 --> 00:05:49.230 align:middle line:90% I would run the software by myself. 00:05:49.230 --> 00:05:51.390 align:middle line:84% I would look at students submissions side by side. 00:05:51.390 --> 00:05:54.015 align:middle line:84% And I would ultimately decide which to refer forward 00:05:54.015 --> 00:05:55.140 align:middle line:90% to Harvard's Honor Council. 00:05:55.140 --> 00:05:57.390 align:middle line:84% And then ultimately, document all those cases. 00:05:57.390 --> 00:06:00.900 align:middle line:84% But in more recent years have we involved more of CS50s senior staff 00:06:00.900 --> 00:06:01.560 align:middle line:90% in the process. 00:06:01.560 --> 00:06:06.180 align:middle line:84% The upside of which is that we can now one, analyze the submissions roughly 00:06:06.180 --> 00:06:07.370 align:middle line:90% on a week to week basis. 00:06:07.370 --> 00:06:09.870 align:middle line:84% The upside of which is that we can provide the Honor Council 00:06:09.870 --> 00:06:11.800 align:middle line:90% with the tails far more quickly. 00:06:11.800 --> 00:06:14.160 align:middle line:84% Students themselves, while though, never a pleasant 00:06:14.160 --> 00:06:16.893 align:middle line:84% process at least no sooner rather than later, rather 00:06:16.893 --> 00:06:18.810 align:middle line:84% than getting to the entire end of the semester 00:06:18.810 --> 00:06:22.177 align:middle line:84% and then realizing just how many or how often they cross some line. 00:06:22.177 --> 00:06:24.510 align:middle line:84% But two, the fact that we have multiple human eyes on it 00:06:24.510 --> 00:06:27.360 align:middle line:84% means that we do allocate more time week to week 00:06:27.360 --> 00:06:31.230 align:middle line:84% on each of the individual submissions and the crossways comparisons thereof. 00:06:31.230 --> 00:06:33.360 align:middle line:84% The upside though of those multiple humans, 00:06:33.360 --> 00:06:37.110 align:middle line:84% we now have two or three of us who ultimately vote on whether or not 00:06:37.110 --> 00:06:40.160 align:middle line:84% a case should move forward to the Honor Council is that I at least, 00:06:40.160 --> 00:06:43.410 align:middle line:84% and hopefully all of us, have much more comfort in sending a case to the Honor 00:06:43.410 --> 00:06:46.410 align:middle line:84% Council because not one pair of eyes, but two or three 00:06:46.410 --> 00:06:50.160 align:middle line:84% have all adjudicated it to be a clear indication of a line 00:06:50.160 --> 00:06:51.150 align:middle line:90% having been crossed. 00:06:51.150 --> 00:06:52.770 align:middle line:84% BRIAN YU: Can you tell me a little more about that process? 00:06:52.770 --> 00:06:54.060 align:middle line:84% You've talked about now that there are now 00:06:54.060 --> 00:06:56.435 align:middle line:84% a couple of eyes that are all looking at the submissions, 00:06:56.435 --> 00:06:58.810 align:middle line:84% but you've also talked about software being involved too. 00:06:58.810 --> 00:07:01.470 align:middle line:84% So what is the interplay there between the role that software 00:07:01.470 --> 00:07:04.320 align:middle line:84% plays in trying to detect this sort of thing and the role 00:07:04.320 --> 00:07:06.750 align:middle line:84% that people play in trying to detect academic dishonesty? 00:07:06.750 --> 00:07:08.542 align:middle line:84% DAVID MALAN: Yeah, I should first emphasize 00:07:08.542 --> 00:07:10.500 align:middle line:84% that it is not software that is ultimately 00:07:10.500 --> 00:07:13.560 align:middle line:84% disciplining students or referring them to Harvard's Honor Council. 00:07:13.560 --> 00:07:16.650 align:middle line:84% It is rather just a tool that we use as a first pass. 00:07:16.650 --> 00:07:19.650 align:middle line:84% Given that we have some, nowadays, 800 students, each of whom 00:07:19.650 --> 00:07:23.080 align:middle line:84% are submitting 10 homework problems over the course of the semester. 00:07:23.080 --> 00:07:26.760 align:middle line:84% This is a big O of-- n squared problem times 10 or so. 00:07:26.760 --> 00:07:29.970 align:middle line:84% So it's a huge number of comparisons that need to be made, 00:07:29.970 --> 00:07:33.490 align:middle line:84% and it just wouldn't be practically done by hand or by eye alone. 00:07:33.490 --> 00:07:36.660 align:middle line:84% So what we do is run software that literally cross compares 00:07:36.660 --> 00:07:39.300 align:middle line:84% every submission against every other submission 00:07:39.300 --> 00:07:42.630 align:middle line:84% sometimes, within the current year or even, based on our archives, 00:07:42.630 --> 00:07:46.680 align:middle line:84% against recent prior years as well which explodes the problem even more. 00:07:46.680 --> 00:07:49.050 align:middle line:84% And what we get out of that software based process 00:07:49.050 --> 00:07:54.150 align:middle line:84% is a list from top to bottom of pairs of submissions that the software considers 00:07:54.150 --> 00:07:55.810 align:middle line:90% worrisome least similar. 00:07:55.810 --> 00:07:59.340 align:middle line:84% And then we, the humans, typically go through the top 50 or the top 100 00:07:59.340 --> 00:08:02.820 align:middle line:84% matches on that list and use our human eyes and our own experience 00:08:02.820 --> 00:08:05.430 align:middle line:84% and our instincts to decide, ah, this just happened by chance 00:08:05.430 --> 00:08:08.222 align:middle line:84% or, oh, as you said, this is a relatively short program like Hello, 00:08:08.222 --> 00:08:09.180 align:middle line:90% World or Mario. 00:08:09.180 --> 00:08:11.910 align:middle line:84% This is just bound to happen at that point in the semester. 00:08:11.910 --> 00:08:14.890 align:middle line:84% But certainly as the problems get more sophisticated 00:08:14.890 --> 00:08:19.230 align:middle line:84% and the code gets longer is it more clear to multiple humans that, hmm, 00:08:19.230 --> 00:08:21.990 align:middle line:84% looks like something's awry here, especially when it is again, 00:08:21.990 --> 00:08:24.810 align:middle line:84% the same variable names or the same comments or worse, 00:08:24.810 --> 00:08:28.350 align:middle line:84% the same comments with typographical or grammatical errors 00:08:28.350 --> 00:08:30.960 align:middle line:84% in exactly the same place, odds are that's much more 00:08:30.960 --> 00:08:34.080 align:middle line:84% likely to indicate copy paste than it is two students independently 00:08:34.080 --> 00:08:36.780 align:middle line:84% in their own rooms, on their own laptops literally writing 00:08:36.780 --> 00:08:38.293 align:middle line:90% in the same place the same errors. 00:08:38.293 --> 00:08:39.210 align:middle line:90% BRIAN YU: Makes sense. 00:08:39.210 --> 00:08:42.539 align:middle line:84% And it's also interesting that depending on the type of software 00:08:42.539 --> 00:08:46.410 align:middle line:84% that you use, in the same way that a compiler can take a C program 00:08:46.410 --> 00:08:48.550 align:middle line:84% and figure out what is the structure of the program 00:08:48.550 --> 00:08:51.020 align:middle line:84% and compare the structure of a program to another, 00:08:51.020 --> 00:08:54.640 align:middle line:84% that these sorts of comparison programs can do the same thing. 00:08:54.640 --> 00:08:56.940 align:middle line:84% They can take two pieces of code, and even 00:08:56.940 --> 00:08:59.700 align:middle line:84% if they might use slightly different variable names, 00:08:59.700 --> 00:09:02.280 align:middle line:84% can still look at the structure of the program as a whole 00:09:02.280 --> 00:09:04.350 align:middle line:84% and try and compare them against each other 00:09:04.350 --> 00:09:06.540 align:middle line:84% to do some more sophisticated comparisons. 00:09:06.540 --> 00:09:09.270 align:middle line:84% DAVID MALAN: Yeah, and thanks to some of CS50s team members, Chad 00:09:09.270 --> 00:09:12.030 align:middle line:84% and [? Yella ?] and Kareem, we now have our own tools, Compare50, 00:09:12.030 --> 00:09:13.620 align:middle line:90% which automates this process for us. 00:09:13.620 --> 00:09:15.780 align:middle line:84% And you can perhaps, given your experience in the space, 00:09:15.780 --> 00:09:18.270 align:middle line:84% speak a little more perhaps to the algorithmics underneath the hood? 00:09:18.270 --> 00:09:19.980 align:middle line:84% BRIAN YU: Yeah, it is really Chad and [? Yella ?] 00:09:19.980 --> 00:09:21.810 align:middle line:84% and Kareem that were doing a lot of the work there. 00:09:21.810 --> 00:09:24.227 align:middle line:84% But algorithmically, it's sort of an interesting challenge 00:09:24.227 --> 00:09:26.460 align:middle line:84% to figure out how to do these sorts of comparisons. 00:09:26.460 --> 00:09:29.205 align:middle line:84% Because even though it might seem like a computer 00:09:29.205 --> 00:09:31.080 align:middle line:84% is obviously going to be able to do it faster 00:09:31.080 --> 00:09:32.740 align:middle line:84% than people are going to be able to do it, 00:09:32.740 --> 00:09:34.615 align:middle line:84% it's still a lot of work even for a computer. 00:09:34.615 --> 00:09:38.010 align:middle line:84% Especially, if you consider like 800 students in the class being compared 00:09:38.010 --> 00:09:41.350 align:middle line:84% against all of the other students, plus all of the students who have ever taken 00:09:41.350 --> 00:09:44.570 align:middle line:84% CS50 before, not only for one problem, but for all of the problems 00:09:44.570 --> 00:09:45.380 align:middle line:90% in the course. 00:09:45.380 --> 00:09:47.640 align:middle line:84% That's a lot of work for any computer to do. 00:09:47.640 --> 00:09:50.850 align:middle line:84% And so there is a lot of interesting algorithmic efficiencies 00:09:50.850 --> 00:09:53.070 align:middle line:84% that have been put into the software in order 00:09:53.070 --> 00:09:54.570 align:middle line:90% to make it work a little bit better. 00:09:54.570 --> 00:09:57.630 align:middle line:84% Trying to take advantage of things you actually learn about in CS50. 00:09:57.630 --> 00:10:01.173 align:middle line:84% Things like hashing in order to store data inside of a hash table 00:10:01.173 --> 00:10:03.090 align:middle line:84% so you can very quickly look up whether or not 00:10:03.090 --> 00:10:06.472 align:middle line:84% you've seen a particular pattern of characters in a file before. 00:10:06.472 --> 00:10:09.680 align:middle line:84% Those sort of data structures all come into play if you start to think about, 00:10:09.680 --> 00:10:12.405 align:middle line:84% how do you try and solve this problem in a way that's efficient? 00:10:12.405 --> 00:10:13.430 align:middle line:90% DAVID MALAN: Yeah. 00:10:13.430 --> 00:10:16.880 align:middle line:84% And besides software, certainly our own policies have evolved over time. 00:10:16.880 --> 00:10:19.790 align:middle line:84% So you know for instance, that in a few weeks time, 00:10:19.790 --> 00:10:22.910 align:middle line:84% we'll be presenting at a computer science education conference called 00:10:22.910 --> 00:10:25.640 align:middle line:84% CSEIT a recent paper that a few of us worked on 00:10:25.640 --> 00:10:28.550 align:middle line:84% based on our experience with issues of academic dishonesty 00:10:28.550 --> 00:10:29.850 align:middle line:90% over the past few years. 00:10:29.850 --> 00:10:32.390 align:middle line:84% And it's perhaps worth noting that software aside, 00:10:32.390 --> 00:10:37.010 align:middle line:84% I think one of the more noteworthy policy changes 00:10:37.010 --> 00:10:40.100 align:middle line:84% we introduced some years ago was CS50s so-called Regret Clause. 00:10:40.100 --> 00:10:42.680 align:middle line:84% Which was just a single sentence that we added to the courses 00:10:42.680 --> 00:10:45.890 align:middle line:84% syllabus that encourage students to come forward 00:10:45.890 --> 00:10:49.130 align:middle line:84% if within 72 hours of submitting some work, 00:10:49.130 --> 00:10:51.710 align:middle line:84% they realized that, oh, they had indeed crossed some line. 00:10:51.710 --> 00:10:54.530 align:middle line:84% They had copied unduly from some resource online. 00:10:54.530 --> 00:10:57.140 align:middle line:84% They had copied some portion of code from a classmate 00:10:57.140 --> 00:10:59.060 align:middle line:84% or otherwise, somehow other across the line 00:10:59.060 --> 00:11:02.157 align:middle line:84% that was prescribed in the course of syllabus as being not reasonable. 00:11:02.157 --> 00:11:04.490 align:middle line:84% And what we committed to doing in writing in the courses 00:11:04.490 --> 00:11:06.527 align:middle line:84% syllabus was there would still be penalty 00:11:06.527 --> 00:11:08.360 align:middle line:84% and there would still be consequence, but it 00:11:08.360 --> 00:11:12.692 align:middle line:84% would be limited for instance, to our zeroing the problem or the problem 00:11:12.692 --> 00:11:14.150 align:middle line:90% set that the student had submitted. 00:11:14.150 --> 00:11:18.020 align:middle line:84% And we committed not to escalating the matter to Harvard's Honor Council. 00:11:18.020 --> 00:11:21.050 align:middle line:84% The hope was that we could actually turn what had historically 00:11:21.050 --> 00:11:24.860 align:middle line:84% been purely punitive processes whereby we detect some transgression, 00:11:24.860 --> 00:11:27.170 align:middle line:84% we refer it to the Honor Council, and there 00:11:27.170 --> 00:11:31.550 align:middle line:84% after the student is penalized in some way, the most extreme outcome of which 00:11:31.550 --> 00:11:35.270 align:middle line:84% might actually be required time off from Harvard University itself. 00:11:35.270 --> 00:11:37.310 align:middle line:84% We wanted to create a window of opportunity 00:11:37.310 --> 00:11:40.790 align:middle line:84% where students after some sleep, some thought, some reflection, 00:11:40.790 --> 00:11:42.170 align:middle line:90% can actually own up to a mistake. 00:11:42.170 --> 00:11:44.300 align:middle line:84% Because for so many years, so many of our cases 00:11:44.300 --> 00:11:48.350 align:middle line:84% were truly involving students who at 2:00 AM 3:00 AM 4:00 AM are 00:11:48.350 --> 00:11:52.370 align:middle line:84% under very little sleep, under significant amount of stress, 00:11:52.370 --> 00:11:55.700 align:middle line:84% and with a deadline not only in CS50, but perhaps some other course looming, 00:11:55.700 --> 00:11:59.270 align:middle line:84% made some poor decision to take the quick way out 00:11:59.270 --> 00:12:02.620 align:middle line:84% to just copy and paste someone else's work and submit it on their own. 00:12:02.620 --> 00:12:06.470 align:middle line:84% And even if they've decided or realized a day or two later, wow, 00:12:06.470 --> 00:12:08.270 align:middle line:90% really didn't mean to do that. 00:12:08.270 --> 00:12:11.530 align:middle line:84% Really shouldn't have done that, we had never described 00:12:11.530 --> 00:12:14.090 align:middle line:84% a well-documented process for how they should handle that 00:12:14.090 --> 00:12:15.340 align:middle line:90% and how they could own up. 00:12:15.340 --> 00:12:18.800 align:middle line:84% And so this Regret Clause was meant to help ideally chip away 00:12:18.800 --> 00:12:20.910 align:middle line:84% at the total number of cases we were seeing. 00:12:20.910 --> 00:12:23.345 align:middle line:84% But ultimately, help students meet us halfway 00:12:23.345 --> 00:12:25.220 align:middle line:84% so that it becomes more of a teachable moment 00:12:25.220 --> 00:12:27.200 align:middle line:90% if you will and not just punitive. 00:12:27.200 --> 00:12:30.973 align:middle line:84% BRIAN YU: So I remember when I first took CS50 in fall 2015 it was, 00:12:30.973 --> 00:12:33.140 align:middle line:84% I remember seeing the Regret Clause in the syllabus. 00:12:33.140 --> 00:12:34.535 align:middle line:90% And I remember being a little surprised. 00:12:34.535 --> 00:12:36.560 align:middle line:84% Because it wasn't something I had seen before. 00:12:36.560 --> 00:12:39.050 align:middle line:84% It's not something that many other classes do. 00:12:39.050 --> 00:12:41.750 align:middle line:84% Not really anything that I was familiar with. 00:12:41.750 --> 00:12:44.360 align:middle line:84% So I'm curious about where the policy came from? 00:12:44.360 --> 00:12:46.190 align:middle line:90% Was it inspired by any other policy? 00:12:46.190 --> 00:12:49.140 align:middle line:84% Or where did you start to find your way to this idea? 00:12:49.140 --> 00:12:52.012 align:middle line:84% And what was the process like for bringing this into the course? 00:12:52.012 --> 00:12:53.720 align:middle line:84% DAVID MALAN: Yeah, it was really inspired 00:12:53.720 --> 00:12:58.520 align:middle line:84% by having, for almost 10 years, watched the number of cases 00:12:58.520 --> 00:13:03.300 align:middle line:84% come through CS50 and watching the circumstances that ultimately explain 00:13:03.300 --> 00:13:03.800 align:middle line:90% them. 00:13:03.800 --> 00:13:07.220 align:middle line:84% Again, these late night poor decisions under a great stress. 00:13:07.220 --> 00:13:10.340 align:middle line:84% And it just felt like we, the teachers of the course, should be doing 00:13:10.340 --> 00:13:14.360 align:middle line:84% or could be doing a more proactive job at trying to tackle this problem. 00:13:14.360 --> 00:13:18.050 align:middle line:84% And not just looking to detect it, but looking to teach students how to one, 00:13:18.050 --> 00:13:19.680 align:middle line:90% ideally avoid it altogether. 00:13:19.680 --> 00:13:25.100 align:middle line:84% But two, even if they do cross some line how to address the situation then. 00:13:25.100 --> 00:13:28.280 align:middle line:84% And yet, it was not with great ease that we rolled this out. 00:13:28.280 --> 00:13:31.760 align:middle line:84% There were absolutely some sensitivities on campus among administrators, 00:13:31.760 --> 00:13:36.770 align:middle line:84% among the universities Honor Council, who had long standing processes when 00:13:36.770 --> 00:13:39.800 align:middle line:84% it came to issues of academic dishonesty, not only for CS50, 00:13:39.800 --> 00:13:41.060 align:middle line:90% but all courses at Harvard. 00:13:41.060 --> 00:13:44.420 align:middle line:84% The upside of course, is that by having a central body, Harvard's Honor 00:13:44.420 --> 00:13:48.230 align:middle line:84% Council, adjudicate all of these cases, you have uniform processes. 00:13:48.230 --> 00:13:50.930 align:middle line:84% You hopefully have more equitable outcomes overall. 00:13:50.930 --> 00:13:53.240 align:middle line:84% And there was great concern initially in some circles 00:13:53.240 --> 00:13:56.480 align:middle line:84% that we were now doing something more on our own internally. 00:13:56.480 --> 00:14:00.380 align:middle line:84% And so it only debuted after quite a few conversations with Harvard's Honor 00:14:00.380 --> 00:14:02.690 align:middle line:84% Council and administration so that we can ultimately 00:14:02.690 --> 00:14:05.570 align:middle line:84% get folks comfortable with what, at the time, was an experiment, 00:14:05.570 --> 00:14:08.622 align:middle line:84% but now is an ongoing six year policy for us at least. 00:14:08.622 --> 00:14:10.580 align:middle line:84% BRIAN YU: All right so now six years in, policy 00:14:10.580 --> 00:14:11.720 align:middle line:90% has been around for a little while. 00:14:11.720 --> 00:14:14.240 align:middle line:84% Do you feel like it's done what you expected it to do? 00:14:14.240 --> 00:14:17.330 align:middle line:84% How does it compare to what your original goals and objectives were 00:14:17.330 --> 00:14:19.970 align:middle line:84% for what the policy would do for the class and for students? 00:14:19.970 --> 00:14:22.340 align:middle line:84% DAVID MALAN: Yeah, so we hoped that it would actually 00:14:22.340 --> 00:14:24.440 align:middle line:84% chip away at the total number of cases that we 00:14:24.440 --> 00:14:26.270 align:middle line:84% were referring to Harvard's Honor Council, 00:14:26.270 --> 00:14:28.700 align:middle line:90% but it did not in fact, do that. 00:14:28.700 --> 00:14:31.520 align:middle line:84% Interestingly enough, the number of cases 00:14:31.520 --> 00:14:34.580 align:middle line:84% we have referred to the Honor Council since have been roughly the same 00:14:34.580 --> 00:14:40.040 align:middle line:84% or even higher in some years than prior to the Regret Clauses introduction. 00:14:40.040 --> 00:14:42.800 align:middle line:84% We had the wonderfully successfully and nontrivial number 00:14:42.800 --> 00:14:45.290 align:middle line:84% of students avail themselves of this clause. 00:14:45.290 --> 00:14:48.080 align:middle line:84% Most years so in the court clauses first year, 00:14:48.080 --> 00:14:51.710 align:middle line:84% 2014, we had 19 students come forward under this clause, 00:14:51.710 --> 00:14:55.310 align:middle line:84% reach out to me in the courses hedge, generally by way of an email first. 00:14:55.310 --> 00:14:58.220 align:middle line:84% After which we would then schedule time to chat with me. 00:14:58.220 --> 00:15:00.530 align:middle line:84% And I would chat with these 19 students one on one 00:15:00.530 --> 00:15:03.860 align:middle line:84% and better understand what had happened and what had they done. 00:15:03.860 --> 00:15:06.140 align:middle line:84% Better understand what circumstances had led 00:15:06.140 --> 00:15:09.620 align:middle line:84% to them having made whatever decision it was we were then discussing. 00:15:09.620 --> 00:15:12.560 align:middle line:84% And then ultimately, explicitly tell them, all right, 00:15:12.560 --> 00:15:14.090 align:middle line:90% let's consider the matter behind us. 00:15:14.090 --> 00:15:16.160 align:middle line:84% After zeroing the particular work in question 00:15:16.160 --> 00:15:18.820 align:middle line:84% to reassure them that this was indeed the end of that process. 00:15:18.820 --> 00:15:21.820 align:middle line:84% But the beginning, hopefully, of a healthier approach to future problems 00:15:21.820 --> 00:15:22.520 align:middle line:90% sets. 00:15:22.520 --> 00:15:25.220 align:middle line:90% And we would then encourage them to-- 00:15:25.220 --> 00:15:29.150 align:middle line:84% and discuss with them ways for better managed managing their time, 00:15:29.150 --> 00:15:30.612 align:middle line:90% better managing their stress. 00:15:30.612 --> 00:15:32.570 align:middle line:84% In some cases, too, it came to light that there 00:15:32.570 --> 00:15:34.210 align:middle line:90% were extenuating circumstances. 00:15:34.210 --> 00:15:36.770 align:middle line:84% Students struggling with issues at home, with their family, 00:15:36.770 --> 00:15:40.080 align:middle line:84% with relationships, with other courses, issues of mental health. 00:15:40.080 --> 00:15:43.700 align:middle line:84% And so what was a pleasant revelation to us 00:15:43.700 --> 00:15:47.570 align:middle line:84% was that we were able more proactively than had been possible in the past 00:15:47.570 --> 00:15:49.860 align:middle line:84% to connect students with support resources on campus, 00:15:49.860 --> 00:15:51.830 align:middle line:84% whether academic in the case of tutoring, 00:15:51.830 --> 00:15:54.560 align:middle line:84% or perhaps health in the way of mental health. 00:15:54.560 --> 00:15:57.380 align:middle line:84% So that too seemed to be a positive outcome and the experience 00:15:57.380 --> 00:16:00.860 align:middle line:84% that we were able to connect up to 19 students 00:16:00.860 --> 00:16:03.360 align:middle line:84% that first year with other resources on campus. 00:16:03.360 --> 00:16:04.610 align:middle line:90% And there after it fluctuated. 00:16:04.610 --> 00:16:06.860 align:middle line:90% In 2015, we had 26 students. 00:16:06.860 --> 00:16:09.080 align:middle line:90% In 2016, we had seven students. 00:16:09.080 --> 00:16:11.990 align:middle line:84% Then it went back up in 2017 to 18 students. 00:16:11.990 --> 00:16:14.240 align:middle line:84% And I think this variation is partly just a function 00:16:14.240 --> 00:16:16.230 align:middle line:90% of messaging on our part, on my part. 00:16:16.230 --> 00:16:18.710 align:middle line:84% How much time we spend in lectures and in emails 00:16:18.710 --> 00:16:22.160 align:middle line:84% during the semester reminding students of the policy's availability. 00:16:22.160 --> 00:16:25.630 align:middle line:84% I also suspect that there's some ebb and flow based on the current-- 00:16:25.630 --> 00:16:26.570 align:middle line:90% the given year. 00:16:26.570 --> 00:16:31.010 align:middle line:84% If more students in this class know that a student in the previous year 00:16:31.010 --> 00:16:34.650 align:middle line:84% might have invoked this clause there just might be broader awareness of it. 00:16:34.650 --> 00:16:39.450 align:middle line:84% But it's been a good number of students, I think every semester. 00:16:39.450 --> 00:16:42.650 align:middle line:84% However, the fact that we didn't see a downturn in the number of cases 00:16:42.650 --> 00:16:44.520 align:middle line:90% we referred too was also a surprise. 00:16:44.520 --> 00:16:48.140 align:middle line:84% In fact, in the first year of the Regret Clauses existence, 00:16:48.140 --> 00:16:51.500 align:middle line:84% it turned out that most, if not all of the students 00:16:51.500 --> 00:16:55.010 align:middle line:84% that invoke the Regret Clause did not even appear on our radar 00:16:55.010 --> 00:16:57.822 align:middle line:84% when we ran our software based cross comparisons of their work. 00:16:57.822 --> 00:16:59.780 align:middle line:84% Which suggested that had they not come forward, 00:16:59.780 --> 00:17:02.960 align:middle line:84% we actually would not have noticed and they would not 00:17:02.960 --> 00:17:05.720 align:middle line:84% have been connected ideally with these resources. 00:17:05.720 --> 00:17:09.079 align:middle line:90% And so that too was a bit of a surprise. 00:17:09.079 --> 00:17:11.150 align:middle line:84% These students invoking the Regret Clause 00:17:11.150 --> 00:17:15.020 align:middle line:84% dare say composed a different demographic of students 00:17:15.020 --> 00:17:16.790 align:middle line:84% that we hadn't yet previously identified. 00:17:16.790 --> 00:17:19.670 align:middle line:84% Students who had indeed crossed some lines in many cases, 00:17:19.670 --> 00:17:23.780 align:middle line:84% but that had not been connected with or been 00:17:23.780 --> 00:17:27.589 align:middle line:84% offered some teachable moment that might actually help them course correct. 00:17:27.589 --> 00:17:30.720 align:middle line:84% And I should note too, that of the 19 students, 26 students, and so forth, 00:17:30.720 --> 00:17:32.762 align:middle line:84% not all of them it had indeed crossed some lines. 00:17:32.762 --> 00:17:35.623 align:middle line:84% In several cases each year, were students unnecessarily worried. 00:17:35.623 --> 00:17:38.540 align:middle line:84% And so I would simply reassure them and thank them for coming forward, 00:17:38.540 --> 00:17:41.520 align:middle line:84% but not to worry, you've navigated the waters properly. 00:17:41.520 --> 00:17:43.520 align:middle line:84% BRIAN YU: Yeah, it's really interesting that now 00:17:43.520 --> 00:17:45.270 align:middle line:84% by reaching this other demographic, you've 00:17:45.270 --> 00:17:48.770 align:middle line:84% been able to have these sorts of chats that otherwise may not 00:17:48.770 --> 00:17:51.770 align:middle line:84% have been able to happen and connect them with other kinds of resources. 00:17:51.770 --> 00:17:54.410 align:middle line:84% I'm curious as to what are the kinds of advice you 00:17:54.410 --> 00:17:58.340 align:middle line:84% give to students that find difficulty with time management and stress? 00:17:58.340 --> 00:18:00.800 align:middle line:84% Because I think this is not a unique problem to CS50 00:18:00.800 --> 00:18:03.740 align:middle line:84% that and other computer science classes are just in school in general 00:18:03.740 --> 00:18:05.100 align:middle line:90% or even outside of school. 00:18:05.100 --> 00:18:08.760 align:middle line:84% Like, time management, stress, managing these things and making good decisions 00:18:08.760 --> 00:18:09.420 align:middle line:90% is-- 00:18:09.420 --> 00:18:10.160 align:middle line:90% it's challenging. 00:18:10.160 --> 00:18:13.260 align:middle line:84% And something that I'm sure many students and other people face. 00:18:13.260 --> 00:18:14.510 align:middle line:90% DAVID MALAN: Yeah, absolutely. 00:18:14.510 --> 00:18:16.880 align:middle line:84% To be honest, it's fairly straightforward things. 00:18:16.880 --> 00:18:20.270 align:middle line:84% It's things that we even put in the courses syllabus or FAQs often. 00:18:20.270 --> 00:18:22.850 align:middle line:84% For instance, in a programming class like ours, start early. 00:18:22.850 --> 00:18:26.750 align:middle line:84% You have nearly seven days from start to finish for each programming assignment. 00:18:26.750 --> 00:18:29.210 align:middle line:84% And the key to avoiding a lot of the stress 00:18:29.210 --> 00:18:32.180 align:middle line:84% is to just start early, so that when you do invariably hit a wall 00:18:32.180 --> 00:18:35.867 align:middle line:84% or encounter some bug that you just can't quite see, you can go to sleep, 00:18:35.867 --> 00:18:37.700 align:middle line:84% you can go for a run, you can take a shower. 00:18:37.700 --> 00:18:39.920 align:middle line:84% You can take a break from it and come back to it 00:18:39.920 --> 00:18:43.220 align:middle line:84% some hours or even a couple of days later and have that perspective. 00:18:43.220 --> 00:18:47.790 align:middle line:84% I mean even I found in the real world that I do not produce good code when I, 00:18:47.790 --> 00:18:48.920 align:middle line:90% myself am under stress. 00:18:48.920 --> 00:18:50.240 align:middle line:90% It's no fun. 00:18:50.240 --> 00:18:51.890 align:middle line:90% It doesn't yield correct results. 00:18:51.890 --> 00:18:56.840 align:middle line:84% And so really helping students realize that, it is a relatively simple fix. 00:18:56.840 --> 00:19:01.340 align:middle line:84% They just really need to take charge and commit themselves to that. 00:19:01.340 --> 00:19:04.310 align:middle line:84% Besides that, it's often a matter of referring students and reminding 00:19:04.310 --> 00:19:07.370 align:middle line:84% them of the many resources that the course offers on campus, whether it's 00:19:07.370 --> 00:19:11.780 align:middle line:84% the courses lectures, or sections, or office hours, or notes or tutorials, 00:19:11.780 --> 00:19:14.443 align:middle line:84% or any number of online and in-person resources. 00:19:14.443 --> 00:19:17.360 align:middle line:84% And just reminding themselves that you need to meet the course halfway 00:19:17.360 --> 00:19:19.047 align:middle line:90% and take advantage of these resources. 00:19:19.047 --> 00:19:20.880 align:middle line:84% And it's no surprise that you are struggling 00:19:20.880 --> 00:19:24.350 align:middle line:84% if you're not availing yourself of at least some of these resources. 00:19:24.350 --> 00:19:25.910 align:middle line:84% BRIAN YU: Yeah, actually it's always incredible to me 00:19:25.910 --> 00:19:28.910 align:middle line:84% when on our problems at forums, we always ask students like, on what day 00:19:28.910 --> 00:19:30.217 align:middle line:90% did you start the problems set? 00:19:30.217 --> 00:19:33.050 align:middle line:84% And so many students respond like the day of the deadline or the day 00:19:33.050 --> 00:19:36.350 align:middle line:84% before the deadline for a project that we wrote with the expectation 00:19:36.350 --> 00:19:38.810 align:middle line:84% that it will take students a week to complete it. 00:19:38.810 --> 00:19:41.600 align:middle line:84% And students are trying to do it like day of or day before. 00:19:41.600 --> 00:19:45.160 align:middle line:84% It always amazes me the number of cases where that ends up happening. 00:19:45.160 --> 00:19:48.410 align:middle line:84% DAVID MALAN: Yeah, so I think the more we can send that message even before we 00:19:48.410 --> 00:19:50.720 align:middle line:84% get to the point of a student having regret 00:19:50.720 --> 00:19:53.450 align:middle line:90% clause this conversation, the better. 00:19:53.450 --> 00:19:57.020 align:middle line:84% I should note though too, that another surprise effect of the regret clause 00:19:57.020 --> 00:20:02.130 align:middle line:84% was not even that we-- or the number of cases we referred didn't go down, 00:20:02.130 --> 00:20:05.090 align:middle line:84% but rather at least in at least one year they went significantly up. 00:20:05.090 --> 00:20:10.230 align:middle line:84% In 2016, and as you noted, is when we had 10% of the courses student body. 00:20:10.230 --> 00:20:14.020 align:middle line:84% So this is 10% of the students taking CS50 referred to the courses-- 00:20:14.020 --> 00:20:15.950 align:middle line:90% to the university's Honor Council. 00:20:15.950 --> 00:20:17.670 align:middle line:90% But to be honest that too was in part. 00:20:17.670 --> 00:20:20.990 align:middle line:84% And I think our numbers since have been partly a reflection of our feeling 00:20:20.990 --> 00:20:24.530 align:middle line:84% that when we do detect what appears to be a straightforward 00:20:24.530 --> 00:20:28.880 align:middle line:84% case of academic dishonesty, plagiarism of some sort, duplication of code, 00:20:28.880 --> 00:20:33.020 align:middle line:84% these days, I think I personally am even more comfortable referring the case 00:20:33.020 --> 00:20:37.010 align:middle line:84% than I was in years past because we have given students an opportunity 00:20:37.010 --> 00:20:38.740 align:middle line:90% to meet us halfway and reach out. 00:20:38.740 --> 00:20:41.030 align:middle line:84% And indeed, as you know, in every one of the courses 00:20:41.030 --> 00:20:44.570 align:middle line:84% problem sets this year on the form via which they submitted their work, 00:20:44.570 --> 00:20:48.200 align:middle line:84% we asked them to check a checkbox to acknowledge their understanding 00:20:48.200 --> 00:20:50.010 align:middle line:90% of the clauses availability. 00:20:50.010 --> 00:20:52.670 align:middle line:84% And so at that point, if we are not only reminding students 00:20:52.670 --> 00:20:55.160 align:middle line:84% each week that it's available and they are not thereafter 00:20:55.160 --> 00:20:57.980 align:middle line:84% taking advantage of it, it seems quite reasonable, 00:20:57.980 --> 00:21:00.230 align:middle line:84% I think, for the course to move forward with the more 00:21:00.230 --> 00:21:02.510 align:middle line:84% traditional punitive process involving the Honor 00:21:02.510 --> 00:21:06.363 align:middle line:84% Council to investigate whether indeed the line had been crossed. 00:21:06.363 --> 00:21:07.280 align:middle line:90% BRIAN YU: I'm curious. 00:21:07.280 --> 00:21:09.710 align:middle line:84% So we often talk now about like the line being crossed 00:21:09.710 --> 00:21:11.400 align:middle line:90% and what it means to cross the line. 00:21:11.400 --> 00:21:16.250 align:middle line:84% I'm curious about how you see this in the context of programming assignments 00:21:16.250 --> 00:21:18.500 align:middle line:84% in particular. like if you're writing an essay 00:21:18.500 --> 00:21:22.250 align:middle line:84% and you copy a sentence, that seems like very clearly copying. 00:21:22.250 --> 00:21:24.980 align:middle line:84% But in the case of code if you copy a line of code 00:21:24.980 --> 00:21:27.890 align:middle line:84% you see from Stack Overflow for example, if you're looking up like, 00:21:27.890 --> 00:21:31.015 align:middle line:84% how do I solve this particular problem, and you incorporate a line of code, 00:21:31.015 --> 00:21:32.960 align:middle line:90% that that might not be crossing a line. 00:21:32.960 --> 00:21:36.660 align:middle line:84% So how do you think about where the line is in the context of a programming 00:21:36.660 --> 00:21:37.160 align:middle line:90% assignment? 00:21:37.160 --> 00:21:39.110 align:middle line:90% And how to teach that kind of thing? 00:21:39.110 --> 00:21:40.430 align:middle line:84% DAVID MALAN: Yeah, it's a really good question. 00:21:40.430 --> 00:21:42.222 align:middle line:84% And it's a common question, because I think 00:21:42.222 --> 00:21:45.620 align:middle line:84% there's a perception among folks both in the software 00:21:45.620 --> 00:21:49.850 align:middle line:84% world and non-software world that this notion of academic dishonesty 00:21:49.850 --> 00:21:52.370 align:middle line:84% in a programming class itself is incompatible with the idea 00:21:52.370 --> 00:21:53.000 align:middle line:90% of programming. 00:21:53.000 --> 00:21:55.190 align:middle line:90% And I do very much disagree with that. 00:21:55.190 --> 00:21:57.920 align:middle line:84% The lines that we prescribe to students, both in broad strokes 00:21:57.920 --> 00:22:01.550 align:middle line:84% and in very precise bullets in the courses syllabus, essentially 00:22:01.550 --> 00:22:04.268 align:middle line:84% try to teach students to be reasonable so to speak. 00:22:04.268 --> 00:22:05.310 align:middle line:90% And what might that mean? 00:22:05.310 --> 00:22:07.393 align:middle line:84% Well, early in the semester in CS50, we of course, 00:22:07.393 --> 00:22:11.360 align:middle line:84% have students in C, and later in Python implement Mario's Pyramid. 00:22:11.360 --> 00:22:15.570 align:middle line:84% So a sort of pyramid-like structure just using some ASCII art to paint that 00:22:15.570 --> 00:22:16.070 align:middle line:90% picture. 00:22:16.070 --> 00:22:18.440 align:middle line:84% And it involves ultimately like a couple of for loops. 00:22:18.440 --> 00:22:21.380 align:middle line:84% It would be unreasonable for students to go off and Google or look 00:22:21.380 --> 00:22:24.770 align:middle line:84% on Stack Overflow for something like, how print Mario's Pyramid. 00:22:24.770 --> 00:22:27.500 align:middle line:84% That would be a search for the outright solution to the problem. 00:22:27.500 --> 00:22:30.350 align:middle line:84% And surely it is not our intent to assess you 00:22:30.350 --> 00:22:32.750 align:middle line:84% on your ability to Google a solution like that as opposed 00:22:32.750 --> 00:22:34.310 align:middle line:90% to crafting it yourself. 00:22:34.310 --> 00:22:37.400 align:middle line:84% However, it would be very reasonable for instance, to Google something 00:22:37.400 --> 00:22:43.670 align:middle line:84% like, how write nested for loops in C. Or how print spaces in C. 00:22:43.670 --> 00:22:47.870 align:middle line:84% Because it's actually not obvious to students one, how you can actually 00:22:47.870 --> 00:22:50.402 align:middle line:84% have two loops and one nested inside of the other 00:22:50.402 --> 00:22:51.860 align:middle line:90% using different counting variables. 00:22:51.860 --> 00:22:54.900 align:middle line:84% And two, how to print would appear to be blank spaces on the screen, 00:22:54.900 --> 00:22:57.420 align:middle line:84% not quite appreciating that it's actually just the SPACEBAR. 00:22:57.420 --> 00:22:59.253 align:middle line:84% So I think it's very reasonable for students 00:22:59.253 --> 00:23:02.480 align:middle line:84% and it is allowed in the course syllabus to look for short snippets 00:23:02.480 --> 00:23:03.680 align:middle line:90% so to speak of code. 00:23:03.680 --> 00:23:06.170 align:middle line:84% Where a snippet itself is one line, few lines, 00:23:06.170 --> 00:23:08.510 align:middle line:84% but it is not the essence of the problem. 00:23:08.510 --> 00:23:10.640 align:middle line:84% And so indeed when we do find that students 00:23:10.640 --> 00:23:12.920 align:middle line:84% have crossed the line, what has happened is 00:23:12.920 --> 00:23:15.000 align:middle line:84% we notice some curiosity about their code. 00:23:15.000 --> 00:23:17.660 align:middle line:84% It's maybe very similar to another student's code 00:23:17.660 --> 00:23:19.760 align:middle line:84% or it suggests a technique that we haven't 00:23:19.760 --> 00:23:23.210 align:middle line:84% taught in the class or some syntax that's not consistent with what we 00:23:23.210 --> 00:23:25.170 align:middle line:90% know students have seen in the class. 00:23:25.170 --> 00:23:29.030 align:middle line:84% And so we ourselves might Google certain key phrases or portions of code 00:23:29.030 --> 00:23:30.740 align:middle line:90% or comments that we see in their code. 00:23:30.740 --> 00:23:34.610 align:middle line:84% And sure enough, it too often leads us to the very same GitHub repository 00:23:34.610 --> 00:23:39.020 align:middle line:84% or Reddit post, wherein someone else has posted exactly that same code 00:23:39.020 --> 00:23:40.580 align:middle line:90% that the student has copy pasted. 00:23:40.580 --> 00:23:42.980 align:middle line:84% And so there too, the kinds of cases we are referring 00:23:42.980 --> 00:23:44.990 align:middle line:84% are not the many, many, many students code 00:23:44.990 --> 00:23:47.785 align:middle line:84% who very reasonably use these kinds of digital resources. 00:23:47.785 --> 00:23:49.910 align:middle line:84% But the ones who use these resources, and then take 00:23:49.910 --> 00:23:53.930 align:middle line:84% shortcuts to submission as by just copying and pasting many lines of code 00:23:53.930 --> 00:23:55.380 align:middle line:90% that they see. 00:23:55.380 --> 00:23:57.407 align:middle line:84% BRIAN YU: So other than the Regret Clause now, 00:23:57.407 --> 00:23:59.240 align:middle line:84% which we've talked about for a little while, 00:23:59.240 --> 00:24:01.550 align:middle line:84% have there been any other things you've thought about doing 00:24:01.550 --> 00:24:03.467 align:middle line:84% or things you have done to the course in terms 00:24:03.467 --> 00:24:07.490 align:middle line:84% of thinking about how to either address academic dishonesty when it happens 00:24:07.490 --> 00:24:09.225 align:middle line:90% or to try to prevent it beforehand? 00:24:09.225 --> 00:24:10.100 align:middle line:90% DAVID MALAN: We have. 00:24:10.100 --> 00:24:13.852 align:middle line:84% So couple of years ago, we introduced the courses Brink Clauses, so to speak. 00:24:13.852 --> 00:24:17.060 align:middle line:84% Which was a couple of sentences inspired by a colleague of ours at Princeton, 00:24:17.060 --> 00:24:20.780 align:middle line:84% Chris Moretti, who gave us some really inspiring language that 00:24:20.780 --> 00:24:26.900 align:middle line:84% encouraged students in the courses syllabus to write us late at night 00:24:26.900 --> 00:24:30.320 align:middle line:84% just as they felt themselves being on the brink of making a poor decision. 00:24:30.320 --> 00:24:33.080 align:middle line:84% That is to say, even when you and I and most of the courses staff 00:24:33.080 --> 00:24:36.710 align:middle line:84% might be asleep and a student might be working late at night on their work, 00:24:36.710 --> 00:24:38.720 align:middle line:84% it would be reasonable to assume that they 00:24:38.720 --> 00:24:44.270 align:middle line:84% could get a response to a request for an extension for instance. 00:24:44.270 --> 00:24:46.160 align:middle line:84% And so with this brink clause prescribed was 00:24:46.160 --> 00:24:49.100 align:middle line:84% a mechanism for students to send that note to say, listen, 00:24:49.100 --> 00:24:50.970 align:middle line:90% I really feel like I'm in a bad place. 00:24:50.970 --> 00:24:55.640 align:middle line:84% And I worry I'm about to make a poor decision as by copying and pasting 00:24:55.640 --> 00:24:57.235 align:middle line:90% too many lines of code online. 00:24:57.235 --> 00:24:59.360 align:middle line:84% I'd like to discuss this tomorrow and indeed that's 00:24:59.360 --> 00:25:02.120 align:middle line:90% what the syllabus asked them to do. 00:25:02.120 --> 00:25:03.752 align:middle line:90% Go to sleep, don't submit your work. 00:25:03.752 --> 00:25:05.210 align:middle line:90% We'll figure it out in the morning. 00:25:05.210 --> 00:25:08.690 align:middle line:84% And just writing students to write us and meet us halfway 00:25:08.690 --> 00:25:11.420 align:middle line:84% under that sort of duress was the intent of the clause. 00:25:11.420 --> 00:25:15.260 align:middle line:84% Unfortunately, when it was invoked some number of times 00:25:15.260 --> 00:25:18.050 align:middle line:84% that first year, based on the wording of the emails, 00:25:18.050 --> 00:25:20.160 align:middle line:84% based on the conversations we had with students, 00:25:20.160 --> 00:25:25.070 align:middle line:84% it really devolved into a backdoor to just extensions. 00:25:25.070 --> 00:25:27.455 align:middle line:84% We did not believe, ironically, that most 00:25:27.455 --> 00:25:29.330 align:middle line:84% of the students who were invoking this clause 00:25:29.330 --> 00:25:33.320 align:middle line:84% were actually on the brink of doing something academic dishonest. 00:25:33.320 --> 00:25:36.050 align:middle line:84% They were simply on the brink of not meeting the deadline. 00:25:36.050 --> 00:25:39.970 align:middle line:84% And so we ended up removing the clause from the courses syllabus, . 00:25:39.970 --> 00:25:40.970 align:middle line:90% Ultimately 00:25:40.970 --> 00:25:45.920 align:middle line:84% But I'm glad we did try it, but this was one example of a measure that, at least 00:25:45.920 --> 00:25:48.830 align:middle line:84% for us, in our context, in our implementation failed. 00:25:48.830 --> 00:25:52.850 align:middle line:84% But I do think more compelling has been what we introduced a few years ago 00:25:52.850 --> 00:25:55.880 align:middle line:84% in the spirit of the Regret Clause, but whereby we actually 00:25:55.880 --> 00:25:57.480 align:middle line:90% initiate the conversations. 00:25:57.480 --> 00:25:59.993 align:middle line:84% So it's not infrequently been the case that when 00:25:59.993 --> 00:26:02.660 align:middle line:84% we've crossed compared so many students submissions that there's 00:26:02.660 --> 00:26:04.940 align:middle line:84% a few cases that seem a little worrisome, 00:26:04.940 --> 00:26:07.250 align:middle line:84% but it definitely doesn't seem like it's over the line. 00:26:07.250 --> 00:26:10.663 align:middle line:84% We certainly wouldn't refer them to the Honor Council on that basis. 00:26:10.663 --> 00:26:13.580 align:middle line:84% But we realized that this then would is an opportunity for us to maybe 00:26:13.580 --> 00:26:17.480 align:middle line:84% go chat with those students now and say, hey, listen, you appeared on our radar. 00:26:17.480 --> 00:26:20.160 align:middle line:84% We think it's because of the similarities between your code 00:26:20.160 --> 00:26:21.410 align:middle line:90% and maybe some other students. 00:26:21.410 --> 00:26:24.590 align:middle line:84% And we would leave the other student anonymously out of it. 00:26:24.590 --> 00:26:27.920 align:middle line:84% But we would then ask the student, how did you get your code to this point? 00:26:27.920 --> 00:26:30.230 align:middle line:84% Walk us through the process and let's figure out 00:26:30.230 --> 00:26:34.550 align:middle line:84% how you came so close to what we worried was crossing a line, 00:26:34.550 --> 00:26:36.450 align:middle line:84% so that you can just avoid it moving forward. 00:26:36.450 --> 00:26:38.200 align:middle line:84% And so these interventional conversations, 00:26:38.200 --> 00:26:41.068 align:middle line:84% as we describe them internally, I hope has actually 00:26:41.068 --> 00:26:43.610 align:middle line:84% gone a long way to just helping students navigate the waters. 00:26:43.610 --> 00:26:45.440 align:middle line:84% Even if they don't cross those lines, they at least now 00:26:45.440 --> 00:26:48.470 align:middle line:84% are being more conscious and thoughtful about what it is they're doing. 00:26:48.470 --> 00:26:50.803 align:middle line:84% BRIAN YU: And what do you usually gather from those sort 00:26:50.803 --> 00:26:51.830 align:middle line:90% of interventional chats? 00:26:51.830 --> 00:26:55.040 align:middle line:84% Like what sort of actions you find that students are taking? 00:26:55.040 --> 00:26:57.313 align:middle line:84% Does is seem like there's some teachable moment there 00:26:57.313 --> 00:26:58.730 align:middle line:90% that you're helping students with? 00:26:58.730 --> 00:27:01.105 align:middle line:84% DAVID MALAN: I think so because not infrequently would it 00:27:01.105 --> 00:27:03.890 align:middle line:84% be the case that two students were indeed working reasonably 00:27:03.890 --> 00:27:06.410 align:middle line:90% on the homework assignment together. 00:27:06.410 --> 00:27:10.310 align:middle line:84% But they were perhaps asking each other a few too many questions about code. 00:27:10.310 --> 00:27:13.250 align:middle line:84% It wasn't necessarily entirely in pseudocode or in English, 00:27:13.250 --> 00:27:14.210 align:middle line:90% their conversations. 00:27:14.210 --> 00:27:17.300 align:middle line:84% And maybe one was being shown the other's code, 00:27:17.300 --> 00:27:20.123 align:middle line:84% which is allowed within some circumstances per the syllabus. 00:27:20.123 --> 00:27:21.540 align:middle line:90% But maybe a little too frequently. 00:27:21.540 --> 00:27:24.980 align:middle line:84% And so as such, their work was just sort of over time, 00:27:24.980 --> 00:27:27.620 align:middle line:90% converging to become one in the same. 00:27:27.620 --> 00:27:30.995 align:middle line:84% And so given that we would have these chats within a week of them having done 00:27:30.995 --> 00:27:33.620 align:middle line:84% that, it was usually pretty obvious to students like, oh, let's 00:27:33.620 --> 00:27:34.970 align:middle line:90% not do that again. 00:27:34.970 --> 00:27:36.958 align:middle line:90% And recalibrate their approach. 00:27:36.958 --> 00:27:38.750 align:middle line:84% BRIAN YU: So it seems like all in all, CS50 00:27:38.750 --> 00:27:41.810 align:middle line:84% has tried a lot with the Regret Clause, with the Brink Clause, 00:27:41.810 --> 00:27:44.420 align:middle line:84% with these interventional chats that you've had with students. 00:27:44.420 --> 00:27:48.025 align:middle line:84% A lot that CS50 has done with regards to the issue of academic dishonesty 00:27:48.025 --> 00:27:50.150 align:middle line:84% and trying to create teachable moments out of that. 00:27:50.150 --> 00:27:52.850 align:middle line:84% And trying to work within the university and with students 00:27:52.850 --> 00:27:55.410 align:middle line:90% on how to improve that situation. 00:27:55.410 --> 00:27:58.970 align:middle line:84% What do you think are the lessons to be taken away for other courses? 00:27:58.970 --> 00:28:01.280 align:middle line:84% What can other classes do, either in computer science 00:28:01.280 --> 00:28:04.940 align:middle line:84% or outside of computer science that they can do based on the lessons 00:28:04.940 --> 00:28:06.980 align:middle line:84% that you and the course overall has learned 00:28:06.980 --> 00:28:10.820 align:middle line:84% from these years of working with these issues of academic dishonesty? 00:28:10.820 --> 00:28:13.520 align:middle line:84% DAVID MALAN: I think one takeaway has been just clarity. 00:28:13.520 --> 00:28:18.227 align:middle line:84% Our policy in the courses syllabus is not short, but it is detailed. 00:28:18.227 --> 00:28:20.060 align:middle line:84% And that's the result of a lot of situations 00:28:20.060 --> 00:28:22.070 align:middle line:84% having arisen over the years, a lot of conversations 00:28:22.070 --> 00:28:23.362 align:middle line:90% having happened over the years. 00:28:23.362 --> 00:28:27.470 align:middle line:84% And so I am glad that we do documents so clearly for students, 00:28:27.470 --> 00:28:31.100 align:middle line:84% where the lines are and what our expectations of students are. 00:28:31.100 --> 00:28:34.640 align:middle line:84% Toward that end too, I think it has been a good thing that we've introduced 00:28:34.640 --> 00:28:36.530 align:middle line:90% these interventional conversations. 00:28:36.530 --> 00:28:42.500 align:middle line:84% Even if a course is not as involved in the mechanics of the process as we are, 00:28:42.500 --> 00:28:45.500 align:middle line:84% they're not necessarily running software across compare your submission. 00:28:45.500 --> 00:28:47.333 align:middle line:84% But when something does appear on the radar, 00:28:47.333 --> 00:28:49.291 align:middle line:84% if a teaching fellow or teaching assistant does 00:28:49.291 --> 00:28:52.375 align:middle line:84% notice some curiosity in the student's code, it's dissimilar to their code 00:28:52.375 --> 00:28:55.010 align:middle line:84% last week or it's a little too similar to another student's, I 00:28:55.010 --> 00:28:58.160 align:middle line:84% think just being comfortable reaching out proactively to those students, 00:28:58.160 --> 00:29:01.175 align:middle line:84% not to impugn them, but rather to say, listen, we have some concerns. 00:29:01.175 --> 00:29:03.050 align:middle line:84% We don't feel you've crossed a line, but we'd 00:29:03.050 --> 00:29:05.510 align:middle line:84% like to better understand what you've done and how you did this. 00:29:05.510 --> 00:29:08.135 align:middle line:84% So that we can steer you in the right direction moving forward. 00:29:08.135 --> 00:29:12.652 align:middle line:84% That too seems a very straightforward, healthy and teachable opportunity. 00:29:12.652 --> 00:29:14.360 align:middle line:84% And as for the Regret Clause, I certainly 00:29:14.360 --> 00:29:16.280 align:middle line:84% think it's worth trying in other classes. 00:29:16.280 --> 00:29:19.040 align:middle line:84% I think it certainly is completely reasonable 00:29:19.040 --> 00:29:21.890 align:middle line:84% that a course, whether ours or anyone else's, 00:29:21.890 --> 00:29:26.480 align:middle line:84% just clearly defines what steps students should take when they find themselves 00:29:26.480 --> 00:29:27.560 align:middle line:90% in certain situations. 00:29:27.560 --> 00:29:29.780 align:middle line:84% And prior to the forgot clause it was ill-defined. 00:29:29.780 --> 00:29:34.040 align:middle line:84% What should a student do if they make a poor decision, especially late at night 00:29:34.040 --> 00:29:38.760 align:middle line:84% and then they do actually regret it the next day or some number of hours later? 00:29:38.760 --> 00:29:40.200 align:middle line:90% There was no well-defined process. 00:29:40.200 --> 00:29:42.110 align:middle line:84% And while technically, there was nothing stopping a student 00:29:42.110 --> 00:29:44.540 align:middle line:84% from coming forward and turning themselves in, 00:29:44.540 --> 00:29:46.850 align:middle line:84% I can certainly appreciate the trepidation 00:29:46.850 --> 00:29:49.760 align:middle line:84% that a student might have with taking that on not knowing 00:29:49.760 --> 00:29:50.990 align:middle line:90% what the outcome might be. 00:29:50.990 --> 00:29:57.140 align:middle line:84% Especially, if they assume it might even be time off from the University itself. 00:29:57.140 --> 00:29:59.390 align:middle line:84% So I think the fact that we've sort of clarified 00:29:59.390 --> 00:30:02.330 align:middle line:84% how to conduct oneself before you get to that point, 00:30:02.330 --> 00:30:06.220 align:middle line:84% after you get to that point, and after we have detected as much, 00:30:06.220 --> 00:30:08.297 align:middle line:84% is just only fair to students in the class. 00:30:08.297 --> 00:30:10.630 align:middle line:84% BRIAN YU: I think there are a lot of very useful lessons 00:30:10.630 --> 00:30:14.560 align:middle line:84% there in terms of what classes can start to do about this sort of issue. 00:30:14.560 --> 00:30:17.560 align:middle line:84% Certainly, if any of you are interested in learning more about this, 00:30:17.560 --> 00:30:19.540 align:middle line:84% we've actually written a paper, the two of us 00:30:19.540 --> 00:30:23.340 align:middle line:84% along with Doug Lloyd on CS50s teams about economic honesty in CS50. 00:30:23.340 --> 00:30:25.090 align:middle line:84% So we can provide a link to that if you're 00:30:25.090 --> 00:30:28.600 align:middle line:84% interested in reading more about the policy and about the Regret Clause 00:30:28.600 --> 00:30:31.840 align:middle line:84% and about other interventions that CS50 is made on these sorts of issues. 00:30:31.840 --> 00:30:32.673 align:middle line:90% DAVID MALAN: Indeed. 00:30:32.673 --> 00:30:34.720 align:middle line:84% The title is Teaching Academic Honesty In CS50. 00:30:34.720 --> 00:30:36.470 align:middle line:84% If you want to Google something like that. 00:30:36.470 --> 00:30:39.512 align:middle line:84% And if you're more interested in the software side of things and the cost 00:30:39.512 --> 00:30:44.860 align:middle line:84% comparison of submissions, if you go to github.com/cs50/compare50 you'll be 00:30:44.860 --> 00:30:47.830 align:middle line:84% able to play around with the open source software there as well. 00:30:47.830 --> 00:30:49.940 align:middle line:84% BRIAN YU: Certainly, if you have any feedback about today's podcast 00:30:49.940 --> 00:30:52.420 align:middle line:84% or suggestions for future podcast ideas, you can always 00:30:52.420 --> 00:30:55.210 align:middle line:90% reach us at cs50.harvard.edu. 00:30:55.210 --> 00:30:57.090 align:middle line:90% DAVID MALAN: This was CS50.