1 00:00:00,000 --> 00:00:02,950 [MUSIC PLAYING] 2 00:00:02,950 --> 00:00:04,755 DAVID MALAN: This is CS50. 3 00:00:04,755 --> 00:00:07,677 [MUSIC PLAYING] 4 00:00:07,677 --> 00:00:08,760 DAVID MALAN: Hello, world. 5 00:00:08,760 --> 00:00:10,260 This is the CS50 podcast. 6 00:00:10,260 --> 00:00:11,370 My name is David Malan. 7 00:00:11,370 --> 00:00:12,787 BRIAN YU: And my name is Brian Yu. 8 00:00:12,787 --> 00:00:16,379 And today, we thought we'd discuss academic honesty in CS50. 9 00:00:16,379 --> 00:00:19,230 And so every year in CS50, we always have some number of cases 10 00:00:19,230 --> 00:00:22,320 of academic dishonesty where some number of students 11 00:00:22,320 --> 00:00:27,180 submit work that isn't their own, either by copying homework from a friend 12 00:00:27,180 --> 00:00:29,880 or by looking something up online and using a solution they 13 00:00:29,880 --> 00:00:32,229 find online as part of their solution. 14 00:00:32,229 --> 00:00:35,070 And so this is something that CS50 has had to deal with for years 15 00:00:35,070 --> 00:00:38,100 now in terms of how best to address this type of situation, 16 00:00:38,100 --> 00:00:40,593 and how best to prevent academic dishonesty in general. 17 00:00:40,593 --> 00:00:43,260 DAVID MALAN: Indeed this was-- when I first took over the course 18 00:00:43,260 --> 00:00:47,790 myself back in 2007, it was really an end of semester process. 19 00:00:47,790 --> 00:00:50,430 After the teaching Fellows would evaluate student's work 20 00:00:50,430 --> 00:00:52,305 and provide feedback throughout the semester, 21 00:00:52,305 --> 00:00:54,600 I would finally, all too often by semester end, 22 00:00:54,600 --> 00:00:58,260 carve out some time in order to then cross compare all of the submissions 23 00:00:58,260 --> 00:01:02,340 from that semester looking for statistically unlikely similarities 24 00:01:02,340 --> 00:01:03,660 between students work. 25 00:01:03,660 --> 00:01:06,540 Indeed, what a student might sometimes unfortunately do 26 00:01:06,540 --> 00:01:09,570 is copy the work of another student, lean too heavily 27 00:01:09,570 --> 00:01:12,330 on some resource online, copying more than a reasonable number 28 00:01:12,330 --> 00:01:13,350 of lines of code. 29 00:01:13,350 --> 00:01:16,290 And so by cross comparing all submissions with software 30 00:01:16,290 --> 00:01:19,080 itself, do we then notice which lines of code 31 00:01:19,080 --> 00:01:23,610 are in both student A student B's work, and then conclude ultimately, 32 00:01:23,610 --> 00:01:25,890 that statistically this was unlikely to happen. 33 00:01:25,890 --> 00:01:27,735 BRIAN YU: Now, how exactly do you draw those conclusions. 34 00:01:27,735 --> 00:01:30,180 Because I'm thinking about a programming language like C, 35 00:01:30,180 --> 00:01:32,770 there are only so many parts of the language. 36 00:01:32,770 --> 00:01:34,830 Their for loops and their conditions. 37 00:01:34,830 --> 00:01:37,230 And probably everyone's solutions to similar problems 38 00:01:37,230 --> 00:01:39,120 probably have these sorts of elements. 39 00:01:39,120 --> 00:01:41,120 So what exactly do you look for in this process? 40 00:01:41,120 --> 00:01:42,578 DAVID MALAN: Yeah, it's quite fair. 41 00:01:42,578 --> 00:01:44,502 If we relied on this kind of cross comparison 42 00:01:44,502 --> 00:01:46,710 for programs like Hello, World, everyone would appear 43 00:01:46,710 --> 00:01:48,640 to have written exactly the same code. 44 00:01:48,640 --> 00:01:52,140 But as soon as we get into CS50's second and third weeks 45 00:01:52,140 --> 00:01:55,470 where the programs they write in C tend to get a little longer, 46 00:01:55,470 --> 00:01:58,620 there does end up being more opportunity for creativity, 47 00:01:58,620 --> 00:02:02,192 for different stylized actions by students. 48 00:02:02,192 --> 00:02:03,900 And so students code does start to drift. 49 00:02:03,900 --> 00:02:05,858 Even though at the end of the day the solutions 50 00:02:05,858 --> 00:02:09,000 might still be using for loops and while loops and conditions and so forth, 51 00:02:09,000 --> 00:02:11,250 students might format their code slightly differently. 52 00:02:11,250 --> 00:02:13,620 They might write slightly different comments. 53 00:02:13,620 --> 00:02:17,680 And so what tends to happen over time, as the programs exceed 54 00:02:17,680 --> 00:02:21,750 maybe 10, 20, 30 lines of code, is there enough variation? 55 00:02:21,750 --> 00:02:24,300 And indeed, unfortunately, what we often notice 56 00:02:24,300 --> 00:02:27,527 is not even necessarily that the code is identical, because as you know, 57 00:02:27,527 --> 00:02:29,610 that in and of itself might just be a coincidence. 58 00:02:29,610 --> 00:02:31,767 Especially, when nowadays we have 800 students, 59 00:02:31,767 --> 00:02:34,350 it is absolutely going to be the case that two students write, 60 00:02:34,350 --> 00:02:36,397 by chance, very similar code. 61 00:02:36,397 --> 00:02:38,730 But unfortunately, the kinds of things we tend to notice 62 00:02:38,730 --> 00:02:41,940 is when students have the same typographical errors, 63 00:02:41,940 --> 00:02:44,460 or they use precisely the same variable names, 64 00:02:44,460 --> 00:02:47,850 or they make precisely the same mistake in precisely the same location. 65 00:02:47,850 --> 00:02:50,160 And at that point, our instincts start to kick in 66 00:02:50,160 --> 00:02:52,830 and we look at code like this and start to realize, 67 00:02:52,830 --> 00:02:55,650 while this may have happened by chance, on scale 68 00:02:55,650 --> 00:02:58,200 the odds that had happened in this line and in this line 69 00:02:58,200 --> 00:03:00,420 and in this line between two students code is 70 00:03:00,420 --> 00:03:04,440 just more likely than not better explained by some deliberate act. 71 00:03:04,440 --> 00:03:07,950 BRIAN YU: So at Harvard at least, when there are cases of academic dishonesty, 72 00:03:07,950 --> 00:03:10,680 they're usually referred to some administrative body, which 73 00:03:10,680 --> 00:03:12,870 now is called the Honor Council here at Harvard. 74 00:03:12,870 --> 00:03:15,203 And I think you've pointed out and a couple other people 75 00:03:15,203 --> 00:03:19,200 have pointed out that CS50, though it is the largest course that the university, 76 00:03:19,200 --> 00:03:24,772 does refer far more people to the Honor Council like any other class on campus. 77 00:03:24,772 --> 00:03:27,480 Do you think that has to do with something about computer science 78 00:03:27,480 --> 00:03:28,980 or introduction to computer science? 79 00:03:28,980 --> 00:03:30,428 Or why do you think that might be? 80 00:03:30,428 --> 00:03:31,470 DAVID MALAN: No, I don't. 81 00:03:31,470 --> 00:03:34,170 And that's certainly an unfortunate distinction that we've long had, 82 00:03:34,170 --> 00:03:37,200 say for, one or two years where there are issues in other departments. 83 00:03:37,200 --> 00:03:40,080 No, I don't think that computer science students are any less honest 84 00:03:40,080 --> 00:03:41,820 than their classmates in other fields. 85 00:03:41,820 --> 00:03:44,670 I don't think students in CS50 or any less honest than students 86 00:03:44,670 --> 00:03:46,290 in other computer science courses. 87 00:03:46,290 --> 00:03:51,180 I think it really boils down to one, you and I and educators in computer science 88 00:03:51,180 --> 00:03:54,720 are perhaps somewhat uniquely positioned with tools-- 89 00:03:54,720 --> 00:03:57,110 with software tools via which to detect it. 90 00:03:57,110 --> 00:03:59,422 And in a large introductory course like CS50, 91 00:03:59,422 --> 00:04:02,130 I think it's important not only out of fairness to those students 92 00:04:02,130 --> 00:04:06,360 who are behaving honestly throughout the term, but also because one of our goals 93 00:04:06,360 --> 00:04:08,430 should be in this course, to teach students 94 00:04:08,430 --> 00:04:10,530 the ethical application of computer science. 95 00:04:10,530 --> 00:04:14,610 That we should be holding students to those same expectations as 96 00:04:14,610 --> 00:04:17,279 are prescribed in great detail in the courses syllabus. 97 00:04:17,279 --> 00:04:23,160 And so I think it's really a function of our one, looking for it. 98 00:04:23,160 --> 00:04:27,240 And to two, through on it that really ends up explaining the large numbers. 99 00:04:27,240 --> 00:04:30,240 BRIAN YU: Yeah, so I'm looking here at the data from past years in CS50, 100 00:04:30,240 --> 00:04:32,865 and it does seem that there's also a fair amount of fluctuation 101 00:04:32,865 --> 00:04:36,060 in terms of what percentage of students in the course end 102 00:04:36,060 --> 00:04:37,800 up being referred to the Honor Council. 103 00:04:37,800 --> 00:04:40,380 Like, in 2009 for example, it looks like nobody 104 00:04:40,380 --> 00:04:42,030 was referred to the Honor Council. 105 00:04:42,030 --> 00:04:46,680 And in other years like 2010, 2012, there's like 1% or 2% of students. 106 00:04:46,680 --> 00:04:51,112 But in other years like 2015, it's up to 5%, 2016 is up to 10%. 107 00:04:51,112 --> 00:04:53,070 What do you think accounts for that fluctuation 108 00:04:53,070 --> 00:04:55,903 because that's a pretty big difference between one year and another? 109 00:04:55,903 --> 00:04:58,900 DAVID MALAN: Yeah, there really has been as you say, from 0% to 10% 110 00:04:58,900 --> 00:04:59,910 depending on the year. 111 00:04:59,910 --> 00:05:01,620 I think it's a few things. 112 00:05:01,620 --> 00:05:03,870 Part of it I think is just a function of how much time 113 00:05:03,870 --> 00:05:06,450 I or we put into the process. 114 00:05:06,450 --> 00:05:13,042 I think the year in 2009 when there were 0%, I did look for worrisome instances 115 00:05:13,042 --> 00:05:15,750 at that particular year, but admittedly in retrospect, I probably 116 00:05:15,750 --> 00:05:17,875 spent less time that year than the subsequent year. 117 00:05:17,875 --> 00:05:20,010 Because the subsequent year it went up to 2%. 118 00:05:20,010 --> 00:05:21,930 With that said, it might have been by chance, 119 00:05:21,930 --> 00:05:24,720 just a group of students who exhibited this pattern of behavior 120 00:05:24,720 --> 00:05:26,590 with far less frequency than others. 121 00:05:26,590 --> 00:05:28,870 So I think that's certainly possible as well. 122 00:05:28,870 --> 00:05:33,090 But I think the uptick in more recent years for instance, 10% in 2016 123 00:05:33,090 --> 00:05:36,450 and roughly 4% or 5% then, which is where 124 00:05:36,450 --> 00:05:38,940 we've been rather in equilibrium the past few years, 125 00:05:38,940 --> 00:05:43,380 I think is also a function of just how much time we invest in it. 126 00:05:43,380 --> 00:05:45,570 So back in 2008, and for a few years there after, 127 00:05:45,570 --> 00:05:47,760 it was only me who is engaged in this process. 128 00:05:47,760 --> 00:05:49,230 I would run the software by myself. 129 00:05:49,230 --> 00:05:51,390 I would look at students submissions side by side. 130 00:05:51,390 --> 00:05:54,015 And I would ultimately decide which to refer forward 131 00:05:54,015 --> 00:05:55,140 to Harvard's Honor Council. 132 00:05:55,140 --> 00:05:57,390 And then ultimately, document all those cases. 133 00:05:57,390 --> 00:06:00,900 But in more recent years have we involved more of CS50s senior staff 134 00:06:00,900 --> 00:06:01,560 in the process. 135 00:06:01,560 --> 00:06:06,180 The upside of which is that we can now one, analyze the submissions roughly 136 00:06:06,180 --> 00:06:07,370 on a week to week basis. 137 00:06:07,370 --> 00:06:09,870 The upside of which is that we can provide the Honor Council 138 00:06:09,870 --> 00:06:11,800 with the tails far more quickly. 139 00:06:11,800 --> 00:06:14,160 Students themselves, while though, never a pleasant 140 00:06:14,160 --> 00:06:16,893 process at least no sooner rather than later, rather 141 00:06:16,893 --> 00:06:18,810 than getting to the entire end of the semester 142 00:06:18,810 --> 00:06:22,177 and then realizing just how many or how often they cross some line. 143 00:06:22,177 --> 00:06:24,510 But two, the fact that we have multiple human eyes on it 144 00:06:24,510 --> 00:06:27,360 means that we do allocate more time week to week 145 00:06:27,360 --> 00:06:31,230 on each of the individual submissions and the crossways comparisons thereof. 146 00:06:31,230 --> 00:06:33,360 The upside though of those multiple humans, 147 00:06:33,360 --> 00:06:37,110 we now have two or three of us who ultimately vote on whether or not 148 00:06:37,110 --> 00:06:40,160 a case should move forward to the Honor Council is that I at least, 149 00:06:40,160 --> 00:06:43,410 and hopefully all of us, have much more comfort in sending a case to the Honor 150 00:06:43,410 --> 00:06:46,410 Council because not one pair of eyes, but two or three 151 00:06:46,410 --> 00:06:50,160 have all adjudicated it to be a clear indication of a line 152 00:06:50,160 --> 00:06:51,150 having been crossed. 153 00:06:51,150 --> 00:06:52,770 BRIAN YU: Can you tell me a little more about that process? 154 00:06:52,770 --> 00:06:54,060 You've talked about now that there are now 155 00:06:54,060 --> 00:06:56,435 a couple of eyes that are all looking at the submissions, 156 00:06:56,435 --> 00:06:58,810 but you've also talked about software being involved too. 157 00:06:58,810 --> 00:07:01,470 So what is the interplay there between the role that software 158 00:07:01,470 --> 00:07:04,320 plays in trying to detect this sort of thing and the role 159 00:07:04,320 --> 00:07:06,750 that people play in trying to detect academic dishonesty? 160 00:07:06,750 --> 00:07:08,542 DAVID MALAN: Yeah, I should first emphasize 161 00:07:08,542 --> 00:07:10,500 that it is not software that is ultimately 162 00:07:10,500 --> 00:07:13,560 disciplining students or referring them to Harvard's Honor Council. 163 00:07:13,560 --> 00:07:16,650 It is rather just a tool that we use as a first pass. 164 00:07:16,650 --> 00:07:19,650 Given that we have some, nowadays, 800 students, each of whom 165 00:07:19,650 --> 00:07:23,080 are submitting 10 homework problems over the course of the semester. 166 00:07:23,080 --> 00:07:26,760 This is a big O of-- n squared problem times 10 or so. 167 00:07:26,760 --> 00:07:29,970 So it's a huge number of comparisons that need to be made, 168 00:07:29,970 --> 00:07:33,490 and it just wouldn't be practically done by hand or by eye alone. 169 00:07:33,490 --> 00:07:36,660 So what we do is run software that literally cross compares 170 00:07:36,660 --> 00:07:39,300 every submission against every other submission 171 00:07:39,300 --> 00:07:42,630 sometimes, within the current year or even, based on our archives, 172 00:07:42,630 --> 00:07:46,680 against recent prior years as well which explodes the problem even more. 173 00:07:46,680 --> 00:07:49,050 And what we get out of that software based process 174 00:07:49,050 --> 00:07:54,150 is a list from top to bottom of pairs of submissions that the software considers 175 00:07:54,150 --> 00:07:55,810 worrisome least similar. 176 00:07:55,810 --> 00:07:59,340 And then we, the humans, typically go through the top 50 or the top 100 177 00:07:59,340 --> 00:08:02,820 matches on that list and use our human eyes and our own experience 178 00:08:02,820 --> 00:08:05,430 and our instincts to decide, ah, this just happened by chance 179 00:08:05,430 --> 00:08:08,222 or, oh, as you said, this is a relatively short program like Hello, 180 00:08:08,222 --> 00:08:09,180 World or Mario. 181 00:08:09,180 --> 00:08:11,910 This is just bound to happen at that point in the semester. 182 00:08:11,910 --> 00:08:14,890 But certainly as the problems get more sophisticated 183 00:08:14,890 --> 00:08:19,230 and the code gets longer is it more clear to multiple humans that, hmm, 184 00:08:19,230 --> 00:08:21,990 looks like something's awry here, especially when it is again, 185 00:08:21,990 --> 00:08:24,810 the same variable names or the same comments or worse, 186 00:08:24,810 --> 00:08:28,350 the same comments with typographical or grammatical errors 187 00:08:28,350 --> 00:08:30,960 in exactly the same place, odds are that's much more 188 00:08:30,960 --> 00:08:34,080 likely to indicate copy paste than it is two students independently 189 00:08:34,080 --> 00:08:36,780 in their own rooms, on their own laptops literally writing 190 00:08:36,780 --> 00:08:38,293 in the same place the same errors. 191 00:08:38,293 --> 00:08:39,210 BRIAN YU: Makes sense. 192 00:08:39,210 --> 00:08:42,539 And it's also interesting that depending on the type of software 193 00:08:42,539 --> 00:08:46,410 that you use, in the same way that a compiler can take a C program 194 00:08:46,410 --> 00:08:48,550 and figure out what is the structure of the program 195 00:08:48,550 --> 00:08:51,020 and compare the structure of a program to another, 196 00:08:51,020 --> 00:08:54,640 that these sorts of comparison programs can do the same thing. 197 00:08:54,640 --> 00:08:56,940 They can take two pieces of code, and even 198 00:08:56,940 --> 00:08:59,700 if they might use slightly different variable names, 199 00:08:59,700 --> 00:09:02,280 can still look at the structure of the program as a whole 200 00:09:02,280 --> 00:09:04,350 and try and compare them against each other 201 00:09:04,350 --> 00:09:06,540 to do some more sophisticated comparisons. 202 00:09:06,540 --> 00:09:09,270 DAVID MALAN: Yeah, and thanks to some of CS50s team members, Chad 203 00:09:09,270 --> 00:09:12,030 and [? Yella ?] and Kareem, we now have our own tools, Compare50, 204 00:09:12,030 --> 00:09:13,620 which automates this process for us. 205 00:09:13,620 --> 00:09:15,780 And you can perhaps, given your experience in the space, 206 00:09:15,780 --> 00:09:18,270 speak a little more perhaps to the algorithmics underneath the hood? 207 00:09:18,270 --> 00:09:19,980 BRIAN YU: Yeah, it is really Chad and [? Yella ?] 208 00:09:19,980 --> 00:09:21,810 and Kareem that were doing a lot of the work there. 209 00:09:21,810 --> 00:09:24,227 But algorithmically, it's sort of an interesting challenge 210 00:09:24,227 --> 00:09:26,460 to figure out how to do these sorts of comparisons. 211 00:09:26,460 --> 00:09:29,205 Because even though it might seem like a computer 212 00:09:29,205 --> 00:09:31,080 is obviously going to be able to do it faster 213 00:09:31,080 --> 00:09:32,740 than people are going to be able to do it, 214 00:09:32,740 --> 00:09:34,615 it's still a lot of work even for a computer. 215 00:09:34,615 --> 00:09:38,010 Especially, if you consider like 800 students in the class being compared 216 00:09:38,010 --> 00:09:41,350 against all of the other students, plus all of the students who have ever taken 217 00:09:41,350 --> 00:09:44,570 CS50 before, not only for one problem, but for all of the problems 218 00:09:44,570 --> 00:09:45,380 in the course. 219 00:09:45,380 --> 00:09:47,640 That's a lot of work for any computer to do. 220 00:09:47,640 --> 00:09:50,850 And so there is a lot of interesting algorithmic efficiencies 221 00:09:50,850 --> 00:09:53,070 that have been put into the software in order 222 00:09:53,070 --> 00:09:54,570 to make it work a little bit better. 223 00:09:54,570 --> 00:09:57,630 Trying to take advantage of things you actually learn about in CS50. 224 00:09:57,630 --> 00:10:01,173 Things like hashing in order to store data inside of a hash table 225 00:10:01,173 --> 00:10:03,090 so you can very quickly look up whether or not 226 00:10:03,090 --> 00:10:06,472 you've seen a particular pattern of characters in a file before. 227 00:10:06,472 --> 00:10:09,680 Those sort of data structures all come into play if you start to think about, 228 00:10:09,680 --> 00:10:12,405 how do you try and solve this problem in a way that's efficient? 229 00:10:12,405 --> 00:10:13,430 DAVID MALAN: Yeah. 230 00:10:13,430 --> 00:10:16,880 And besides software, certainly our own policies have evolved over time. 231 00:10:16,880 --> 00:10:19,790 So you know for instance, that in a few weeks time, 232 00:10:19,790 --> 00:10:22,910 we'll be presenting at a computer science education conference called 233 00:10:22,910 --> 00:10:25,640 CSEIT a recent paper that a few of us worked on 234 00:10:25,640 --> 00:10:28,550 based on our experience with issues of academic dishonesty 235 00:10:28,550 --> 00:10:29,850 over the past few years. 236 00:10:29,850 --> 00:10:32,390 And it's perhaps worth noting that software aside, 237 00:10:32,390 --> 00:10:37,010 I think one of the more noteworthy policy changes 238 00:10:37,010 --> 00:10:40,100 we introduced some years ago was CS50s so-called Regret Clause. 239 00:10:40,100 --> 00:10:42,680 Which was just a single sentence that we added to the courses 240 00:10:42,680 --> 00:10:45,890 syllabus that encourage students to come forward 241 00:10:45,890 --> 00:10:49,130 if within 72 hours of submitting some work, 242 00:10:49,130 --> 00:10:51,710 they realized that, oh, they had indeed crossed some line. 243 00:10:51,710 --> 00:10:54,530 They had copied unduly from some resource online. 244 00:10:54,530 --> 00:10:57,140 They had copied some portion of code from a classmate 245 00:10:57,140 --> 00:10:59,060 or otherwise, somehow other across the line 246 00:10:59,060 --> 00:11:02,157 that was prescribed in the course of syllabus as being not reasonable. 247 00:11:02,157 --> 00:11:04,490 And what we committed to doing in writing in the courses 248 00:11:04,490 --> 00:11:06,527 syllabus was there would still be penalty 249 00:11:06,527 --> 00:11:08,360 and there would still be consequence, but it 250 00:11:08,360 --> 00:11:12,692 would be limited for instance, to our zeroing the problem or the problem 251 00:11:12,692 --> 00:11:14,150 set that the student had submitted. 252 00:11:14,150 --> 00:11:18,020 And we committed not to escalating the matter to Harvard's Honor Council. 253 00:11:18,020 --> 00:11:21,050 The hope was that we could actually turn what had historically 254 00:11:21,050 --> 00:11:24,860 been purely punitive processes whereby we detect some transgression, 255 00:11:24,860 --> 00:11:27,170 we refer it to the Honor Council, and there 256 00:11:27,170 --> 00:11:31,550 after the student is penalized in some way, the most extreme outcome of which 257 00:11:31,550 --> 00:11:35,270 might actually be required time off from Harvard University itself. 258 00:11:35,270 --> 00:11:37,310 We wanted to create a window of opportunity 259 00:11:37,310 --> 00:11:40,790 where students after some sleep, some thought, some reflection, 260 00:11:40,790 --> 00:11:42,170 can actually own up to a mistake. 261 00:11:42,170 --> 00:11:44,300 Because for so many years, so many of our cases 262 00:11:44,300 --> 00:11:48,350 were truly involving students who at 2:00 AM 3:00 AM 4:00 AM are 263 00:11:48,350 --> 00:11:52,370 under very little sleep, under significant amount of stress, 264 00:11:52,370 --> 00:11:55,700 and with a deadline not only in CS50, but perhaps some other course looming, 265 00:11:55,700 --> 00:11:59,270 made some poor decision to take the quick way out 266 00:11:59,270 --> 00:12:02,620 to just copy and paste someone else's work and submit it on their own. 267 00:12:02,620 --> 00:12:06,470 And even if they've decided or realized a day or two later, wow, 268 00:12:06,470 --> 00:12:08,270 really didn't mean to do that. 269 00:12:08,270 --> 00:12:11,530 Really shouldn't have done that, we had never described 270 00:12:11,530 --> 00:12:14,090 a well-documented process for how they should handle that 271 00:12:14,090 --> 00:12:15,340 and how they could own up. 272 00:12:15,340 --> 00:12:18,800 And so this Regret Clause was meant to help ideally chip away 273 00:12:18,800 --> 00:12:20,910 at the total number of cases we were seeing. 274 00:12:20,910 --> 00:12:23,345 But ultimately, help students meet us halfway 275 00:12:23,345 --> 00:12:25,220 so that it becomes more of a teachable moment 276 00:12:25,220 --> 00:12:27,200 if you will and not just punitive. 277 00:12:27,200 --> 00:12:30,973 BRIAN YU: So I remember when I first took CS50 in fall 2015 it was, 278 00:12:30,973 --> 00:12:33,140 I remember seeing the Regret Clause in the syllabus. 279 00:12:33,140 --> 00:12:34,535 And I remember being a little surprised. 280 00:12:34,535 --> 00:12:36,560 Because it wasn't something I had seen before. 281 00:12:36,560 --> 00:12:39,050 It's not something that many other classes do. 282 00:12:39,050 --> 00:12:41,750 Not really anything that I was familiar with. 283 00:12:41,750 --> 00:12:44,360 So I'm curious about where the policy came from? 284 00:12:44,360 --> 00:12:46,190 Was it inspired by any other policy? 285 00:12:46,190 --> 00:12:49,140 Or where did you start to find your way to this idea? 286 00:12:49,140 --> 00:12:52,012 And what was the process like for bringing this into the course? 287 00:12:52,012 --> 00:12:53,720 DAVID MALAN: Yeah, it was really inspired 288 00:12:53,720 --> 00:12:58,520 by having, for almost 10 years, watched the number of cases 289 00:12:58,520 --> 00:13:03,300 come through CS50 and watching the circumstances that ultimately explain 290 00:13:03,300 --> 00:13:03,800 them. 291 00:13:03,800 --> 00:13:07,220 Again, these late night poor decisions under a great stress. 292 00:13:07,220 --> 00:13:10,340 And it just felt like we, the teachers of the course, should be doing 293 00:13:10,340 --> 00:13:14,360 or could be doing a more proactive job at trying to tackle this problem. 294 00:13:14,360 --> 00:13:18,050 And not just looking to detect it, but looking to teach students how to one, 295 00:13:18,050 --> 00:13:19,680 ideally avoid it altogether. 296 00:13:19,680 --> 00:13:25,100 But two, even if they do cross some line how to address the situation then. 297 00:13:25,100 --> 00:13:28,280 And yet, it was not with great ease that we rolled this out. 298 00:13:28,280 --> 00:13:31,760 There were absolutely some sensitivities on campus among administrators, 299 00:13:31,760 --> 00:13:36,770 among the universities Honor Council, who had long standing processes when 300 00:13:36,770 --> 00:13:39,800 it came to issues of academic dishonesty, not only for CS50, 301 00:13:39,800 --> 00:13:41,060 but all courses at Harvard. 302 00:13:41,060 --> 00:13:44,420 The upside of course, is that by having a central body, Harvard's Honor 303 00:13:44,420 --> 00:13:48,230 Council, adjudicate all of these cases, you have uniform processes. 304 00:13:48,230 --> 00:13:50,930 You hopefully have more equitable outcomes overall. 305 00:13:50,930 --> 00:13:53,240 And there was great concern initially in some circles 306 00:13:53,240 --> 00:13:56,480 that we were now doing something more on our own internally. 307 00:13:56,480 --> 00:14:00,380 And so it only debuted after quite a few conversations with Harvard's Honor 308 00:14:00,380 --> 00:14:02,690 Council and administration so that we can ultimately 309 00:14:02,690 --> 00:14:05,570 get folks comfortable with what, at the time, was an experiment, 310 00:14:05,570 --> 00:14:08,622 but now is an ongoing six year policy for us at least. 311 00:14:08,622 --> 00:14:10,580 BRIAN YU: All right so now six years in, policy 312 00:14:10,580 --> 00:14:11,720 has been around for a little while. 313 00:14:11,720 --> 00:14:14,240 Do you feel like it's done what you expected it to do? 314 00:14:14,240 --> 00:14:17,330 How does it compare to what your original goals and objectives were 315 00:14:17,330 --> 00:14:19,970 for what the policy would do for the class and for students? 316 00:14:19,970 --> 00:14:22,340 DAVID MALAN: Yeah, so we hoped that it would actually 317 00:14:22,340 --> 00:14:24,440 chip away at the total number of cases that we 318 00:14:24,440 --> 00:14:26,270 were referring to Harvard's Honor Council, 319 00:14:26,270 --> 00:14:28,700 but it did not in fact, do that. 320 00:14:28,700 --> 00:14:31,520 Interestingly enough, the number of cases 321 00:14:31,520 --> 00:14:34,580 we have referred to the Honor Council since have been roughly the same 322 00:14:34,580 --> 00:14:40,040 or even higher in some years than prior to the Regret Clauses introduction. 323 00:14:40,040 --> 00:14:42,800 We had the wonderfully successfully and nontrivial number 324 00:14:42,800 --> 00:14:45,290 of students avail themselves of this clause. 325 00:14:45,290 --> 00:14:48,080 Most years so in the court clauses first year, 326 00:14:48,080 --> 00:14:51,710 2014, we had 19 students come forward under this clause, 327 00:14:51,710 --> 00:14:55,310 reach out to me in the courses hedge, generally by way of an email first. 328 00:14:55,310 --> 00:14:58,220 After which we would then schedule time to chat with me. 329 00:14:58,220 --> 00:15:00,530 And I would chat with these 19 students one on one 330 00:15:00,530 --> 00:15:03,860 and better understand what had happened and what had they done. 331 00:15:03,860 --> 00:15:06,140 Better understand what circumstances had led 332 00:15:06,140 --> 00:15:09,620 to them having made whatever decision it was we were then discussing. 333 00:15:09,620 --> 00:15:12,560 And then ultimately, explicitly tell them, all right, 334 00:15:12,560 --> 00:15:14,090 let's consider the matter behind us. 335 00:15:14,090 --> 00:15:16,160 After zeroing the particular work in question 336 00:15:16,160 --> 00:15:18,820 to reassure them that this was indeed the end of that process. 337 00:15:18,820 --> 00:15:21,820 But the beginning, hopefully, of a healthier approach to future problems 338 00:15:21,820 --> 00:15:22,520 sets. 339 00:15:22,520 --> 00:15:25,220 And we would then encourage them to-- 340 00:15:25,220 --> 00:15:29,150 and discuss with them ways for better managed managing their time, 341 00:15:29,150 --> 00:15:30,612 better managing their stress. 342 00:15:30,612 --> 00:15:32,570 In some cases, too, it came to light that there 343 00:15:32,570 --> 00:15:34,210 were extenuating circumstances. 344 00:15:34,210 --> 00:15:36,770 Students struggling with issues at home, with their family, 345 00:15:36,770 --> 00:15:40,080 with relationships, with other courses, issues of mental health. 346 00:15:40,080 --> 00:15:43,700 And so what was a pleasant revelation to us 347 00:15:43,700 --> 00:15:47,570 was that we were able more proactively than had been possible in the past 348 00:15:47,570 --> 00:15:49,860 to connect students with support resources on campus, 349 00:15:49,860 --> 00:15:51,830 whether academic in the case of tutoring, 350 00:15:51,830 --> 00:15:54,560 or perhaps health in the way of mental health. 351 00:15:54,560 --> 00:15:57,380 So that too seemed to be a positive outcome and the experience 352 00:15:57,380 --> 00:16:00,860 that we were able to connect up to 19 students 353 00:16:00,860 --> 00:16:03,360 that first year with other resources on campus. 354 00:16:03,360 --> 00:16:04,610 And there after it fluctuated. 355 00:16:04,610 --> 00:16:06,860 In 2015, we had 26 students. 356 00:16:06,860 --> 00:16:09,080 In 2016, we had seven students. 357 00:16:09,080 --> 00:16:11,990 Then it went back up in 2017 to 18 students. 358 00:16:11,990 --> 00:16:14,240 And I think this variation is partly just a function 359 00:16:14,240 --> 00:16:16,230 of messaging on our part, on my part. 360 00:16:16,230 --> 00:16:18,710 How much time we spend in lectures and in emails 361 00:16:18,710 --> 00:16:22,160 during the semester reminding students of the policy's availability. 362 00:16:22,160 --> 00:16:25,630 I also suspect that there's some ebb and flow based on the current-- 363 00:16:25,630 --> 00:16:26,570 the given year. 364 00:16:26,570 --> 00:16:31,010 If more students in this class know that a student in the previous year 365 00:16:31,010 --> 00:16:34,650 might have invoked this clause there just might be broader awareness of it. 366 00:16:34,650 --> 00:16:39,450 But it's been a good number of students, I think every semester. 367 00:16:39,450 --> 00:16:42,650 However, the fact that we didn't see a downturn in the number of cases 368 00:16:42,650 --> 00:16:44,520 we referred too was also a surprise. 369 00:16:44,520 --> 00:16:48,140 In fact, in the first year of the Regret Clauses existence, 370 00:16:48,140 --> 00:16:51,500 it turned out that most, if not all of the students 371 00:16:51,500 --> 00:16:55,010 that invoke the Regret Clause did not even appear on our radar 372 00:16:55,010 --> 00:16:57,822 when we ran our software based cross comparisons of their work. 373 00:16:57,822 --> 00:16:59,780 Which suggested that had they not come forward, 374 00:16:59,780 --> 00:17:02,960 we actually would not have noticed and they would not 375 00:17:02,960 --> 00:17:05,720 have been connected ideally with these resources. 376 00:17:05,720 --> 00:17:09,079 And so that too was a bit of a surprise. 377 00:17:09,079 --> 00:17:11,150 These students invoking the Regret Clause 378 00:17:11,150 --> 00:17:15,020 dare say composed a different demographic of students 379 00:17:15,020 --> 00:17:16,790 that we hadn't yet previously identified. 380 00:17:16,790 --> 00:17:19,670 Students who had indeed crossed some lines in many cases, 381 00:17:19,670 --> 00:17:23,780 but that had not been connected with or been 382 00:17:23,780 --> 00:17:27,589 offered some teachable moment that might actually help them course correct. 383 00:17:27,589 --> 00:17:30,720 And I should note too, that of the 19 students, 26 students, and so forth, 384 00:17:30,720 --> 00:17:32,762 not all of them it had indeed crossed some lines. 385 00:17:32,762 --> 00:17:35,623 In several cases each year, were students unnecessarily worried. 386 00:17:35,623 --> 00:17:38,540 And so I would simply reassure them and thank them for coming forward, 387 00:17:38,540 --> 00:17:41,520 but not to worry, you've navigated the waters properly. 388 00:17:41,520 --> 00:17:43,520 BRIAN YU: Yeah, it's really interesting that now 389 00:17:43,520 --> 00:17:45,270 by reaching this other demographic, you've 390 00:17:45,270 --> 00:17:48,770 been able to have these sorts of chats that otherwise may not 391 00:17:48,770 --> 00:17:51,770 have been able to happen and connect them with other kinds of resources. 392 00:17:51,770 --> 00:17:54,410 I'm curious as to what are the kinds of advice you 393 00:17:54,410 --> 00:17:58,340 give to students that find difficulty with time management and stress? 394 00:17:58,340 --> 00:18:00,800 Because I think this is not a unique problem to CS50 395 00:18:00,800 --> 00:18:03,740 that and other computer science classes are just in school in general 396 00:18:03,740 --> 00:18:05,100 or even outside of school. 397 00:18:05,100 --> 00:18:08,760 Like, time management, stress, managing these things and making good decisions 398 00:18:08,760 --> 00:18:09,420 is-- 399 00:18:09,420 --> 00:18:10,160 it's challenging. 400 00:18:10,160 --> 00:18:13,260 And something that I'm sure many students and other people face. 401 00:18:13,260 --> 00:18:14,510 DAVID MALAN: Yeah, absolutely. 402 00:18:14,510 --> 00:18:16,880 To be honest, it's fairly straightforward things. 403 00:18:16,880 --> 00:18:20,270 It's things that we even put in the courses syllabus or FAQs often. 404 00:18:20,270 --> 00:18:22,850 For instance, in a programming class like ours, start early. 405 00:18:22,850 --> 00:18:26,750 You have nearly seven days from start to finish for each programming assignment. 406 00:18:26,750 --> 00:18:29,210 And the key to avoiding a lot of the stress 407 00:18:29,210 --> 00:18:32,180 is to just start early, so that when you do invariably hit a wall 408 00:18:32,180 --> 00:18:35,867 or encounter some bug that you just can't quite see, you can go to sleep, 409 00:18:35,867 --> 00:18:37,700 you can go for a run, you can take a shower. 410 00:18:37,700 --> 00:18:39,920 You can take a break from it and come back to it 411 00:18:39,920 --> 00:18:43,220 some hours or even a couple of days later and have that perspective. 412 00:18:43,220 --> 00:18:47,790 I mean even I found in the real world that I do not produce good code when I, 413 00:18:47,790 --> 00:18:48,920 myself am under stress. 414 00:18:48,920 --> 00:18:50,240 It's no fun. 415 00:18:50,240 --> 00:18:51,890 It doesn't yield correct results. 416 00:18:51,890 --> 00:18:56,840 And so really helping students realize that, it is a relatively simple fix. 417 00:18:56,840 --> 00:19:01,340 They just really need to take charge and commit themselves to that. 418 00:19:01,340 --> 00:19:04,310 Besides that, it's often a matter of referring students and reminding 419 00:19:04,310 --> 00:19:07,370 them of the many resources that the course offers on campus, whether it's 420 00:19:07,370 --> 00:19:11,780 the courses lectures, or sections, or office hours, or notes or tutorials, 421 00:19:11,780 --> 00:19:14,443 or any number of online and in-person resources. 422 00:19:14,443 --> 00:19:17,360 And just reminding themselves that you need to meet the course halfway 423 00:19:17,360 --> 00:19:19,047 and take advantage of these resources. 424 00:19:19,047 --> 00:19:20,880 And it's no surprise that you are struggling 425 00:19:20,880 --> 00:19:24,350 if you're not availing yourself of at least some of these resources. 426 00:19:24,350 --> 00:19:25,910 BRIAN YU: Yeah, actually it's always incredible to me 427 00:19:25,910 --> 00:19:28,910 when on our problems at forums, we always ask students like, on what day 428 00:19:28,910 --> 00:19:30,217 did you start the problems set? 429 00:19:30,217 --> 00:19:33,050 And so many students respond like the day of the deadline or the day 430 00:19:33,050 --> 00:19:36,350 before the deadline for a project that we wrote with the expectation 431 00:19:36,350 --> 00:19:38,810 that it will take students a week to complete it. 432 00:19:38,810 --> 00:19:41,600 And students are trying to do it like day of or day before. 433 00:19:41,600 --> 00:19:45,160 It always amazes me the number of cases where that ends up happening. 434 00:19:45,160 --> 00:19:48,410 DAVID MALAN: Yeah, so I think the more we can send that message even before we 435 00:19:48,410 --> 00:19:50,720 get to the point of a student having regret 436 00:19:50,720 --> 00:19:53,450 clause this conversation, the better. 437 00:19:53,450 --> 00:19:57,020 I should note though too, that another surprise effect of the regret clause 438 00:19:57,020 --> 00:20:02,130 was not even that we-- or the number of cases we referred didn't go down, 439 00:20:02,130 --> 00:20:05,090 but rather at least in at least one year they went significantly up. 440 00:20:05,090 --> 00:20:10,230 In 2016, and as you noted, is when we had 10% of the courses student body. 441 00:20:10,230 --> 00:20:14,020 So this is 10% of the students taking CS50 referred to the courses-- 442 00:20:14,020 --> 00:20:15,950 to the university's Honor Council. 443 00:20:15,950 --> 00:20:17,670 But to be honest that too was in part. 444 00:20:17,670 --> 00:20:20,990 And I think our numbers since have been partly a reflection of our feeling 445 00:20:20,990 --> 00:20:24,530 that when we do detect what appears to be a straightforward 446 00:20:24,530 --> 00:20:28,880 case of academic dishonesty, plagiarism of some sort, duplication of code, 447 00:20:28,880 --> 00:20:33,020 these days, I think I personally am even more comfortable referring the case 448 00:20:33,020 --> 00:20:37,010 than I was in years past because we have given students an opportunity 449 00:20:37,010 --> 00:20:38,740 to meet us halfway and reach out. 450 00:20:38,740 --> 00:20:41,030 And indeed, as you know, in every one of the courses 451 00:20:41,030 --> 00:20:44,570 problem sets this year on the form via which they submitted their work, 452 00:20:44,570 --> 00:20:48,200 we asked them to check a checkbox to acknowledge their understanding 453 00:20:48,200 --> 00:20:50,010 of the clauses availability. 454 00:20:50,010 --> 00:20:52,670 And so at that point, if we are not only reminding students 455 00:20:52,670 --> 00:20:55,160 each week that it's available and they are not thereafter 456 00:20:55,160 --> 00:20:57,980 taking advantage of it, it seems quite reasonable, 457 00:20:57,980 --> 00:21:00,230 I think, for the course to move forward with the more 458 00:21:00,230 --> 00:21:02,510 traditional punitive process involving the Honor 459 00:21:02,510 --> 00:21:06,363 Council to investigate whether indeed the line had been crossed. 460 00:21:06,363 --> 00:21:07,280 BRIAN YU: I'm curious. 461 00:21:07,280 --> 00:21:09,710 So we often talk now about like the line being crossed 462 00:21:09,710 --> 00:21:11,400 and what it means to cross the line. 463 00:21:11,400 --> 00:21:16,250 I'm curious about how you see this in the context of programming assignments 464 00:21:16,250 --> 00:21:18,500 in particular. like if you're writing an essay 465 00:21:18,500 --> 00:21:22,250 and you copy a sentence, that seems like very clearly copying. 466 00:21:22,250 --> 00:21:24,980 But in the case of code if you copy a line of code 467 00:21:24,980 --> 00:21:27,890 you see from Stack Overflow for example, if you're looking up like, 468 00:21:27,890 --> 00:21:31,015 how do I solve this particular problem, and you incorporate a line of code, 469 00:21:31,015 --> 00:21:32,960 that that might not be crossing a line. 470 00:21:32,960 --> 00:21:36,660 So how do you think about where the line is in the context of a programming 471 00:21:36,660 --> 00:21:37,160 assignment? 472 00:21:37,160 --> 00:21:39,110 And how to teach that kind of thing? 473 00:21:39,110 --> 00:21:40,430 DAVID MALAN: Yeah, it's a really good question. 474 00:21:40,430 --> 00:21:42,222 And it's a common question, because I think 475 00:21:42,222 --> 00:21:45,620 there's a perception among folks both in the software 476 00:21:45,620 --> 00:21:49,850 world and non-software world that this notion of academic dishonesty 477 00:21:49,850 --> 00:21:52,370 in a programming class itself is incompatible with the idea 478 00:21:52,370 --> 00:21:53,000 of programming. 479 00:21:53,000 --> 00:21:55,190 And I do very much disagree with that. 480 00:21:55,190 --> 00:21:57,920 The lines that we prescribe to students, both in broad strokes 481 00:21:57,920 --> 00:22:01,550 and in very precise bullets in the courses syllabus, essentially 482 00:22:01,550 --> 00:22:04,268 try to teach students to be reasonable so to speak. 483 00:22:04,268 --> 00:22:05,310 And what might that mean? 484 00:22:05,310 --> 00:22:07,393 Well, early in the semester in CS50, we of course, 485 00:22:07,393 --> 00:22:11,360 have students in C, and later in Python implement Mario's Pyramid. 486 00:22:11,360 --> 00:22:15,570 So a sort of pyramid-like structure just using some ASCII art to paint that 487 00:22:15,570 --> 00:22:16,070 picture. 488 00:22:16,070 --> 00:22:18,440 And it involves ultimately like a couple of for loops. 489 00:22:18,440 --> 00:22:21,380 It would be unreasonable for students to go off and Google or look 490 00:22:21,380 --> 00:22:24,770 on Stack Overflow for something like, how print Mario's Pyramid. 491 00:22:24,770 --> 00:22:27,500 That would be a search for the outright solution to the problem. 492 00:22:27,500 --> 00:22:30,350 And surely it is not our intent to assess you 493 00:22:30,350 --> 00:22:32,750 on your ability to Google a solution like that as opposed 494 00:22:32,750 --> 00:22:34,310 to crafting it yourself. 495 00:22:34,310 --> 00:22:37,400 However, it would be very reasonable for instance, to Google something 496 00:22:37,400 --> 00:22:43,670 like, how write nested for loops in C. Or how print spaces in C. 497 00:22:43,670 --> 00:22:47,870 Because it's actually not obvious to students one, how you can actually 498 00:22:47,870 --> 00:22:50,402 have two loops and one nested inside of the other 499 00:22:50,402 --> 00:22:51,860 using different counting variables. 500 00:22:51,860 --> 00:22:54,900 And two, how to print would appear to be blank spaces on the screen, 501 00:22:54,900 --> 00:22:57,420 not quite appreciating that it's actually just the SPACEBAR. 502 00:22:57,420 --> 00:22:59,253 So I think it's very reasonable for students 503 00:22:59,253 --> 00:23:02,480 and it is allowed in the course syllabus to look for short snippets 504 00:23:02,480 --> 00:23:03,680 so to speak of code. 505 00:23:03,680 --> 00:23:06,170 Where a snippet itself is one line, few lines, 506 00:23:06,170 --> 00:23:08,510 but it is not the essence of the problem. 507 00:23:08,510 --> 00:23:10,640 And so indeed when we do find that students 508 00:23:10,640 --> 00:23:12,920 have crossed the line, what has happened is 509 00:23:12,920 --> 00:23:15,000 we notice some curiosity about their code. 510 00:23:15,000 --> 00:23:17,660 It's maybe very similar to another student's code 511 00:23:17,660 --> 00:23:19,760 or it suggests a technique that we haven't 512 00:23:19,760 --> 00:23:23,210 taught in the class or some syntax that's not consistent with what we 513 00:23:23,210 --> 00:23:25,170 know students have seen in the class. 514 00:23:25,170 --> 00:23:29,030 And so we ourselves might Google certain key phrases or portions of code 515 00:23:29,030 --> 00:23:30,740 or comments that we see in their code. 516 00:23:30,740 --> 00:23:34,610 And sure enough, it too often leads us to the very same GitHub repository 517 00:23:34,610 --> 00:23:39,020 or Reddit post, wherein someone else has posted exactly that same code 518 00:23:39,020 --> 00:23:40,580 that the student has copy pasted. 519 00:23:40,580 --> 00:23:42,980 And so there too, the kinds of cases we are referring 520 00:23:42,980 --> 00:23:44,990 are not the many, many, many students code 521 00:23:44,990 --> 00:23:47,785 who very reasonably use these kinds of digital resources. 522 00:23:47,785 --> 00:23:49,910 But the ones who use these resources, and then take 523 00:23:49,910 --> 00:23:53,930 shortcuts to submission as by just copying and pasting many lines of code 524 00:23:53,930 --> 00:23:55,380 that they see. 525 00:23:55,380 --> 00:23:57,407 BRIAN YU: So other than the Regret Clause now, 526 00:23:57,407 --> 00:23:59,240 which we've talked about for a little while, 527 00:23:59,240 --> 00:24:01,550 have there been any other things you've thought about doing 528 00:24:01,550 --> 00:24:03,467 or things you have done to the course in terms 529 00:24:03,467 --> 00:24:07,490 of thinking about how to either address academic dishonesty when it happens 530 00:24:07,490 --> 00:24:09,225 or to try to prevent it beforehand? 531 00:24:09,225 --> 00:24:10,100 DAVID MALAN: We have. 532 00:24:10,100 --> 00:24:13,852 So couple of years ago, we introduced the courses Brink Clauses, so to speak. 533 00:24:13,852 --> 00:24:17,060 Which was a couple of sentences inspired by a colleague of ours at Princeton, 534 00:24:17,060 --> 00:24:20,780 Chris Moretti, who gave us some really inspiring language that 535 00:24:20,780 --> 00:24:26,900 encouraged students in the courses syllabus to write us late at night 536 00:24:26,900 --> 00:24:30,320 just as they felt themselves being on the brink of making a poor decision. 537 00:24:30,320 --> 00:24:33,080 That is to say, even when you and I and most of the courses staff 538 00:24:33,080 --> 00:24:36,710 might be asleep and a student might be working late at night on their work, 539 00:24:36,710 --> 00:24:38,720 it would be reasonable to assume that they 540 00:24:38,720 --> 00:24:44,270 could get a response to a request for an extension for instance. 541 00:24:44,270 --> 00:24:46,160 And so with this brink clause prescribed was 542 00:24:46,160 --> 00:24:49,100 a mechanism for students to send that note to say, listen, 543 00:24:49,100 --> 00:24:50,970 I really feel like I'm in a bad place. 544 00:24:50,970 --> 00:24:55,640 And I worry I'm about to make a poor decision as by copying and pasting 545 00:24:55,640 --> 00:24:57,235 too many lines of code online. 546 00:24:57,235 --> 00:24:59,360 I'd like to discuss this tomorrow and indeed that's 547 00:24:59,360 --> 00:25:02,120 what the syllabus asked them to do. 548 00:25:02,120 --> 00:25:03,752 Go to sleep, don't submit your work. 549 00:25:03,752 --> 00:25:05,210 We'll figure it out in the morning. 550 00:25:05,210 --> 00:25:08,690 And just writing students to write us and meet us halfway 551 00:25:08,690 --> 00:25:11,420 under that sort of duress was the intent of the clause. 552 00:25:11,420 --> 00:25:15,260 Unfortunately, when it was invoked some number of times 553 00:25:15,260 --> 00:25:18,050 that first year, based on the wording of the emails, 554 00:25:18,050 --> 00:25:20,160 based on the conversations we had with students, 555 00:25:20,160 --> 00:25:25,070 it really devolved into a backdoor to just extensions. 556 00:25:25,070 --> 00:25:27,455 We did not believe, ironically, that most 557 00:25:27,455 --> 00:25:29,330 of the students who were invoking this clause 558 00:25:29,330 --> 00:25:33,320 were actually on the brink of doing something academic dishonest. 559 00:25:33,320 --> 00:25:36,050 They were simply on the brink of not meeting the deadline. 560 00:25:36,050 --> 00:25:39,970 And so we ended up removing the clause from the courses syllabus, . 561 00:25:39,970 --> 00:25:40,970 Ultimately 562 00:25:40,970 --> 00:25:45,920 But I'm glad we did try it, but this was one example of a measure that, at least 563 00:25:45,920 --> 00:25:48,830 for us, in our context, in our implementation failed. 564 00:25:48,830 --> 00:25:52,850 But I do think more compelling has been what we introduced a few years ago 565 00:25:52,850 --> 00:25:55,880 in the spirit of the Regret Clause, but whereby we actually 566 00:25:55,880 --> 00:25:57,480 initiate the conversations. 567 00:25:57,480 --> 00:25:59,993 So it's not infrequently been the case that when 568 00:25:59,993 --> 00:26:02,660 we've crossed compared so many students submissions that there's 569 00:26:02,660 --> 00:26:04,940 a few cases that seem a little worrisome, 570 00:26:04,940 --> 00:26:07,250 but it definitely doesn't seem like it's over the line. 571 00:26:07,250 --> 00:26:10,663 We certainly wouldn't refer them to the Honor Council on that basis. 572 00:26:10,663 --> 00:26:13,580 But we realized that this then would is an opportunity for us to maybe 573 00:26:13,580 --> 00:26:17,480 go chat with those students now and say, hey, listen, you appeared on our radar. 574 00:26:17,480 --> 00:26:20,160 We think it's because of the similarities between your code 575 00:26:20,160 --> 00:26:21,410 and maybe some other students. 576 00:26:21,410 --> 00:26:24,590 And we would leave the other student anonymously out of it. 577 00:26:24,590 --> 00:26:27,920 But we would then ask the student, how did you get your code to this point? 578 00:26:27,920 --> 00:26:30,230 Walk us through the process and let's figure out 579 00:26:30,230 --> 00:26:34,550 how you came so close to what we worried was crossing a line, 580 00:26:34,550 --> 00:26:36,450 so that you can just avoid it moving forward. 581 00:26:36,450 --> 00:26:38,200 And so these interventional conversations, 582 00:26:38,200 --> 00:26:41,068 as we describe them internally, I hope has actually 583 00:26:41,068 --> 00:26:43,610 gone a long way to just helping students navigate the waters. 584 00:26:43,610 --> 00:26:45,440 Even if they don't cross those lines, they at least now 585 00:26:45,440 --> 00:26:48,470 are being more conscious and thoughtful about what it is they're doing. 586 00:26:48,470 --> 00:26:50,803 BRIAN YU: And what do you usually gather from those sort 587 00:26:50,803 --> 00:26:51,830 of interventional chats? 588 00:26:51,830 --> 00:26:55,040 Like what sort of actions you find that students are taking? 589 00:26:55,040 --> 00:26:57,313 Does is seem like there's some teachable moment there 590 00:26:57,313 --> 00:26:58,730 that you're helping students with? 591 00:26:58,730 --> 00:27:01,105 DAVID MALAN: I think so because not infrequently would it 592 00:27:01,105 --> 00:27:03,890 be the case that two students were indeed working reasonably 593 00:27:03,890 --> 00:27:06,410 on the homework assignment together. 594 00:27:06,410 --> 00:27:10,310 But they were perhaps asking each other a few too many questions about code. 595 00:27:10,310 --> 00:27:13,250 It wasn't necessarily entirely in pseudocode or in English, 596 00:27:13,250 --> 00:27:14,210 their conversations. 597 00:27:14,210 --> 00:27:17,300 And maybe one was being shown the other's code, 598 00:27:17,300 --> 00:27:20,123 which is allowed within some circumstances per the syllabus. 599 00:27:20,123 --> 00:27:21,540 But maybe a little too frequently. 600 00:27:21,540 --> 00:27:24,980 And so as such, their work was just sort of over time, 601 00:27:24,980 --> 00:27:27,620 converging to become one in the same. 602 00:27:27,620 --> 00:27:30,995 And so given that we would have these chats within a week of them having done 603 00:27:30,995 --> 00:27:33,620 that, it was usually pretty obvious to students like, oh, let's 604 00:27:33,620 --> 00:27:34,970 not do that again. 605 00:27:34,970 --> 00:27:36,958 And recalibrate their approach. 606 00:27:36,958 --> 00:27:38,750 BRIAN YU: So it seems like all in all, CS50 607 00:27:38,750 --> 00:27:41,810 has tried a lot with the Regret Clause, with the Brink Clause, 608 00:27:41,810 --> 00:27:44,420 with these interventional chats that you've had with students. 609 00:27:44,420 --> 00:27:48,025 A lot that CS50 has done with regards to the issue of academic dishonesty 610 00:27:48,025 --> 00:27:50,150 and trying to create teachable moments out of that. 611 00:27:50,150 --> 00:27:52,850 And trying to work within the university and with students 612 00:27:52,850 --> 00:27:55,410 on how to improve that situation. 613 00:27:55,410 --> 00:27:58,970 What do you think are the lessons to be taken away for other courses? 614 00:27:58,970 --> 00:28:01,280 What can other classes do, either in computer science 615 00:28:01,280 --> 00:28:04,940 or outside of computer science that they can do based on the lessons 616 00:28:04,940 --> 00:28:06,980 that you and the course overall has learned 617 00:28:06,980 --> 00:28:10,820 from these years of working with these issues of academic dishonesty? 618 00:28:10,820 --> 00:28:13,520 DAVID MALAN: I think one takeaway has been just clarity. 619 00:28:13,520 --> 00:28:18,227 Our policy in the courses syllabus is not short, but it is detailed. 620 00:28:18,227 --> 00:28:20,060 And that's the result of a lot of situations 621 00:28:20,060 --> 00:28:22,070 having arisen over the years, a lot of conversations 622 00:28:22,070 --> 00:28:23,362 having happened over the years. 623 00:28:23,362 --> 00:28:27,470 And so I am glad that we do documents so clearly for students, 624 00:28:27,470 --> 00:28:31,100 where the lines are and what our expectations of students are. 625 00:28:31,100 --> 00:28:34,640 Toward that end too, I think it has been a good thing that we've introduced 626 00:28:34,640 --> 00:28:36,530 these interventional conversations. 627 00:28:36,530 --> 00:28:42,500 Even if a course is not as involved in the mechanics of the process as we are, 628 00:28:42,500 --> 00:28:45,500 they're not necessarily running software across compare your submission. 629 00:28:45,500 --> 00:28:47,333 But when something does appear on the radar, 630 00:28:47,333 --> 00:28:49,291 if a teaching fellow or teaching assistant does 631 00:28:49,291 --> 00:28:52,375 notice some curiosity in the student's code, it's dissimilar to their code 632 00:28:52,375 --> 00:28:55,010 last week or it's a little too similar to another student's, I 633 00:28:55,010 --> 00:28:58,160 think just being comfortable reaching out proactively to those students, 634 00:28:58,160 --> 00:29:01,175 not to impugn them, but rather to say, listen, we have some concerns. 635 00:29:01,175 --> 00:29:03,050 We don't feel you've crossed a line, but we'd 636 00:29:03,050 --> 00:29:05,510 like to better understand what you've done and how you did this. 637 00:29:05,510 --> 00:29:08,135 So that we can steer you in the right direction moving forward. 638 00:29:08,135 --> 00:29:12,652 That too seems a very straightforward, healthy and teachable opportunity. 639 00:29:12,652 --> 00:29:14,360 And as for the Regret Clause, I certainly 640 00:29:14,360 --> 00:29:16,280 think it's worth trying in other classes. 641 00:29:16,280 --> 00:29:19,040 I think it certainly is completely reasonable 642 00:29:19,040 --> 00:29:21,890 that a course, whether ours or anyone else's, 643 00:29:21,890 --> 00:29:26,480 just clearly defines what steps students should take when they find themselves 644 00:29:26,480 --> 00:29:27,560 in certain situations. 645 00:29:27,560 --> 00:29:29,780 And prior to the forgot clause it was ill-defined. 646 00:29:29,780 --> 00:29:34,040 What should a student do if they make a poor decision, especially late at night 647 00:29:34,040 --> 00:29:38,760 and then they do actually regret it the next day or some number of hours later? 648 00:29:38,760 --> 00:29:40,200 There was no well-defined process. 649 00:29:40,200 --> 00:29:42,110 And while technically, there was nothing stopping a student 650 00:29:42,110 --> 00:29:44,540 from coming forward and turning themselves in, 651 00:29:44,540 --> 00:29:46,850 I can certainly appreciate the trepidation 652 00:29:46,850 --> 00:29:49,760 that a student might have with taking that on not knowing 653 00:29:49,760 --> 00:29:50,990 what the outcome might be. 654 00:29:50,990 --> 00:29:57,140 Especially, if they assume it might even be time off from the University itself. 655 00:29:57,140 --> 00:29:59,390 So I think the fact that we've sort of clarified 656 00:29:59,390 --> 00:30:02,330 how to conduct oneself before you get to that point, 657 00:30:02,330 --> 00:30:06,220 after you get to that point, and after we have detected as much, 658 00:30:06,220 --> 00:30:08,297 is just only fair to students in the class. 659 00:30:08,297 --> 00:30:10,630 BRIAN YU: I think there are a lot of very useful lessons 660 00:30:10,630 --> 00:30:14,560 there in terms of what classes can start to do about this sort of issue. 661 00:30:14,560 --> 00:30:17,560 Certainly, if any of you are interested in learning more about this, 662 00:30:17,560 --> 00:30:19,540 we've actually written a paper, the two of us 663 00:30:19,540 --> 00:30:23,340 along with Doug Lloyd on CS50s teams about economic honesty in CS50. 664 00:30:23,340 --> 00:30:25,090 So we can provide a link to that if you're 665 00:30:25,090 --> 00:30:28,600 interested in reading more about the policy and about the Regret Clause 666 00:30:28,600 --> 00:30:31,840 and about other interventions that CS50 is made on these sorts of issues. 667 00:30:31,840 --> 00:30:32,673 DAVID MALAN: Indeed. 668 00:30:32,673 --> 00:30:34,720 The title is Teaching Academic Honesty In CS50. 669 00:30:34,720 --> 00:30:36,470 If you want to Google something like that. 670 00:30:36,470 --> 00:30:39,512 And if you're more interested in the software side of things and the cost 671 00:30:39,512 --> 00:30:44,860 comparison of submissions, if you go to github.com/cs50/compare50 you'll be 672 00:30:44,860 --> 00:30:47,830 able to play around with the open source software there as well. 673 00:30:47,830 --> 00:30:49,940 BRIAN YU: Certainly, if you have any feedback about today's podcast 674 00:30:49,940 --> 00:30:52,420 or suggestions for future podcast ideas, you can always 675 00:30:52,420 --> 00:30:55,210 reach us at cs50.harvard.edu. 676 00:30:55,210 --> 00:30:57,090 DAVID MALAN: This was CS50.