1 00:00:00,000 --> 00:00:03,140 2 00:00:03,140 --> 00:00:04,935 SPEAKER: This is CS50. 3 00:00:04,935 --> 00:00:08,337 4 00:00:08,337 --> 00:00:09,420 DAVID MALAN: Hello, world. 5 00:00:09,420 --> 00:00:11,090 This is the CS50 Podcast. 6 00:00:11,090 --> 00:00:14,342 My name is David Malan, and I'm here again with his CS50's own Brian Yu. 7 00:00:14,342 --> 00:00:15,300 BRIAN YU: Hi, everyone. 8 00:00:15,300 --> 00:00:16,580 Good to be back again. 9 00:00:16,580 --> 00:00:20,990 So CS50 is taught primarily in C, is the programming language that we teach 10 00:00:20,990 --> 00:00:22,490 in the course at the very beginning. 11 00:00:22,490 --> 00:00:24,365 And a lot of students and people that we talk 12 00:00:24,365 --> 00:00:27,800 to very often ask, why C, why is that the programming language 13 00:00:27,800 --> 00:00:31,132 that you've chosen for CS50 as the language to start out with. 14 00:00:31,132 --> 00:00:32,840 And I know that a couple of years ago you 15 00:00:32,840 --> 00:00:35,578 answered a Quora post that was also asking the same question. 16 00:00:35,578 --> 00:00:38,120 But I thought I'd take today as an opportunity on the podcast 17 00:00:38,120 --> 00:00:40,310 to ask about that, push you a little bit on it, 18 00:00:40,310 --> 00:00:43,280 and just discuss that, the pros and cons of the programming language 19 00:00:43,280 --> 00:00:44,510 choice for this intro class. 20 00:00:44,510 --> 00:00:48,420 DAVID MALAN: Sure, in fact allow me to encourage folks to Google Quaro CS50 21 00:00:48,420 --> 00:00:51,855 Why C, because, I daresay, it's a very thoughtful response out there 22 00:00:51,855 --> 00:00:53,730 that actually took a couple hours to compose. 23 00:00:53,730 --> 00:00:56,270 So I [INAUDIBLE] every point that I wanted us to hit. 24 00:00:56,270 --> 00:00:57,620 But it is a fair question. 25 00:00:57,620 --> 00:01:00,380 Honestly, this is one of those religious questions even that comes up all 26 00:01:00,380 --> 00:01:03,422 the time, especially when chatting with other teachers who teach computer 27 00:01:03,422 --> 00:01:07,790 science who might be using Python or Java or C++ or C or something else. 28 00:01:07,790 --> 00:01:11,300 It's kind of the go to language when you meet someone new in computer science 29 00:01:11,300 --> 00:01:12,410 to talk about languages. 30 00:01:12,410 --> 00:01:15,650 So here we are, another FAQ. 31 00:01:15,650 --> 00:01:18,500 So I think it's only fair to disclaim that some of it 32 00:01:18,500 --> 00:01:23,352 is probably my own upbringing when I took CS50 myself in 1996. 33 00:01:23,352 --> 00:01:26,060 Professor Brian Kernighan was teaching the class year at Harvard. 34 00:01:26,060 --> 00:01:29,510 And the course was under him and his predecessors 35 00:01:29,510 --> 00:01:32,840 primarily in C. In fact, even more of the course was spent in C. 36 00:01:32,840 --> 00:01:36,030 And so when I took the class, that was what I cut my own teeth on, 37 00:01:36,030 --> 00:01:37,110 so to speak. 38 00:01:37,110 --> 00:01:43,130 But with that said, with the benefit of hindsight, I do think pedagogically. 39 00:01:43,130 --> 00:01:46,640 I have a more thoughtful answer these days, having now taught CS50 itself 40 00:01:46,640 --> 00:01:50,030 and having deliberately chosen to stay with that language. 41 00:01:50,030 --> 00:01:53,780 I think that it just has so many pedagogically upsides 42 00:01:53,780 --> 00:01:56,660 despite some of its simultaneous pedagogically downsides. 43 00:01:56,660 --> 00:01:58,160 So it's a relatively small language. 44 00:01:58,160 --> 00:02:00,650 And even within a handful of weeks in CS50, 45 00:02:00,650 --> 00:02:03,170 can we expose students to most of the language. 46 00:02:03,170 --> 00:02:06,050 We leave out just a few more advanced or more arcane 47 00:02:06,050 --> 00:02:08,930 topics, like unions or function pointers, 48 00:02:08,930 --> 00:02:11,420 and a couple of other syntactic features as well. 49 00:02:11,420 --> 00:02:14,240 But by the end of the semester have students seen most of what 50 00:02:14,240 --> 00:02:16,060 constitutes C. 51 00:02:16,060 --> 00:02:18,263 Two, it's about as close to the hardware as you 52 00:02:18,263 --> 00:02:20,930 can get before you have to creep into what's called the assembly 53 00:02:20,930 --> 00:02:23,480 language, which if you think C is scary, assembly language is 54 00:02:23,480 --> 00:02:27,578 way scarier than that just syntactically and just how much control you, 55 00:02:27,578 --> 00:02:30,620 the programmer, have to exercise over what's going on inside the machine. 56 00:02:30,620 --> 00:02:34,700 That's a little too low level, I think, for students' first exposure 57 00:02:34,700 --> 00:02:38,570 to programming, whether they're a major or nonmajor. 58 00:02:38,570 --> 00:02:42,830 But I think it's also at the tail end of CS50, as time with C, 59 00:02:42,830 --> 00:02:47,060 it's just so wonderfully empowering to go on one day from-- or one week 60 00:02:47,060 --> 00:02:49,610 having implemented, say, your own hash table-- a particularly 61 00:02:49,610 --> 00:02:52,760 sophisticated data structure in C-- and probably having spent hours 62 00:02:52,760 --> 00:02:56,070 and hours getting that data structure to, not only be correct, 63 00:02:56,070 --> 00:03:00,290 but to be performant as well, using as efficiently as possible 64 00:03:00,290 --> 00:03:02,660 your computer CPU cycles and memory and then 65 00:03:02,660 --> 00:03:05,360 the next week, literally-- at least in the way CS50 66 00:03:05,360 --> 00:03:09,440 is structured-- to go to Python, for instance, most recently 67 00:03:09,440 --> 00:03:11,210 or years ago in PHP, 68 00:03:11,210 --> 00:03:13,760 reimplementing the entirety of that data structure 69 00:03:13,760 --> 00:03:17,240 in just one line using a dictionary or dict in Python 70 00:03:17,240 --> 00:03:19,430 or using an associate of array in PHP. 71 00:03:19,430 --> 00:03:22,570 And I think that really speaks to the takeaways of abstraction 72 00:03:22,570 --> 00:03:24,320 that we want students to have in a course. 73 00:03:24,320 --> 00:03:26,528 You understand the underlying implementation details. 74 00:03:26,528 --> 00:03:28,280 But once you do, you can stipulate. 75 00:03:28,280 --> 00:03:29,060 I got that. 76 00:03:29,060 --> 00:03:34,730 Now I'm going to operate at and write code in a higher level of abstraction. 77 00:03:34,730 --> 00:03:38,500 BRIAN YU: So you talk a lot about the pedagogically upsides of C. 78 00:03:38,500 --> 00:03:40,890 What are the pedagogically downsides on the other hand? 79 00:03:40,890 --> 00:03:43,930 DAVID MALAN: Yeah, pointers is probably the go to answer here. 80 00:03:43,930 --> 00:03:46,130 With great power comes great responsibility. 81 00:03:46,130 --> 00:03:51,490 And admittedly, we do have to delve or we choose to delve deeply into pointers 82 00:03:51,490 --> 00:03:54,460 and memory management more generally, the upsides of which-- 83 00:03:54,460 --> 00:03:55,210 let me stipulate-- 84 00:03:55,210 --> 00:03:57,910 I think are that we can have even more interesting conversations 85 00:03:57,910 --> 00:04:02,043 about security and the implications for management of one's own data 86 00:04:02,043 --> 00:04:04,960 that you just can't have as effectively in higher level languages that 87 00:04:04,960 --> 00:04:06,520 protect you from yourself. 88 00:04:06,520 --> 00:04:10,870 But the downside, pedagogically and just technologically, 89 00:04:10,870 --> 00:04:13,930 is that students really do trip over their code 90 00:04:13,930 --> 00:04:16,930 ever more so because in C, if unfamiliar, 91 00:04:16,930 --> 00:04:20,250 you have the ability to actually touch specific addresses and memories. 92 00:04:20,250 --> 00:04:23,050 So if your program has the illusion of, say, 2 gigabytes of space, 93 00:04:23,050 --> 00:04:24,850 you can touch every byte of memory there, 94 00:04:24,850 --> 00:04:26,980 for better or for worse, even if you're not supposed to. 95 00:04:26,980 --> 00:04:29,260 And the result of touching memory that you're not supposed to 96 00:04:29,260 --> 00:04:31,510 is often what's called a seg fault or segmentation 97 00:04:31,510 --> 00:04:34,900 fault or, more mechanically, just your program freezes or crashes 98 00:04:34,900 --> 00:04:37,090 or behaves unpredictably in some way. 99 00:04:37,090 --> 00:04:39,680 And that is super, super common. 100 00:04:39,680 --> 00:04:44,290 But I think that it gives you thereafter so much more 101 00:04:44,290 --> 00:04:47,650 of an appreciation for what a computer or a language or a runtime 102 00:04:47,650 --> 00:04:48,880 is doing for you. 103 00:04:48,880 --> 00:04:52,180 Java's references become more of an aha moment. 104 00:04:52,180 --> 00:04:53,050 Oh, this is nice. 105 00:04:53,050 --> 00:04:56,980 I don't have to worry about this feature, this risk anymore. 106 00:04:56,980 --> 00:04:59,980 Python and other languages, Swift and Objective C, 107 00:04:59,980 --> 00:05:01,860 can do memory management for you. 108 00:05:01,860 --> 00:05:03,610 But you know what that means, and you know 109 00:05:03,610 --> 00:05:05,620 what's going on underneath the hood. 110 00:05:05,620 --> 00:05:08,950 And I just think that's a much better place to be in most any domain-- 111 00:05:08,950 --> 00:05:12,710 to understand the underlying plumbing at least at some point 112 00:05:12,710 --> 00:05:16,510 and then build on top of it even if you don't go into those weeds as much 113 00:05:16,510 --> 00:05:17,058 anymore. 114 00:05:17,058 --> 00:05:18,850 BRIAN YU: So just to play devil's advocate, 115 00:05:18,850 --> 00:05:21,310 a lot of what you do at the beginning of CS50 116 00:05:21,310 --> 00:05:25,400 is first creating these abstractions and then diving beneath them later. 117 00:05:25,400 --> 00:05:27,477 You start by introducing like that string, 118 00:05:27,477 --> 00:05:29,560 we're just going to call it a string, before later 119 00:05:29,560 --> 00:05:32,185 in the third or fourth week of the course revealing, OK, here's 120 00:05:32,185 --> 00:05:34,763 what's actually happening in an underlying sense. 121 00:05:34,763 --> 00:05:37,930 And you might imagine that you could take the same approach with programming 122 00:05:37,930 --> 00:05:41,320 languages, introducing a higher level programming language like Python 123 00:05:41,320 --> 00:05:44,132 first and saying, well, here's the easy way 124 00:05:44,132 --> 00:05:46,840 to do it without having to worry about all the underlying details 125 00:05:46,840 --> 00:05:50,230 and then later on thing saying, let's now take a look at a language like C 126 00:05:50,230 --> 00:05:52,543 and figure out what's actually happening in Python 127 00:05:52,543 --> 00:05:54,460 when you're trying to run something like this. 128 00:05:54,460 --> 00:05:57,550 So how do you make these decisions about why 129 00:05:57,550 --> 00:06:01,658 do C first and then transition to Python as opposed to the other way around? 130 00:06:01,658 --> 00:06:04,450 DAVID MALAN: So one I should note that it's not so much I, it's we. 131 00:06:04,450 --> 00:06:07,330 So you're just as guilty as I am for teaching it this way. 132 00:06:07,330 --> 00:06:07,630 BRIAN YU: This is true. 133 00:06:07,630 --> 00:06:10,755 DAVID MALAN: Two, that would have been an amazing moment in a Law and Order 134 00:06:10,755 --> 00:06:14,780 episode, where on cross you get me to contradict myself like this. 135 00:06:14,780 --> 00:06:16,985 And that's actually a really thoughtful challenge. 136 00:06:16,985 --> 00:06:20,110 And I don't think in all the chats I've personally ever had about languages 137 00:06:20,110 --> 00:06:22,448 that someone framed it quite like that, to be honest. 138 00:06:22,448 --> 00:06:24,490 And for those listening, and this is not planned. 139 00:06:24,490 --> 00:06:26,560 I did not see this question in advance. 140 00:06:26,560 --> 00:06:30,160 So I think my pushback would be-- 141 00:06:30,160 --> 00:06:34,280 it's I think harder to justify, at least in the context of one course-- 142 00:06:34,280 --> 00:06:37,310 or it would be harder for me to justify in the context of one course-- 143 00:06:37,310 --> 00:06:40,090 going from a higher level, language like starting with Python, 144 00:06:40,090 --> 00:06:43,420 and then diving more deeply into C insofar 145 00:06:43,420 --> 00:06:47,230 as it probably doesn't really solve problems for students in the same way. 146 00:06:47,230 --> 00:06:51,025 Most of our students and most students, daresay, 147 00:06:51,025 --> 00:06:52,900 who are studying to become software engineers 148 00:06:52,900 --> 00:06:55,192 or who just want to apply programming in other domains, 149 00:06:55,192 --> 00:06:56,980 they're not going to have to bother with C 150 00:06:56,980 --> 00:06:59,140 and it's lower level optimizations and the performance 151 00:06:59,140 --> 00:07:00,250 that you can squeeze out of it. 152 00:07:00,250 --> 00:07:02,125 That's arguably, at least in the real world-- 153 00:07:02,125 --> 00:07:06,460 I daresay-- more of a specialized field or specialized use case for language 154 00:07:06,460 --> 00:07:10,700 where you really do want to be as close to the hardware as possible. 155 00:07:10,700 --> 00:07:14,770 And so I worry that that approach, that sequencing of things 156 00:07:14,770 --> 00:07:18,400 would invite more of a who cares or why are we doing this now. 157 00:07:18,400 --> 00:07:23,950 Whereas, if we can start from the older, the lower level language 158 00:07:23,950 --> 00:07:26,740 and just give ourselves enough training wheels early 159 00:07:26,740 --> 00:07:29,530 on that we don't have to trip over some of the harder language 160 00:07:29,530 --> 00:07:33,190 features like pointers and what a string actually is underneath the hood, 161 00:07:33,190 --> 00:07:36,940 we can peel those off but still be in that original language 162 00:07:36,940 --> 00:07:39,500 and then step from that level to the next. 163 00:07:39,500 --> 00:07:44,830 So I do concede that there's a bit of built in contradiction here. 164 00:07:44,830 --> 00:07:47,110 But to me it feels like the nicer balance-- 165 00:07:47,110 --> 00:07:50,680 start with the smaller, the more primitive language, 166 00:07:50,680 --> 00:07:52,750 layer on some training wheels, take them off. 167 00:07:52,750 --> 00:07:54,910 And then after taking those training wheels off, 168 00:07:54,910 --> 00:07:58,090 motivate a fundamentally different approach, different language 169 00:07:58,090 --> 00:07:58,990 altogether. 170 00:07:58,990 --> 00:08:00,782 That narrative just makes more sense to me. 171 00:08:00,782 --> 00:08:03,198 BRIAN YU: Yeah, and I can definitely sympathize with that. 172 00:08:03,198 --> 00:08:06,040 Once you've motivated Python and you've started to experience that, 173 00:08:06,040 --> 00:08:09,760 it's very difficult to imagine more situations 174 00:08:09,760 --> 00:08:11,980 where you would want to use C, especially 175 00:08:11,980 --> 00:08:14,720 for students for which CS50 is they're only programming class 176 00:08:14,720 --> 00:08:16,140 that they end up taking for it. 177 00:08:16,140 --> 00:08:17,110 DAVID MALAN: Well, and that, too, honestly. 178 00:08:17,110 --> 00:08:19,210 I really do want students to emerge from CS50, 179 00:08:19,210 --> 00:08:22,780 whether they're taking it here on campus or just online at their own pace, 180 00:08:22,780 --> 00:08:27,472 really not just being a programmer or a software developer, 181 00:08:27,472 --> 00:08:30,430 but really thinking themselves as a budding computer scientist, someone 182 00:08:30,430 --> 00:08:33,370 who can think thoughtfully and critically about these very same issues 183 00:08:33,370 --> 00:08:35,412 so that ideally some of our students can actually 184 00:08:35,412 --> 00:08:36,880 have this same religious debate. 185 00:08:36,880 --> 00:08:38,838 And maybe they'll fall on the other side of it. 186 00:08:38,838 --> 00:08:41,110 But at least they understand the primitives involved 187 00:08:41,110 --> 00:08:43,450 and the trade offs involved with which to come up 188 00:08:43,450 --> 00:08:45,670 with that opinion themselves. 189 00:08:45,670 --> 00:08:47,920 But your question about pedagogically downsides, too-- 190 00:08:47,920 --> 00:08:49,420 I mean, there are other side effects. 191 00:08:49,420 --> 00:08:51,753 I daresay-- and you know this better than anyone lately, 192 00:08:51,753 --> 00:08:55,540 as you try to develop new problem sets for CS50 this coming year-- 193 00:08:55,540 --> 00:09:01,920 that it's really hard to do cool things in C for some definition of cool. 194 00:09:01,920 --> 00:09:05,638 If we want to do more modern techniques that [? pertain ?] to software 195 00:09:05,638 --> 00:09:08,680 features that students might see on their laptops and desktops and phones 196 00:09:08,680 --> 00:09:12,460 these days, like image manipulation or sound manipulation or features that 197 00:09:12,460 --> 00:09:14,500 exist in a lot of today's most popular apps, 198 00:09:14,500 --> 00:09:19,870 it is arguably harder to do that in C because you don't have off the shelf 199 00:09:19,870 --> 00:09:23,230 libraries left and right that can solve these problems for you. 200 00:09:23,230 --> 00:09:27,760 And even if there are, their APIs, their usage 201 00:09:27,760 --> 00:09:29,980 might be a little more sophisticated than students 202 00:09:29,980 --> 00:09:33,670 are ready for after just one or two weeks of C. Case in point, 203 00:09:33,670 --> 00:09:37,240 I have long wanted us to be able to introduce file I/O earlier 204 00:09:37,240 --> 00:09:39,880 in the semester so that, in week one of the course, 205 00:09:39,880 --> 00:09:42,490 we could actually have students read data from a file. 206 00:09:42,490 --> 00:09:45,280 Now we could absolutely provide in CS5O's own library 207 00:09:45,280 --> 00:09:47,440 one or more functions that does this. 208 00:09:47,440 --> 00:09:51,250 But as much as we've discussed internally how best to do this, 209 00:09:51,250 --> 00:09:56,590 it's not quite as simple as just having one function, like a get file or read 210 00:09:56,590 --> 00:09:59,740 file, that hands you back an array of all lines. 211 00:09:59,740 --> 00:10:02,440 Long story short, we start to trip over some limitations of C 212 00:10:02,440 --> 00:10:06,100 where we might need a second function to help navigate things to close 213 00:10:06,100 --> 00:10:08,800 the file, for instance, or we might need a specialized data type 214 00:10:08,800 --> 00:10:10,218 to represent the data from a file. 215 00:10:10,218 --> 00:10:12,010 At which point it just feels to me like now 216 00:10:12,010 --> 00:10:14,110 we're putting on too many training wheels 217 00:10:14,110 --> 00:10:16,960 and we're trying to use the language for what it's not really 218 00:10:16,960 --> 00:10:19,990 best for, at least in an introductory programming context. 219 00:10:19,990 --> 00:10:21,100 So we pay that price. 220 00:10:21,100 --> 00:10:24,520 And there's a reason that CS50's early p sets are fairly textual 221 00:10:24,520 --> 00:10:28,460 and not nearly is as cool I think as some of the mid or late p 222 00:10:28,460 --> 00:10:29,252 sets in the course. 223 00:10:29,252 --> 00:10:31,252 BRIAN YU: Yeah, I definitely see that challenge. 224 00:10:31,252 --> 00:10:33,670 And for listeners who might not be too familiar with how 225 00:10:33,670 --> 00:10:37,270 C differs from other programming languages with regards to files 226 00:10:37,270 --> 00:10:40,060 and file manipulation, different programming languages 227 00:10:40,060 --> 00:10:41,590 use different abstractions. 228 00:10:41,590 --> 00:10:45,400 So in a language like Python, for example, with one line of code, 229 00:10:45,400 --> 00:10:47,770 you can open up a file and read it for that you 230 00:10:47,770 --> 00:10:51,160 have the entire contents of the file that you can very easily manipulate 231 00:10:51,160 --> 00:10:54,700 and begin to work with, whereas in C there's multiple steps involved 232 00:10:54,700 --> 00:10:56,110 of first opening up the file. 233 00:10:56,110 --> 00:10:58,840 And you have to decide how much memory you're going to allocate. 234 00:10:58,840 --> 00:11:02,200 And you have to do a separate call to read the data from the file 235 00:11:02,200 --> 00:11:02,890 into memory. 236 00:11:02,890 --> 00:11:05,860 So it is a little bit trickier to be able to write 237 00:11:05,860 --> 00:11:09,373 problems that are easily accessible in C that deal with files for instance. 238 00:11:09,373 --> 00:11:11,290 DAVID MALAN: Yeah, and honestly, another issue 239 00:11:11,290 --> 00:11:15,180 we've been wrestling with this summer is portability of code. 240 00:11:15,180 --> 00:11:18,430 Well, we have always in if CS50 used a Linus or a Unix environment, even 241 00:11:18,430 --> 00:11:19,390 before my time. 242 00:11:19,390 --> 00:11:22,720 And that's the architecture on which we've standardized for students. 243 00:11:22,720 --> 00:11:25,600 However, in recent years, the past 10 plus years 244 00:11:25,600 --> 00:11:30,010 have we focused on making CS50's materials and software as portable as 245 00:11:30,010 --> 00:11:34,480 possible, all of our open source tools, including CS50's own library for C. 246 00:11:34,480 --> 00:11:39,250 But unfortunately with C do you run into some architectural specific details 247 00:11:39,250 --> 00:11:41,230 like the size of various data types. 248 00:11:41,230 --> 00:11:44,800 It varies by machine sometimes what and int is 249 00:11:44,800 --> 00:11:49,488 or how many bytes or bits it takes up, a double or float or a long long. 250 00:11:49,488 --> 00:11:51,280 And so case in point, most recently have we 251 00:11:51,280 --> 00:11:53,770 been wanting to deprecate-- that is, kill off-- 252 00:11:53,770 --> 00:11:55,900 a function that for many years in CS50's library 253 00:11:55,900 --> 00:11:59,200 was called get long long, because in recent years 254 00:11:59,200 --> 00:12:01,570 we've migrated our own cloud architecture 255 00:12:01,570 --> 00:12:04,060 and students laptops have essentially evolved into being 256 00:12:04,060 --> 00:12:07,300 64-bit machines instead of 32-bit machines, 257 00:12:07,300 --> 00:12:09,710 which has had the side effect, if you will, 258 00:12:09,710 --> 00:12:14,020 of making longs twice as long as an inch. 259 00:12:14,020 --> 00:12:18,920 So instead of 32 bits, they are 64 bits on most platforms like Linux, 260 00:12:18,920 --> 00:12:21,130 and Mac OS, but not Windows. 261 00:12:21,130 --> 00:12:25,060 On Windows, a long is still 4 bytes, or 32 bits, which 262 00:12:25,060 --> 00:12:28,570 means if you were to work on CS50's problem sets or programming assignments 263 00:12:28,570 --> 00:12:34,420 on your own Windows PC and you are using the CS50 library and you use a long, 264 00:12:34,420 --> 00:12:39,242 you might actually only be getting 4 bytes of storage, not 8, which 265 00:12:39,242 --> 00:12:41,200 means you could actually have errors on Windows 266 00:12:41,200 --> 00:12:44,660 that you wouldn't with the exact same code on Mac OS or Linux. 267 00:12:44,660 --> 00:12:47,560 So even those kinds of frustrations come with C. 268 00:12:47,560 --> 00:12:50,650 But however, it's a less common case I think for students 269 00:12:50,650 --> 00:12:53,320 to be working on CS50's p sets on their own windows computers 270 00:12:53,320 --> 00:13:00,520 and not using our environment like CS50 sandbox or IDE or cli50 these days. 271 00:13:00,520 --> 00:13:03,493 BRIAN YU: So I also wanted to ask about it CS50's problem sets 272 00:13:03,493 --> 00:13:05,410 and whether C is the right language for these, 273 00:13:05,410 --> 00:13:08,600 whether you could imagine doing these sorts of problems in other languages. 274 00:13:08,600 --> 00:13:11,410 So Game of 15, for example, which is the problems that we 275 00:13:11,410 --> 00:13:15,220 assign in the third week of the class approximately. 276 00:13:15,220 --> 00:13:18,340 And we ask students to implement one of those 15 puzzles 277 00:13:18,340 --> 00:13:20,040 with the sliding tiles. 278 00:13:20,040 --> 00:13:22,867 If you were to implement that outside of the context of the course, 279 00:13:22,867 --> 00:13:24,450 is C the right language choice for it? 280 00:13:24,450 --> 00:13:26,170 Or would you use some other language for that? 281 00:13:26,170 --> 00:13:27,430 DAVID MALAN: So I think I should first stipulate 282 00:13:27,430 --> 00:13:29,980 that C is rarely the right language for any of the problems 283 00:13:29,980 --> 00:13:32,980 that we might do in problem sets because the reality is you can probably 284 00:13:32,980 --> 00:13:36,750 implement most CS50's problem sets more easily, more quickly, 285 00:13:36,750 --> 00:13:41,900 more pleasantly in languages besides C. With that said, Game of 15 286 00:13:41,900 --> 00:13:45,500 essentially boils down to a two dimensional array 287 00:13:45,500 --> 00:13:49,230 of numbers, the goal being to arrange the numbers from top to bottom, left 288 00:13:49,230 --> 00:13:52,460 to right, from 1 to 15, leaving just one blank space in the bottom right hand 289 00:13:52,460 --> 00:13:53,110 corner. 290 00:13:53,110 --> 00:13:54,860 And honestly, that is a problem that could 291 00:13:54,860 --> 00:13:56,892 be solved pretty readily in almost any language 292 00:13:56,892 --> 00:13:59,600 because you're pretty much just navigating a two dimensional data 293 00:13:59,600 --> 00:14:02,750 structure and looking up, down, left, right to figure out 294 00:14:02,750 --> 00:14:04,940 if a move is valid for instance. 295 00:14:04,940 --> 00:14:08,773 I think the GUE, which is a very simplistic GUE that we implement right 296 00:14:08,773 --> 00:14:10,940 now, could probably be done a little more pleasantly 297 00:14:10,940 --> 00:14:14,030 in other languages, especially ones that lend themselves to graphics. 298 00:14:14,030 --> 00:14:16,760 So we are not even using graphics in Game of 15. 299 00:14:16,760 --> 00:14:19,940 We are just using ANSI control characters so as 300 00:14:19,940 --> 00:14:24,290 to clear the screen instantly and create the illusion of animation. 301 00:14:24,290 --> 00:14:28,340 And that's about as good as you can get with C until you use a proper library, 302 00:14:28,340 --> 00:14:30,770 like the ncurses library, which I'll add as an aside, 303 00:14:30,770 --> 00:14:34,550 we used to use years ago for fancier game interfaces. 304 00:14:34,550 --> 00:14:37,250 But I think Game of 15 is perfectly fine in C. 305 00:14:37,250 --> 00:14:42,680 But you could have a better UI more easily using other languages. 306 00:14:42,680 --> 00:14:45,950 BRIAN YU: And what about some of these CS50's other problem sets? 307 00:14:45,950 --> 00:14:49,700 So one that comes after a Game of 15 is usually resize, 308 00:14:49,700 --> 00:14:51,980 which is a image manipulation problems that where 309 00:14:51,980 --> 00:14:55,590 students are asked to take a bitmap image and scale it up, make it bigger, 310 00:14:55,590 --> 00:14:57,250 or make it smaller, for example? 311 00:14:57,250 --> 00:15:00,050 DAVID MALAN: Yes, so with anything involving images, 312 00:15:00,050 --> 00:15:02,090 most any other language is going to be easier 313 00:15:02,090 --> 00:15:04,220 because odds are you can import some library 314 00:15:04,220 --> 00:15:08,333 and do what we are doing with resize, resizing images, with one line of code. 315 00:15:08,333 --> 00:15:11,000 So yes, if I were in the real world to be implementing a program 316 00:15:11,000 --> 00:15:14,270 to resize images that I couldn't just download myself, 317 00:15:14,270 --> 00:15:16,610 like ImageMagick, which is a free open source tool 318 00:15:16,610 --> 00:15:20,670 chain for image manipulation, I would probably just use Python for that. 319 00:15:20,670 --> 00:15:24,020 Though, I gather Java is pretty good at image manipulation as well. 320 00:15:24,020 --> 00:15:27,648 But most languages have libraries built in that would make that easier. 321 00:15:27,648 --> 00:15:29,690 But I would probably reach for Python these days. 322 00:15:29,690 --> 00:15:31,070 BRIAN YU: Actually, that reminds me of another question 323 00:15:31,070 --> 00:15:32,320 that I was meaning to ask you. 324 00:15:32,320 --> 00:15:35,150 So you mentioned that C is the programming language that 325 00:15:35,150 --> 00:15:37,550 is taught in CS50 but was also taught in CS50 326 00:15:37,550 --> 00:15:40,252 before you started teaching CS50, whereas Python, that's 327 00:15:40,252 --> 00:15:41,210 a little bit different. 328 00:15:41,210 --> 00:15:43,002 Because a couple of years ago the class was 329 00:15:43,002 --> 00:15:45,590 taught using PHP in the latter half of the course. 330 00:15:45,590 --> 00:15:49,240 And recently, you transitioned the class to being taught in Python instead. 331 00:15:49,240 --> 00:15:50,990 I was curious about that decision process. 332 00:15:50,990 --> 00:15:52,270 Was Python the obvious choice? 333 00:15:52,270 --> 00:15:54,270 Were there other languages you were considering? 334 00:15:54,270 --> 00:15:56,720 Or what was the thought process that went into figuring that out? 335 00:15:56,720 --> 00:15:58,595 DAVID MALAN: Oh, also a question that I think 336 00:15:58,595 --> 00:16:00,200 we answered on Quora some time ago. 337 00:16:00,200 --> 00:16:01,310 This too is an FAQ. 338 00:16:01,310 --> 00:16:04,250 So for many years, we did use PHP. 339 00:16:04,250 --> 00:16:08,120 And that was the change we made early on in 2007 when I took over the course, 340 00:16:08,120 --> 00:16:11,960 whereas in recent years prior the course had been introducing students 341 00:16:11,960 --> 00:16:13,280 to a bit of Ruby-- 342 00:16:13,280 --> 00:16:17,467 I believe not for web programming, but just as a follow on language to C. 343 00:16:17,467 --> 00:16:18,800 At the time, I didn't know Ruby. 344 00:16:18,800 --> 00:16:19,815 I did know PHP. 345 00:16:19,815 --> 00:16:22,190 PHP was certainly very much in vogue for web programming. 346 00:16:22,190 --> 00:16:24,740 And that was my pedagogically goal, to introduce students 347 00:16:24,740 --> 00:16:28,350 at the tail end of the semester to web applications, 348 00:16:28,350 --> 00:16:31,970 including some SQL and JavaScript and HTML and CSS, all of which 349 00:16:31,970 --> 00:16:36,620 of course were popular by then as well. 350 00:16:36,620 --> 00:16:38,810 And what I liked about PHP at the time was frankly 351 00:16:38,810 --> 00:16:40,610 its documentation was just stellar. 352 00:16:40,610 --> 00:16:43,127 Every page is standardized on these PHP's website. 353 00:16:43,127 --> 00:16:45,710 It gives you really good examples that allow you to dive right 354 00:16:45,710 --> 00:16:47,437 in and use functions quickly. 355 00:16:47,437 --> 00:16:48,770 And it was just very accessible. 356 00:16:48,770 --> 00:16:52,310 And syntactically PHP is just so similar to C, at least 357 00:16:52,310 --> 00:16:53,990 as we present it to students. 358 00:16:53,990 --> 00:16:56,510 Pretty much the only salient difference is like dollar signs 359 00:16:56,510 --> 00:16:58,250 need to be prefixed to variable names. 360 00:16:58,250 --> 00:17:00,680 But for the most part, almost every other construct 361 00:17:00,680 --> 00:17:03,200 is the same as in C. That was a really nice stepping stone, 362 00:17:03,200 --> 00:17:04,970 to go from C to PHP. 363 00:17:04,970 --> 00:17:13,220 But the time came by mid 2010s that PHP was definitely falling out of favor. 364 00:17:13,220 --> 00:17:14,720 Python was becoming more popular. 365 00:17:14,720 --> 00:17:18,869 Ruby was definitely more popular, or at least was gaining more steam. 366 00:17:18,869 --> 00:17:20,780 So the first derivative of these languages 367 00:17:20,780 --> 00:17:23,317 was probably higher than PHP was at the time. 368 00:17:23,317 --> 00:17:25,609 And so at some point, it just felt like the right thing 369 00:17:25,609 --> 00:17:29,622 to do in terms of just trends in so far as half of CS50 students 370 00:17:29,622 --> 00:17:31,580 are never going to take, we know statistically, 371 00:17:31,580 --> 00:17:33,050 another computer science course before. 372 00:17:33,050 --> 00:17:35,092 But they are going to go off to some other domain 373 00:17:35,092 --> 00:17:39,112 and apply programming principles in whatever areas of interest to them. 374 00:17:39,112 --> 00:17:42,320 We wanted them to have something that was a little more familiar and trending 375 00:17:42,320 --> 00:17:43,160 at the time. 376 00:17:43,160 --> 00:17:46,287 However, pedagogically that argument could 377 00:17:46,287 --> 00:17:47,870 have been made even a few years prior. 378 00:17:47,870 --> 00:17:51,517 What really put me over the edge was just changes in industry paradigms 379 00:17:51,517 --> 00:17:52,850 when it came to web programming. 380 00:17:52,850 --> 00:17:57,020 So it was becoming increasingly common not to take PHP's approach 381 00:17:57,020 --> 00:17:59,990 of having every file serve as its own controller, 382 00:17:59,990 --> 00:18:04,510 so to speak-- so index dot PHP and foo dot PHP and bar dot PHP and baz dot 383 00:18:04,510 --> 00:18:07,850 PHP, but rather to start routing all incoming HTTP 384 00:18:07,850 --> 00:18:10,940 requests through an incoming controller of sorts. 385 00:18:10,940 --> 00:18:13,700 Maybe it's all requests go through index dot PHP 386 00:18:13,700 --> 00:18:17,630 and you somehow implement within that and more files your actual routes. 387 00:18:17,630 --> 00:18:20,810 And it got to the point where in the CS50's problem set seven 388 00:18:20,810 --> 00:18:24,680 at the time, CS50 [INAUDIBLE],, we were going to provide students 389 00:18:24,680 --> 00:18:30,050 finally with a rounding framework so that we mimic this industry standard 390 00:18:30,050 --> 00:18:34,070 paradigm where all of your requests are coming in through central controller. 391 00:18:34,070 --> 00:18:39,290 And at that point, we were so adding on to PHP 392 00:18:39,290 --> 00:18:43,220 things that just existed, not only in PHP frameworks, 393 00:18:43,220 --> 00:18:46,700 but in other languages frameworks, like Flask for Python, 394 00:18:46,700 --> 00:18:50,760 that it felt like we were no longer using PHP for what it was 395 00:18:50,760 --> 00:18:54,890 so wonderfully pedagogically helpful for, which was early on the one 396 00:18:54,890 --> 00:18:59,450 to one mapping between text files, something dot PSP, and routes 397 00:18:59,450 --> 00:19:00,300 in a web page. 398 00:19:00,300 --> 00:19:02,630 It was very nice and simple that if a student wants 399 00:19:02,630 --> 00:19:07,090 to implement a foo route, they implement foo dot HP, and a bar route, 400 00:19:07,090 --> 00:19:08,300 bar dot PHP. 401 00:19:08,300 --> 00:19:11,630 But the reality is that is done more centrally in many frameworks 402 00:19:11,630 --> 00:19:12,510 these days. 403 00:19:12,510 --> 00:19:14,750 And so that is what finally put me over the edge. 404 00:19:14,750 --> 00:19:19,130 If we were going to so abstract away some of the underlying design 405 00:19:19,130 --> 00:19:21,950 features of PHP with our own mini framework, 406 00:19:21,950 --> 00:19:25,310 we might as well just adopt altogether a framework. 407 00:19:25,310 --> 00:19:28,150 And at that point, that opened the gates to any number of languages 408 00:19:28,150 --> 00:19:30,447 in Python, [? one ?] out for us. 409 00:19:30,447 --> 00:19:33,530 BRIAN YU: Yeah, Python definitely seems to have gained a lot of popularity 410 00:19:33,530 --> 00:19:34,030 recently. 411 00:19:34,030 --> 00:19:37,130 I was just looking at last year GitHub Octoverse 412 00:19:37,130 --> 00:19:40,490 report, which is the report where they take all of the GitHub repositories 413 00:19:40,490 --> 00:19:44,810 that are online and do some analysis and do some looking at the data there. 414 00:19:44,810 --> 00:19:49,910 And it looks like, since 2014, Python went from the fourth place most common 415 00:19:49,910 --> 00:19:53,690 programming language for repositories on GitHub to the third place 416 00:19:53,690 --> 00:19:55,880 now, just behind JavaScript and Java. 417 00:19:55,880 --> 00:19:58,010 Although I guess on the flip side, since 2014, 418 00:19:58,010 --> 00:20:02,480 C seems to have gone down from seventh place down to ninth place. 419 00:20:02,480 --> 00:20:05,870 So I guess as a follow up question to bring things back to why C, 420 00:20:05,870 --> 00:20:10,190 it looks like we eventually made the transition to Python because of changes 421 00:20:10,190 --> 00:20:13,670 in industry trends and because Python was getting more popular. 422 00:20:13,670 --> 00:20:16,410 In the longer term, maybe years down the line, 423 00:20:16,410 --> 00:20:19,865 do you see the course eventually moving away from C as a programming language? 424 00:20:19,865 --> 00:20:20,990 DAVID MALAN: Good question. 425 00:20:20,990 --> 00:20:21,750 Maybe. 426 00:20:21,750 --> 00:20:23,500 I mean, at some point, it's probably going 427 00:20:23,500 --> 00:20:25,997 to be a little clinging to the past. 428 00:20:25,997 --> 00:20:27,830 I don't know where that inflection point is. 429 00:20:27,830 --> 00:20:29,630 It doesn't feel like it's on the horizon. 430 00:20:29,630 --> 00:20:32,010 I mean, C is still quite popular. 431 00:20:32,010 --> 00:20:33,860 It's been with us for decades now. 432 00:20:33,860 --> 00:20:36,860 It's the foundation in some sense of many other languages 433 00:20:36,860 --> 00:20:39,410 either in terms of its syntactic inspiration or just 434 00:20:39,410 --> 00:20:40,610 the underlying software. 435 00:20:40,610 --> 00:20:44,030 Like the interpreters and the compilers we use are with high probability 436 00:20:44,030 --> 00:20:48,120 written in C, maybe C++ for performance reasons and so forth. 437 00:20:48,120 --> 00:20:52,372 So it's really not going away, I think, anytime soon, whereas in industry, 438 00:20:52,372 --> 00:20:54,080 when it comes to web programming-- again, 439 00:20:54,080 --> 00:20:56,705 the domain that we like to explore at the end of the semester-- 440 00:20:56,705 --> 00:20:59,570 I think there's a more rapid rate of change 441 00:20:59,570 --> 00:21:03,110 when it comes to languages being in or out of vogue or frameworks certainly 442 00:21:03,110 --> 00:21:04,820 that are in or out of vogue. 443 00:21:04,820 --> 00:21:08,090 And I should mention, too, that with Python the other characteristic that 444 00:21:08,090 --> 00:21:12,110 put me over the edge mentally when it came to making the final call 445 00:21:12,110 --> 00:21:16,220 was that Python is just wonderfully well rounded, too. 446 00:21:16,220 --> 00:21:19,280 It can be used and is used quite popularly for web programming 447 00:21:19,280 --> 00:21:21,620 as with Flask or Django or other frameworks. 448 00:21:21,620 --> 00:21:24,050 But it's also very commonly used for scripting, 449 00:21:24,050 --> 00:21:26,540 for smaller applications, for command line programs, 450 00:21:26,540 --> 00:21:29,250 for data science applications, data analysis. 451 00:21:29,250 --> 00:21:31,310 So there's any number of different use cases. 452 00:21:31,310 --> 00:21:35,390 And PHP was, to its credit, designed for the web. 453 00:21:35,390 --> 00:21:38,778 But it wasn't really intended to be for data science, 454 00:21:38,778 --> 00:21:40,570 for command line applications and so forth. 455 00:21:40,570 --> 00:21:41,630 You can use it for that. 456 00:21:41,630 --> 00:21:43,382 And indeed, we did for many years. 457 00:21:43,382 --> 00:21:45,590 But it just wasn't really designed with that in mind, 458 00:21:45,590 --> 00:21:50,210 whereas Python is generally pretty commonly found 459 00:21:50,210 --> 00:21:51,410 in both of those domains. 460 00:21:51,410 --> 00:21:54,860 So it just felt like the more versatile tool to have in your toolkit, 461 00:21:54,860 --> 00:21:55,618 so to speak. 462 00:21:55,618 --> 00:21:58,160 BRIAN YU: So you've talked about some of the qualms with PHP. 463 00:21:58,160 --> 00:22:01,540 And you mentioned with C, it's difficult to teach file I/O early on because it 464 00:22:01,540 --> 00:22:02,630 is a little complicated. 465 00:22:02,630 --> 00:22:03,380 What about Python? 466 00:22:03,380 --> 00:22:06,290 Any qualms you have with Python as a programming language or things you wish 467 00:22:06,290 --> 00:22:06,915 were different? 468 00:22:06,915 --> 00:22:08,060 DAVID MALAN: Yes! 469 00:22:08,060 --> 00:22:13,430 I think its documentation is not good, certainly 470 00:22:13,430 --> 00:22:15,500 in comparison with something like PHP. 471 00:22:15,500 --> 00:22:17,510 I think Python's documentation really does 472 00:22:17,510 --> 00:22:21,140 assume a more comfortable audience, which I think is to a fault. 473 00:22:21,140 --> 00:22:23,510 I think that the language itself would benefit 474 00:22:23,510 --> 00:22:27,560 from just much more user friendly documentation 475 00:22:27,560 --> 00:22:32,330 and frankly more thorough documentation and more standardization of formatting 476 00:22:32,330 --> 00:22:32,990 thereof. 477 00:22:32,990 --> 00:22:35,360 Right now a lot of the Python libraries essentially 478 00:22:35,360 --> 00:22:37,620 have a paragraph per function. 479 00:22:37,620 --> 00:22:40,130 But it doesn't sometimes make super clear, especially 480 00:22:40,130 --> 00:22:43,852 to a less comfortable student, exactly what all of the arguments are 481 00:22:43,852 --> 00:22:46,310 and what they're definitions of, what the return types are, 482 00:22:46,310 --> 00:22:49,460 what exceptions might be raised from a certain function, 483 00:22:49,460 --> 00:22:53,690 whereas PHP and Java and a few other languages ecosystems 484 00:22:53,690 --> 00:22:56,360 have done I think a much better job at standardizing 485 00:22:56,360 --> 00:22:58,280 how the documentation is presented. 486 00:22:58,280 --> 00:23:02,390 It's perhaps a bit more verbose with [INAUDIBLE] lists or the formatting 487 00:23:02,390 --> 00:23:04,070 with which the information's presented. 488 00:23:04,070 --> 00:23:06,650 But for newbies, I think that's especially important and just 489 00:23:06,650 --> 00:23:10,730 pragmatically helpful to be able to skim documentation in some uniform fashion 490 00:23:10,730 --> 00:23:13,060 and not as in Python, as too often the case, 491 00:23:13,060 --> 00:23:15,230 cross reference one function with another 492 00:23:15,230 --> 00:23:18,290 because it happens to have the same signature or the same arguments 493 00:23:18,290 --> 00:23:19,490 as this other function. 494 00:23:19,490 --> 00:23:22,280 I'd much rather the documentation just duplicate that information 495 00:23:22,280 --> 00:23:24,680 and maybe put it in one canonical place on the server 496 00:23:24,680 --> 00:23:26,960 but have it imported in multiple places. 497 00:23:26,960 --> 00:23:31,110 I think the documentation could be much, much improved. 498 00:23:31,110 --> 00:23:32,870 BRIAN YU: And speaking of documentation, I 499 00:23:32,870 --> 00:23:36,110 know that the canonical reference for C documentation 500 00:23:36,110 --> 00:23:39,410 for functions in the libraries or the function of man pages 501 00:23:39,410 --> 00:23:43,170 that describe as a manual of what the function does and how it works. 502 00:23:43,170 --> 00:23:45,485 But honestly, I found those to be somewhat arcane, too, 503 00:23:45,485 --> 00:23:47,360 especially for students that are just getting 504 00:23:47,360 --> 00:23:49,370 started with C as their first introduction 505 00:23:49,370 --> 00:23:52,573 to programming and understanding, what the functions do and how they work. 506 00:23:52,573 --> 00:23:54,740 They're not the easiest things in the world to read. 507 00:23:54,740 --> 00:23:56,390 DAVID MALAN: Oh, yes, those are perhaps arguably worse. 508 00:23:56,390 --> 00:23:58,310 So you asked me only about Python. 509 00:23:58,310 --> 00:24:00,282 You didn't ask me to impune C as well. 510 00:24:00,282 --> 00:24:02,990 But yes, and that's actually been an issue for us for many years. 511 00:24:02,990 --> 00:24:07,820 And we for many years tried to assuage this concern with our own website, 512 00:24:07,820 --> 00:24:09,500 which was called CS50 reference. 513 00:24:09,500 --> 00:24:11,810 And what we essentially did one summer some years 514 00:24:11,810 --> 00:24:15,260 ago was a number of our teaching fellows, or TFs, 515 00:24:15,260 --> 00:24:19,610 kindly sat down and wrote more user or beginner friendly versions 516 00:24:19,610 --> 00:24:21,020 of various man pages. 517 00:24:21,020 --> 00:24:23,390 And we encourage students to use this web based tool 518 00:24:23,390 --> 00:24:25,370 to look up information in man pages. 519 00:24:25,370 --> 00:24:28,790 Just this summer, though, have we switch to a new tool, which 520 00:24:28,790 --> 00:24:32,450 is CS50's own adaptation of man pages that you 521 00:24:32,450 --> 00:24:36,980 would find on a typical Linux or Mac OS system at man dot CS50 dot io. 522 00:24:36,980 --> 00:24:40,790 So these are those raw, arguably arcane man pages. 523 00:24:40,790 --> 00:24:44,780 But what we've begun to do is, using some fancy regular expressions 524 00:24:44,780 --> 00:24:50,960 or pattern matching, have we've begun to overlay little hints on top 525 00:24:50,960 --> 00:24:53,810 of key words or phrases that tried to explain, 526 00:24:53,810 --> 00:24:56,720 similar in spirit to help 50, another of CS50's tool, 527 00:24:56,720 --> 00:25:00,890 what it is some piece of language or some piece of English 528 00:25:00,890 --> 00:25:02,630 actually means in those man pages. 529 00:25:02,630 --> 00:25:04,100 And it remains to be seen if this is better. 530 00:25:04,100 --> 00:25:05,975 But this is more consistent with what we have 531 00:25:05,975 --> 00:25:09,590 done most recently with help 50, which is we do not hide arcane error 532 00:25:09,590 --> 00:25:10,710 messages from students. 533 00:25:10,710 --> 00:25:13,230 We want them to see the raw output for better or for worse 534 00:25:13,230 --> 00:25:15,980 that they would see in the real world with any number of the tools 535 00:25:15,980 --> 00:25:16,730 that we use. 536 00:25:16,730 --> 00:25:21,570 But we do want to lower the bar to them understanding those error messages. 537 00:25:21,570 --> 00:25:25,610 So in the case help 50, a command line tool that parses program's output 538 00:25:25,610 --> 00:25:30,170 and then translates them to a more human friendly explanations thereof, 539 00:25:30,170 --> 00:25:34,280 man dot CS50 io does the same thing by showing the student first and foremost 540 00:25:34,280 --> 00:25:36,680 the actual if arcane man page. 541 00:25:36,680 --> 00:25:39,140 But we collapse some of the sections that 542 00:25:39,140 --> 00:25:42,230 probably aren't germane to the question the student's trying to answer. 543 00:25:42,230 --> 00:25:46,940 We put little yellow on hover attributes essentially, 544 00:25:46,940 --> 00:25:49,730 or JavaScript tricks, to show them more information 545 00:25:49,730 --> 00:25:53,240 about certain phrases within the documentation 546 00:25:53,240 --> 00:25:55,760 so as to chip away at that problem. 547 00:25:55,760 --> 00:25:59,000 It would be a non-trivial undertaking to do the same for Python. 548 00:25:59,000 --> 00:25:59,940 We're not there yet. 549 00:25:59,940 --> 00:26:01,220 I'm not sure if we will. 550 00:26:01,220 --> 00:26:03,690 But I think there are opportunities in both. 551 00:26:03,690 --> 00:26:06,380 And again, to its credit, I think PHP out of the box 552 00:26:06,380 --> 00:26:07,918 does this very well as a language. 553 00:26:07,918 --> 00:26:09,710 BRIAN YU: You mentioned C's error messages. 554 00:26:09,710 --> 00:26:12,350 And that reminded me of another qualm I have with Python. 555 00:26:12,350 --> 00:26:14,930 Just how long the trace backs when you get 556 00:26:14,930 --> 00:26:17,000 an error message can be that oftentimes, when 557 00:26:17,000 --> 00:26:20,460 students are writing using frameworks like Flask in this class 558 00:26:20,460 --> 00:26:23,790 during their work with Python, when they have an error in their code, 559 00:26:23,790 --> 00:26:26,768 they get this incredibly long trace back that references so many 560 00:26:26,768 --> 00:26:30,060 files that they haven't even written and haven't looked at and haven't touched. 561 00:26:30,060 --> 00:26:32,018 And it can just be difficult when they're first 562 00:26:32,018 --> 00:26:35,480 getting started to look at that and understand what their actual error is 563 00:26:35,480 --> 00:26:38,600 and where they made a mistake as opposed to all of the different places 564 00:26:38,600 --> 00:26:40,410 where the exceptions are happening in the Python code. 565 00:26:40,410 --> 00:26:41,600 DAVID MALAN: Yeah, absolutely. 566 00:26:41,600 --> 00:26:43,350 And the rule of thumb would be, well, look 567 00:26:43,350 --> 00:26:47,380 for the line in that trace back that mentions your file and your file 568 00:26:47,380 --> 00:26:47,880 number. 569 00:26:47,880 --> 00:26:49,790 But we actually do consistent with help 50, 570 00:26:49,790 --> 00:26:52,250 with man dot CS50 dot io, actually do try 571 00:26:52,250 --> 00:26:55,650 these days, the past couple of years, to help students see through that noise, 572 00:26:55,650 --> 00:26:56,660 if you will. 573 00:26:56,660 --> 00:27:01,143 So if you so much in Python as have a single line import CS50 at the top, 574 00:27:01,143 --> 00:27:03,060 you don't actually have to call any functions. 575 00:27:03,060 --> 00:27:09,650 We, doing some technique called monkey patching, importing other libraries 576 00:27:09,650 --> 00:27:11,930 and tweaking certain code settings, are we 577 00:27:11,930 --> 00:27:16,663 able to for those tracebacks highlight in yellow for students only those lines 578 00:27:16,663 --> 00:27:18,830 that pertain to their code, leaving black and white, 579 00:27:18,830 --> 00:27:21,770 the lines of code that are unrelated to theirs, for the same purpose. 580 00:27:21,770 --> 00:27:22,970 We want them to see everything. 581 00:27:22,970 --> 00:27:24,220 We want them to see the noise. 582 00:27:24,220 --> 00:27:27,350 But we want to help them see through the noise as by highlighting, 583 00:27:27,350 --> 00:27:30,482 much like help 50 and man dot CS50 dot io, portions 584 00:27:30,482 --> 00:27:31,940 that are actually relevant to them. 585 00:27:31,940 --> 00:27:34,398 So we hope, too, that this strikes that right pedagogically 586 00:27:34,398 --> 00:27:37,212 balance without sweeping too many details under the rug. 587 00:27:37,212 --> 00:27:39,170 BRIAN YU: And I guess that kind of gets back us 588 00:27:39,170 --> 00:27:42,470 to the why C of letting people feel the noise without sweeping 589 00:27:42,470 --> 00:27:43,610 the details under the rug. 590 00:27:43,610 --> 00:27:45,410 DAVID MALAN: Yeah, so I'd like to think-- 591 00:27:45,410 --> 00:27:48,380 and the more we chat I'm reassured to hear that I think 592 00:27:48,380 --> 00:27:50,390 we're being self consistent throughout. 593 00:27:50,390 --> 00:27:54,152 But this is absolutely thematic. 594 00:27:54,152 --> 00:27:55,860 There have been, though, I'll admit, some 595 00:27:55,860 --> 00:27:59,810 psets in the past where honestly we probably shoehorned in C, 596 00:27:59,810 --> 00:28:01,620 where it just wasn't really very pleasant. 597 00:28:01,620 --> 00:28:03,328 So case in point, a few years ago, we had 598 00:28:03,328 --> 00:28:05,880 a pset that called Server, where students actually needed 599 00:28:05,880 --> 00:28:09,300 to implement the guts of their own web server-- so not a web application, 600 00:28:09,300 --> 00:28:13,650 not a website per se, but an actual program written in C that listens 601 00:28:13,650 --> 00:28:17,010 for HTTP requests and responds to them 602 00:28:17,010 --> 00:28:21,150 the exercise, though, ended up really becoming quite the tedious exercise 603 00:28:21,150 --> 00:28:23,980 in parsing strings in C, which wasn't really 604 00:28:23,980 --> 00:28:26,230 the intent because we do that of course very early on. 605 00:28:26,230 --> 00:28:28,650 And we, too, have CS50 students parse strings 606 00:28:28,650 --> 00:28:30,990 and understand char stars and representation 607 00:28:30,990 --> 00:28:33,360 of strings as arrays of characters. 608 00:28:33,360 --> 00:28:35,280 But this one came later in the semester. 609 00:28:35,280 --> 00:28:39,240 And while it was a wonderful transition, I THINK conceptually from C 610 00:28:39,240 --> 00:28:41,460 to web programming by actually implementing 611 00:28:41,460 --> 00:28:45,750 in C the foundation for web programs, it really 612 00:28:45,750 --> 00:28:50,160 was incredibly tedious to parse HTTP request lines. 613 00:28:50,160 --> 00:28:52,050 That is the sort of thing that is much more 614 00:28:52,050 --> 00:28:55,080 pleasant to do in almost any language, Python among them. 615 00:28:55,080 --> 00:28:58,690 In C it really is a little frustrating. 616 00:28:58,690 --> 00:29:02,760 And so that I felt like now we're using the wrong language even 617 00:29:02,760 --> 00:29:04,778 for the right problem sets. 618 00:29:04,778 --> 00:29:06,570 BRIAN YU: Even in Python I feel like having 619 00:29:06,570 --> 00:29:10,570 to parse the HTTP requests on your own would be a tedious task. 620 00:29:10,570 --> 00:29:13,508 And I'm glad that there are frameworks out there like Flask and Django 621 00:29:13,508 --> 00:29:15,300 that just handle a lot of the stuff for you 622 00:29:15,300 --> 00:29:18,570 and make it easier just to focus on the more interesting parts of web 623 00:29:18,570 --> 00:29:20,070 application design and development. 624 00:29:20,070 --> 00:29:20,730 DAVID MALAN: Oh, yeah, absolutely. 625 00:29:20,730 --> 00:29:22,110 And that's why we use Flask. 626 00:29:22,110 --> 00:29:23,410 We don't implement Flask. 627 00:29:23,410 --> 00:29:25,745 But in other years ago, we had a pset inspired 628 00:29:25,745 --> 00:29:28,620 by Eric Roberts of Stanford called Breakout, which is that old school 629 00:29:28,620 --> 00:29:30,630 game where you can bounce a ball off a paddle 630 00:29:30,630 --> 00:29:33,630 and you try to break these bricks that are colorful in the sky. 631 00:29:33,630 --> 00:29:36,870 So it's actually non-trivial to do that in C, especially 632 00:29:36,870 --> 00:29:39,540 in our environment when you don't necessarily 633 00:29:39,540 --> 00:29:42,630 have a proper C GUE, a Graphical User Environment, 634 00:29:42,630 --> 00:29:46,678 like something called X Windows, because our most recent incarnations 635 00:29:46,678 --> 00:29:48,720 of CS50, of course, have students using web IDEs. 636 00:29:48,720 --> 00:29:49,750 Ideas. 637 00:29:49,750 --> 00:29:52,740 So in a nutshell, back in the day when we were still 638 00:29:52,740 --> 00:29:56,520 using a virtual machine with students, the so-called CS50 Appliance, which 639 00:29:56,520 --> 00:30:00,060 they ran locally in their Macs and PCs, we actually used C and some code 640 00:30:00,060 --> 00:30:03,720 that Eric Roberts had written, which itself was written in Java because Java 641 00:30:03,720 --> 00:30:06,960 allows you to do graphics a little more easily in a way that was also 642 00:30:06,960 --> 00:30:08,430 cross platform. 643 00:30:08,430 --> 00:30:10,980 We had this bridge called the Stanford Portable Library, 644 00:30:10,980 --> 00:30:15,270 or SPL, that allowed students to implement this really cool interactive, 645 00:30:15,270 --> 00:30:21,810 animated game called Breakout in C by using a nice pedagogically simplified 646 00:30:21,810 --> 00:30:22,310 library. 647 00:30:22,310 --> 00:30:25,060 But when we finally transition to more of a web based environment, 648 00:30:25,060 --> 00:30:26,400 that actually complicated that. 649 00:30:26,400 --> 00:30:30,355 So it's been a few years since we've used Breakout as a game. 650 00:30:30,355 --> 00:30:32,480 BRIAN YU: On the other hand, are there any problems 651 00:30:32,480 --> 00:30:35,820 that you think really cleanly in C where the solution in C 652 00:30:35,820 --> 00:30:37,670 is very elegant, very nice? 653 00:30:37,670 --> 00:30:40,615 DAVID MALAN: Mario, I think, lends itself wonderfully to C, 654 00:30:40,615 --> 00:30:42,740 although of course you can probably implement Mario 655 00:30:42,740 --> 00:30:48,080 in like two lines of Python code instead of six to 10 or so in C. 656 00:30:48,080 --> 00:30:50,840 But I think some of the early psets in CS50 657 00:30:50,840 --> 00:30:54,440 could certainly be done pretty easily in C, like Caesar, which 658 00:30:54,440 --> 00:30:58,010 is a problem set on cryptography where students have to encrypt or decrypt 659 00:30:58,010 --> 00:30:58,910 information. 660 00:30:58,910 --> 00:31:01,580 That's nice because it lends itself to pretty simple user input. 661 00:31:01,580 --> 00:31:04,455 You type in a string, and then you iterate over that string character 662 00:31:04,455 --> 00:31:07,970 by character and then do a bit of arithmetic using Ascii or Unicode 663 00:31:07,970 --> 00:31:11,638 values to convert it to the ciphertext or conversely to the plaintext. 664 00:31:11,638 --> 00:31:14,180 So that works perfectly well in C. And even though, yeah, you 665 00:31:14,180 --> 00:31:18,110 could save a few lines in Python, C is perfectly fine for those kinds 666 00:31:18,110 --> 00:31:19,220 of string manipulations. 667 00:31:19,220 --> 00:31:22,820 Game of 15, where we began the story, I think works perfectly fine in C, 668 00:31:22,820 --> 00:31:25,670 even if the GUE might not be as pretty as you might 669 00:31:25,670 --> 00:31:27,575 get more easily with another language. 670 00:31:27,575 --> 00:31:30,200 The spell checking assignment that we do where we give students 671 00:31:30,200 --> 00:31:33,740 like 140,000 words in a big text file, and they have to load it into memory 672 00:31:33,740 --> 00:31:36,890 and implement their own hash table or some other data structure, 673 00:31:36,890 --> 00:31:41,150 that is particularly well suited for C when performance matters to you. 674 00:31:41,150 --> 00:31:44,840 You want to use as few CPU cycles as possible, as a little RAM as possible, 675 00:31:44,840 --> 00:31:47,150 because with the power and capabilities of something 676 00:31:47,150 --> 00:31:52,700 like Python comes a non-zero amount of overhead CPU cycles and bites of RAM 677 00:31:52,700 --> 00:31:56,060 that are wasted on the overhead or the abstractions that it provides. 678 00:31:56,060 --> 00:32:00,200 And in fact, pedagogically what's nice about doing that pset in C 679 00:32:00,200 --> 00:32:03,890 and then reimplementing it a week or two later in Python in lecture 680 00:32:03,890 --> 00:32:06,890 is that students can see, oh my god, you can implement the spell 681 00:32:06,890 --> 00:32:10,970 checker in just a few lines of Python despite 20 hours of coding 682 00:32:10,970 --> 00:32:12,870 in C on that same problem. 683 00:32:12,870 --> 00:32:15,230 But when you compare their performance side by side, 684 00:32:15,230 --> 00:32:17,510 you see that generally the Python version 685 00:32:17,510 --> 00:32:20,690 takes a non-trivial amount of more time to actually run. 686 00:32:20,690 --> 00:32:23,420 And so that's one of the goals even of that. 687 00:32:23,420 --> 00:32:26,420 Of course, I wouldn't want to implement something like CS50 finance in C 688 00:32:26,420 --> 00:32:29,780 or really anything for the web in C. While you can absolutely do it, 689 00:32:29,780 --> 00:32:33,710 it's entirely unpleasant or way more time consuming than you'd want. 690 00:32:33,710 --> 00:32:37,340 So I think finance has always lent itself to PHP in the past 691 00:32:37,340 --> 00:32:40,227 or more recently Python, though it is one 692 00:32:40,227 --> 00:32:43,310 that could be implemented in any number of other languages or frameworks-- 693 00:32:43,310 --> 00:32:45,890 Ruby, C sharp, Java, or others. 694 00:32:45,890 --> 00:32:48,890 BRIAN YU: Now that you mention it, the speller assignment really I think 695 00:32:48,890 --> 00:32:51,265 lends itself well to C because, whereas in Python there's 696 00:32:51,265 --> 00:32:55,410 really just one way to do that right, in C there are so many ways. 697 00:32:55,410 --> 00:32:57,608 And it's been really cool to see students online 698 00:32:57,608 --> 00:33:00,650 that are all tackling the misspellings problem in various different ways, 699 00:33:00,650 --> 00:33:03,770 trying to figure out how to optimize it and make it perform the best. 700 00:33:03,770 --> 00:33:07,115 And we have this online scoreboard where students online can compete 701 00:33:07,115 --> 00:33:08,990 against each other to see how quickly they're 702 00:33:08,990 --> 00:33:11,360 able to spell check these documents by writing 703 00:33:11,360 --> 00:33:13,460 their most efficient implementation in C. 704 00:33:13,460 --> 00:33:15,410 And it's fun to watch people compete in that 705 00:33:15,410 --> 00:33:18,200 and try and figure out what the best way to do it is. 706 00:33:18,200 --> 00:33:21,170 I mean, so suffice it to say, without any discussion about language, 707 00:33:21,170 --> 00:33:23,390 everyone brings to the table his or her own, I think, 708 00:33:23,390 --> 00:33:25,408 preferences work or religious beliefs. 709 00:33:25,408 --> 00:33:27,950 DAVID MALAN: I'd like to think, though, that C has definitely 710 00:33:27,950 --> 00:33:30,500 helped us pedagogically solve some problems, 711 00:33:30,500 --> 00:33:32,900 or it has allowed us to implement a particular vision we 712 00:33:32,900 --> 00:33:37,565 have for the trajectory we want students to have from day 0 to day n minus 1. 713 00:33:37,565 --> 00:33:40,580 It's certainly not without its downsides or costs 714 00:33:40,580 --> 00:33:44,210 in terms of cognitive difficulties or just actual technical bugs 715 00:33:44,210 --> 00:33:45,188 that students run into. 716 00:33:45,188 --> 00:33:47,480 But I'd like to think, certainly after this many years, 717 00:33:47,480 --> 00:33:51,480 that we have enough success stories of students exiting from the course. 718 00:33:51,480 --> 00:33:54,130 And even though they might never want to touch C again, 719 00:33:54,130 --> 00:33:56,630 that's fine because truly, at the end of the semester, where 720 00:33:56,630 --> 00:34:00,740 we end, for instance, here with the so-called CS50 fair and campus 721 00:34:00,740 --> 00:34:02,930 wide exhibition of students final projects, 722 00:34:02,930 --> 00:34:05,870 every year for like 11 years now of CS50 fairs 723 00:34:05,870 --> 00:34:10,520 have I genuinely been struck at how many students go off and implement 724 00:34:10,520 --> 00:34:13,040 something, create something, build something 725 00:34:13,040 --> 00:34:17,690 on topics, on languages, using tools that we ourselves didn't teach them 726 00:34:17,690 --> 00:34:18,590 in the course. 727 00:34:18,590 --> 00:34:21,500 And I would really like to think that is largely attributable, not 728 00:34:21,500 --> 00:34:25,429 to the course per se, but to the scaffolding that the course happens 729 00:34:25,429 --> 00:34:28,400 to provide to students and the takeaways that they hopefully have, 730 00:34:28,400 --> 00:34:32,210 which is that they understand some underlying implementation details that 731 00:34:32,210 --> 00:34:35,270 are perhaps unique to a course like this or the relatively few courses 732 00:34:35,270 --> 00:34:39,440 out there that you start at this level of abstraction that's pretty close to 733 00:34:39,440 --> 00:34:41,960 but just above the level of hardware but exit 734 00:34:41,960 --> 00:34:45,741 with more modern, with more powerful, higher level languages 735 00:34:45,741 --> 00:34:48,949 and techniques that allows a student who three months prior had never touched 736 00:34:48,949 --> 00:34:52,400 a line of code before to make his or her own iPhone application, web 737 00:34:52,400 --> 00:34:53,030 application, 738 00:34:53,030 --> 00:34:54,929 or any number of other tools. 739 00:34:54,929 --> 00:34:58,598 So I like to think we're doing something right here along the way. 740 00:34:58,598 --> 00:35:00,890 BRIAN YU: Yeah, that's also amazing to see at the fair. 741 00:35:00,890 --> 00:35:05,720 DAVID MALAN: But do feel free, as always, out there to agree or disagree. 742 00:35:05,720 --> 00:35:08,720 Feel free to chime in with comments of your own perspective on languages 743 00:35:08,720 --> 00:35:10,700 or your own experiences positive or negative 744 00:35:10,700 --> 00:35:14,670 with C, with CS50, with Python, PHP, and any number of other languages. 745 00:35:14,670 --> 00:35:17,960 And indeed, we invite you to contribute your own ideas 746 00:35:17,960 --> 00:35:19,340 for the podcast moving forward. 747 00:35:19,340 --> 00:35:22,040 Just as this week was inspired by a past Quora post, 748 00:35:22,040 --> 00:35:25,430 last weeks was inspired by a Facebook poll on machine learning. 749 00:35:25,430 --> 00:35:28,742 We'd love to hear what topics are of interest to folks the most. 750 00:35:28,742 --> 00:35:31,700 BRIAN YU: If you have ideas or you have feedback about today's episode, 751 00:35:31,700 --> 00:35:35,640 you can feel free to reach both of us at podcast@csfo.harvard.edu. 752 00:35:35,640 --> 00:35:37,390 DAVID MALAN: Thanks so much for tuning in. 753 00:35:37,390 --> 00:35:39,840 This was the CS50 Podcast.