1 00:00:00,000 --> 00:00:02,000 [Valgrind] 2 00:00:02,000 --> 00:00:05,000 [Nate Hardison, Harvard University] 3 00:00:05,000 --> 00:00:07,000 This is CS50, CS50.TV] 4 00:00:07,000 --> 00:00:10,000 Some of the most difficult bugs in C programs 5 00:00:10,000 --> 00:00:13,000 come from the mismanagement of memory. 6 00:00:13,000 --> 00:00:15,000 There are a huge number of ways to screw things up, 7 00:00:15,000 --> 00:00:17,000 including allocating the wrong amount of memory, 8 00:00:17,000 --> 00:00:20,000 forgetting to initialize variables, 9 00:00:20,000 --> 00:00:23,000 writing before or after the end of a buffer, 10 00:00:23,000 --> 00:00:25,000 and freeing keep memory multiple times. 11 00:00:25,000 --> 00:00:28,000 The symptoms range from intermittent crashes 12 00:00:28,000 --> 00:00:30,000 to mysteriously overwritten values, 13 00:00:30,000 --> 00:00:34,000 often at places and times far removed from the original error. 14 00:00:34,000 --> 00:00:37,000 Tracing the observed problem back to the underlying root cause 15 00:00:37,000 --> 00:00:39,000 can be challenging, 16 00:00:39,000 --> 00:00:42,000 but fortunately there's a helpful program called Valgrind 17 00:00:42,000 --> 00:00:44,000 that can do a lot to help. 18 00:00:44,000 --> 00:00:47,000 >> You run a program under Valgrind to enable 19 00:00:47,000 --> 00:00:50,000 extensive checking of heap memory allocations and accesses. 20 00:00:50,000 --> 00:00:53,000 When Valgrind detects a problem, it gives you immediate, 21 00:00:53,000 --> 00:00:56,000 direct information that allows you to 22 00:00:56,000 --> 00:00:58,000 more easily find and fix the problem. 23 00:00:58,000 --> 00:01:01,000 Valgrind also reports on less deadly memory issues, 24 00:01:01,000 --> 00:01:04,000 such as memory leaks, allocating heap memory, 25 00:01:04,000 --> 00:01:07,000 and forgetting to free it. 26 00:01:07,000 --> 00:01:10,000 Like our compiler, Clang, in our debugger, GDB, 27 00:01:10,000 --> 00:01:14,000 Valgrind is free software, and it is installed on the appliance. 28 00:01:14,000 --> 00:01:16,000 Valgrind runs on your binary executable, 29 00:01:16,000 --> 00:01:20,000 not your .c or .h source code files, 30 00:01:20,000 --> 00:01:23,000 so be sure you have compiled an up-to-date copy of your program 31 00:01:23,000 --> 00:01:25,000 using Clang or Make. 32 00:01:25,000 --> 00:01:28,000 Then, running your program under Valgrind can be 33 00:01:28,000 --> 00:01:32,000 as simple as just prefixing the standard program command with the word Valgrind, 34 00:01:32,000 --> 00:01:35,000 which starts up Valgrind and runs the program inside of it. 35 00:01:35,000 --> 00:01:38,000 When starting, Valgrind does some complex 36 00:01:38,000 --> 00:01:41,000 jiggering to configure the executable for the memory checks, 37 00:01:41,000 --> 00:01:44,000 so it can take a bit to get up and running. 38 00:01:44,000 --> 00:01:48,000 The program will then execute normally, be it much more slowly, 39 00:01:48,000 --> 00:01:52,000 and when it finishes, Valgrind will print a summary of its memory usage. 40 00:01:52,000 --> 00:01:58,000 If all goes well, it will look something like this: 41 00:01:58,000 --> 00:02:01,000 In this case, ./clean_program 42 00:02:01,000 --> 00:02:04,000 is the path to the program I want to run. 43 00:02:04,000 --> 00:02:06,000 And while this one doesn't take any arguments, 44 00:02:06,000 --> 00:02:09,000 if it did I'd just tack them on to the end of the command as usual. 45 00:02:09,000 --> 00:02:12,000 Clean program is just a silly little program I created 46 00:02:12,000 --> 00:02:15,000 that allocates space for a block of ints on the heap, 47 00:02:15,000 --> 00:02:19,000 put some values inside of them, and frees the whole block. 48 00:02:19,000 --> 00:02:23,000 This is what you're shooting for, no errors and no leaks. 49 00:02:23,000 --> 00:02:27,000 >> Another important metric is the total number of bytes allocated. 50 00:02:27,000 --> 00:02:32,000 Depending on the program, if your allocations are in the megabytes or higher, 51 00:02:32,000 --> 00:02:34,000 you're probably doing something wrong. 52 00:02:34,000 --> 00:02:37,000 Are you unnecessarily storing duplicates? 53 00:02:37,000 --> 00:02:40,000 Are you using the heap for storage, when it would be better to use the stack? 54 00:02:40,000 --> 00:02:43,000 So, memory errors can be truly evil. 55 00:02:43,000 --> 00:02:46,000 The more overt ones cause spectacular crashes, 56 00:02:46,000 --> 00:02:49,000 but even then it can still be hard to pinpoint 57 00:02:49,000 --> 00:02:51,000 what exactly led to the crash. 58 00:02:51,000 --> 00:02:54,000 More insidiously, a program with a memory error 59 00:02:54,000 --> 00:02:56,000 can still compile cleanly 60 00:02:56,000 --> 00:02:58,000 and can still seem to work correctly 61 00:02:58,000 --> 00:03:01,000 because you managed to get lucky most of the time. 62 00:03:01,000 --> 00:03:04,000 After several "successful outcomes," 63 00:03:04,000 --> 00:03:07,000 you might just think that a crash is a fluke of the computer, 64 00:03:07,000 --> 00:03:10,000 but the computer is never wrong. 65 00:03:10,000 --> 00:03:13,000 >> Running Valgrind can help you track down the cause of visible memory errors 66 00:03:13,000 --> 00:03:18,000 as well as find lurking errors you don't even yet know about. 67 00:03:18,000 --> 00:03:22,000 Each time Valgrind detects a problem, it prints information about what it observed. 68 00:03:22,000 --> 00:03:24,000 Each item is fairly terse-- 69 00:03:24,000 --> 00:03:27,000 the source line of the offending instruction, what the issue is, 70 00:03:27,000 --> 00:03:30,000 and a little info about the memory involved-- 71 00:03:30,000 --> 00:03:34,000 but often it's enough information to direct your attention to the right place. 72 00:03:34,000 --> 00:03:37,000 Here is an example of Valgrind running on a buggy program 73 00:03:37,000 --> 00:03:40,000 that does an invalid read of heap memory. 74 00:03:40,000 --> 00:03:49,000 We see no errors or warnings in compilation. 75 00:03:49,000 --> 00:03:53,000 Uh-oh, the error summary says that there are two errors-- 76 00:03:53,000 --> 00:03:56,000 two invalid reads of size 4--bytes, that is. 77 00:03:56,000 --> 00:04:01,000 Both bad reads occurred in the main function of invalid_read.c, 78 00:04:01,000 --> 00:04:04,000 the first on line 16 and the second on line 19. 79 00:04:04,000 --> 00:04:06,000 Let's look at the code. 80 00:04:06,000 --> 00:04:11,000 Looks like the first call to printf tries to read one int past the end of our memory block. 81 00:04:11,000 --> 00:04:13,000 If we look back at Valgrind's output, 82 00:04:13,000 --> 00:04:16,000 we see that Valgrind told us exactly that. 83 00:04:16,000 --> 00:04:19,000 The address we're trying to read starts 0 bytes 84 00:04:19,000 --> 00:04:22,000 past the end of the block of size 16 bytes-- 85 00:04:22,000 --> 00:04:25,000 four 32-bit ints that we allocated. 86 00:04:25,000 --> 00:04:29,000 That is, the address we were trying to read starts right at the end of our block, 87 00:04:29,000 --> 00:04:32,000 just as we see in our bad printf call. 88 00:04:32,000 --> 00:04:36,000 Now, invalid reads might not seem like that big of a deal, 89 00:04:36,000 --> 00:04:39,000 but if you're using that data to control the flow of your program-- 90 00:04:39,000 --> 00:04:42,000 for example, as part of an if statement or loop-- 91 00:04:42,000 --> 00:04:45,000 then things can silently go bad. 92 00:04:45,000 --> 00:04:47,000 Watch how I can run the invalid_read program 93 00:04:47,000 --> 00:04:50,000 and nothing out of the ordinary happens. 94 00:04:50,000 --> 00:04:52,000 Scary, huh? 95 00:04:52,000 --> 00:04:56,000 >> Now, let's look at some more kinds of errors that you might encounter in your code, 96 00:04:56,000 --> 00:04:59,000 and we'll see how Valgrind detects them. 97 00:04:59,000 --> 00:05:01,000 We just saw an example of an invalid_read, 98 00:05:01,000 --> 00:05:04,000 so now let's check out an invalid_write. 99 00:05:04,000 --> 00:05:09,000 Again, no errors or warnings in compilation. 100 00:05:09,000 --> 00:05:12,000 Okay, Valgrind says that there are two errors in this program-- 101 00:05:12,000 --> 00:05:15,000 and invalid_write and an invalid_read. 102 00:05:15,000 --> 00:05:18,000 Let's check out this code. 103 00:05:18,000 --> 00:05:21,000 Looks like we've got an instance of the classic strlen plus one bug. 104 00:05:21,000 --> 00:05:24,000 The code doesn't malloc an extra byte of space 105 00:05:24,000 --> 00:05:26,000 for the /0 character, 106 00:05:26,000 --> 00:05:30,000 so when str copy went to write it at ssubstrlen "cs50 rocks!" 107 00:05:30,000 --> 00:05:33,000 it wrote 1 byte past the end of our block. 108 00:05:33,000 --> 00:05:36,000 The invalid_read comes when we make our call to printf. 109 00:05:36,000 --> 00:05:40,000 Printf ends up reading invalid memory when it reads the /0 character 110 00:05:40,000 --> 00:05:43,000 as it looks at the end of this E string it's printing. 111 00:05:43,000 --> 00:05:45,000 But none of this escaped Valgrind. 112 00:05:45,000 --> 00:05:48,000 We see that it caught the invalid_write as part of the str copy 113 00:05:48,000 --> 00:05:51,000 on line 11 of main, and the invalid_read is part of printf. 114 00:05:51,000 --> 00:05:54,000 Rock on, Valgrind. 115 00:05:54,000 --> 00:05:57,000 Again, this might not seem like a big deal. 116 00:05:57,000 --> 00:06:00,000 We can run this program over and over outside of Valgrind 117 00:06:00,000 --> 00:06:03,000 and not see any error symptoms. 118 00:06:03,000 --> 00:06:06,000 >> However, let's look at a slight variation of this to see 119 00:06:06,000 --> 00:06:09,000 how things can get really bad. 120 00:06:09,000 --> 00:06:14,000 So, granted, we are abusing things more than just a bit in this code. 121 00:06:14,000 --> 00:06:17,000 We're only allocating space on the heap for two strings 122 00:06:17,000 --> 00:06:19,000 the length of cs50 rocks, 123 00:06:19,000 --> 00:06:22,000 this time, remembering the /0 character. 124 00:06:22,000 --> 00:06:25,000 But then we throw in a super-long string into the memory block 125 00:06:25,000 --> 00:06:27,000 that S is pointing to. 126 00:06:27,000 --> 00:06:30,000 What effect will that have on the memory block that T points to? 127 00:06:30,000 --> 00:06:34,000 Well, if T points to memory that's just adjacent to S, 128 00:06:34,000 --> 00:06:37,000 coming just after it, 129 00:06:37,000 --> 00:06:39,000 then we might have written over part of T. 130 00:06:39,000 --> 00:06:41,000 Let's run this code. 131 00:06:41,000 --> 00:06:43,000 Look at what happened. 132 00:06:43,000 --> 00:06:47,000 The strings we stored in our heap blocks both appeared to have printed out correctly. 133 00:06:47,000 --> 00:06:49,000 Nothing seems wrong at all. 134 00:06:49,000 --> 00:06:52,000 However, let's go back into our code and 135 00:06:52,000 --> 00:06:55,000 comment out the line where we copy cs50 rocks 136 00:06:55,000 --> 00:06:59,000 into the second memory block, pointed to by t. 137 00:06:59,000 --> 00:07:02,000 Now, when we run this code we should 138 00:07:02,000 --> 00:07:06,000 only see the contents of the first memory block print out. 139 00:07:06,000 --> 00:07:09,000 Whoa, even though we didn't str copy 140 00:07:09,000 --> 00:07:12,000 any characters into the second heap block, the one pointed to by T, 141 00:07:12,000 --> 00:07:15,000 we get a print out. 142 00:07:15,000 --> 00:07:18,000 Indeed, the string we stuffed into our first block 143 00:07:18,000 --> 00:07:21,000 overran the first block and into the second block, 144 00:07:21,000 --> 00:07:23,000 making everything seem normal. 145 00:07:23,000 --> 00:07:26,000 Valgrind, though, tells us the true story. 146 00:07:26,000 --> 00:07:28,000 There we go. 147 00:07:28,000 --> 00:07:32,000 All of those invalid reads and writes. 148 00:07:32,000 --> 00:07:36,000 >> Let's look at an example of another kind of error. 149 00:07:36,000 --> 00:07:39,000 Here we do something rather unfortunate. 150 00:07:39,000 --> 00:07:41,000 We grab space for an int on the heap, 151 00:07:41,000 --> 00:07:45,000 and we initialize an int pointer--p--to point to that space. 152 00:07:45,000 --> 00:07:48,000 However, while our pointer is initialized, 153 00:07:48,000 --> 00:07:52,000 the data that it's pointing to just has whatever junk is in that part of the heap. 154 00:07:52,000 --> 00:07:55,000 So when we load that data into int i, 155 00:07:55,000 --> 00:07:57,000 we technically initialize i, 156 00:07:57,000 --> 00:08:00,000 but we do so with junk data. 157 00:08:00,000 --> 00:08:03,000 The call to assert, which is a handy debugging macro 158 00:08:03,000 --> 00:08:06,000 defined in the aptly named assert library, 159 00:08:06,000 --> 00:08:09,000 will abort the program if its test condition fails. 160 00:08:09,000 --> 00:08:11,000 That is, if i is not 0. 161 00:08:11,000 --> 00:08:14,000 Depending on what was in the heap space, pointed to by p, 162 00:08:14,000 --> 00:08:18,000 this program might work sometimes and fail at other times. 163 00:08:18,000 --> 00:08:20,000 If it works, we're just getting lucky. 164 00:08:20,000 --> 00:08:24,000 The compiler won't catch this error, but Valgrind sure will. 165 00:08:24,000 --> 00:08:28,000 There we see the error stemming from our use of that junk data. 166 00:08:28,000 --> 00:08:32,000 >> When you allocate heap memory but don't deallocate it or free it, 167 00:08:32,000 --> 00:08:34,000 that is called a leak. 168 00:08:34,000 --> 00:08:37,000 For a small, short-lived program that runs and immediately exits, 169 00:08:37,000 --> 00:08:39,000 leaks are fairly harmless, 170 00:08:39,000 --> 00:08:42,000 but for a project of larger size and/or longevity, 171 00:08:42,000 --> 00:08:46,000 even a small leak can compound into something major. 172 00:08:46,000 --> 00:08:49,000 For CS50, we do expect you to 173 00:08:49,000 --> 00:08:51,000 take care of freeing all of the heap memory that you allocate, 174 00:08:51,000 --> 00:08:54,000 since we want you to build the skills to properly handle the manual process 175 00:08:54,000 --> 00:08:56,000 required by C. 176 00:08:56,000 --> 00:08:59,000 To do so, your program should have an exact 177 00:08:59,000 --> 00:09:03,000 one-to-one correspondence between malloc and free calls. 178 00:09:03,000 --> 00:09:06,000 Fortunately, Valgrind can help you with memory leaks too. 179 00:09:06,000 --> 00:09:09,000 Here is a leaky program called leak.c that allocates 180 00:09:09,000 --> 00:09:13,000 space on the heap, writes to it, but doesn't free it. 181 00:09:13,000 --> 00:09:16,000 We compile it with Make and run it under Valgrind, 182 00:09:16,000 --> 00:09:18,000 and we see that, while we have no memory errors, 183 00:09:18,000 --> 00:09:20,000 we do have one leak. 184 00:09:20,000 --> 00:09:23,000 There are 16 bytes definitely lost, 185 00:09:23,000 --> 00:09:27,000 meaning that the pointer to that memory wasn't in scope when the program exited. 186 00:09:27,000 --> 00:09:30,000 Now, Valgrind doesn't give us a ton of information about the leak, 187 00:09:30,000 --> 00:09:35,000 but if we follow this little note that it gives down towards the bottom of its report 188 00:09:35,000 --> 00:09:38,000 to rerun with --leak-check=full 189 00:09:38,000 --> 00:09:41,000 to see the full details of leaked memory, 190 00:09:41,000 --> 00:09:44,000 we'll get more information. 191 00:09:44,000 --> 00:09:46,000 Now, in the heap summary, 192 00:09:46,000 --> 00:09:50,000 Valgrind tells us where the memory that was lost was initially allocated. 193 00:09:50,000 --> 00:09:52,000 Just as we know from looking in the source code, 194 00:09:52,000 --> 00:09:55,000 Valgrind informs us that we leaked the memory 195 00:09:55,000 --> 00:09:58,000 allocated with a call to malloc on line 8 of leak.c 196 00:09:58,000 --> 00:10:00,000 in the main function. 197 00:10:00,000 --> 00:10:02,000 Pretty nifty. 198 00:10:02,000 --> 00:10:04,000 >> Valgrind categorizes leaks using these terms: 199 00:10:04,000 --> 00:10:07,000 Definitely lost--this is heap allocated memory 200 00:10:07,000 --> 00:10:10,000 to which the program no longer has a pointer. 201 00:10:10,000 --> 00:10:14,000 Valgrind knows that you once had the pointer but have since lost track of it. 202 00:10:14,000 --> 00:10:17,000 This memory is definitely leaked. 203 00:10:17,000 --> 00:10:20,000 Indirectly lost--this is heap allocated memory 204 00:10:20,000 --> 00:10:24,000 to which the only pointers to it also are lost. 205 00:10:24,000 --> 00:10:27,000 For example, if you lost your pointer to the first node of a linked list, 206 00:10:27,000 --> 00:10:30,000 then the first node itself would be definitely lost, 207 00:10:30,000 --> 00:10:34,000 while any subsequent nodes would be indirectly lost. 208 00:10:34,000 --> 00:10:37,000 Possibly lost--this is heap allocated memory 209 00:10:37,000 --> 00:10:41,000 to which Valgrind cannot be sure whether there is a pointer or not. 210 00:10:41,000 --> 00:10:44,000 Still reachable is heap allocated memory 211 00:10:44,000 --> 00:10:47,000 to which the program still has a pointer at exit, 212 00:10:47,000 --> 00:10:50,000 which typically means that a global variable points to it. 213 00:10:50,000 --> 00:10:53,000 To check for these leaks, you'll also have to include the option 214 00:10:53,000 --> 00:10:55,000 --still-reachable=yes 215 00:10:55,000 --> 00:10:58,000 in your invocation of Valgrind. 216 00:10:58,000 --> 00:11:01,000 >> These different cases might require different strategies for cleaning them up, 217 00:11:01,000 --> 00:11:05,000 but leaks should be eliminated. 218 00:11:05,000 --> 00:11:08,000 Unfortunately, fixing leaks can be hard to do, 219 00:11:08,000 --> 00:11:11,000 since incorrect calls to free can blow up your program. 220 00:11:11,000 --> 00:11:14,000 For example, if we look at invalid_free.c, 221 00:11:14,000 --> 00:11:18,000 we see an example of bad memory deallocation. 222 00:11:18,000 --> 00:11:21,000 What should be a single call to free the entire block 223 00:11:21,000 --> 00:11:24,000 of memory pointed to by int_block, 224 00:11:24,000 --> 00:11:27,000 has instead become an attempt to free each int-sized section 225 00:11:27,000 --> 00:11:29,000 of the memory individually. 226 00:11:29,000 --> 00:11:32,000 This will fail catastrophically. 227 00:11:32,000 --> 00:11:34,000 Boom! What an error. 228 00:11:34,000 --> 00:11:36,000 This is definitely not good. 229 00:11:36,000 --> 00:11:39,000 If you're stuck with this kind of error, though, and you don't know where to look, 230 00:11:39,000 --> 00:11:41,000 fall back on your new best friend. 231 00:11:41,000 --> 00:11:44,000 You guessed it--Valgrind. 232 00:11:44,000 --> 00:11:47,000 Valgrind, as always, knows exactly what's up. 233 00:11:47,000 --> 00:11:50,000 The alloc and free counts don't match up. 234 00:11:50,000 --> 00:11:52,000 We've got 1 alloc and 4 frees. 235 00:11:52,000 --> 00:11:55,000 And Valgrind also tells us where the first bad free call-- 236 00:11:55,000 --> 00:11:58,000 the one that triggered the blowup--is coming from-- 237 00:11:58,000 --> 00:12:00,000 line 16. 238 00:12:00,000 --> 00:12:03,000 As you see, bad calls to free are really bad, 239 00:12:03,000 --> 00:12:05,000 so we recommend letting your program leak 240 00:12:05,000 --> 00:12:08,000 while you're working on getting the functionality correct. 241 00:12:08,000 --> 00:12:12,000 Start looking for leaks only after your program is working properly, 242 00:12:12,000 --> 00:12:14,000 without any other errors. 243 00:12:14,000 --> 00:12:16,000 >> And that's all we've got for this video. 244 00:12:16,000 --> 00:12:18,000 Now what are you waiting for? 245 00:12:18,000 --> 00:12:21,000 Go run Valgrind on your programs right now. 246 00:12:21,000 --> 00:12:25,000 My name is Nate Hardison. This is CS50. [CS50.TV]