1 00:00:00,000 --> 00:00:13,000 2 00:00:13,000 --> 00:00:15,890 >> ROB BOWDEN: I'm Rob, and let's get cracking. 3 00:00:15,890 --> 00:00:19,390 So remember from the pset spec that we're going to be needing to use the 4 00:00:19,390 --> 00:00:20,890 crypt function. 5 00:00:20,890 --> 00:00:26,330 For the man page, we have two hash define _xopensource. 6 00:00:26,330 --> 00:00:28,290 Don't worry about why we need to do that. 7 00:00:28,290 --> 00:00:31,550 And also hash include unistd.h. 8 00:00:31,550 --> 00:00:35,920 >> So once that's out of the way, let's get to the actual program. 9 00:00:35,920 --> 00:00:39,570 First thing we need to do is make sure the user entered a valid encrypted 10 00:00:39,570 --> 00:00:41,520 password at the command line. 11 00:00:41,520 --> 00:00:46,050 Remember that the program is supposed to be run like dot slash crack, and 12 00:00:46,050 --> 00:00:48,120 then encrypted string. 13 00:00:48,120 --> 00:00:52,990 >> So here we're checking to make sure that argc to two if we want to 14 00:00:52,990 --> 00:00:54,380 continue with the program. 15 00:00:54,380 --> 00:00:58,830 If argc is not two, that means either the user didn't enter an encrypted 16 00:00:58,830 --> 00:01:02,560 password at the command line, or they entered more than just the encrypted 17 00:01:02,560 --> 00:01:05,379 password at the command line, in which case we don't know what to do with the 18 00:01:05,379 --> 00:01:07,660 command line arguments. 19 00:01:07,660 --> 00:01:11,390 >> So if argc was two, we can continue. 20 00:01:11,390 --> 00:01:14,160 And here, we're going to declare a variable encrypted. 21 00:01:14,160 --> 00:01:17,650 That's just going to alias the original argv1 so that throughout this 22 00:01:17,650 --> 00:01:20,690 program, we don't have to call it argv1, which then you have to think 23 00:01:20,690 --> 00:01:22,950 about what that actually meant. 24 00:01:22,950 --> 00:01:27,180 >> So finally, we want to validate that the encrypted password the user 25 00:01:27,180 --> 00:01:30,840 entered could have actually been an encrypted password. 26 00:01:30,840 --> 00:01:35,120 Per the man page of crypt, the encrypted password must be 13 27 00:01:35,120 --> 00:01:36,440 characters long. 28 00:01:36,440 --> 00:01:41,500 Up here, notice that we hash defined encrypt length as 13. 29 00:01:41,500 --> 00:01:46,140 So we're just making sure that the string length of the encrypted 30 00:01:46,140 --> 00:01:49,090 password is 13. 31 00:01:49,090 --> 00:01:52,280 >> And if it's not, we want to exit the program. 32 00:01:52,280 --> 00:01:56,470 So once that's out of the way, we can now actually try to find what the 33 00:01:56,470 --> 00:02:00,410 password that gave the encrypted password was. 34 00:02:00,410 --> 00:02:04,870 Here, we want to grab the salt from the encrypted password. 35 00:02:04,870 --> 00:02:08,930 Remember, per the man page, that the first two characters of an encrypted 36 00:02:08,930 --> 00:02:10,590 string, like here-- 37 00:02:10,590 --> 00:02:12,770 50ZPJ and so on-- 38 00:02:12,770 --> 00:02:16,170 the first two characters give us the salt that was used 39 00:02:16,170 --> 00:02:18,080 in the crypt function. 40 00:02:18,080 --> 00:02:21,740 >> And here, we see that the salt was ha. 41 00:02:21,740 --> 00:02:27,610 So we want to copy the first two characters, salt length being hash 42 00:02:27,610 --> 00:02:30,230 defined as two. 43 00:02:30,230 --> 00:02:35,970 We have to copy the first two characters into this array, salt. 44 00:02:35,970 --> 00:02:39,340 Notice that we need salt length plus one, since we still need a null 45 00:02:39,340 --> 00:02:42,440 terminator at the end of our salt. 46 00:02:42,440 --> 00:02:46,940 >> Then we're going to declare this array, guest, of size max length plus 47 00:02:46,940 --> 00:02:51,930 one, where max length is hash defined as eight, since the maximum password 48 00:02:51,930 --> 00:02:55,090 is eight characters long. 49 00:02:55,090 --> 00:02:59,860 And we're going to use this to iterate over all possible strings that could 50 00:02:59,860 --> 00:03:01,430 be valid passwords. 51 00:03:01,430 --> 00:03:07,720 So if the valid characters in a password were just a, b, and c, then 52 00:03:07,720 --> 00:03:14,970 we would iterate over a, b, c, aa, ba, ca, and so on, until 53 00:03:14,970 --> 00:03:16,690 we get to see cccccccc-- 54 00:03:16,690 --> 00:03:19,600 eight c's. 55 00:03:19,600 --> 00:03:23,620 >> And if we have not down a valid password, then we need to say that the 56 00:03:23,620 --> 00:03:26,590 encrypted string wasn't valid to begin with. 57 00:03:26,590 --> 00:03:29,970 So now, we reach this while 1 loop. 58 00:03:29,970 --> 00:03:33,100 Notice that means it's an infinite loop. 59 00:03:33,100 --> 00:03:36,430 >> Notice there are no break statement inside of this infinite loop. 60 00:03:36,430 --> 00:03:38,570 There are only return statements. 61 00:03:38,570 --> 00:03:41,210 So we never actually expect to exit the loop. 62 00:03:41,210 --> 00:03:44,750 We only expect to exit the program. 63 00:03:44,750 --> 00:03:48,220 I've added this print statement to the top of this loop to just print out 64 00:03:48,220 --> 00:03:51,790 what our current guess at what the password is. 65 00:03:51,790 --> 00:03:53,630 >> Now, what is this loop doing? 66 00:03:53,630 --> 00:03:58,330 It's looping over all possible strings that could be valid passwords. 67 00:03:58,330 --> 00:04:02,700 The first thing we're going to do is take our current guess for what the 68 00:04:02,700 --> 00:04:03,920 password is. 69 00:04:03,920 --> 00:04:07,230 We'll take the salt that we grabbed from the encrypted string, and we're 70 00:04:07,230 --> 00:04:09,850 going to encrypt the guess. 71 00:04:09,850 --> 00:04:14,760 This will give us an encrypted guess, which we're going to compare against 72 00:04:14,760 --> 00:04:18,810 the encrypted string that the user entered at the command line. 73 00:04:18,810 --> 00:04:23,030 >> If they are the same, in which case string comparable will return zero, if 74 00:04:23,030 --> 00:04:28,050 they're the same, then guess was the password that generated the encrypted 75 00:04:28,050 --> 00:04:33,520 string, in which case we can print that as our password and return. 76 00:04:33,520 --> 00:04:37,520 But if they weren't the same, that means our guess was incorrect. 77 00:04:37,520 --> 00:04:43,250 >> And we want to iterate to the next valid guess. 78 00:04:43,250 --> 00:04:46,410 So that's what this while loop is trying to do. 79 00:04:46,410 --> 00:04:51,760 It's going to iterate our guess to the next valid guess. 80 00:04:51,760 --> 00:04:56,080 Notice that when we say that a particular character in our guess has 81 00:04:56,080 --> 00:05:01,770 reached the max symbol, which up here is hash defined as a tilde, since 82 00:05:01,770 --> 00:05:05,710 that's the largest ASCII value character that a user can enter at the 83 00:05:05,710 --> 00:05:11,210 keyboard, when the character reaches the max symbol, then we want to send 84 00:05:11,210 --> 00:05:17,150 it back to the minimum symbol, which is a space, again the lowest ASCII 85 00:05:17,150 --> 00:05:20,800 value symbol that a user can enter at the keyboard. 86 00:05:20,800 --> 00:05:22,940 >> So we're going to set that to the minimum symbol. 87 00:05:22,940 --> 00:05:25,720 And then we're going to go on to the next character. 88 00:05:25,720 --> 00:05:28,730 So how are our guesses going to iterate? 89 00:05:28,730 --> 00:05:33,685 Well, if the valid characters are a, b, and c, then if we started with a, 90 00:05:33,685 --> 00:05:36,630 it'll iterate to b, it'll iterate to c. 91 00:05:36,630 --> 00:05:44,360 c is our max symbol, so we'll set c back to a, the minimum symbol. 92 00:05:44,360 --> 00:05:48,100 And then we'll iterate index to the next character. 93 00:05:48,100 --> 00:05:53,920 >> So if the original guess was c, the next character is going to be the null 94 00:05:53,920 --> 00:05:55,560 terminator. 95 00:05:55,560 --> 00:06:00,670 Down here, notice that if the character that we now want to 96 00:06:00,670 --> 00:06:04,690 increment was the null terminator, then we're going to set it to the 97 00:06:04,690 --> 00:06:06,260 minimum symbol. 98 00:06:06,260 --> 00:06:11,431 So if the guess was c, then our new guess is going to be aa. 99 00:06:11,431 --> 00:06:16,050 And if our original guess was cccc, then our new guess 100 00:06:16,050 --> 00:06:18,380 is going to be aaaaa. 101 00:06:18,380 --> 00:06:24,430 >> So whenever we reach the maximum string of a given length, then we're 102 00:06:24,430 --> 00:06:29,090 going to implement to the minimum string of the next length, which will 103 00:06:29,090 --> 00:06:34,420 just be all characters of the minimum symbol. 104 00:06:34,420 --> 00:06:36,970 Now, what is this check doing here? 105 00:06:36,970 --> 00:06:42,780 Well, if index moved from the eighth character to the nine character-- 106 00:06:42,780 --> 00:06:46,460 so we add eight c's as our previous guess-- 107 00:06:46,460 --> 00:06:51,270 then index is going to focus on the last null terminator of our guess 108 00:06:51,270 --> 00:06:57,990 array, which is not meant to actually be used in our password. 109 00:06:57,990 --> 00:07:03,530 >> So if we are focused on that last null terminator, then we haven't found a 110 00:07:03,530 --> 00:07:07,750 password that's valid using just eight characters, which means there is no 111 00:07:07,750 --> 00:07:10,550 valid password that encrypts to the given string. 112 00:07:10,550 --> 00:07:13,520 And we have to print that, saying we couldn't find a valid 113 00:07:13,520 --> 00:07:16,100 password, and return. 114 00:07:16,100 --> 00:07:20,280 So this while loop is going to iterate over all possible strings. 115 00:07:20,280 --> 00:07:24,640 >> If it finds any that encrypts to the expected encrypted string, it'll 116 00:07:24,640 --> 00:07:26,190 return that password. 117 00:07:26,190 --> 00:07:29,610 And it it doesn't find anything, then it will return, printing that it 118 00:07:29,610 --> 00:07:31,910 wasn't able to find anything. 119 00:07:31,910 --> 00:07:39,220 Now, notice that iterating over all possible strings is probably going to 120 00:07:39,220 --> 00:07:40,420 take a while. 121 00:07:40,420 --> 00:07:43,590 Let's actually see how long that takes. 122 00:07:43,590 --> 00:07:47,230 >> Let's make crack. 123 00:07:47,230 --> 00:07:51,050 Well, oops-- it says undefined reference to crypt. 124 00:07:51,050 --> 00:07:55,330 So remember, for the p sets spec and also the man page for crypt that we 125 00:07:55,330 --> 00:07:58,130 need to link in crypt. 126 00:07:58,130 --> 00:08:01,130 Now, the default make command doesn't know that you 127 00:08:01,130 --> 00:08:03,010 want to use that function. 128 00:08:03,010 --> 00:08:09,680 >> So let's copy this client command and just add on to the end 129 00:08:09,680 --> 00:08:13,300 of it, linking crypt. 130 00:08:13,300 --> 00:08:14,820 Now, it compiles. 131 00:08:14,820 --> 00:08:23,880 So let's run crack on a given encrypted string-- 132 00:08:23,880 --> 00:08:25,130 so Caesar's. 133 00:08:25,130 --> 00:08:28,690 134 00:08:28,690 --> 00:08:30,790 So that was pretty fast. 135 00:08:30,790 --> 00:08:33,230 >> Notice that this ended on 13. 136 00:08:33,230 --> 00:08:38,240 Well, Caesar's encrypted password happens to be 13. 137 00:08:38,240 --> 00:08:41,650 So let's try another password. 138 00:08:41,650 --> 00:08:45,830 Let's take Hirschhorn's encrypted password and try cracking that. 139 00:08:45,830 --> 00:08:51,750 140 00:08:51,750 --> 00:08:55,110 >> So notice we've already reached three characters. 141 00:08:55,110 --> 00:08:58,660 And we're iterating over all possible three-character strings. 142 00:08:58,660 --> 00:09:01,420 That means we've already finish iterating over all possible one and 143 00:09:01,420 --> 00:09:04,660 two character strings. 144 00:09:04,660 --> 00:09:09,180 Now, it looks like this is going to take a while before we reach the 145 00:09:09,180 --> 00:09:10,580 four-character strings. 146 00:09:10,580 --> 00:09:14,680 It might take a couple of minutes. 147 00:09:14,680 --> 00:09:16,055 >> It didn't take a couple of minutes. 148 00:09:16,055 --> 00:09:18,450 We're on the four-character strings. 149 00:09:18,450 --> 00:09:22,800 But now, we need to iterate over all possible four-character strings, which 150 00:09:22,800 --> 00:09:26,000 that might take maybe 10 minutes. 151 00:09:26,000 --> 00:09:28,720 And then when we reach five character strings, we need to iterate over all 152 00:09:28,720 --> 00:09:31,450 of those, which might take a couple hours. 153 00:09:31,450 --> 00:09:34,080 And we need to iterate over all possible six-character strings, which 154 00:09:34,080 --> 00:09:36,560 might take a couple days and so on. 155 00:09:36,560 --> 00:09:41,380 >> So it could take a potentially very long time to iterate over all possible 156 00:09:41,380 --> 00:09:44,850 eight-character and fewer strings. 157 00:09:44,850 --> 00:09:50,600 So notice that this isn't necessarily a very efficient algorithm for finding 158 00:09:50,600 --> 00:09:51,860 a password. 159 00:09:51,860 --> 00:09:54,540 You might think that there are better ways. 160 00:09:54,540 --> 00:10:02,230 For example, the password zyx!32ab probably isn't a very common password, 161 00:10:02,230 --> 00:10:06,440 whereas the password 12345 is probably a lot more common. 162 00:10:06,440 --> 00:10:13,570 >> So one way of trying to find a password more quickly is to just look 163 00:10:13,570 --> 00:10:15,560 at passwords that are more common. 164 00:10:15,560 --> 00:10:20,480 So for example, we can try to read words from a dictionary and try all of 165 00:10:20,480 --> 00:10:24,860 those words as our password guesses. 166 00:10:24,860 --> 00:10:29,210 Now, maybe a password isn't that simple. 167 00:10:29,210 --> 00:10:32,600 Maybe the user was somewhat clever and try appending a number to 168 00:10:32,600 --> 00:10:34,220 the end of a word. 169 00:10:34,220 --> 00:10:37,000 >> So maybe their password was password1. 170 00:10:37,000 --> 00:10:41,520 So you can try iterating over all words in the dictionary with a one 171 00:10:41,520 --> 00:10:43,210 appended to the end of it. 172 00:10:43,210 --> 00:10:47,360 And then maybe after doing that, you'll append a two to the end of it. 173 00:10:47,360 --> 00:10:50,240 >> Or maybe the user is trying to be even more clever, and they want their 174 00:10:50,240 --> 00:10:54,980 password to be "hacker," but they're going to replace all instances of e's 175 00:10:54,980 --> 00:10:56,600 with threes. 176 00:10:56,600 --> 00:10:58,440 So you could do this too. 177 00:10:58,440 --> 00:11:02,100 Iterate over all words in the dictionary but replace characters that 178 00:11:02,100 --> 00:11:04,790 look like numbers with those numbers. 179 00:11:04,790 --> 00:11:09,670 >> So this way, you might catch even more passwords that are pretty common. 180 00:11:09,670 --> 00:11:14,690 But in the end, the only way you can capture all passwords is to brute 181 00:11:14,690 --> 00:11:17,340 force iterate over all possible strings. 182 00:11:17,340 --> 00:11:22,100 So in the end, you do need to iterate over all strings from one character to 183 00:11:22,100 --> 00:11:28,110 eight characters, which might take a very long time, but you need to do it. 184 00:11:28,110 --> 00:11:30,024 >> My name is Rob Bowden. 185 00:11:30,024 --> 00:11:31,425 And this is Crack. 186 00:11:31,425 --> 00:11:36,533