Problem 2-7: Crack

This is CS50 AP 1516. Harvard University.

Table of Contents

Objectives
Recommended Reading
Academic Honesty
- Reasonable
- Not Reasonable
Assessment
Passwords Et Cetera

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, or the CS50 Facebook group.

Objectives

Become better acquainted with functions and libraries.
Dabble in cryptanalysis.
Play the role of an adversary.
Think more critically about data security.

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

Reasonable

Communicating with classmates about problems in English (or some other spoken language).
Discussing the course’s material with others in order to understand it better.
Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.
Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.
Reviewing past years' quizzes, tests, and solutions thereto.
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.
Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

Not Reasonable

Accessing a solution to some problem prior to (re-)submitting your own.
Asking a classmate to see his or her solution to a problem before (re-)submitting your own.
Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.
Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.
Looking at another individual’s work during a quiz or test.
Paying or offering to pay an individual for work that you may submit as (part of) your own.
Providing or making available solutions to problems to individuals who might take this course in the future.
Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.
Searching for or soliciting outright solutions to problems online or elsewhere.
Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).
Submitting (after possibly modifying) the work of another individual beyond allowed snippets.
Submitting the same or similar work to this course that you have submitted or will submit to another.
Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.
Viewing another’s solution to a problem and basing your own solution on it.

Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope: To what extent does your code implement the features required by our specification?
Correctness: To what extent is your code consistent with our specifications and free of bugs?
Design: To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?
Style: To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Passwords Et Cetera

On most, if not all, systems running Linux or UNIX is a file called /etc/passwd. By design, this file is meant to contain usernames and passwords, along with other account-related details (e.g., paths to users' home directories and shells). Also by (poor) design, this file is typically world-readable. Thankfully, the passwords therein aren’t stored "in the clear" but are instead encrypted using a "one-way hash function." When a user logs into these systems by typing a username and password, the latter is encrypted with the very same hash function, and the result is compared against the username’s entry in /etc/passwd. If the two ciphertexts match, the user is allowed in. If you’ve ever forgotten some password, you may have been told that "I can’t look up your password, but I can change it for you." It could be that person doesn’t know how. But, odds are they just can’t if a one-way hash function’s involved.

Even though passwords in /etc/passwd are encrypted, the crypto involved is not terribly strong. Quite often are adversaries, upon obtaining files like this one, able to guess (and check) users' passwords or crack them using brute force (i.e., trying all possible passwords). Only in recent years have (most) system administrators stopped storing passwords in /etc/passwd, instead using /etc/shadow, which is (supposed to be) readable only by root. (Take a look at /etc/passwd in CS50 IDE, for instance; wherever you see x a password once was.) Below, though, are some username:ciphertext pairs from an outdated (fake) system.

andi:HALRCq0IBXEPM
caesar:50zPJlUFIYY0o
eli:50MxVjGD7EfY6
hdan:50z2Htq2DN2qs
jason:50CMVwEqJXRUY
john:50TGdEyijNDNY
levatich:50QykIulIPuKI
rob:50q.zrL5e0Sak
skroob:50Bpa7n/23iug
zamyla:HAYRs6vZAb4wo

Crack these passwords, each of which has been encrypted with C’s DES-based (not MD5-based) crypt function. Specifically, write, in crack.c, a program that accepts a single command-line argument: an encrypted password. (In case you test your code with other ciphertexts, know that command-line arguments with certain characters (e.g., ?) must be enclosed in single or double quotes; those quotation marks will not end up in argv itself.) If your program is executed without any command-line arguments or with more than one command-line argument, your program should complain and exit immediately, with main returning any non-zero int (thereby signifying an error that our own tests can detect). Otherwise, your program must proceed to crack the given password, ideally as quickly as possible, ultimately printing to standard output the password in the clear followed by \n, nothing more, nothing less, with main returning 0. The underlying design of this program is entirely up to you, but you must explain each and every one of your design decisions, including any implications for performance and accuracy, with profuse comments throughout your source code. Your program must be designed in such a way that it could crack all of the passwords above, even if said cracking might take quite a while. That is to say, it’s okay if your code might take several minutes or days or longer to run. What we demand of you is correctness, not necessarily optimal performance. Your program should certainly work on inputs other than these as well; hard-coding into your program the solutions to the above is not acceptable.

Your program must behave per the below; underlined is some sample input.

username@ide50:~/workspace/unit2 $ ./crack 50Bpa7n/23iug
12345

Assume that users' passwords, as plaintext, are composed of printable ASCII characters and are no longer than eight characters long. As for their ciphertexts, you’d best pull up the "man page" (i.e., manual) for crypt by executing

man crypt

in a terminal window so that you know how the function works. In particular, make sure you understand its use of a "salt." (According to the man page, a salt "is used to perturb the algorithm in one of 4096 different ways," but why might that be useful?) As implied by that man page, you’ll likely want to put

#define _XOPEN_SOURCE
#include <unistd.h>

at the top of your file. Moreover, you’ll want to link with -lcrypt, as by compiling not with make but with:

clang -o crack crack.c -lcrypt

You might also want to read up on C’s support for file I/O, as there’s quite a number of English words in /usr/share/dict/words in CS50 IDE that might (or might not) save your program some time. If that file seems to be missing, you can install it with:

sudo apt-get install -y wamerican

By design, /etc/passwd entrusts the security of passwords to an assumption: that adversaries lack the computational resources with which to crack those passwords. Once upon a time, that may have been true. Perhaps some still do. But when it comes to security, assumptions are dangerous. May that this problem set make that claim all the more real.

We should note that this problem set is no invitation to seek out other passwords to crack. Do not conflate these Hacker Editions with "black hat" editions. We hope, though, that by understanding better the design of today’s systems, you might one day build better systems yourself. Besides acquainting you further with C, this problem set urges you to start questioning designs, as vulnerabilities (if not regrets) often result from poor ones.

If you’d like to play with the staff’s own implementation of crack, well, sorry! :-) Where’d be the fun in that?

This was Problem 2-7.