# #functions

Recall that a hash function maps a (potentially infinite) domain of inputs to a finite range of outputs. A "perfect" hash function, meanwhile, is one that maps a (finite) set of inputs to a range of distinct outputs without any collisions.

Answer the below in `functions.md`.

## Questions

1. (2 points.) Consider the hash function below.

``````int hash(char *s)
{
if (s >= 'A' && s <= 'Z')
{
return s - 'A';
}
else
{
return s - 'a';
}
}``````

Why might a hash table, if implemented using this hash function, prove to be slow? Assume that `s` is an entirely alphabetical string of non-zero length.

2. (4 points) Recall that a string is just a sequence of characters, each of which is a sequence of bits. In ASCII, `A` is thus `01000001`, `AB` is thus `0100000101000010`, and `ABC` is thus `010000010100001001000011`. Suppose that a hash function, given a string in ASCII as input, outputs the (decimal) integer represented by its sequence of bits. Given `A` as input, this hash function would output 65; given `AB` as input, this hash function would output `16706`; and given `ABC` as input, this hash function would output `4276803`. Why is this hash function "perfect" by definition but imperfect in practice if implemented in C?

3. (4 points.) The input to a hash function could also be a file. After all, a file is just a sequence of bits, much like a string. As it turns out, `check50` uses hashes to check the correctness of `recover` for Problem Set 4. Rather than compare the 50 files outputted by your implementation of `recover` against the original 50 JPEGs from `card.raw` (as is possible, recall, with programs like `diff`), `check50` instead hashes each of your 50 files, the output of which is a 64-digit hexadecimal string (i.e., 256 bits), comparing each such hash against 64-digit hexadecimal strings hard-coded into the source code of `check50`'s checks. If curious, you (and anyone on the internet!) can see them atop https://github.com/cs50/checks/blob/master/cs50/2017/fall/recover/check50/__init__.py, though no need to understand the Python code therein. Why might we be storing those 50 hashes in that repository instead of the original 50 JPEGs themselves?

4. (4 points.) Recall that a trie iteratively hashes each of the letters in words in order to index into arrays. And even a hash table might hash all of the letters in words in order to index into an array. Why, though, is lookup of words said to be in O(1) for a trie but O(n) for a hash table?

## Debrief

1. Which resources, if any, did you find helpful in answering this problem’s questions?

2. About how long did you spend on this problem’s questions?