CS50 Test 2016: Answer Key

Abstraction

Answers

Stacks and queues are abstractions in that they describe, at a high level, how a collection of data should behave without prescribing low-level implementation details. In particular, a stack is characterized by its support for operations like push and pop, which behave in a last-in-first-out (LIFO) fashion, while a queue is characterized by its support for operations like enqueue and dequeue, which behave in a first-in-first-out (FIFO) fashion. Not prescribed, though, is how those operations are implemented. Both abstractions could be implemented underneath the hood, for instance, with an array, a linked list, or another data structure altogether.
A block like is an abstraction insofar as it describes at a high level only what the block will do (display text on the screen), not how it will do it (toggle patterns of pixels from white to black).
A string is an abstraction insofar as it only represents a sequence of characters, each of which can be accessed via syntax like []; it does not specify how that sequence is implemented at a lower level. Only when it’s revealed that a string is just a char * (and not, say, a linked list of characters underneath the hood) is that abstraction broken, at which point it becomes clear those same characters can also be accessed via contiguous addresses in memory.

Bleep

Answers

Because only 4 bytes are allocated for word, 5-letter words like puppy and 6-letter words like kitten will overflow that buffer, with their fifth letters' (non-zero) ASCII codes ending up in the memory allocated for bleep. The value of bleep will thus be non-zero (i.e., not false), and so bleep will effectively be true, in which case those words are bleeped.
The program can be fixed by allocating more bytes for word, provided the user never inputs more than that many characters (plus one for a trailing \0) or by dynamically allocating enough memory for word, as does CS50’s own get_string.

Insecurity

Answers

Because the format stores the password in the file itself (as plaintext, no less), an adversary could simply open the file with an ordinary text editor in order to see the password. An 8-character password is also relatively short, and an adversary could write a program to crack (i.e., guess and check) it.

#include <stdio.h>
#include <stdlib.h>

typedef struct
{
    char password[8];
}
FILEHEADER;

int main(int argc, char *argv[])
{
    FILE *inptr = fopen(argv[1], "r");
    if (inptr == NULL)
    {
        return 1;
    }
    FILE *outptr = fopen(argv[2], "w");
    if (outptr == NULL)
    {
        fclose(inptr);
        return 1;
    }
    fseek(inptr, sizeof(FILEHEADER), SEEK_SET);
    char c;
    while (fread(&c, sizeof(char), 1, inptr) == 1)
    {
        fwrite(&c, sizeof(char), 1, outptr);
    }
    fclose(outptr);
    fclose(inptr);
}

Reading Someone Else’s Code

Answers

Because get_char has to return a char, per its return type, NULL (a pointer) isn’t an option. (It’s conventional to choose some maximal value like CHAR_MAX, which is already defined as a constant, as a sentinel value indicating error.)
We try to read two so that we can detect whether the user actually inputted two (or more) characters. The second char is a "canary" of sorts that will be assigned a value if the user has inputted two (or more) characters, in which case sscanf will return 2 for which our condition can check.
Those variables are declared as static so that they are local (i.e., confined) to cs50.c, inaccessible from files that might call the library’s functions. Were they not declared as static, any program that links against the library might be able to alter their values, potentially breaking the library’s garbage collection.
Each invocation of get_string that allocates memory for a string adds that memory’s address to a (dynamically resized) array before returning the address to a caller. After main has completed or exit is called, teardown iterates over that array, freeing the memory.

Restructuring Strings

Answers

The length of a string can be determined in O(1) time instead of O(n), where n is the length of a string.
A string can be no more than 255 characters long, since its length is stored as an 8-bit value.

#include <string.h>

size_t strlen(const char *s)
{
    if (s == NULL)
    {
        return 0;
    }
    return (size_t) s[0];
}

Answers

If Medium used an array, they would have to decide on its length in advance. If a user ends up reading zero or few posts, Medium might end up wasting memory. If a user ends up reading many posts, Medium might have to reallocate the array dynamically, which would cost time. And if the array weren’t sorted (which itself would take time), searching the array could be slow.
If Medium used a linked list, they would have to search it linearly in order to check wheher a user has already read some post. If a user ends up reading many posts, searching that list could be slow.
If Medium used a trie, they might end up wasting quite a bit of memory as each node in the trie might have quite a few NULL pointers, particularly since posts' ID tend to be pseudorandom alphanumeric strings, in which case space-saving common prefixes would be rare.
Because Medium uses a Bloom filter, they risk false positives (but not false negatives). A Bloom filter is essentially an array of bits, each initialized to 0, some of which are then set to 1 via multiple hash functions. When a user reads some post, Medium hashes that post’s ID with multiple hash functions. It uses the outputs of those hash functions as indices into the array, setting each of the bits at those indices to 1. It’s possible, though, that one or more other IDs might collectively hash to those same indices, in which case Medium will conclude that a user has read a post that he or she hasn’t.

Tradeoffs

Answers

1. Lists of values can be searched with either linear search or binary search.
2. Linear search is easy to implement and does not require random access.
3. Binary search is asymptotically faster (provided the list is sorted).
1. A dictionary of words can be implemented with either a hash table or trie.
2. A hash table tends to use less memory (since a trie tends to have lots of NULL pointers).
3. A trie is asymptotically faster, since the running time of lookups doesn’t depend on the number of words already in the dictionary.

Trusting `clang`

Answers

You could change:

if(c == 'n')
        return('\n');

to:

if(c == 'n' || c == 'N')
        return('\n');

A Trojan horse might already be compiled into your installation of clang, the code for which has since been removed from clang's open source. And even though the source code for clang might not contain a Trojan horse, the compiler with which you compile clang yourself might already be compromised, in which case it could inject a Trojan horse into your own compilation.

What’s Upword?

Answers

Yes.
No.

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

bool upword(const char *word)
{
    if (word == NULL)
    {
        return false;
    }
    for (int i = 0, n = strlen(word); i < n; i++)
    {
        if (!isalpha(word[i]))
        {
            return false;
        }
        if (i < n - 1 && tolower(word[i]) > tolower(word[i+1]))
        {
            return false;
        }
    }
    return true;
}

Z

Answers

function main()
{
    $a <- get()
    $b <- get()
    if (not($a < $b))
    {
        if (not($b < $a))
        {
            print("equal")
        }
    }
}

function main()
{
    recurse(get())
}

function recurse($i)
{
    if (-1 < $i)
    {
        print($i)
        print("\n")
        recurse(add($i, -1))
    }
}