Emoji

Recall that emoji are just Unicode characters, patterns of bits that humans have decided shall represent ideograms.

Whereas ASCII uses just one byte to represent each of its characters, Unicode uses as many as four. Each emoji is defined by its "code point," a number, often written in hexadecimal, that uniquely identifies it. For instance, the code point for "grinning face" is U+1F600, otherwise known as 0x1F600. 😀

It turns out you can programmatically generate emoji! Try compiling and running the below in CS50 IDE, wherein wchar_t represents a "wide character," which you can think of as a 4-byte character (or, really, number).

#include <locale.h>
#include <stdio.h>
#include <wchar.h>

int main(void)
{
    setlocale(LC_ALL, "");
    wchar_t pumpkin = 0x1F383;
    printf("%lc\n", pumpkin);
}

Boo! You should see a jack-o-lantern? 🎃

Answer the below in emoji.md.

Questions

  1. (1 point.) How many bytes are necessary to represent a jack-o-lantern in Unicode?

  2. (2 points.) Why can a jack-o-lantern not be represented in C with a char?

  3. (8 points.) Complete the implementation of get_emoji in such a way that it prompts the user for a string and returns its corresponding value as an emoji (aka wchar_t). If the user’s input is not formatted as U+ followed only by one or more hexadecimal digits, re-prompt the user with prompt just as, e.g., get_int does when its input is not actually an int. You’re welcome but not expected to use strtol, declared in stdlib.h, though keep in mind that strtol won’t recognize U+ as a prefix. Alternatively, keep in mind that a hexadecimal digit represents 4 bits and that hexadecimal values have a 1s place, a 16s place, a 256s place, and so on. Here’s a full emoji list with whose codes you can test your implementation, though keep in mind that some of Unicode’s newest emoji (e.g., "nauseated face") might not display in CS50 IDE, even if your code is correct.

    You can compile the emoji program below with:

    CFLAGS="$CFLAGS -Wno-format-security" make emoji
    #include <cs50.h>
    #include <ctype.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <wchar.h>
    
    typedef wchar_t emoji;
    
    emoji get_emoji(string prompt);
    
    int main(void)
    {
        // Set locale according to environment variables
        setlocale(LC_ALL, "");
    
        // Prompt user for code point
        emoji c = get_emoji("Code point: ");
    
        // Print character
        printf("%lc\n", c);
    }
    
    emoji get_emoji(string prompt)
    {
        // TODO
    }

Debrief

  1. Which resources, if any, did you find helpful in answering this problem’s questions?

  2. About how long did you spend on this problem’s questions?