Andrew Sellergren
To prepare a string for Huffman coding, we need to analyze the frequencies of letters in it like so:
As before, we want to use a small number of bits for “E” because it appears more frequently than the other letters. Huffman proposes doing so by considering a forest of trees, each of which represents a letter:
Now, we begin to combine these one-node trees starting with the two with the smallest frequencies (“B” and “C” in this case):
We now have four trees whose root nodes have the frequency values 0.2 (the combination of the frequencies of “B” and “C”), 0.15, 0.2, and 0.45. The two smallest values are 0.15 and 0.2, so we combine those two. It doesn’t matter which 0.2 value we pick, so we’ll pick the one that represents the “B”-“C” tree:
We do this a few more times to end up with a single tree:
A Huffman node is simple enough to represent in C:
typedef struct node
{
char symbol;
float frequency;
struct node* left;
struct node* right;
}
node;
How many bits do we need to flip to convert “A” to “a”? Just one, the third from the left:
A 01000001
a 01100001
Previously, we did this conversion using if statements and addition. However, we could also have done it using a single operator, the bitwise “or” operator. Recall that the logical “or” returns true if either of its operands is true. Similarly, the bitwise “or” operator returns 1 if either of its operands is 1. Consider then the result of applying this operator between the following two binary numbers:
01000001
| 00100000
In the leftmost place, 0 “or” 0 gives us 0. In the next place to the right, 1 “or” 0 returns 1. And so on until we get:
01100001
01000001 | 00100000
. The bitwise “or” operator is represented as |
, not to be confused with the other “or” operator ||
. Actually, we could even write this in C as 'A' | 32
.To go from “a” to “A,” we need to turn a 1 bit to a 0. We simply invert our mask and use the bitwise “and” operator:
01100001
& 11011111
&
, returns 1 only if both operands are 1.Take a look at a program which converts a user-provided letter to lowercase:
/****************************************************************************
* tolower.c
*
* David J. Malan
* malan@harvard.edu
*
* Converts an uppercase character to lowercase.
*
* Demonstrates bitwise operators.
***************************************************************************/
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
int main(void)
{
// prompt user for an uppercase character
char c;
do
{
printf("Uppercase character please: ");
c = GetChar();
}
while (c < 'A' || c > 'Z');
// print number in lowercase
printf("%c\n", c | 0x20);
// that's all folks
return 0;
}
00100000
or 32. To see how 0x20 represents 00100000
, remember that 2 in binary is 0010 and 0 in binary is 0000. Since each hexadecimal digit represents 4 bits, we simply put those two binary numbers side by side and we get 00100000
. Hexadecimal is convenient for representing very large numbers. Try for yourself to understand how 0xdf is a hexadecimal representation of the other mask we discussed above, 11011111
.As another demonstration of bitwise operators, check out binary.c
:
/****************************************************************************
* binary.c
*
* David J. Malan
* malan@harvard.edu
*
* Displays a number in binary.
*
* Demonstrates bitwise operators.
***************************************************************************/
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// prompt user for number
int n;
do
{
printf("Non-negative integer please: ");
n = GetInt();
}
while (n < 0);
// print number in binary
for (int i = sizeof(int) * 8 - 1; i >= 0; i--)
{
int mask = 1 << i;
if (n & mask)
printf("1");
else
printf("0");
}
printf("\n");
// that's all folks
return 0;
}
<<
, the left bitshift operator. This operator shifts the bits of the left operand by a number equal to the right operand. So we take the integer 1 (represented in binary as 31 0’s followed by a 1) and shift its bits to the left by 31 on the first iteration of the loop. This gives us the integer represented in binary as a 1 followed by 31 0’s. Incidentally, bitshifting a number to the left has the effect of doubling it since each digit represents a power of 2.n & mask
will be true if n
has a 1 in that same digit’s place. Otherwise, it will be false. If it’s true, we print out a 1 for that digit’s place and if it’s false, we print out a 0 for that digit’s place.bool
. Unfortunately, a bool
is represented by 8 bits in C when we really only need 1 bit. If you wanted to decrease the amount of memory required to store your array, you might use bitwise operators to manipulate the individual bits in a byte.^
. It returns true if one and only one of its operands is a 1. There are also the “not” (~
) and right bitshift (>>
) operators among others.The flags of Germany (top) and France (bottom) lend themselves quite naturally to compression because they have a lot of repetition.:
clang
or make
commands, your computer has actually been carrying out four steps to translate your C source code to binary:
Starting with the simplest of programs:
#include <stdio.h>
int main(void)
{
printf("hello, world");
return 0;
}
#
symbol are executed during the pre-processing stage. In the program above, the contents of stdio.h
are copied and pasted into our source code.movl
, pushl
, and subl
. Your CPU (e.g. an Intel processor) is built to understand these instructions. To see this assembly language, you can run clang
on your C source code with the -S
flag.printf
needs to be linked with our program.You open up your web browser and navigate to facebook.com. What’s actually happening? Your computer is communicating with Facebook’s servers via HTTP, hypertext transfer protocol. HTTP is merely a set of conventions which dictate how browsers and servers should interact with each other, much like the set of conventions which dictate how humans interact with humans, e.g. by shaking hands and greeting each other. When your web browser navigates to facebook.com, it’s effectively extending its hand for a handshake along with a message like so:
GET / HTTP/1.1
Host: www.facebook.com
GET /
denotes that we’re requesting the homepage. HTTP/1.1
alerts Facebook to the fact that we’re speaking version 1.1 of HTTP. The Host
parameter indicates that the particular website we’re asking to access.
The very first line of code in an HTML file is the following:
<!DOCTYPE HTML>
HTML is composed of tags which begin and end with angle brackets. The root tag is <html>
. Within that, there are the <head>
and <body>
tags. Within the <head>
tag is the <title>
tag, which is responsible for the text that appears in the tab in your browser:
<title id="pageTitle">Welcome to Facebook - Log In, Sign Up or Learn More</title>
As with other programming languages, we’ll write the “hello, world” version of HTML:
<!DOCTYPE html>
<html>
<head>
<title>hello, world</title>
</head>
<body>
hello, world
</body>
</html>
<html>
is the start tag and </html>
is the end tag. In HTML, you have to tell the browser when to start doing something and when to stop doing something. If you want text to be bold, you have to tell the browser to start making text bold and to stop making text bold.hello.html
, we can drag it into a browser to see the text “hello, world” rendered. This will also be displayed in the tab, as per the <title>
tag.We can spruce up our HTML by linking to two files written in two other languages, CSS and JavaScript:
<!DOCTYPE html>
<html>
<head>
<link href="styles.css" rel="stylesheet"/>
<script src="scripts.js"></script>
<title>hello, world</title>
</head>
<body>
hello, world
</body>
</html>
<link>
tag has two attributes, href
and rel
, which modify its behavior. This tag tells the browser to include a CSS file named styles.css
which dictates certain aesthetics. You can think of the <link>
tag as the functional equivalent of the #include
directive in C. Here, the <script>
tag pulls in JavaScript code.To make some of our text bold, we use the <b>
tag:
<!DOCTYPE html>
<html>
<head>
<title>hello, world</title>
</head>
<body>
hello, <b>world</b>
</body>
</html>
Let’s try to add a link to another webpage:
<!DOCTYPE html>
<html>
<head>
<title>hello, world</title>
</head>
<body>
hello, <b>world</b>
my favorite website is youtube.com
</body>
</html>
<br/>
tag. This tag is an example of a tag that doesn’t come in a pair.Unfortunately, “youtube.com” isn’t clickable. To make it actually lead to YouTube, we need to use the <a>
tag:
<!DOCTYPE html>
<html>
<head>
<title>hello, world</title>
</head>
<body>
hello, <b>world</b>
my favorite website is
<a href="http://www.youtube.com">youtube.com</a>
</body>
</html>
href
attribute to something malicious and the clickable text to something innocuous.