string
data type is one that we provided to you with the CS50 Library and it’s really just a synonym for char *
. The *
implies that this data type is a pointer, or a memory address. A memory address is simply a number that designates a particular byte among the billions that make up your RAM.Consider this simple example:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
printf("Give me a string: ");
string s = GetString();
printf("Give me another string: ");
string t = GetString();
if (s == t)
printf("You typed the same thing!\n");
else
printf("You typed something different!\n");
return 0;
}
s
, it contains a garbage value from when its memory was being used by another function or program. GetString
does the work of finding memory in which to put the string the user types. Then it returns the address of this memory (specifically the address of othe first character in the string) to be stored in s
. Given this address of the first char
, the rest of the string can be found by iterating until the null terminator character \0
is encountered. Considering this, strlen
is pretty simple: it takes a pointer, iterates to the null terminator, and counts the number of steps it takes.s
might be 123 whereas the value of t
might be 456. Thus, we when we compare s
and t
we’re really comparing 123 and 456. In general, we don’t care what these values are, so when we represent pointers pictorially, we usually just draw an arrow to the memory that they point to.Instead of the ==
operator, we really should use the strcmp
function:
/****************************************************************************
* compare2.c
*
* David J. Malan
* malan@harvard.edu
*
* Compares two strings.
*
* Demonstrates strings as pointers to characters.
***************************************************************************/
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// get line of text
printf("Say something: ");
char* s = GetString();
// get another line of text
printf("Say something: ");
char* t = GetString();
// try to compare strings
if (s != NULL && t != NULL)
{
if (strcmp(s, t) == 0)
printf("You typed the same thing!\n");
else
printf("You typed different things!\n");
}
return 0;
}
strcmp
returns 0 if two strings are equal, and -1 or 1 depending on the alphabetical ordering of the strings. This is useful in searching and sorting strings.Now that we know how to compare strings, let’s try copying them:
/****************************************************************************
* copy1.c
*
* David J. Malan
* malan@harvard.edu
*
* Tries and fails to copy two strings.
*
* Demonstrates strings as pointers to arrays.
***************************************************************************/
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// get line of text
printf("Say something: ");
string s = GetString();
if (s == NULL)
return 1;
// try (and fail) to copy string
string t = s;
// change "copy"
printf("Capitalizing copy...\n");
if (strlen(t) > 0)
t[0] = toupper(t[0]);
// print original and "copy"
printf("Original: %s\n", s);
printf("Copy: %s\n", t);
return 0;
}
s == NULL
because when we ask the operating system for memory, it might say no. For example, maybe we asked the operating system for trillions of bytes that it just doesn’t have. GetString
returns the special value NULL
if something has gone wrong. NULL
is actually a pointer that represents the memory address 0. The operating system will never use this memory, it is reserved for signifying an error.s
to a new variable t
does not actually make a copy of s
. To demonstrate this, the rest of the program purports to capitalize the first letter of only the copy, but if we run the program we see that both the original and the copy are capitalized. Thinking of s
and t
pictorially, they are both arrows to the same memory.To fix this problem, we need to create a new chunk of memory in which we can store a true copy of the string which s
points to:
/****************************************************************************
* copy2.c
*
* David J. Malan
* malan@harvard.edu
*
* Copies a string.
*
* Demonstrates strings as pointers to arrays.
***************************************************************************/
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// get line of text
printf("Say something: ");
char* s = GetString();
if (s == NULL)
return 1;
// allocate enough space for copy
char* t = malloc((strlen(s) + 1) * sizeof(char));
if (t == NULL)
return 1;
// copy string
int n = strlen(s);
for (int i = 0; i < n; i++)
t[i] = s[i];
t[n] = '\0';
// change copy
printf("Capitalizing copy...\n");
if (strlen(t) > 0)
t[0] = toupper(t[0]);
// print original and copy
printf("Original: %s\n", s);
printf("Copy: %s\n", t);
// free memory
free(s);
free(t);
return 0;
}
malloc
, short for “memory allocation.” This function takes a single argument, the number of bytes we’re asking the operating system for. In this case, we calculate the number of bytes to be strlen(s) + 1
, enough to store all of the characters of s
plus one additional byte for the null terminator character. sizeof
is a function which returns the size in bytes of a data type. Although a char
will almost always be 1 byte, we want to make sure our program works on any systems. If s
holds the string “hello,” then we ask for 6 bytes. Assuming all goes well, malloc
returns the address of the first byte of a chunk of memory. If all does not go well, malloc
returns NULL
, which is why we check for this special value.s
, we actually iterate over the characters of s
and copy them one by one to t
. We have to make sure to add the \0
at the end of t
. Alternatively, we could iterate to i <= n
instead of i < n
so that the \0
from s
would also be copied.t
. This time, however, it works! The original remains uncapitalized.free
to return the memory used by s
and t
to the operating system. Interestingly, this has no effect on the contents of the memory. They will still contain the strings as before. Similarly, when you drag a file into the trash can on your computer, it has no effect on the contents of the file: they’re still stored on your hard disk somewhere. When you empty the trash can on your computer, the contents of the file remain, but the operating system deletes its record of where the contents are acctually stored. Still, the file itself can actually be recovered by some forensic methods that we’ll make use of in a future problem set. If you really want to make sure that the contents of the file go way, you need to overwrite the 0’s and 1’s on the hard drive. Choosing the “Secure Empty Trash” option on a Mac does this for you.char*
, but this is equivalent to char *
.Thus far, we have accessed the characters in a string using square bracket notation. This program demonstrates an alternative way to do so:
/****************************************************************************
* pointers.c
*
* David J. Malan
* malan@harvard.edu
*
* Prints a given string one character per line.
*
* Demonstrates pointer arithmetic.
***************************************************************************/
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// get line of text
char* s = GetString();
if (s == NULL)
return 1;
// print string, one character per line
for (int i = 0, n = strlen(s); i < n; i++)
printf("%c\n", *(s+i));
// free string
free(s);
return 0;
}
*(s+i)
evaluate to? Let’s consider the first iteration of the loop, when i
is 0. Then *(s+i)
is really just *s
. In this context, *
is the dereferencing operator, which means “go to the memory address of.” Thus, if s
points to the string “hello,” then *s
is the letter “h.” On the second iteration of the loop, i
is 1 and the expression is *(s+1)
, which means “go to the memory address 1 byte after s
.” Functionally, *(s+i)
is equivalent to s[i]
.free
any memory allocated by GetString
or malloc
! If you allocate memory within a loop, you have to free it within the loop since the variable won’t exist outside the scope of the loop. Soon we’ll introduce a program called Valgrind that will help you track down memory leaks in your code.char*
, the underlying data type is a single-byte char
so it makes sense that adding 1 would advance to the next char
. However, if s
were an int[]
, the example above would still work. C is smart enough to evaluate i
to 4 bytes instead of 1 because the underlying data type is a four-byte int
.Recall our buggy swap
function:
void swap(int a, int b)
{
int tmp = a;
a = b;
b = tmp;
}
Now that we know about pointers, we can fix this function quite easily:
void swap(int* a, int* b)
{
int tmp = *a;
*a = *b;
*b = tmp;
}
a
and b
are now pointers, not integers. Our function has changed so that when we call swap
, we have to pass in addresses, not integers. So if x
and y
are stored at memory addresses 123 and 124, respectively, then a
and b
will have the values 123 and 124, respectively.*a
, we’re getting the contents of memory at address a
and assigning them to tmp
. *a
evaluates to 1 (the value of x
) so this is what gets stored in tmp
. The second line of the function reads as “take the contents of memory at b
and assign them to the contents of memory at a
.” Thus x
becomes 2. Finally, we assign to the contents of memory at b
the value of tmp
. Thus, y
becomes 1.The final working version is as follows:
/****************************************************************************
* swap.c
*
* David J. Malan
* malan@harvard.edu
*
* Swaps two variables' values.
*
* Demonstrates passing by reference.
***************************************************************************/
#include <stdio.h>
// function prototype
void swap(int* a, int* b);
int main(void)
{
int x = 1;
int y = 2;
printf("x is %d\n", x);
printf("y is %d\n", y);
printf("Swapping...\n");
swap(&x, &y);
printf("Swapped!\n");
printf("x is %d\n", x);
printf("y is %d\n", y);
return 0;
}
/**
* Swap arguments' values.
*/
void swap(int* a, int* b)
{
int tmp = *a;
*a = *b;
*b = tmp;
}
swap
. Previously, we wrote swap(x, y)
, but as we said, we have to pass memory addresses, not integers to swap
now. &
is the “address-of” operator, so swap(&x, &y)
means “pass the addresses of x
and y
to swap
.”free
. The heap is where GetString
and malloc
get their memory from. The memory itself is not intrinsically different from the memory on the stack; it’s simply in a different location which follows different conventions.Consider the following simple program:
int main(void)
{
int* x;
int* y;
x = malloc(sizeof(int));
*x = 42;
*y = 13;
y = x;
*y = 13;
}
malloc
returns memory addresses, so we can assign its return value to x
. Writing *x = 42
places the integer 42 in the memory which x
points to. *y = 13
is problematic, though. We haven’t initialized y
, so it still contains some garbage value. When we try to access that garbage value as a memory address, we’ll probably get a segmentation fault.