1 00:00:07,360 --> 00:00:09,360 Let's talk about arrays. 2 00:00:09,360 --> 00:00:12,780 So why would we ever want to use arrays? 3 00:00:12,780 --> 00:00:17,210 Well let's say you have a program that needs to store 5 student IDs. 4 00:00:17,210 --> 00:00:21,270 It might seem reasonable to have 5 separate variables. 5 00:00:21,270 --> 00:00:24,240 For reasons we'll see in a bit, we'll start counting from 0. 6 00:00:24,240 --> 00:00:30,700 The variables we'll have will be int id0, int id1, and so on. 7 00:00:30,700 --> 00:00:34,870 Any logic we want to perform on a student ID will need to be copied and pasted 8 00:00:34,870 --> 00:00:36,870 for each of these student IDs. 9 00:00:36,870 --> 00:00:39,710 If we want to check which students happen to be in CS50, 10 00:00:39,710 --> 00:00:43,910 we'll first need to check if id0 represents the student in the course. 11 00:00:43,910 --> 00:00:48,070 Then to do the same for the next student, we'll need to copy and paste the code for id0 12 00:00:48,070 --> 00:00:54,430 and replace all occurrences of id0 with id1 and so on for id2, 3, and 4. 13 00:00:54,430 --> 00:00:57,560 >> As soon as you hear that we need to copy and paste, 14 00:00:57,560 --> 00:01:00,440 you should start thinking that there is a better solution. 15 00:01:00,440 --> 00:01:05,360 Now what if you realize you don't need 5 student IDs but rather 7? 16 00:01:05,360 --> 00:01:09,570 You need to go back into your source code and add in an id5, an id6, 17 00:01:09,570 --> 00:01:14,260 and copy and paste the logic for checking if the IDs belong to the class for these 2 new IDs. 18 00:01:14,260 --> 00:01:19,600 There is nothing connecting all these IDs together, and so there is no way of asking 19 00:01:19,600 --> 00:01:22,040 the program to do this for IDs 0 through 6. 20 00:01:22,040 --> 00:01:26,120 Well now you realize you have 100 student IDs. 21 00:01:26,120 --> 00:01:30,770 It's starting to seem less than ideal to need to separately declare each of these IDs, 22 00:01:30,770 --> 00:01:33,760 and copy and paste any logic for those new IDs. 23 00:01:33,760 --> 00:01:38,380 But maybe we are determined, and we do it for all 100 students. 24 00:01:38,380 --> 00:01:42,240 But what if you don't know how many students there actually are? 25 00:01:42,240 --> 00:01:47,320 There are just some n students and your program has to ask the user what that n is. 26 00:01:47,320 --> 00:01:50,250 Uh oh. This isn't going to work very well. 27 00:01:50,250 --> 00:01:53,820 Your program only works for some constant number of students. 28 00:01:53,820 --> 00:01:57,520 >> Solving all of these problems is the beauty of arrays. 29 00:01:57,520 --> 00:01:59,930 So what is an array? 30 00:01:59,930 --> 00:02:04,480 In some programming languages an array type might be able to do a bit more, 31 00:02:04,480 --> 00:02:09,960 but here we'll focus on the basic array data structure just as you'll see it in C. 32 00:02:09,960 --> 00:02:14,030 An array is just a big block of memory. That's it. 33 00:02:14,030 --> 00:02:17,770 When we say we have an array of 10 integers, that just means we have some block 34 00:02:17,770 --> 00:02:20,740 of memory that is large enough to hold 10 separate integers. 35 00:02:29,930 --> 00:02:33,410 Assuming that an integer is 4 bytes, this means that an array of 10 integers 36 00:02:33,410 --> 00:02:37,180 is a continuous block of 40 bytes in memory. 37 00:02:42,660 --> 00:02:46,280 Even when you use multidimensional arrays, which we won't go in to here, 38 00:02:46,280 --> 00:02:49,200 it's still just a big block of memory. 39 00:02:49,200 --> 00:02:51,840 The multidimensional notation is just a convenience. 40 00:02:51,840 --> 00:02:55,640 If you have a 3 by 3 multidimensional array of integers, 41 00:02:55,640 --> 00:03:00,650 then your program will really just treat this as a big block of 36 bytes. 42 00:03:00,650 --> 00:03:05,460 The total number of integers is 3 times 3, and each integer takes up 4 bytes. 43 00:03:05,460 --> 00:03:07,750 >> Let's take a look at a basic example. 44 00:03:07,750 --> 00:03:10,660 We can see here 2 different ways of declaring arrays. 45 00:03:15,660 --> 00:03:18,580 We'll have to comment 1 of them out for the program to compile 46 00:03:18,580 --> 00:03:20,900 since we declare x twice. 47 00:03:20,900 --> 00:03:25,140 We'll take a look at some of the differences between these 2 types of declarations in a bit. 48 00:03:25,140 --> 00:03:28,560 Both of these lines declare an array of size N, 49 00:03:28,560 --> 00:03:30,740 where we have #define N as 10. 50 00:03:30,740 --> 00:03:34,460 We could just as easily have asked the user for a positive integer 51 00:03:34,460 --> 00:03:37,250 and used that integer as a number of elements in our array. 52 00:03:37,250 --> 00:03:41,960 Like our student ID example before, this is kind of like declaring 10 completely separate 53 00:03:41,960 --> 00:03:49,000 imaginary variables; x0, x1, x2, and so on up to xN-1. 54 00:03:57,270 --> 00:04:00,840 Ignoring the lines where we declare the array, notice the square brackets intact 55 00:04:00,840 --> 00:04:02,090 inside the for loops. 56 00:04:02,090 --> 00:04:09,660 When we write something like x[3], which I'll just read as x bracket 3, 57 00:04:09,660 --> 00:04:13,090 you can think of it like asking for the imaginary x3. 58 00:04:13,090 --> 00:04:17,519 Notice than with an array of size N, this means that the number inside of the brackets, 59 00:04:17,519 --> 00:04:22,630 which we'll call the index, can be anything from 0 to N-1, 60 00:04:22,630 --> 00:04:25,660 which is a total of N indices. 61 00:04:25,660 --> 00:04:28,260 >> To think about how this actually works 62 00:04:28,260 --> 00:04:31,260 remember that the array is a big block of memory. 63 00:04:31,260 --> 00:04:37,460 Assuming that an integer is 4 bytes, the entire array x is a 40 byte block of memory. 64 00:04:37,460 --> 00:04:41,360 So x0 refers to the very first 4 bytes of the block. 65 00:04:45,810 --> 00:04:49,230 X[1] refers to the next 4 bytes and so on. 66 00:04:49,230 --> 00:04:53,760 This means that the start of x is all the program ever needs to keep track of. 67 00:04:55,660 --> 00:04:59,840 If you want to use x[400], then the program knows that this is equivalent 68 00:04:59,840 --> 00:05:03,460 to just 1,600 bytes after the start of x. 69 00:05:03,460 --> 00:05:08,780 Where'd we get 1,600 bytes from? It's just 400 times 4 bytes per integer. 70 00:05:08,780 --> 00:05:13,170 >> Before moving on, it's very important to realize that in C 71 00:05:13,170 --> 00:05:17,080 there is no enforcement of the index that we use in the array. 72 00:05:17,080 --> 00:05:23,180 Our big block is only 10 integers long, but nothing will yell at us if we write x[20] 73 00:05:23,180 --> 00:05:26,060 or even x[-5]. 74 00:05:26,060 --> 00:05:28,240 The index doesn't even have to be a number. 75 00:05:28,240 --> 00:05:30,630 It can be any arbitrary expression. 76 00:05:30,630 --> 00:05:34,800 In the program we use the variable i from the for loop to index into the array. 77 00:05:34,800 --> 00:05:40,340 This is a very common pattern, looping from i=0 to the length of the array, 78 00:05:40,340 --> 00:05:43,350 and then using i as the index for the array. 79 00:05:43,350 --> 00:05:46,160 In this way you effectively loop over the entire array, 80 00:05:46,160 --> 00:05:50,600 and you can either assign to each spot in the array or use it for some calculation. 81 00:05:50,600 --> 00:05:53,920 >> In the first for loop, i starts at 0, 82 00:05:53,920 --> 00:05:58,680 and so it will assign to the 0 spot in the array, the value 0 times 2. 83 00:05:58,680 --> 00:06:04,370 Then i increments, and we assign the first spot in the array the value 1 times 2. 84 00:06:04,370 --> 00:06:10,170 Then i increments again and so on up until we assign to position N-1 in the array 85 00:06:10,170 --> 00:06:13,370 the value N-1 times 2. 86 00:06:13,370 --> 00:06:17,810 So we've created an array with the first 10 even numbers. 87 00:06:17,810 --> 00:06:21,970 Maybe evens would have been a bit better name for the variable than x, 88 00:06:21,970 --> 00:06:24,760 but that would have given things away. 89 00:06:24,760 --> 00:06:30,210 The second for loop then just prints the values that we have already stored inside of the array. 90 00:06:30,210 --> 00:06:33,600 >> Let's try running the program with both types of array declarations 91 00:06:33,600 --> 00:06:36,330 and take a look at the output of the program. 92 00:06:51,450 --> 00:06:57,020 As far as we can see, the program behaves the same way for both types of declarations. 93 00:06:57,020 --> 00:07:02,230 Let's also take a look at what happens if we change the first loop to not stop at N 94 00:07:02,230 --> 00:07:05,040 but rather say 10,000. 95 00:07:05,040 --> 00:07:07,430 Way beyond the end of the array. 96 00:07:14,700 --> 00:07:17,210 Oops. Maybe you've seen this before. 97 00:07:17,210 --> 00:07:20,440 A segmentation fault means your program has crashed. 98 00:07:20,440 --> 00:07:24,430 You start seeing these when you touch areas of memory you shouldn't be touching. 99 00:07:24,430 --> 00:07:27,870 Here we are touching 10,000 places beyond the start of x, 100 00:07:27,870 --> 00:07:31,920 which evidently is a place in memory we shouldn't be touching. 101 00:07:31,920 --> 00:07:37,690 So most of us probably wouldn't accidentally put 10,000 instead of N, 102 00:07:37,690 --> 00:07:42,930 but what if we do something more subtle like say write less than or equal to N 103 00:07:42,930 --> 00:07:46,830 in the for loop condition as opposed to less than N. 104 00:07:46,830 --> 00:07:50,100 Remember that an array only has indices from 0 to N-1, 105 00:07:50,100 --> 00:07:54,510 which means that index N is beyond the end of the array. 106 00:07:54,510 --> 00:07:58,050 The program might not crash in this case, but it's still an error. 107 00:07:58,050 --> 00:08:01,950 In fact, this error is so common that it has it's own name, 108 00:08:01,950 --> 00:08:03,970 an off by 1 error. 109 00:08:03,970 --> 00:08:05,970 >> That's it for the basics. 110 00:08:05,970 --> 00:08:09,960 So what are the major differences between the 2 types of array declarations? 111 00:08:09,960 --> 00:08:13,960 One difference is where the big block of memory goes. 112 00:08:13,960 --> 00:08:17,660 In the first declaration, which I'll call the bracket-array type, 113 00:08:17,660 --> 00:08:20,300 though this is by no means a conventional name, 114 00:08:20,300 --> 00:08:22,480 it will go on the stack. 115 00:08:22,480 --> 00:08:27,450 Whereas in the second, which I'll call the pointer-array type, it will go on the heap. 116 00:08:27,450 --> 00:08:32,480 This means that when the function returns, the bracket array will automatically be deallocated, 117 00:08:32,480 --> 00:08:36,419 whereas as you must explicitily call free on the pointer array 118 00:08:36,419 --> 00:08:38,010 or else you have a memory leak. 119 00:08:38,010 --> 00:08:42,750 Additionally, the bracket array isn't actually a variable. 120 00:08:42,750 --> 00:08:45,490 This is important. It's just a symbol. 121 00:08:45,490 --> 00:08:49,160 You can think of it as a constant that the compiler chooses for you. 122 00:08:49,160 --> 00:08:52,970 This means that we can't do something like x++ with the bracket type, 123 00:08:52,970 --> 00:08:56,240 though this is perfectly valid with the pointer type. 124 00:08:56,240 --> 00:08:58,270 >> The pointer type is a variable. 125 00:08:58,270 --> 00:09:01,510 For the pointer type, we have 2 separate blocks of memory. 126 00:09:01,510 --> 00:09:06,060 The variable x itself is stored in the stack and is just a single pointer, 127 00:09:06,060 --> 00:09:08,620 but the big block of memory is stored on the heap. 128 00:09:08,620 --> 00:09:11,010 The variable x on the stack just stores the address 129 00:09:11,010 --> 00:09:14,010 of the big block of memory on the heap. 130 00:09:14,010 --> 00:09:17,370 One implication of this is with the size of operator. 131 00:09:17,370 --> 00:09:22,480 If you ask for the size of the bracket array, it will give you the size of the big block of memory, 132 00:09:22,480 --> 00:09:24,620 something like 40 bytes, 133 00:09:24,620 --> 00:09:26,920 but if you ask for the size of the pointer type of array, 134 00:09:26,920 --> 00:09:32,740 it will give you the size of the variable x itself, which on the appliance is likely just 4 bytes. 135 00:09:32,740 --> 00:09:36,530 Using the pointer-array type, it is impossible to directly ask for 136 00:09:36,530 --> 00:09:38,530 the size of the big block of memory. 137 00:09:38,530 --> 00:09:42,530 This isn't usually much of a restriction since we very rarely want the size 138 00:09:42,530 --> 00:09:46,980 of the big block of memory, and we can usually calculate it if we need it. 139 00:09:46,980 --> 00:09:51,490 >> Finally, the bracket array happens to provide us with a shortcut for initializing an array. 140 00:09:51,490 --> 00:09:56,130 Let's see how we could write the first 10 even integers using the shortcut initilization. 141 00:10:11,220 --> 00:10:14,470 With the pointer array, there isn't a way to do a shortcut like this. 142 00:10:14,470 --> 00:10:18,120 This is just an introduction to what you can do with arrays. 143 00:10:18,120 --> 00:10:20,990 They show up in almost every program you write. 144 00:10:20,990 --> 00:10:24,390 Hopefully you can now see a better way of doing the student IDs example 145 00:10:24,390 --> 00:10:26,710 from the beginning of the video. 146 00:10:26,710 --> 00:10:29,960 >> My name is Rob Bowden, and this is CS50.