WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.388 >> [MUSIC PLAYING] 00:00:05.104 --> 00:00:06.020 DOUG LLOYD: All right. 00:00:06.020 --> 00:00:07.680 Working with single variables is pretty fun. 00:00:07.680 --> 00:00:09.500 But what if we want to work with a lot of variables, 00:00:09.500 --> 00:00:12.760 but we don't want to have a bunch of different names flying around our code? 00:00:12.760 --> 00:00:15.980 In this case, arrays are going to come in really handy. 00:00:15.980 --> 00:00:19.510 Arrays are a really fundamental data structure for any programming language 00:00:19.510 --> 00:00:20.260 that you will use. 00:00:20.260 --> 00:00:24.450 And they're really, really useful, particularly, as we'll see, in CS 50. 00:00:24.450 --> 00:00:27.870 >> We use arrays to hold values of the same data type 00:00:27.870 --> 00:00:29.830 at contiguous memory locations. 00:00:29.830 --> 00:00:32.430 That is to say, it's a way that we can group 00:00:32.430 --> 00:00:35.430 a bunch of integers together in memory or a bunch of characters 00:00:35.430 --> 00:00:38.270 or floats in memory really close together and work 00:00:38.270 --> 00:00:41.930 with them without having to give each one its own unique name, which can 00:00:41.930 --> 00:00:44.500 get cumbersome after a little while. 00:00:44.500 --> 00:00:48.130 >> Now, one way to analogize arrays is to think about your local post 00:00:48.130 --> 00:00:49.000 office for a second. 00:00:49.000 --> 00:00:51.820 So step away from programming and just close your eyes 00:00:51.820 --> 00:00:54.120 and visualize in your mind your local post office. 00:00:54.120 --> 00:00:57.160 Usually, in most post offices, there's a large bank 00:00:57.160 --> 00:01:00.490 a post office boxes on the wall. 00:01:00.490 --> 00:01:03.510 >> An array is a giant block of contiguous memory, 00:01:03.510 --> 00:01:06.120 the same way that a mail bank in your post office 00:01:06.120 --> 00:01:11.230 is a large space on the wall of the post office. 00:01:11.230 --> 00:01:15.750 Arrays have been partitioned into small, identically sized blocks of space, 00:01:15.750 --> 00:01:19.930 each of which is called an element, in the same way that the wall of the post 00:01:19.930 --> 00:01:23.840 office has been partitioned into small, identically sized blocks of space, 00:01:23.840 --> 00:01:27.560 which we call a PO box. 00:01:27.560 --> 00:01:31.650 Each element of the array can store a certain amount of data, 00:01:31.650 --> 00:01:37.540 just as each post office box is able to hold a certain amount of mail. 00:01:37.540 --> 00:01:41.540 >> What can be stored in each element of the array is variables of the same data 00:01:41.540 --> 00:01:45.300 type, such as int or char, just like in your post office box, 00:01:45.300 --> 00:01:47.300 you can only fit things of a similar type, 00:01:47.300 --> 00:01:50.430 such as letters or small packages. 00:01:50.430 --> 00:01:55.050 Lastly, we can access each element of the array directly by index number, 00:01:55.050 --> 00:01:59.770 just as we can access our post office box by knowing its mailbox number. 00:01:59.770 --> 00:02:02.750 Hopefully, that analogy helps you get your head 00:02:02.750 --> 00:02:05.540 around the idea of arrays by analogizing to something else 00:02:05.540 --> 00:02:08.400 that you are probably already familiar with. 00:02:08.400 --> 00:02:13.182 >> In C, the elements of an array are indexed starting from 0, not from 1. 00:02:13.182 --> 00:02:14.390 And this is really important. 00:02:14.390 --> 00:02:18.530 And in fact, this is why we, in CS 50, and why computer scientists frequently 00:02:18.530 --> 00:02:22.150 will count from 0, is because of C's array 00:02:22.150 --> 00:02:24.660 indexing, which always starts at 0. 00:02:24.660 --> 00:02:28.730 So if an array consists of n elements, the first element of that array 00:02:28.730 --> 00:02:32.960 is located at index 0, and the last element of the array 00:02:32.960 --> 00:02:36.610 is located at index n minus 1. 00:02:36.610 --> 00:02:43.160 Again, if there's n elements in our array, the last index is n minus 1. 00:02:43.160 --> 00:02:46.820 >> So if our array has 50 elements, the first element is located at index 0, 00:02:46.820 --> 00:02:51.060 and the last element is located at index 49. 00:02:51.060 --> 00:02:53.940 Unfortunately, or fortunately, depending on your perspective, 00:02:53.940 --> 00:02:56.170 C is very lenient here. 00:02:56.170 --> 00:02:59.480 It will not prevent you from going out of bounds of your array. 00:02:59.480 --> 00:03:03.080 You could access the minus 3 element of your array 00:03:03.080 --> 00:03:07.400 or the 59th element of your array, if your array only has 50 elements. 00:03:07.400 --> 00:03:11.060 It won't stop your program from compiling, but at run time, 00:03:11.060 --> 00:03:14.350 you might encounter a dreaded segmentation fault 00:03:14.350 --> 00:03:17.460 if you start to access memory that is outside the bounds of what 00:03:17.460 --> 00:03:19.260 you asked your program to give you. 00:03:19.260 --> 00:03:21.250 So do be careful. 00:03:21.250 --> 00:03:23.120 >> What does an array declaration look like? 00:03:23.120 --> 00:03:26.940 How do we code an array into existence like we code any other variable? 00:03:26.940 --> 00:03:31.250 There are three parts to an array declaration-- a type, a name, 00:03:31.250 --> 00:03:31.880 and a size. 00:03:31.880 --> 00:03:34.088 This is very similar to a variable declaration, which 00:03:34.088 --> 00:03:36.970 is just a type and a name, the size element being 00:03:36.970 --> 00:03:39.860 the special case for an array, because we are getting a bunch of them 00:03:39.860 --> 00:03:41.830 at the same time. 00:03:41.830 --> 00:03:45.560 >> So the type is what kind of variable you want each element of the array to be. 00:03:45.560 --> 00:03:47.150 Do want it to an array of integers? 00:03:47.150 --> 00:03:49.010 Then, your data type should be int. 00:03:49.010 --> 00:03:51.760 Do you want it to be an array of doubles or floats? 00:03:51.760 --> 00:03:54.545 Data type should be double or float. 00:03:54.545 --> 00:03:56.420 The name is what you want to call your array. 00:03:56.420 --> 00:04:00.970 What do you want to name this giant bank of integers or floats or chars 00:04:00.970 --> 00:04:03.250 or doubles, or whatever have you? 00:04:03.250 --> 00:04:04.700 What do you want to call it? 00:04:04.700 --> 00:04:06.110 Pretty self explanatory. 00:04:06.110 --> 00:04:08.610 >> Lastly, size, which goes inside of square brackets, 00:04:08.610 --> 00:04:12.180 is how many elements you would like your array to contain. 00:04:12.180 --> 00:04:13.530 How many integers do you want? 00:04:13.530 --> 00:04:15.570 How many floats do you want? 00:04:15.570 --> 00:04:19.070 >> So for example, int student grades 40. 00:04:19.070 --> 00:04:26.020 This declares an array called Student grades, which consists of 40 integers. 00:04:26.020 --> 00:04:28.180 Pretty self explanatory, I hope. 00:04:28.180 --> 00:04:29.330 Here's another example. 00:04:29.330 --> 00:04:31.560 Double menu prices 8. 00:04:31.560 --> 00:04:34.610 This creates an array called Menu prices, which consists 00:04:34.610 --> 00:04:38.300 of room in memory for eight doubles. 00:04:42.000 --> 00:04:45.750 >> If you think of every element of an array of type data-type, 00:04:45.750 --> 00:04:49.860 so for example, a single element of an array of type int, the same way you 00:04:49.860 --> 00:04:52.770 would think of any other variable of type int, 00:04:52.770 --> 00:04:56.440 all the familiar operations that we discussed previously in the Operations 00:04:56.440 --> 00:04:58.270 video will make sense. 00:04:58.270 --> 00:05:01.620 So here, we could declare an array of Booleans called Truthtable, 00:05:01.620 --> 00:05:05.590 which consists of room for 10 Booleans. 00:05:05.590 --> 00:05:09.650 >> And then, just like we could just assign a value to any other variable of type 00:05:09.650 --> 00:05:13.470 Boolean, we could say something like Truthtable square bracket 00:05:13.470 --> 00:05:18.040 2, which is how we indicate, which element of the truth table? 00:05:18.040 --> 00:05:20.350 The third element of the truth table, because remember, 00:05:20.350 --> 00:05:21.800 we're counting from 0. 00:05:21.800 --> 00:05:25.690 So that's how we indicate the third element of the truth table. 00:05:25.690 --> 00:05:28.680 Truthtable 2 equals false, just like we could declare-- 00:05:28.680 --> 00:05:33.560 or we could assign, rather, any Boolean type variable to be false. 00:05:33.560 --> 00:05:35.050 >> We can also use it in conditions. 00:05:35.050 --> 00:05:39.000 if(truthtable 7 == true), which is to say, 00:05:39.000 --> 00:05:42.370 if the eighth element of Truthtable is true, 00:05:42.370 --> 00:05:46.760 maybe we want to print a message to the user, printf("TRUE!n");. 00:05:46.760 --> 00:05:50.290 That causes us to say Truthtable 10 equals true, right? 00:05:50.290 --> 00:05:53.590 Well, I can, but it's pretty dangerous, because remember, 00:05:53.590 --> 00:05:56.260 we have an array of 10 Booleans. 00:05:56.260 --> 00:06:02.340 So the highest index that the compiler has given us is 9. 00:06:02.340 --> 00:06:06.010 >> This program will compile, but if something else in memory 00:06:06.010 --> 00:06:09.110 exists where we would expect Truthtable 10 to go, 00:06:09.110 --> 00:06:13.980 we could suffer a segmentation fault. We might get away with it, but in general, 00:06:13.980 --> 00:06:14.710 pretty dangerous. 00:06:14.710 --> 00:06:19.759 So what I'm doing here is legal C, but not necessarily the best move. 00:06:19.759 --> 00:06:22.300 Now, when you declare and initialize an array simultaneously, 00:06:22.300 --> 00:06:23.960 there's actually a pretty special syntax that you 00:06:23.960 --> 00:06:26.250 can use to fill up the array with its starting values. 00:06:26.250 --> 00:06:30.130 It can get cumbersome to declare an array of size 100, 00:06:30.130 --> 00:06:33.430 and then have to say, element 0 equals this; element 1 equals this; 00:06:33.430 --> 00:06:34.850 element 2 equals that. 00:06:34.850 --> 00:06:36.370 What's the point, right? 00:06:36.370 --> 00:06:39.470 >> If it's a small array, you could do something like this. 00:06:39.470 --> 00:06:44.360 Bool truthtable 3 equals open curly brace and then comma 00:06:44.360 --> 00:06:48.060 separate the list of elements that you want to put in the array. 00:06:48.060 --> 00:06:50.520 Then close curly brace semicolon. 00:06:50.520 --> 00:06:53.910 This creates an array of size three called Truthtable, 00:06:53.910 --> 00:06:56.090 with elements false, true, and true. 00:06:56.090 --> 00:06:59.270 And in fact, the instantiation syntax I have here is 00:06:59.270 --> 00:07:03.350 exactly the same as doing the individual element syntax below. 00:07:03.350 --> 00:07:09.380 These two ways of coding would produce the exact same array. 00:07:09.380 --> 00:07:11.740 >> Similarly, we could iterate over all of the elements 00:07:11.740 --> 00:07:15.400 of an array using a loop, which, in fact, is a very strongly recommended 00:07:15.400 --> 00:07:16.790 at-home exercise. 00:07:16.790 --> 00:07:20.720 How do you create an array of 100 integers, where 00:07:20.720 --> 00:07:23.477 every element of the array is its index? 00:07:23.477 --> 00:07:26.560 So for example, we have a array of 100 integers, and in the first element, 00:07:26.560 --> 00:07:27.790 we want to put 0. 00:07:27.790 --> 00:07:29.810 In the second element, we want to put 1. 00:07:29.810 --> 00:07:33.319 In the third element, we want to put 2; and so on and so on. 00:07:33.319 --> 00:07:35.360 That's a really good at-home exercise to do that. 00:07:38.190 --> 00:07:40.220 >> Here, it doesn't look like too much has changed. 00:07:40.220 --> 00:07:44.170 But notice that in between the square brackets, this time, 00:07:44.170 --> 00:07:45.830 I've actually omitted the number. 00:07:45.830 --> 00:07:48.000 If you're using this very special instantiation 00:07:48.000 --> 00:07:50.380 syntax to create an array, you actually don't 00:07:50.380 --> 00:07:53.491 need to indicate the size of the array beforehand. 00:07:53.491 --> 00:07:55.740 The compiler is smart enough to know that you actually 00:07:55.740 --> 00:07:58.980 want an array of size 3, because you put three elements 00:07:58.980 --> 00:08:00.640 to the right of the equal sign. 00:08:00.640 --> 00:08:04.140 If you had put four, it would have given you a truth table of size four; 00:08:04.140 --> 00:08:06.270 and so on and so on. 00:08:06.270 --> 00:08:09.380 >> Arrays are not restricted to a single dimension, which is pretty cool. 00:08:09.380 --> 00:08:12.000 You can actually have as many side specifiers as you wish. 00:08:12.000 --> 00:08:16.470 So for example, if you want to create a board for the game Battleship, which, 00:08:16.470 --> 00:08:20.910 if you ever played, is a game that is played with pegs on the 10 by 10 grid, 00:08:20.910 --> 00:08:22.450 you could create an array like this. 00:08:22.450 --> 00:08:26.030 You could say Bool battleship square bracket 10 00:08:26.030 --> 00:08:29.590 closed square bracket square bracket 10 closed square bracket. 00:08:29.590 --> 00:08:32.710 >> And then, you can choose to interpret this in your mind as a 10 00:08:32.710 --> 00:08:35.576 by 10 grid of cells. 00:08:35.576 --> 00:08:37.409 Now, in fact, in memory, it really does just 00:08:37.409 --> 00:08:42.440 remain a 100 element, single dimensional array. 00:08:42.440 --> 00:08:46.070 And this, in fact, goes for if you have three dimensions or four or five. 00:08:46.070 --> 00:08:49.420 It really just does multiply all of the indices-- 00:08:49.420 --> 00:08:51.130 or all of the size specifiers-- together, 00:08:51.130 --> 00:08:53.480 and you just get a one-dimensional array of that size. 00:08:53.480 --> 00:08:57.090 >> But in terms of organization and visualization and human perception, 00:08:57.090 --> 00:08:59.240 it can be a lot easier to work with a grid 00:08:59.240 --> 00:09:02.980 if you're working on a game like Tic-tac-toe or Battleship, 00:09:02.980 --> 00:09:05.179 or something like that. 00:09:05.179 --> 00:09:06.970 It's a great abstraction, instead of having 00:09:06.970 --> 00:09:09.340 to think about a Tic-tac-toe board as a line of nine 00:09:09.340 --> 00:09:13.810 squares or a Battleship board as a line of 100 squares. 00:09:13.810 --> 00:09:16.010 A 10 by 10 grid or a three by three grid is probably 00:09:16.010 --> 00:09:17.225 a lot more easy to perceive. 00:09:19.820 --> 00:09:22.280 >> Now, something really important about arrays. 00:09:22.280 --> 00:09:25.950 We can treat each individual element of the array as a variable. 00:09:25.950 --> 00:09:27.700 We saw that earlier when we were assigning 00:09:27.700 --> 00:09:32.240 the value True to certain Booleans or testing them in conditionals. 00:09:32.240 --> 00:09:35.960 But we can't treat entire arrays themselves as variables. 00:09:35.960 --> 00:09:41.760 We cannot, for example, assign one array to another array using the assignment 00:09:41.760 --> 00:09:42.930 operator. 00:09:42.930 --> 00:09:44.640 It's not legal C. 00:09:44.640 --> 00:09:47.920 >> If we want to, for example-- what we would be doing in that example 00:09:47.920 --> 00:09:50.200 would be to copy one array into another. 00:09:50.200 --> 00:09:53.810 If we want to do that, we actually need to use a loop to copy over 00:09:53.810 --> 00:09:56.550 each individual element one at a time. 00:09:56.550 --> 00:09:58.700 I know it's a little time consuming. 00:09:58.700 --> 00:10:04.022 >> So for example, if we had these couple of lines of code, would this work? 00:10:04.022 --> 00:10:05.230 Well, no, it wouldn't, right? 00:10:05.230 --> 00:10:07.860 Because we're trying to assign food to bar. 00:10:07.860 --> 00:10:09.860 That's not going to work, because it's an array, 00:10:09.860 --> 00:10:13.130 and we just described that that's not legal C. 00:10:13.130 --> 00:10:15.580 >> Instead, if we want to copy the contents of food 00:10:15.580 --> 00:10:18.070 into bar, which is what we're trying to do here, 00:10:18.070 --> 00:10:19.970 we would need a syntax like this. 00:10:19.970 --> 00:10:24.170 We have a for loop that goes from J is equal to 0 up to 5, 00:10:24.170 --> 00:10:28.390 and we increment J on every iteration of the loop and assign elements like that. 00:10:28.390 --> 00:10:33.360 This would result in bar also being one, two, three, four, five, 00:10:33.360 --> 00:10:36.730 but we have to do it this very slow element-by-element way, 00:10:36.730 --> 00:10:40.009 instead of by just copying the entire array. 00:10:40.009 --> 00:10:42.050 In other programming languages, more modern ones, 00:10:42.050 --> 00:10:45.610 you can, in fact, do just that simple equals syntax. 00:10:45.610 --> 00:10:49.620 But C, unfortunately, we're not allowed to do that. 00:10:49.620 --> 00:10:52.026 >> Now, there's one other thing I want to mention 00:10:52.026 --> 00:10:54.650 about arrays that can be a little bit tricky the first time you 00:10:54.650 --> 00:10:55.990 work with them. 00:10:55.990 --> 00:10:59.860 We discussed in a video about variable scope, 00:10:59.860 --> 00:11:04.940 that most variables in C, when you call them in functions, are passed by value. 00:11:04.940 --> 00:11:08.620 Do you remember what it means to pass something by value? 00:11:08.620 --> 00:11:12.570 It means we're making a copy of the variable that's being passed in. 00:11:12.570 --> 00:11:16.290 The callee function, the function that's receiving the variable, 00:11:16.290 --> 00:11:17.730 doesn't get the variable itself. 00:11:17.730 --> 00:11:20.850 It gets its own local copy of it to work with. 00:11:20.850 --> 00:11:24.070 >> Arrays, of course, do not follow this rule. 00:11:24.070 --> 00:11:27.600 Rather, what we call this is passing by reference. 00:11:27.600 --> 00:11:31.360 The callee actually does receive the array. 00:11:31.360 --> 00:11:34.207 It does not receive its own local copy of it. 00:11:34.207 --> 00:11:36.040 And if you think about it, this makes sense. 00:11:36.040 --> 00:11:39.750 If arrays are really large, it takes so much time and effort 00:11:39.750 --> 00:11:44.470 to make a copy of an array of 100 or 1,000 or 10,000 elements, 00:11:44.470 --> 00:11:48.290 that it's not worth it for a function to receive a copy of it, 00:11:48.290 --> 00:11:51.037 do some work with it, and then just be done with the copy; 00:11:51.037 --> 00:11:53.120 it doesn't need to have it hanging around anymore. 00:11:53.120 --> 00:11:54.710 >> Because arrays are some bulky and cumbersome, 00:11:54.710 --> 00:11:56.001 we just pass them by reference. 00:11:56.001 --> 00:12:01.210 We just trust that function to, don't break anything. 00:12:01.210 --> 00:12:03.010 So it does actually get the array. 00:12:03.010 --> 00:12:05.290 It doesn't get its own local copy of it. 00:12:05.290 --> 00:12:07.170 >> So what does this mean, then, when the callee 00:12:07.170 --> 00:12:08.970 manipulates elements of the array? 00:12:08.970 --> 00:12:10.780 What happens? 00:12:10.780 --> 00:12:13.210 For now, we'll gloss over why exactly this 00:12:13.210 --> 00:12:15.320 happens, why arrays are passed by reference 00:12:15.320 --> 00:12:17.810 and everything else is passed by value. 00:12:17.810 --> 00:12:20.470 But I promise you, we will return and give you the answer 00:12:20.470 --> 00:12:23.750 to this in a later video. 00:12:23.750 --> 00:12:28.110 >> Here's one more exercise for you before we wrap up things on arrays. 00:12:28.110 --> 00:12:31.400 The bunch of code here, that's not particularly good style, 00:12:31.400 --> 00:12:33.400 just I'll make that caveat. 00:12:33.400 --> 00:12:36.660 There's no comments in here, which is pretty bad form. 00:12:36.660 --> 00:12:39.750 But it's only because I wanted to be able to fit everything on the screen. 00:12:39.750 --> 00:12:44.360 >> At the top, you can see that I have two function declarations for set array 00:12:44.360 --> 00:12:45.820 and set int. 00:12:45.820 --> 00:12:49.680 Set array apparently takes an array of four integers as its input. 00:12:49.680 --> 00:12:52.767 And set int apparently takes a single integer as its input. 00:12:52.767 --> 00:12:54.350 But both of them don't have an output. 00:12:54.350 --> 00:12:57.689 The output, the return type, of each one is void. 00:12:57.689 --> 00:12:59.480 In Main, we have a couple of lines of code. 00:12:59.480 --> 00:13:02.730 We declare an integer variable called A and assign it the value 10. 00:13:02.730 --> 00:13:07.080 We declare an array of four integers called B and assign the elements 0, 1, 00:13:07.080 --> 00:13:08.730 2, and 3, respectively. 00:13:08.730 --> 00:13:12.190 Then, we have a call to set int and a call to set array. 00:13:12.190 --> 00:13:15.910 The definitions of set array and set int are down below, at the bottom. 00:13:15.910 --> 00:13:17.640 >> And so, again, I ask you the question. 00:13:17.640 --> 00:13:20.770 What gets printed out here at the end of Main? 00:13:20.770 --> 00:13:23.020 There's a printout col. I'm printing out two integers. 00:13:23.020 --> 00:13:28.010 I'm printing out the contents of A and the contents of B square bracket 0. 00:13:28.010 --> 00:13:29.880 Pause the video here and take a minute. 00:13:29.880 --> 00:13:35.482 Can you figure out what this function will print at the end? 00:13:35.482 --> 00:13:38.190 Hopefully, if you recall the distinction between passing by value 00:13:38.190 --> 00:13:41.680 and passing by reference, this problem wasn't too tricky for you. 00:13:41.680 --> 00:13:44.130 And the answer you would have found is this. 00:13:44.130 --> 00:13:47.660 If you're not really sure as to why that's the case, take a second, 00:13:47.660 --> 00:13:50.620 go back, review what I was just discussing about passing arrays 00:13:50.620 --> 00:13:53.450 by reference, versus passing other variables by value, 00:13:53.450 --> 00:13:56.680 and hopefully, it'll make a little bit more sense. 00:13:56.680 --> 00:13:59.760 >> I'm Doug Lloyd, and this is CS50.