[MUSIC PLAYING] SPEAKER: Well, hello, one and all. And welcome to our short on data frames. Now data frames are a convenient way in R to store data in terms of rows and columns. So I have here a table-- really a data frame that stores data in rows and columns. I have here two columns, one called name and one called distance. And each of my rows represents, in this case, a man-made spacecraft exploring the furthest reaches of outer space. So I have here Voyager 1, which is a probe that is currently about 163 astronomical units away from Earth, Voyager 2, which is 136 astronomical units away from Earth, and Pioneer 10, which is, right now, about 80 astronomical units away from Earth. So let's say I wanted to take a table like this and actually create it in R for manipulation, to use my programs, and so on. We'll see how we can do just that using a function called data.frame. So I have here a program called spacecraft.R. And my goal, first and foremost, is to really create that same table we just saw now here in R. Like I said before, we can use a function built into R called data.frame. It can conveniently create data frames for us based on some given vectors we provide as input. So I'll type here data.frame. And data.frame takes as input any number of named arguments where the names for those arguments become our column names. And the value for each of those arguments becomes the values to fill in that particular column. So for instance, if we recall, we had a column named name just like this. If I want it to fill in that column with, let's say, some vector of information, I could do so just like this. I could say name equals and, then, provide, as input, some given vector to be able to be able to fill in the values for this column here. So I could type, for instance, Voyager 1, followed by Voyager 2, followed by Pioneer 10. So this is the data that should fill in that first column of information. I'll separate these arguments with a comma here. And I'll say my next column was distance. And it seemed like Voyager 1 was 163 au away from Earth. Voyager 2 was 136 au, or astronomical units, away from Earth. And Pioneer 10 was about 80. So this, I think, is my data frame. Let me go ahead and store it in a object called spacecraft. If I run, let's say line one here, and go ahead and print out spacecraft by just viewing the object, I should now see down below that I have myself a table with two columns-- one called name, one called distance. And each of the rows seems to be exactly what we had in that table earlier as well. So this is my data frame thanks to data.frame. Now, notice here that we have those columns. But we also have these numbers on the left-hand side, like 1, 2 and 3. These are the row names that data.frame automatically provides for us when creating some new data frame. But more on those in a little bit. So certainly, I can view this data frame by typing out its name, in this case, spacecraft. But I might also want to access individual columns. Well, we've seen we can do that using the dollar sign syntax here, spacecraft, dollar sign, name. And that gives me, in this case, the vector that I actually gave as input for the name column. So I'll run line 8 here. And we'll see Voyager 1, Voyager 2, and Pioneer 10. This is a vector that composes the first column of our spacecraft data frame. Same thing, in fact, for the distance column. The distance column will now get that vector corresponding, in this case, to the distance column. And notice, too, that these vectors are of different types. This is a numeric vector. This is a character string vector. But they can all live in the same data frame. What I can't do, though, still is combine different data types in the same vector. So distance is strictly numeric, and name is strictly characters. But they can be combined into the same data frame despite being different types. So let's see how else we can try to access columns of our data frame. Well, one way we can do so is by not using this syntax but by using, let's say, the indexes, the indices of our various columns. If I type spacecraft, bracket 1, you might think-- well, maybe a few things. But you might think, for instance, that maybe this is referring to the first column of spacecraft. And you would be correct. But we'll get a bit of an unexpected result here. I'll run line 11. And we'll see that I do see that first column of spacecraft, but I see a few other hints here that this isn't quite a vector. I see the column name, which is still name. And I see those row names, which correspond to our data frame. So it seems like, when I have a data frame like spacecraft and I do something like bracket 1 or bracket 2, some index within these individual brackets here, well, I get back a subset of that data frame-- some number of rows that I asked for or some number of columns that I asked for. But the end result is still a data frame. This is not a vector. It is still a data frame but one of only one column. So let's say I wanted the vector instead for that first column. I can access that using bracket bracket 1-- bracket bracket 1. These double brackets give me access to the vector composing that first column, just like this. And we'll see this is, in fact, a vector. So be careful when you write your programs. If you have a data frame, recall that bracket notation with this number in here will give you access to, in this case, a subset of your data frame and not a vector itself. Let's try something like this, though. Maybe I want to access some particular row. Well, I can do so using some syntax we're probably familiar with by now. I can use a comma, space, 1. And that will give me access to, in this case, the first vector that I have in my data frame, in this case Voyager 1, Voyager 2, and Pioneer 10. So a similar way of accessing information to spacecraft bracket bracket 1 and spacecraft name if we wanted to get things by name here. Going down below, what if we wanted rows from our spacecraft data frame? Well, I could do this. I could do spacecraft and then 1, comma, space. And this gives me access to, in this case, the very first row of my data frame. So spacecraft, bracket 1, comma, space, that'll give me access to the first row. And in general, this bracket syntax, when I have a comma in the middle, I can actually ask for some particular value at some particular location, in this case, the first row and the first column, which is, in this case, Voyager 1. So suffice to say, a lot of ways to access data in data frames like these. And you'll get familiar with the syntax as you just practice it more and see what the results might be. Now let's look at these row names in particular. So I might want to just play around with them and see what they can do for me just to get a sense of what data frames can do. If we look back at our spacecraft data frame, notice how it did automatically give us row names like these. But maybe you want to set the row names yourself. Well, you can do that within the function data.frame. In fact, one of the arguments to a data.frame is one called row.names-- row.names. And I can give a vector as input to this particular argument here, maybe, in this case, the names of our spaceships. So notice how I've removed that column we called names. I'm now instead using this argument row.names and giving it the same vector of all of our names of our spaceships here. Let's see the result. I'll go ahead and fill in this comma here, and I'll run. We see on line 1 and then line 6. And we still have a data frame, but it looks a little bit different. Notice how I only have one column-- distance. And on the left-hand side here, I see these values that are on the left-hand side of my data frame. Well, these are the row names-- Voyager 1, Voyager 2, and Pioneer 10. So by default, R will give you some ascending list of numbers. But I can override that if I wanted to using row.names. Well, of course, spacecraft$name, the name column within spacecraft, that no longer exists. If I go ahead and I run line six, I'll get back null. There is no column named name in the spacecraft data frame. But if I wanted to now, I could make use of these row names to access some particular row that I want. I could type spacecraft and then the row name like Voyager 1-- Voyager 1-- followed by a comma. And then, if I go ahead and hit Enter on this, I should see that I get back the particular row I was looking for, in this case, 163, which corresponds to the distance column. So this is a way of accessing information from our data frames. Why don't we go ahead and add another column here to just kind of see what this gives us more precisely. I can add a column, maybe one like type, so we can figure out what type each of these are. So each of these is a probe, probe, probe, probe. So here I am now, adding a new column called type. And I'll go ahead and rerun line 1 to update my data frame. Here is what it looks like. Now, I have those row names with two columns. If I now use the row name to access my data frame, I'll get back, in this case, that particular row that I'm looking from my data frame with both of those columns now involved as well. So this was our brief foray into data frames, creating them from scratch using this function called data.frame, as well as accessing them in various ways. You'll get the hang of as you do more practice. This, then, are short on data frames, and we'll see you next time.