7.1 Repeat an operation on a matrix using apply - Video Tutorials & Practice Problems
Video duration:
4m
Play a video:
<v Voiceover>To help facilitate</v> the avoidance of loops, R has a lot of built-in functions, such as apply, L apply, S apply, and others, that iterates over data in a very quick fashion that will often be faster than using a loop. So to illustrate this, let's first look at apply, the most basic of all of these functions. In order to do this, we will first build a matrix. We will call it "theMatrix". And we will simply make it the first nine numbers, it'll be a nice square matrix. We look at this, and we see we just have, going down the first column's one two three, then four five six, seven eight nine. So let's first assume that we would like to get the sum of each column, right? We're gonna sum up this column, this column, this column, all independently. So what we can do is say "apply", give it the matrix we want to work with, and that is a very important point. Apply expects a matrix. Not a data frame, not a list, not a vector, a matrix. And if you use apply on a data frame, it will first convert it into a matrix. So that means if your data frame has some columns that are characters, some columns that are numeric, it will turn them all into character. So it's one single matrix. So since we want to sum up the columns, we give it the margin, and I'll show this right here, the margin that tells you whether you're gonna do it over rows or columns. We say two. This will sum up the columns. And the third argument here is the function you want to apply. In this case we want to apply sum, so literally we say "sum". We run this, and we get the results six, 15, 24. If we had wanted to sum up the rows, it would be very similar, apply the matrix, you say margin equals one, and you sum it up. 'cause the row is the first margin. This will give us 12, 15, 18. Very simple to do, it literally just iterates over the data. In one case it iterates over each column, and as far as the user knows, it can be happening in order, out of order, it can be in parallel, now, apply won't be in parallel, there are packages that offer a parallel version, but for all intents and purposes, it doesn't really matter what order they're in. Now to illustrate how trivial of an example this was, there are built-in functions to do just this. ColSums of the matrix should give us back six, 15, 24, and rowSums of the matrix will give us back 12, 15, 18. So while this might have been a very trivial example, it is very, very useful to have this function in your back pocket. This worked terrifically on nice clean data, but there might be situations where you have missing data, so let's go insert a missing value into one element of the matrix and see what happens. We will say the matrix, second row, first column, gets NA. Let's look at that now. Right here, second row, first column is NA. Let's go ahead and do a row sum of this. We say apply the matrix, we'll do it across the rows, and we will say sum. We get this NA in there. Because since one value was NA, it can't process it. And this is very important, because if you do have data that's NA, you don't wanna just skip over it and say oh, got it in, just add up five and eight and ignore the NA. The NA means a lot, and you can't just throw it out automatically. However, there is an optional setting for this, and this setting comes of end sum, and we do see this in another lesson. We do apply the matrix one to say we're doing the rows, and we're applying the function sum. Further arguments here are arguments that get passed on to your function. Fortunately the sum function has an argument called NA dot remove, we set that to true. And what that does, it first removes any NA values, and then sums up the numbers. Running this, we get 12, 13, 18. To confirm this, we can go ahead and do row sums of the matrix, which at first returns that NA, but it also has this special argument that lets us remove the NA values. And again 12, 13, 18. Using apply on a matrix is a great way to avoid using a loop and can significantly speed up your code.