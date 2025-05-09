Welcome back, everybody. So in recent videos, we've taken a look at X and Y scatter plots, and we've been able to sort of visually tell just by looking at the data whether there were trends or correlations in them. But sometimes, you may have to sort of more specifically quantify how related two variables are. Fortunately, there is a word for that, and they're called correlation coefficients. Now I know that sounds kind of scary, but all I'm gonna show you in this video is that it's just a number.

It's a simple number that represents how related two variables are. So let's take a look at a couple of examples. We're gonna jump right in. Let's get started. Alright?

So basically, this correlation coefficient, sometimes called the linear correlation coefficient, is given by the letter little r, and basically, it is just a number that is between negative one and positive one. So you can kind of imagine it on a number line like this. And what it does is it measures two things: it measures the direction and the strength of correlation—oops, strength of correlation—between two variables.

In fact, let's just jump right into our example so I can show you how this all works. So we have these three correlation coefficients that are given to us over here, 0.13, 0.64, and -0.96. They're all numbers between negative one and positive one, and they always will be. All we have to do is we're given these three numbers, and we just have to match them up to their appropriate graphs. Alright?

So let's go ahead and get started here. So I mentioned that r measures the direction and strength. So let's talk about the direction. The direction of correlation is always going to match the sign of that r value. So, for example, in a couple of videos, we talked about how positive correlation shows an upward trend like this.

Those are gonna be positive r values. So basically, whenever you have positively trending upwards like this, your r value is gonna be positive, and vice versa. Whenever you have something that sort of slopes downward like this, that's negative correlation. Those r values will be negative. Alright.

So let's take a look at our three numbers. You'll notice that these two are positive and this one is negative. So we just have to find which one of these graphs shows a downward-sloping trend in data. And if you take a look, we should see it here that there's basically only one, and it's going to be this left one over here. So automatically, just by looking at the graphs, we can figure out that this one is the -0.96.

So sometimes you can tell just right off the bat, just by looking at the signs of the r values. Alright? Okay. So that's a little bit more about the strength—oh, sorry, the direction. But let's talk about the strength of that r value.

So we say that correlation is strong whenever the points are tightly clustered around the line that kind of cuts through a lot of the data points. For example, in that left graph, you can see that all of these data points are actually really sort of all lined up such that there's a little bit of wiggle room here, but you can almost kind of draw a line that cuts through most of those data points, and they're really, really tightly clustered around them. So this is what we would say is strong correlation. So this is sort of how we can visually tell whether two variables are strongly correlated. Now, when this happens, the r value is going to be close to either negative one or positive one, depending on whether it slopes down or up.

Alright. So clearly, we can see here that this slopes downward. Therefore, it's negative. But because the data points are tightly packed, it's going to be an r value that's very, very close to -1. Alright.

Now, so basically what happens here is as you get closer towards either 1 or -1, that correlation gets stronger and stronger. Now, on the opposite side, if you get closer towards 0, it means that the correlation is getting weaker. And when you have values that are close to zero, it means that there's no correlation and the values or the data points are kind of scattered around everywhere. Alright? So let's take a look at our remaining two values.

We got 0.13 and 0.64. So both of them are positive numbers, but which one do you think is going to represent each one of these two graphs? Well, if I take a look at the second graph over here and compare it to the third one, you can see that these values, these data points, are much more loosely scattered around. I can't really draw any line that cuts through most of the data points so that there's no real correlation anywhere here. So which of the values do you think it's going to be?

Well, it's going to be the 0.13, and this is what we would say is no correlation as we've seen before. So no correlation is just when you have values that are really, really close to zero. Alright? Therefore, just by default, we have that this final value over here, this r value, is going to be 0.64. So we can still see that there's somewhat of a trend line that cuts through most of these numbers.

But unlike that left graph, those data points don't really sort of tightly cluster around that line. So we say that this is sort of weak correlation. Now there's no general consensus on what counts as strong versus weak, what's the cutoff. But generally, anything beyond 0.8 is kind of considered strong here. So somewhere around 0.8 would be sort of like a good boundary for that.

Alright? That's really all there is to it, folks. There's one final point I wanna make here, which is that if you actually look at the values for the r values, don't get confused with the slopes of those lines. So you might be thinking, well, I've got this negative value over here, but because the slope of this line is kind of shallow, therefore the r value must be kind of small. Just be really, really careful about this because the slope of that best fit line has nothing to do with, does not affect at all, that value of r.

So notice how even though these points over here are shallower than the trend line in the third graph, it all has to do with how tightly those data points are packed against that best fit line. Alright? So therefore, even though this is steeper, it actually has a lower r value. Alright? So that's really all there is to it, folks.

Thanks for watching. Let's get some practice.