5.7 Make scatterplots with ggplot2 - Video Tutorials & Practice Problems
Video duration:
5m
Play a video:
<v Voiceover>The work course</v> of statistical graphics is a scatterplot. It's a two dimensional display that really captures the essence of the data. In this case we're going to look at the price of the diamond versus the carat of a diamond. So let's initialize our plot using ggplot diamonds. This time I'm not saying that data equals diamonds because since diamonds is the first argument, it is assumed it goes with data. This time, unlike with histograms, I am putting the aes command inside the ggplot function. I could have put it in the geom layer such as geompoint, but it can also go in ggplot, and depending on the type of plot you're building up, it could be easier to go with ggplot, it could be easier to go in the geom, really depends on what you need, you have the flexibility to do both. So in this case, aes, x equals carat and y equals price. Close aes, close ggplot. Now while we have all the information, we need to add a layer. So we'll add geom_point, open and closed parenthesis, because remember, that is a function call. Running this builds up an attractive looking scatterplot of the diamonds data. Already this is looking much, much more attractive than the base scatterplot. The points are filled in, we have this nice crosshatching, we have the attractive gray background, good axis, everything about this is really nice. Often times when building up a plot you'll want to make a base plot, hold on to it a little bit or keep modifying it little by little. So rather than having to type this out again and again and again, let's save the first part of it, the ggplot call to a variable. We do this as if we were assigning any other variable. So G gets ggplot diamonds aes, x equals carat, y equals price, close aes, close ggplot, and run it. If we were to run this variable, we would get an error, there are no layers in the plot. There's not a lot of information in here, it is just some structural information about the ggplot. But notice it did try drawing something and then erased our previous graph. That's important to note. In Rstudio you can go back to the previous graph using this back arrow. And then we go back to our nonexistent second plot because it gave back an error. To actually plot a graph you can do g plus geom_point. And that's all that would be needed. So let's run this and see that we should get the same plot right again. Commonly when working with graphs, you want more information than just an x and y axis. Perhaps we want to color code the chart. Our diamonds data has a lot of information, including the color of the diamond. So why don't we color code the plot according to the color of the diamond? So we start at our base plot which is g, add a layer, geom_point, and here we'll add yet another aesthetic, we'll say aes color is mapped to the color of the diamond. Running this we'll take our scatterplot and give it nice beautiful colors to go along with the data. As you can see here, the points are automatically color coded and a legend is automatically created telling us that this color goes with D, this with E, this with F, and so forth and so on. It is a fantastic way to visualize a graph. Making the same plot with base graphics would take a lot more effort, tens of lines of code just to get this right. Whereas in ggplot, this one line of code did it all for us automatically. Once graphs get complicated, a lot of effort can be saved. And color is not the only aesthetics we can work with. There's also size and shape. So why don't we take a look at a few of those? We'll start off our base graph and do g plus geom_point, we'll leave color as it is, and we will map shape to clarity. Running this we get a warning message, but we get two new graphs here. What happened here is clarity has more than six levels, and ggplot doesn't really like plotting more than six different shapes. It believes that's not good because the human eye can only tell so many apart. So why don't we copy and paste this, and try shape with cut, and see what that looks like. Now ggplot didn't complain, it gave us our graph mapping color to color and cut to shape. Now it looks like the legend is a little thrown off of skew, that depends on how big you make the plot, and there are some controls for the legend. Well look really closely here, in fact, why don't we expand this graph using the zoom feature. And see how you can see the different shapes in here, you have pluses, triangles, squares, all sorts of different ways to see the data. Mapping aesthetics is a very powerful tool in ggplot and can be used to great effect.