5.5 Get familiar with ggplot2 - Video Tutorials & Practice Problems
Video duration:
2m
Play a video:
<v Voiceover>Ggplot2</v> represents a great advance in statistical plotting with R. It is a statistical package written by Hadley Wickham that facilitates the easy creation of beautiful graphics. While it makes life a lot easier once you learn it, it does have a bit of a steep learning curve. And you have to get used to the different way things are done. So first things first, if you haven't a new R session and you haven't loaded ggplot2 yet, you need to load that. So you type in require(ggplot2). For us, it was already loaded, so things are good to go. The core of ggplot2 is the ggplot function, that's ggplot. Now, this function can take a number of different arguments. Usually the main argument it takes is the name of the data. So that's the first argument. We can work with other arguments in a little bit. All this does is initialize a plot. In order to actually work with a plot, you need some sort of layer, either points or a histogram or lines or something. These take the form of geom, underscore, the type of layer. That could be point, that could be histogram, that could be line. Takes on many different things. Each of these functions, or the ggplot function itself, can take on a number of arguments. The most important of these arguments is the aesthetic argument and the value fed to that is the aes function. This function itself takes a set of mappings, for instance, you have to tell it what variables should be mapped to the x axis, what variables should be mapped to the y axis, and so on and so forth. You can map the axes, you can map shape, size, and color. There's a lot to learn here, and throughout this section and this entire series, you will see a lot of ggplot plotting. Another important point to remember is that a ggplot graphic is built up layer by layer. That is, you have the base part of it, ggplot, with some number of data in here, maybe it has data, and that is separated from layers by a plus sign. So maybe we're making a scatterplot, in which case you would do geom_point, and in here, you would have some sort of aesthetic mapping of x and y. Everything I've written here so far is just pseudo code. We'll see real code in a little bit, but the main point to get across is that layers are separated by plus signs, that a particular data set is needed, and it has to be a data frame, and that different geoms, such as points, lines, or histograms, are used to make the graph itself mapping variables to physical dimensions of the graph.