5.3 Make scatterplots with base graphics - Video Tutorials & Practice Problems
Video duration:
1m
Play a video:
<v Voiceover>The scatter plot is probably</v> the most ubiquitous statistical graphic that exists. It is simply the plot of two variables, one along the x-axis, one along the y-axis, where each point represents a combination of an x-axis, y-axis value. For this example, we will look at the price of a diamond versus the carat of a diamond. So to build a scatter plot, we use the the simple plot function. In its simplest form, we can provide this function with an x-variable and a y-variable. In this case, it will be, diamonds$carat for the x-axis and diamonds$price for the y-axis. Running that shows a scatter plot where the carat is on the x-axis and the price is on the y-axis, and you see a trend where as the carat gets bigger, the price gets even higher. Now this way of writing a scatter plot is easy enough, but there is yet another way. Again, using the plot function, this time, instead of specifying each variable separately, we will say that we want to see price versus carat, and that these data are coming from the diamonds data set. What this is saying is put price on the y-axis and carat on the x-axis. This notation using a tilde is called a formula, and we learn a lot more about that in the section on modeling. For now, let's see how this plots. And the plot looks identical, because nothing changed. It's showing the exact same thing with just a different way of writing it. In order to change the title, we could run this plot again. I'm going to copy and paste it, and this time I will put in an argument for main equals price vs carat, end quotes. Running that gives a similar plot with just a different title.