18.2 Fit and assess ARIMA models - Video Tutorials & Practice Problems
Video duration:
5m
Play a video:
<v Voiceover>Perhaps the</v> most commonly used form of time series model is an ARIMA model. That stands for Auto-Regressive Integrated Moving Average. There are a few components to this and we will tackle them slowly. The integrated part of ARIMA, that is ARIMA, the integrated part is how you difference the data. That is, subtracting the previous day's values from yesterday's, and so forth, and so on. That's a very common technique for making a time series stationary. And stationarity means that its mean over time is roughly the same and its variance over time is roughly the same and there's no big changes as you progress through time. That's what the integrator stands for. Now the AR and the MA parts are components about the linear independence of the time series with its own past. Let's look at the formula. This may seem complex, but fortunately there are computer programs to solve it for us. The left hand sign is the auto-regressive. X of T is today's value, X of T minus one is yesterday's value, until you get to X of T minus P, which is today minus P units ago. This is the part of the time series that depends on itself. Each of those Phis are the coefficients for the lagged data. The ray inside is the moving average component of an ARIMA model. Each of the Z T's are presumed to be white noise, that is, essentially random noise. Those are the residuals, essentially. Those are the errors. And what we're saying is today's errors depend on previous errors. And here are the Thetas of the coefficients for the moving average component of the ARIMA model. Fitting an ARIMA can be difficult trying to figure out exactly what you want to use. How many lags do you need? Do you want to use two lags, do you want to use three lags? And that's just for the AR component. What about for the MA component? Zero lags, one lag, two lags? It's something that really takes a lot of effort to figure out. Fortunately, there is a function that can do a lot of that for us. It's in the forecast package. And it is called auto.arima. So we will create a variable, usBest gets auto.arima(x=us). There are other arguments we can use, but this will go through and it'll fit a number of iterations trying to find the best model. So we enter in the model. It shows an ARIMA 221 model. That means an AR component of 2, an MA component of 1, and a data set that was differenced twice. The general terminology for an ARIMA model, or an ARMA model if you're not differencing, is ARMA PQ, where P is the number of lags for the AR component, and the Q is the number of lags for the MA component. In this case, the P is 2 and the Q is 1 and the D for differences is also 2. To see how good of a model this is, we can check the ACF and the PACF of the residuals. So we do ACF of usBest residuals. And we see for the residuals, which are the error terms, none of the lags appear to be significant. None of them cross the blue line. So it looks like we've done a fairly good job. Or actually, the function auto.ARIMA did a pretty good job 'cause it pretty much did all the work for us. And we will also check the PACF of the residuals. And here we see, yet again, that none of the lags are significant. So it seems like we're doing a fairly good job. And remember, the model returns a series of coefficients. So if we want to grab them just like we would do with an LM, we could do coef of usBest. And here we see the coefficients. Many people use time series to make predictions. Perhaps you're predicting the value of a stock or a bond. That is easy to do with the predict function. You call up predict, you use the model usBEST, and you say n.ahead equals 5. We'll predict five units into the future. In this case, those units are years. And we will say, give us standard errors. We run this and the first element returned are the point predictions. And the second element returned are the standard errors for each of those point predictions. Notice the standard errors get wider. That's because, as you get more to the future, you have more uncertainty. Now, instead of using predict, we could use the forecast function from the package forecast, which I recommend, 'cause that's a really great package. Say theForecast gets forecast. The object will be our model, usBest. And we'll say five units in the future. Now, we can look at that, or we can plot it, because it builds a really nice plot all automatically. So while it not be as pretty as a ggplot, does it all automatically for you. You have the time series, and then you have the prediction. And you can see the thick blue line consists of the predictions for the next five years and the two shaded grey bands represent the confidence intervals around this prediction. While fitting an ARIMA model can be a bit of an art and takes a lot of understanding your data and understanding how to work these models, the auto.ARIMA function can make your life significantly easier. It's very important to choose the right P and the right Q for the ARMA components and the right number of differences. That will make your predictions that much stronger.