I’m working on a 95 page report. It’s a forecasting project. Fun Fun. Since I’m busy working on this, I don’t really have time to work on a real post, so here is the current section I’m working on, ARIMA. Notice that my final ARIMA model is more accurate than the one the $5,000 piece of software selected as the “best” model.
First we start by looking at the ACFs, and it’s obvious we need to take first differences. This results in the following error ACFs:
With first differences, the seasonality of the data series stands out like crazy. So instead we try seasonal 12th differences, and that results in the following error ACFs:
Ok, now it’s clear that we should use 12th differences, and we should also put 1st differences back in. So in the next error ACF chart, we see 1st and 12th differences together:
Hmmm. A single significant negative spike at period 1. It looks like we now need to add in a moving average (the MA in ARIMA). I can’t tell if it’s seasonal or not, so we’ll put in a seasonal MA for now…:
Well, that didn’t seem to help much, did it. Perhaps we should also put in a non-seasonal MA.
Now it looks like we have white noise. Perfect. And when we look at the statistics, it’s clear this model, (0,1,1)(0,1,1) is far better than the (1,0,1)(0,1,2) that Forecast Pro selected. Interesting.
So what did Forecast Pro see that I’m not seeing?
Well, it just so happens there’s a fairly distinct autoregressive pattern back in the original ACFs. I wonder if we SHOULD try an AR(1) instead of 1st differences. Let’s see:
Very interesting. A (1,0,1)(0,1,1) yields white noise as well. The Bayesian Information Criteria is SLIGHTLY higher than the (0,1,1)(0,1,1), and the RMSE is also slightly higher… but we’re talking TINY differences.
So, why in the heck does Forecast Pro want to put a 2 into the seasonal moving average? Well, frankly it’s a quirk of Forecast Pro, and there seem to be differing opinions on why it does that.
So, in a toss-up, should I use the (1,0,1)(0,1,1) that performed slightly worse than (0,1,1)(0,1,1)… but which happened to be much closer to what the TWO expert systems I used to choose the best model. OR, should I use the statistically better model of (0,1,1)(0,1,1)? Or perhaps I should average the two…
If you’d like to chime in (HRT, I know you just got your MBA, so holla if you have an opinion), please let me know. The vital stats you need to know are the series is non-trended, cyclical, and 12 month seasonal. There are 151 data points of monthly data, and in the analysis 133 points were used for fit, and 18 points were withheld for out-of-sample forecasting statistics. BIC, RMSE are lower in (0,1,1)(0,1,1) and R^2 is within half a percent on either model. What ever should I do?