Introduction to Forecasting with ARIMA in R

What Makes ARIMA & XTS Objects So Useful for Forecasting

XTS Objects

If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout is that these objects are a lot easier to work with when it comes to modeling, forecasting, & visualization.

What Are They?

XTS objects are composed of two components. The first is a date index and the second of which is a traditional data matrix.

Whether you want to predict churn, sales, demand, or whatever else, let’s get to it!

The first thing you’ll need to do is create your date index. We do so using the seq function. Very simply this function takes what is your start date, the number of records you have or length, and then the time interval or by parameter. For us, the dataset starts with the following.

days <- seq(as.Date("2014-01-01"), length = 668, by = "day")

Now that we have our index, we can use it to create our XTS object. For this we will use the xts function.

Don’t forget to install.packages('xts') and then load the library! library(xts)

Once we’ve done this we’ll make our xts call and pass along our data matrix, and then for the date index we will pass the index to the order.by option.

sales_xts <- xts(sales, order.by = days)

Lets Create a Forecast with Arima

Arima stands for auto regressive integrated moving average. A very popular technique when it comes to time series forecasting. We could spend hours talking about ARIMA alone, but for this post we’re going to give a high level explanation and then jump directly into the application.

AR: Auto Regressive

This is where we predict outcomes using lags or values from previous months. It may be that the outcomes of a given month have some dependency on previous values.

I: Integrated

When it comes to time series forecasting, an implicit assumption is that our model depends on time in some capacity. This seems pretty obvious as we probably wouldn’t make our model time based otherwise ;). With that assumption out of the way, we need to understand where on the spectrum of dependence time falls in relation to our model. Yes our model depends on time, but how much? Core to this is the idea of Stationarity; which means that the effect of time diminishes as time goes on.

Going deeper, the historical average of a dataset tends to be the best predictor of future outcomes… but there are certainly times when that’s not true.. can you think of any situations when the historical mean would not be the best predictor?

  • How about predicting sales for December? Seasonal Trends
  • How about sales for a hyper-growth saas company? Consistent upward trends

This is where the process of Differencing is introduced! Differencing is used to eliminate the effects of trends & seasonality.

MA: Moving Average

the moving average model exists to deal with the error of your model.

Let’s Get Modeling!

Train/Validation Split

First things first, let’s break out our data into a training dataset and then what we’ll call our validation dataset.

What makes this different then other validation testing, like cross-validation testing is that here we break it out by time, breaking train up to a given point in time and breaking out validation for everything there after.

train <- sales_xts[index(sales_xts) <= "2015-07-01"]
validation <- sales_xts[index(sales_xts) > "2015-07-01"]

Time to Build a Model

The auto.arima function approximates the best arima model.

model <- auto.arima(train)

Now lets generate a forecast. The same way we did before, we’ll create a date index and then create an xts object with the data matrix.

From here you will plot the validation data and then throw the forecast on top of the plot.

forecast <- forecast(model, h = 121)
forecast_dates <- seq(as.Date("2015-09-01"), length = 121, by = "day")

forecast_xts <- xts(forecast$mean, order.by = forecast_dates)

plot(validation, main = 'Forecast Comparison')
lines(forecast_xts, col = "blue")

Conclusion

I hope this was a helpful introduction to ARIMA forecasting. Be sure to let me know what’s helpful and any additional detail you’d like to learn about.

I’ll be adding a more detailed post on the topic of ARIMA forecasting where we will detail evaluation techniques, confidence levels, and more.

Happy Data Science-ing!

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: