Introduction to Time Series:
Time Series is a series of time stamped values. Time stamped data is basically a sequence of data that has time values attached to the sequence of values.
If we want to forecast the 7th time stamp value why don’t we use regression? For getting the answer to this question we need to know the difference between the regression and time series.
Difference between Regression and Time Series:
Examples of Time Series data:
1. Daily closing stock prices
2. Weekly interest rates
3. Sales figures
4. Annual India population data
5. National Income
Components of Time Series:
Time Series consists of predictable and unpredictable part.The classification is shown in below figure.
Local Predictable part (Xt):
Local predictable part consists of auto regressive behavior. It shows how time series is influenced by its immediate past.
If we know few time stamps immediately preceding time we can predict the time series values.
Example: Give yesterday’s price we can predict today’s price but not price after a year.
Global Predictable part:
In Global predictable part the present time series does not depend on the immediate past.
Example: Temperature in January does not depend on the December or November temperature.
This Global predictable part consists of two parts Trend and Seasonality.
Trend: Trend refers to any pattern that talks about the overall increase or decrease in the values.
Seasonality: Seasonality refers to a repeating pattern of values seen in the data.
Note: Trend and Seasonality can both appear in time series.
Now clearly this data has both types of patterns, it has trend as overall pattern suggests an increase in sales. However the sales are also seasonal, as a similar up and down pattern is repeating itself every 12 months.
Different ways in which these components can be related to each other are
Additive model: When the magnitude of the seasonal pattern in the data does not directly correlate with the value of the series. In Additive model all the components are added i.e. TS=Tt+St+Xt+Zt .
Multiplicative model: When the magnitude of the seasonal pattern in the data increases with an increase in data values and decreases with a decrease in the data values. In Multiplicative model all the components are multiplied i.e TS= Tt*St*Xt*Zt.
Time Series Modeling:
Steps for modeling a time series:
1. Visualize the time series.
2. Recognize the trend and seasonality component.
3. Apply regression to model the trend and seasonality.
4. Remove the trend and seasonal component from the series. What remains is the stationary part : a combination of auto regressive and white noise.
5. Model this stationary time series.
6. Combine the forecast of this model with the trend and seasonal component.
7. Find the residual series by subtracting the forecasted value from the actual observed value.
8. Check if the residual series is pure white noise.
If a time series is stationary, its statistical properties will be same throughout the series, irrespective of the time at which you observe them. In other words, for a stationary time series, properties such as mean, variance etc will be the same for any two windows that you pick.
In general, a stationary time series will have no long-term predictable patterns such as trends or seasonality. Time plots will show the series to roughly have a horizontal trend with the constant variance.
First if we take a time series we will model the Global trend and seasonality and then remove them from the time series and it will result in a weakly stationary time series.
So the locally predictable part is a Weakly Stationary series and can be modeled using ARIMA process.
The noise is what remains when all the predictable parts of a time series have been modeled and extracted from it. It is a set of independent and uncorrelated values. If you plot a white noise over time, it looks like this,
Notice that there is no identifiable trend or seasonal components. So a white noise series is basically an example of stationary series (strong stationary series).
We need to model the locally predictable part that is Weak stationary series. We have two basic types of time series models:
1. Auto regressive (AR) model:
An auto regressive time series is one where the value at time ‘t’ depends on the values at times (t-1)….(t-h) superimposed on a white noise term. You define an auto regressive time series AR(h) of the order ‘h’ as a series
where for some constants a and b. The coefficient b represents the influence (weight) of the value of the time series ‘i’ steps in the past, on the current value.
2. Moving Average (MA) model:
A moving average time series is one where the influence of the noise at some time step ‘t’ carries over to the value at t+1, or possibly up to t+h for some fixed ‘h’. Formally, a moving average time series of the order ‘h’ (denoted MA(h)) is
where Zt is the noise at present time stamp. The value at time ‘t’ in an MA(h) process is, therefore, the noise at the current time, t, superimposed on the cumulative weighted influence of the noise at ‘h’ previous time stamp.
Auto regressive Moving Average (ARMA) model:
A time series that exhibits the characteristics of AR(p) and MA(q) process can be modeled using ARMA(p,q). It is also called Classical decomposition method.
Where p is the number of time stamp of AR and q is the number of time stamps of MA model.
We look for cutoff value in the PACF plot for the most optimal p in AR(p) model and the ACF for q in the MA(q) process. For the ARMA(p,q) process, we need to find the cutoff lag values from both the ACF and PACF plots.
From these plots we have p as 2 and q as 1. So it is ARMA(2,1) model.
It makes the time series stationary through a particular algorithm called differencing, after which it can once again be modeled as an ARMA process.
Differencing: This is the method of replacing each point of the time series with the differenced points. This method can easily be automated and that removes the work of guessing an appropriate trend/seasonal pattern from the analyst.
In general ARIMA means difference the series. If the differenced series is stationary then model it as ARMA process.
Tests for Stationarity:
Strong Stationarity: If a series is stationary, then the slope of the time series plot will look identical, irrespective of time, t, at which you start collecting/ observing the data. The time series shifted to the right or left makes little difference to the shape of the plot.
Ex: White Noise and Constant function.
Weak Stationarity: If a series is stationary, then the pair wise relationships are preserved. In other words the time series shifted to the right or left make a fixed difference to the shape of the plot.
Ex: Locally predictable part of time series.
For testing strong stationary series we have methods like:
2. Q-Q plot
3. ADF test
4. KPSS test
If a series is white noise, the values in it belongs to a normal distribution. So when you take a
Stationary series and plot a Histogram for that it should resemble the normal distribution curve.
2. Q-Q plot:
If you plot a Q-Q plot it should be a straight line if it is strongly stationary series.
3. ADF test:
It is the abbreviated form of Augmented Dickey-Fuller test.
Here Null Hypothesis (H0): The given series is not stationary.
4. KPSS test:
It is the abbreviated form of Kwiatkowski–Phillips–Schmidt–Shin test.
Here Null Hypothesis (H0): The given series is stationary.
For testing the Weakly stationary series we have two methods:
1. ACF test
2. PACF test
(i) Test for strong stationarity (white noise): ACF/PACF should not be significantly different from zero for non-zero lags.
(ii) Test for Weak stationarity (local predictability): Check ACF/PACF plot patterns similar to the AR, MA and ARMA processes.
Once we made local predictions and global predictions, we will naturally need to evaluate the model. For that one measure is that is widely used MAPE (Mean Absolute Percentage Error).
Time series analysis using R:
The ARMA process can be done by differentiating the series and getting it into stationary form step by step.
Let us take only the sales column from the data set.
The first few observations of the data are shown below.
We will convert the series into time series and convert it into matrix and plot the series
The time series consists of trend and seasonality components as shown below.
We first decompose the time series into trend and seasonal components and then remove the seasonal component from the original time series.
The plot of the remained trend is as shown below.
Now we have to make this series as stationary if we want to apply ARIMA function on this
The plot is as shown below which is a stationary series
Now we plot ACF and PACF plots to know the values of p and q
So the order of the ARIMA model is ARIMA(2,1,2)- 2+1+2=5
Now we can apply arima() function on this stationary series
The results are as follows
About Bhagavathi :
Bhagavathi is B.Tech (Electronics and Communication Engineering). Currently she is working as Analyst Intern with Nikhil Guru Consulting Analytics Service LLP (Nikhil Analytics), Bangalore.