Time Series Analysis in R

On This Page

Introduction to Time Series Analysis

A Time series is a set of observations, each one being recorded at a specific time.

Discrete and Continuous time series

It is theoretical Concept. Discrete time series consists of data points separated by time intervals that are greater than one second.A continuous time series contains one data point usually measured per second.

Stationarity

Stationarity means that the statistical properties (usually mean, variance, covariance) of a time series (or rather the process generating it) do not change over time. Unless a time series is stationary it is not ready for further analysis as well as future prediction is not possible

Time Series Components

graph TD
    A[Time Series Components]
    A --> B[Trend]
    A --> C[Seasonal]
    A --> D[Cyclical]
    A --> E[Irregular]

Any time series is composed of any combinations of these four components:

Mathematically, a time series $ Y_t $ can be expressed as:

\[Y_t = f(T_t, S_t, C_t, I_t)\]

In the classical approach, the function $ f $ can be either simple multiplicative or simple additive.

Getting Started with R

To perform time series analysis in R, you need to have R and RStudio installed on your machine. You can download them from the following links:

  1. Download and Install R
  2. RStudio Desktop

Once installed, open RStudio and install the necessary packages by running:

Or, you can R Studio online Posit Cloud

install.packages("forecast")
install.packages("tseries")
install.packages("ggplot2")

Importing and Exploring Time Series Data

Data in form of R vector

# Define the temperature data points as a vector
temperature <- c(25.3, 26.1, 27.5, 28.3, 29.6, 30.2, 31.4, 32.5, 31.2, 29.8, 28.4, 26.9)
# Convert it to a time series object
ts_data <- ts(temperature, start = c(2012, 1), frequency = 12)
# Plot the time series
plot(ts_data, main = "Temperature Time Series", ylab = "Temperature (°C)", xlab = "Time")

Here frequency = 12 means permonth data points , similarly -

Frequency Value Refers to
1 Annual data (one data point per year)
4 Quarterly data (one data point per quarter)
12 Monthly data (one data point per month)
Plot on Temperature Time Series
Temperature Time Series Plot

Example AirPassengers dataset

Let’s start by importing a time series dataset. We’ll use the AirPassengers dataset, which contains monthly totals of international airline passengers from 1949 to 1960.

# Load the dataset
data("AirPassengers")
# Convert the dataset to a time series object
ts_data <- ts(AirPassengers, start = c(1949, 1), frequency = 12)
# Plot the time series
plot(ts_data, main = "AirPassengers Time Series", ylab = "Passengers", xlab = "Time")
R Studio running above code
R Studio

Time Series Decomposition

Decomposing a time series involves breaking it down into its fundamental components: trend, seasonality, and residuals.

# Decompose the time series
decomposed <- decompose(ts_data)
# Plot the decomposed components
plot(decomposed)

In your case, if you are using a time series ts_data with monthly frequency (frequency = 12), you need at least 24 observations (2 years) to decompose the series properly to avoid Error in decompose(ts_data) : time series has no or less than 2 periods.

Decomposition of additive time series
Decomposition of additive time series for AirPassengers Dataset

Stationarity and Differencing

A stationary time series has constant mean and variance over time. Most time series models assume the series is stationary. To check for stationarity, we can use the Augmented Dickey-Fuller test.

To avoid error Check tseries pacage
To avoid Error in adf.test(ts_data) : could not find function “adf.test” Check tseries pacage

Output

> # Perform the Augmented Dickey-Fuller test
> adf.test(ts_data)
	Augmented Dickey-Fuller Test
data:  ts_data
Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(ts_data) : p-value smaller than printed p-value

Hypothesis:

Conclusion:

The time series data is stationary. ($ \alpha = 0.05 > \text{p-value} $)

Differenced AirPassengers Time Series
Differenced AirPassengers Time Series Plot

Autocorrelation and Partial Autocorrelation

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are used to identify the order of autoregressive (AR) and moving average (MA) components in an ARIMA model.

# Plot ACF and PACF
acf(ts_data, main = "ACF of AirPassengers")
pacf(ts_data, main = "PACF of AirPassengers")
ACF
ACF of AirPassengers Dataset
PACF
PACF of AirPassengers Dataset

Time Series Modeling: ARIMA

> library(forecast)
> # Fit an ARIMA model
> fit <- auto.arima(ts_data)
> 
> # Summary of the model
> summary(fit)
Series: ts_data 
ARIMA(2,1,1)(0,1,0)[12] 
Coefficients:
         ar1     ar2      ma1
      0.5960  0.2143  -0.9819
s.e.  0.0888  0.0880   0.0292
sigma^2 = 132.3:  log likelihood = -504.92
AIC=1017.85   AICc=1018.17   BIC=1029.35
Training set error measures:
                   ME     RMSE     MAE       MPE     MAPE     MASE         ACF1
Training set 1.342299 10.84619 7.86754 0.4206976 2.800458 0.245628 -0.001248475
> 

Forecasting

Now we’ll evaluate its performance and use it for forecasting.

# Forecast the future values
forecasted <- forecast(fit, h = 24)
# Plot the forecast
plot(forecasted, main = "AirPassengers Forecast")
AirPassengers forecast 24 time points
Forecast 24 time pointS

References

Drop Your Email

Comments

Add a Comment