π ARIMA (AutoRegressive Integrated Moving Average)
ARIMA is a powerful time series forecasting method that combines autoregression, differencing, and moving averages to model and predict sequential data patterns.
Resources: Statsmodels ARIMA | Time Series Analysis Book
βοΈ Summary
ARIMA (AutoRegressive Integrated Moving Average) is a statistical model used for analyzing and forecasting time series data. It's particularly effective for data that shows patterns over time but may not be stationary. ARIMA combines three components:
- AR (AutoRegressive): Uses the relationship between an observation and lagged observations
- I (Integrated): Uses differencing to make the time series stationary
- MA (Moving Average): Uses the dependency between an observation and residual errors from lagged observations
ARIMA is widely used in: - Stock price forecasting - Sales prediction - Economic indicators analysis - Weather forecasting - Demand planning
The model is denoted as ARIMA(p,d,q) where: - p: number of lag observations (AR terms) - d: degree of differencing (I terms) - q: size of moving average window (MA terms)
π§ Intuition
Mathematical Foundation
The ARIMA model can be understood through its three components:
1. AutoRegressive (AR) Component
The AR(p) model predicts future values based on past values:
Where: - \(X_t\) is the value at time t - \(c\) is a constant - \(\phi_i\) are the autoregressive parameters - \(\epsilon_t\) is white noise
2. Integrated (I) Component
The I(d) component makes the series stationary by differencing:
Where: - \(L\) is the lag operator - \(d\) is the degree of differencing - First difference: \(\nabla X_t = X_t - X_{t-1}\) - Second difference: \(\nabla^2 X_t = \nabla X_t - \nabla X_{t-1}\)
3. Moving Average (MA) Component
The MA(q) model uses past forecast errors:
Where: - \(\mu\) is the mean - \(\theta_i\) are the moving average parameters - \(\epsilon_t\) are error terms
Complete ARIMA Model
Combining all components, ARIMA(p,d,q) is:
Intuitive Understanding
Think of ARIMA as answering three questions: 1. AR: How much do past values influence future values? 2. I: How many times do we need to difference the data to remove trends? 3. MA: How much do past prediction errors affect current predictions?
π’ Implementation using Libraries
Using Statsmodels
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')
# Generate sample time series data
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=200, freq='D')
trend = np.linspace(100, 120, 200)
seasonal = 10 * np.sin(2 * np.pi * np.arange(200) / 30)
noise = np.random.normal(0, 2, 200)
ts = trend + seasonal + noise
# Create time series
data = pd.Series(ts, index=dates)
# Step 1: Check stationarity
def check_stationarity(timeseries, title):
# Perform Augmented Dickey-Fuller test
result = adfuller(timeseries)
print(f'Results of Dickey-Fuller Test for {title}:')
print(f'ADF Statistic: {result[0]:.6f}')
print(f'p-value: {result[1]:.6f}')
print(f'Critical Values:')
for key, value in result[4].items():
print(f'\t{key}: {value:.3f}')
if result[1] <= 0.05:
print("Data is stationary")
else:
print("Data is non-stationary")
print("-" * 50)
check_stationarity(data, "Original Series")
# Step 2: Make series stationary if needed
data_diff = data.diff().dropna()
check_stationarity(data_diff, "First Differenced Series")
# Step 3: Determine ARIMA parameters using ACF and PACF plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Original series
axes[0,0].plot(data)
axes[0,0].set_title('Original Time Series')
axes[0,0].set_xlabel('Date')
axes[0,0].set_ylabel('Value')
# Differenced series
axes[0,1].plot(data_diff)
axes[0,1].set_title('First Differenced Series')
axes[0,1].set_xlabel('Date')
axes[0,1].set_ylabel('Differenced Value')
# ACF and PACF plots
plot_acf(data_diff, ax=axes[1,0], lags=20, title='ACF of Differenced Series')
plot_pacf(data_diff, ax=axes[1,1], lags=20, title='PACF of Differenced Series')
plt.tight_layout()
plt.show()
# Step 4: Fit ARIMA model
# Let's try ARIMA(2,1,2) based on the plots
model = ARIMA(data, order=(2, 1, 2))
fitted_model = model.fit()
# Print model summary
print(fitted_model.summary())
# Step 5: Make predictions
n_periods = 30
forecast = fitted_model.forecast(steps=n_periods)
forecast_index = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1),
periods=n_periods, freq='D')
# Get confidence intervals
forecast_ci = fitted_model.get_forecast(steps=n_periods).conf_int()
# Plot results
plt.figure(figsize=(12, 6))
plt.plot(data.index, data, label='Original Data', color='blue')
plt.plot(forecast_index, forecast, label='Forecast', color='red', linestyle='--')
plt.fill_between(forecast_index,
forecast_ci.iloc[:, 0],
forecast_ci.iloc[:, 1],
color='red', alpha=0.2, label='Confidence Interval')
plt.legend()
plt.title('ARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.show()
# Step 6: Model diagnostics
fitted_model.plot_diagnostics(figsize=(15, 8))
plt.tight_layout()
plt.show()
Auto ARIMA for Parameter Selection
from pmdarima import auto_arima
# Automatically find best ARIMA parameters
auto_model = auto_arima(data,
start_p=0, start_q=0,
max_p=5, max_q=5,
seasonal=False,
stepwise=True,
suppress_warnings=True,
error_action='ignore')
print(f"Best ARIMA model: {auto_model.order}")
print(auto_model.summary())
# Forecast with auto ARIMA
auto_forecast = auto_model.predict(n_periods=30)
print(f"Next 30 predictions: {auto_forecast}")
βοΈ From Scratch Implementation
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from scipy.stats import norm
class ARIMAFromScratch:
def __init__(self, p=1, d=1, q=1):
"""
ARIMA model implementation from scratch
Parameters:
p (int): Order of autoregression
d (int): Degree of differencing
q (int): Order of moving average
"""
self.p = p
self.d = d
self.q = q
self.params = None
self.fitted_values = None
self.residuals = None
def difference(self, series, d):
"""Apply differencing to make series stationary"""
diff_series = series.copy()
for _ in range(d):
diff_series = np.diff(diff_series)
return diff_series
def inverse_difference(self, diff_series, original_series, d):
"""Reverse the differencing operation"""
result = diff_series.copy()
for _ in range(d):
# Add back the last value from previous level
cumsum_result = np.cumsum(result)
# Add the last original value before differencing
result = cumsum_result + original_series[-(d-_)]
return result
def ar_component(self, data, ar_params):
"""Calculate AR component"""
ar_component = np.zeros(len(data))
for i in range(self.p, len(data)):
for j in range(self.p):
ar_component[i] += ar_params[j] * data[i-j-1]
return ar_component
def ma_component(self, residuals, ma_params):
"""Calculate MA component"""
ma_component = np.zeros(len(residuals))
for i in range(self.q, len(residuals)):
for j in range(self.q):
ma_component[i] += ma_params[j] * residuals[i-j-1]
return ma_component
def likelihood(self, params, data):
"""Calculate negative log-likelihood for optimization"""
try:
# Split parameters
ar_params = params[:self.p] if self.p > 0 else []
ma_params = params[self.p:self.p + self.q] if self.q > 0 else []
sigma = params[-1] if len(params) > self.p + self.q else 1.0
n = len(data)
errors = np.zeros(n)
predictions = np.zeros(n)
# Initialize predictions and errors
for i in range(max(self.p, self.q), n):
# AR component
ar_pred = 0
if self.p > 0:
for j in range(self.p):
ar_pred += ar_params[j] * data[i-j-1]
# MA component
ma_pred = 0
if self.q > 0:
for j in range(self.q):
ma_pred += ma_params[j] * errors[i-j-1]
predictions[i] = ar_pred + ma_pred
errors[i] = data[i] - predictions[i]
# Calculate log-likelihood
valid_errors = errors[max(self.p, self.q):]
log_likelihood = -0.5 * len(valid_errors) * np.log(2 * np.pi * sigma**2)
log_likelihood -= 0.5 * np.sum(valid_errors**2) / (sigma**2)
return -log_likelihood # Return negative for minimization
except:
return np.inf
def fit(self, data):
"""Fit ARIMA model to data"""
# Apply differencing
if self.d > 0:
diff_data = self.difference(data, self.d)
else:
diff_data = data
# Initial parameter guesses
initial_params = []
# AR parameters (between -1 and 1)
initial_params.extend([0.1] * self.p)
# MA parameters (between -1 and 1)
initial_params.extend([0.1] * self.q)
# Sigma (positive)
initial_params.append(1.0)
# Bounds for parameters
bounds = []
bounds.extend([(-0.99, 0.99)] * self.p) # AR params
bounds.extend([(-0.99, 0.99)] * self.q) # MA params
bounds.append((0.01, None)) # Sigma
# Optimize parameters
try:
result = minimize(self.likelihood, initial_params, args=(diff_data,),
method='L-BFGS-B', bounds=bounds)
if result.success:
self.params = result.x
# Calculate fitted values and residuals
self._calculate_fitted_values(data)
return self
else:
raise ValueError("Optimization failed")
except Exception as e:
print(f"Error during fitting: {e}")
return None
def _calculate_fitted_values(self, data):
"""Calculate fitted values and residuals"""
if self.params is None:
return
# Apply differencing
if self.d > 0:
diff_data = self.difference(data, self.d)
else:
diff_data = data
# Get parameters
ar_params = self.params[:self.p] if self.p > 0 else []
ma_params = self.params[self.p:self.p + self.q] if self.q > 0 else []
n = len(diff_data)
fitted = np.zeros(n)
errors = np.zeros(n)
# Calculate fitted values
for i in range(max(self.p, self.q), n):
# AR component
ar_pred = 0
if self.p > 0:
for j in range(self.p):
ar_pred += ar_params[j] * diff_data[i-j-1]
# MA component
ma_pred = 0
if self.q > 0:
for j in range(self.q):
ma_pred += ma_params[j] * errors[i-j-1]
fitted[i] = ar_pred + ma_pred
errors[i] = diff_data[i] - fitted[i]
self.fitted_values = fitted
self.residuals = errors
def predict(self, steps):
"""Make predictions for future time steps"""
if self.params is None:
raise ValueError("Model must be fitted before prediction")
ar_params = self.params[:self.p] if self.p > 0 else []
ma_params = self.params[self.p:self.p + self.q] if self.q > 0 else []
predictions = []
# Use last values from fitted data for prediction
last_values = self.fitted_values[-self.p:] if self.p > 0 else []
last_errors = self.residuals[-self.q:] if self.q > 0 else []
for step in range(steps):
# AR component
ar_pred = 0
if self.p > 0 and len(last_values) >= self.p:
for j in range(self.p):
ar_pred += ar_params[j] * last_values[-(j+1)]
# MA component (assumes future errors are 0)
ma_pred = 0
if self.q > 0 and len(last_errors) >= self.q and step == 0:
for j in range(self.q):
ma_pred += ma_params[j] * last_errors[-(j+1)]
pred = ar_pred + ma_pred
predictions.append(pred)
# Update last values for next prediction
if self.p > 0:
last_values = np.append(last_values[1:], pred)
if self.q > 0:
last_errors = np.append(last_errors[1:], 0) # Assume future errors are 0
return np.array(predictions)
def summary(self):
"""Print model summary"""
if self.params is None:
print("Model not fitted")
return
print(f"ARIMA({self.p}, {self.d}, {self.q}) Model Summary")
print("=" * 50)
if self.p > 0:
print("AR Parameters:")
for i, param in enumerate(self.params[:self.p]):
print(f" AR({i+1}): {param:.6f}")
if self.q > 0:
print("MA Parameters:")
for i, param in enumerate(self.params[self.p:self.p + self.q]):
print(f" MA({i+1}): {param:.6f}")
print(f"Sigma: {self.params[-1]:.6f}")
# Example usage
if __name__ == "__main__":
# Generate sample data
np.random.seed(42)
n = 100
true_ar = [0.7, -0.2]
true_ma = [0.3]
# Generate ARIMA(2,1,1) data
data = []
errors = np.random.normal(0, 1, n + 10)
for t in range(2, n):
if t < 3:
value = errors[t]
else:
ar_component = true_ar[0] * (data[t-1] - data[t-2]) + true_ar[1] * (data[t-2] - data[t-3]) if t > 2 else 0
ma_component = true_ma[0] * errors[t-1] if t > 0 else 0
value = ar_component + ma_component + errors[t]
if t > 0:
value += data[t-1] # Add integration
data.append(value)
data = np.array(data)
# Fit custom ARIMA model
model = ARIMAFromScratch(p=2, d=1, q=1)
fitted_model = model.fit(data)
if fitted_model:
fitted_model.summary()
# Make predictions
predictions = fitted_model.predict(10)
print(f"\nNext 10 predictions: {predictions}")
β οΈ Assumptions and Limitations
Assumptions
- Stationarity: After differencing, the series should be stationary (constant mean and variance)
- Linear relationships: ARIMA assumes linear dependencies between observations
- Normal residuals: Error terms should be normally distributed with zero mean
- Homoscedasticity: Constant variance of residuals over time
- No autocorrelation in residuals: Residuals should be independent
Limitations
- Linear models only: Cannot capture non-linear patterns
- Requires sufficient data: Needs adequate historical data for reliable forecasting
- Parameter selection complexity: Choosing optimal (p,d,q) can be challenging
- Poor with structural breaks: Struggles when underlying data patterns change
- No exogenous variables: Basic ARIMA doesn't include external predictors
- Computational intensity: Parameter estimation can be slow for large datasets
Comparison with Other Models
Model | Strengths | Weaknesses |
---|---|---|
ARIMA | Good for linear trends, well-established theory | Limited to linear relationships |
LSTM | Captures complex patterns, handles non-linearity | Requires large datasets, black box |
Prophet | Handles seasonality well, robust to outliers | Less flexible than ARIMA |
Exponential Smoothing | Simple, fast computation | Limited complexity modeling |
π‘ Interview Questions
1. What does each parameter in ARIMA(p,d,q) represent?
Answer: - p (AR order): Number of lagged observations used in the autoregressive component. Determines how many past values influence the current prediction. - d (Differencing degree): Number of times the series is differenced to achieve stationarity. Usually 0, 1, or 2. - q (MA order): Size of the moving average window, representing how many past forecast errors are used in the prediction.
Example: ARIMA(2,1,1) uses 2 lagged values, applies 1 level of differencing, and includes 1 past error term.
2. How do you determine the optimal ARIMA parameters?
Answer: Several methods can be used:
Manual approach: - Use ACF and PACF plots to identify parameters - ACF helps determine q (cuts off after q lags for MA processes)
- PACF helps determine p (cuts off after p lags for AR processes) - Use ADF test to determine d (degree of differencing needed for stationarity)
Automatic approach: - Information criteria (AIC, BIC) - lower values indicate better fit - Grid search over parameter combinations - Auto ARIMA algorithms (like pmdarima.auto_arima()
) - Cross-validation for out-of-sample performance
3. What is the difference between MA and AR components?
Answer:
Autoregressive (AR): - Uses past values of the series itself for prediction - Formula: \(X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + ... + \epsilon_t\) - Captures the "memory" of the time series
Moving Average (MA): - Uses past forecast errors (residuals) for prediction
- Formula: \(X_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ...\) - Captures short-term irregularities and shocks
4. How do you check if your ARIMA model is good?
Answer:
Residual Analysis: - Residuals should be white noise (no patterns) - Ljung-Box test for autocorrelation in residuals - Jarque-Bera test for normality of residuals - Plot ACF/PACF of residuals (should show no significant lags)
Performance Metrics: - AIC/BIC for model comparison - MAPE, RMSE, MAE for forecast accuracy - Out-of-sample validation
Visual Inspection: - Q-Q plots for normality - Residual plots over time - Fitted vs actual values plot
5. What makes a time series stationary and why is it important for ARIMA?
Answer:
Stationary Series Properties: - Constant mean over time - Constant variance over time
- Covariance between periods depends only on lag, not time
Importance for ARIMA: - ARIMA models assume stationarity for reliable parameter estimation - Non-stationary data can lead to spurious relationships - Differencing (I component) is used to achieve stationarity
Tests for Stationarity: - Augmented Dickey-Fuller (ADF) test - KPSS test - Visual inspection of plots
6. Explain the concept of differencing in ARIMA
Answer:
Differencing transforms a non-stationary series into stationary by computing differences between consecutive observations.
Types: - First differencing: \(\nabla X_t = X_t - X_{t-1}\) - Second differencing: \(\nabla^2 X_t = \nabla X_t - \nabla X_{t-1}\) - Seasonal differencing: \(\nabla_s X_t = X_t - X_{t-s}\)
Effects: - Removes trends (first differencing) - Removes curvature (second differencing) - Usually d=1 is sufficient, rarely need d>2 - Over-differencing can introduce unnecessary complexity
7. How would you handle seasonality in ARIMA?
Answer:
SARIMA (Seasonal ARIMA): - Extends ARIMA to SARIMA(p,d,q)(P,D,Q)s - Additional seasonal parameters P, D, Q for period s - Formula includes both non-seasonal and seasonal components
Implementation:
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(data, order=(p,d,q), seasonal_order=(P,D,Q,s))
Alternative approaches: - Seasonal decomposition before applying ARIMA - Use of external regressors for seasonal patterns - Prophet or other specialized seasonal models
8. What are the limitations of ARIMA and when would you choose alternatives?
Answer:
ARIMA Limitations: - Only captures linear relationships - Requires stationary data - Sensitive to outliers - Cannot handle multiple seasonal patterns - No external variables incorporation
When to choose alternatives: - Non-linear patterns: Use LSTM, Neural Networks - Multiple seasonalities: Use Prophet, TBATS - External predictors: Use ARIMAX, VAR models
- Regime changes: Use structural break models - High-frequency data: Use GARCH for volatility modeling - Small datasets: Use Exponential Smoothing
π§ Examples
Example 1: Stock Price Forecasting
import yfinance as yf
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
# Download stock data
stock = yf.download("AAPL", start="2020-01-01", end="2023-01-01")
prices = stock['Close']
# Fit ARIMA model
model = ARIMA(prices, order=(1,1,1))
fitted_model = model.fit()
# Forecast next 30 days
forecast = fitted_model.forecast(steps=30)
conf_int = fitted_model.get_forecast(steps=30).conf_int()
# Plot results
plt.figure(figsize=(12, 6))
plt.plot(prices.index[-60:], prices.values[-60:], label='Actual', color='blue')
forecast_dates = pd.date_range(start=prices.index[-1], periods=31, freq='D')[1:]
plt.plot(forecast_dates, forecast, label='Forecast', color='red', linestyle='--')
plt.fill_between(forecast_dates, conf_int.iloc[:, 0], conf_int.iloc[:, 1],
color='red', alpha=0.2)
plt.legend()
plt.title('AAPL Stock Price Forecast using ARIMA')
plt.show()
print("Forecast Summary:")
print(f"Last actual price: ${prices.iloc[-1]:.2f}")
print(f"30-day forecast mean: ${forecast.mean():.2f}")
print(f"Expected return: {((forecast.mean()/prices.iloc[-1])-1)*100:.2f}%")
Example 2: Sales Forecasting with Seasonality
# Generate seasonal sales data
np.random.seed(42)
dates = pd.date_range('2018-01-01', periods=365*3, freq='D')
trend = np.linspace(1000, 1500, len(dates))
seasonal = 200 * np.sin(2 * np.pi * np.arange(len(dates)) / 365.25)
weekly = 50 * np.sin(2 * np.pi * np.arange(len(dates)) / 7)
noise = np.random.normal(0, 50, len(dates))
sales = trend + seasonal + weekly + noise
sales_ts = pd.Series(sales, index=dates)
# Apply seasonal ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Fit SARIMA model
sarima_model = SARIMAX(sales_ts,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 365))
sarima_fitted = sarima_model.fit(disp=False)
# Forecast next quarter
forecast_days = 90
forecast = sarima_fitted.forecast(steps=forecast_days)
conf_int = sarima_fitted.get_forecast(steps=forecast_days).conf_int()
# Performance metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error
# In-sample predictions for evaluation
in_sample_pred = sarima_fitted.fittedvalues
mae = mean_absolute_error(sales_ts, in_sample_pred)
rmse = np.sqrt(mean_squared_error(sales_ts, in_sample_pred))
print(f"Model Performance:")
print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"Mean Sales: {sales_ts.mean():.2f}")
print(f"MAPE: {(mae/sales_ts.mean()*100):.2f}%")
# Plot seasonal forecast
plt.figure(figsize=(15, 8))
plt.plot(sales_ts.index[-180:], sales_ts.values[-180:],
label='Historical Sales', color='blue')
forecast_dates = pd.date_range(start=sales_ts.index[-1],
periods=forecast_days+1, freq='D')[1:]
plt.plot(forecast_dates, forecast,
label='SARIMA Forecast', color='red', linestyle='--')
plt.fill_between(forecast_dates, conf_int.iloc[:, 0], conf_int.iloc[:, 1],
color='red', alpha=0.2, label='Confidence Interval')
plt.legend()
plt.title('Seasonal Sales Forecasting with SARIMA')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
π References
- Books:
- Forecasting: Principles and Practice by Rob Hyndman & George Athanasopoulos
- Time Series Analysis by Box, Jenkins, Reinsel & Ljung
-
Introduction to Time Series and Forecasting by Brockwell & Davis
-
Documentation:
- Statsmodels ARIMA
- pmdarima (Auto ARIMA)
-
Tutorials:
- ARIMA Model Complete Guide
-
Research Papers:
- Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control
-
Akaike, H. (1974). A new look at the statistical model identification
-
Online Courses:
- Time Series Analysis on Coursera
- Forecasting Using R on DataCamp