TS Module 15 practice problems: sum of squared errors

TS Module 15 Forecasting basics

(The attached PDF file has better formatting.)

Time series practice problems: sum of squared errors

The optimal ARIMA model has the lowest mean squared error for its forecasts. The variance of the error terms is not known exactly, since the residuals depend on fitted values from the ARIMA process.

Illustration: For an AR(1) process, if we select μ and φ, we know the expected values for periods 2 and subsequent. For an MA(1) process, even if we select μ and θ, we don’t know the residual in Period 1, so we don’t know the expected value in Period 2, the residual in Period 2, and so forth. In practice, this is not a material problem. The uncertainty in the Period 1 residual does not have a material effect on the expected values several periods later.

We estimate the variance from the observed values and the assumed ARIMA parameters.

The exercise below calculates the sum of squared errors. Focus on the following items:

Expect an exam problem giving you an ARMA or ARIMA process and asking for the sum of squared errors. The process may be autoregressive, moving average, or both.

Exercise 15.1: Sum of Squared Errors, ARIMA(0,1,1) Model

We use an ARIMA(0,1,1) model y_t – y_t-1 = μ + ε_t – θ₁ ε_t-1.

t	y_t
0	7.50
1	8.00
2	12.00
3	10.30
4	2.00
5	7.50

t	y_t	y_t – y_t-1
0	7.50
1	8.00	0.50
2	12.00	4.00
3	10.30	-1.70
4	2.00	-8.30
5	7.50	5.50
Total		0.00

A. What is the estimated mean of the MA(1) model of the first differences?

B. What is the forecasted first difference for period 2?

C. What is the forecasted value for Period 2?

D. What is the residual in Period 2?

E. What is the forecasted first difference for period 3?

F. What is the forecasted value for period 3?

G. What is the error term for period 3?

H. What are the forecasts and residual for periods 4 and 5?

I. What is the sum of squared errors for periods 2 through 5?

{The following paragraphs explain why we use the sum of squared errors starting with Period 2 instead of Period 1. The exam does not test optimal fitting of a moving average process.}

Terms: The sum of squared errors, error sum of squares, and residual sum of squares are synonyms. We use the acronym ESS, or error sum of squares, since it is not confusing. The acronym RSS stands for regression sum of squares (most authors) and regression sum of squares (Fox textbook).

We did not predict the Period 0 value with the ARIMA model, so we do not include it in the sum of squared errors. To predict Period 0, we need values for Period –1. We may assume a residual of zero for this period to predict future periods, but we don’t include this assumed residual of zero to measure the goodness-of-fit.

The first term for the underlying ARMA process is Period 1. To predict this value, we need the ARMA residual for Period 0, which we do not have. We assume the expected value for Period 1 is the observed value, so its residual is zero. This assumption slightly distorts the sum of squared errors for the subsequent terms, but the error is not material.

We minimize the sum of squared errors to select the optimal θ₁. But the residual in Period 1 does not depend on θ₁, since the previous residual is assumed to be zero. Some statisticians do not include Period 1 in the residual sum of squares, since it does not help determine θ₁. Excluding Period 1 reduces the degrees of freedom.

The sum of squared errors starting at the second period, S(θ₁) = , depends on θ₁. To fit the ARIMA model, we choose θ₁ to minimize the sum of squared errors. We write the residual sum of squares as a function of θ₁ and find the minimum of the function.

Solving for θ₁ requires non-linear regression. Linear regression solves for the response variable (the forecasts) based on known explanatory variables (residuals). For a moving average process:

Part A: The time series is ARIMA(0,1,1), or IMA(1), so the first differences are an MA(1) model, for which μ = δ. The last term of the time series equals the first term. The sum of the first differences is zero, so the mean of the first differences, δ = μ = 0.

Note: Cryer and Chan use the symbol θ₀ in place of δ.

Take heed: This practice problem has a drift of zero for the ARIMA process. In practice, we use ARIMA processes when the drift is not zero.

Part B: By assumption, the error term in Period 0 is zero. The expected error term in any period is also zero. The forecasted first difference for Period 1 is 0 – 0.8 × 0 = 0. In general, the forecasted first difference for Period 1 is the mean of the MA(1) model.

Part C: The forecasted value for Period 1 is the actual value in Period 0 plus the mean of the MA(1) model of first differences. The forecasted value for Period 1 is 7.5 + 0 = 7.5.

Part D: The residual is the actual value minus the forecasted value. The residual in Period 1 is 8.0 – 7.5 = 0.5.

Part E: This forecast depends on θ₁. We evaluate the residual sum of squares at θ₁ = 0.8. The first difference for period 1 has an error term of 0.5, so the forecasted first difference for period 2 is –0.8 × 0.5 = –0.4.

Part F: The value for period 1 is 8. The forecasted first difference is –0.4, so the forecasted value for period 2 is 8 – 0.4 = 7.6.

Part G: The observed value in period 2 is 12, so the residual is 12 – 7.6 = 4.4. If the value is 1 higher, the first difference is also 1 higher, so the residual for the first differences in period 2 is 4.4.

Part H: We compute the forecasts and residuals for each period in the same fashion, shown in the table below.

Part I: The right-most column of the table shows the squares of the residuals. The sum of squares is the 69.513 in the last row.

The table below shows all the values.

t	y_t	ε_t	í_t	ε_t²
0	7.5	–
1	8.0	0.5
2	12.0	4.4	7.600	19.360
3	10.3	1.82	8.480	3.312
4	2.0	-6.844	8.844	46.840
5	7.5	0.0248	7.475	0.001
Total		-0.5992		69.513

For regression analysis, the error sum of squares depends on the regression coefficients. We choose coefficients to minimize the error sum of squares. The explanatory variables for each data point are known, so the error sum of squares is clearly defined.

For time series analysis, the residual sum of squares depends on unknown prior values.

! Without the actual daily temperature on 12/31/20X6 and the expected daily temperature on 12/31/20X6, we can not derive the expected daily temperature on 1/1/20X7.

! Without the expected daily temperature on 1/1/20X7, we don’t know the residual on 1/1/20X7, so we can not derive the expected daily temperature on 1/2/20X7.