Fox Module 10 Advanced multiple regression
Regression Analysis Sum of Squares and R2 Practice problems
(The attached PDF file has better formatting.)
Know the three types of sums of squares: total, residual, and regression.
Ordinary least squares estimators minimize the sums of squared residuals.
The estimator for ó is a sum of squares adjusted for degrees of freedom.
The R2 is a ratio of two sums of squares.
The adjusted (corrected) R2 adjusts this ratio for degrees of freedom.
The F-statistic is a similar ratio, also adjusted for degrees of freedom.
Most regression concepts are based on sums of squares. Standardized coefficients and generalized linear models adjust the sum of squares for the conditional distributions of the explanatory and response variables. GLMs use maximum likelihood estimation, which is similar to (not identical to) minimizing a normalized sum of squares.
Final exam problems are of two types.
Quantitative problems compute the various sums of squares, R2, adjusted R2, analysis of variance, F-statistic, and similar items.
Qualitative problems ask how these items change with units of measurement, number of observations, and displacement.
** Exercise 10.1: R2
Ten pairs of observations (Xi, Yi) are fit to the model Yi = á + â × Xi + åi, where εi are independent, normally distributed random variables with mean 0 and variance σ2.
∑xi = 50
∑x2i = 1,050
∑yi = 60
∑y2i = 3,560
∑xi yi = 1,260
A. What is TSS, the total sum of squares?
B. What is RegSS, the regression sum of squares?
C. What is R2, the coefficient of determination?
D. What is the correlation of X and Y?
E. What is RSS, the residual sum of squares?
F. What is the (omnibus) F-value for this regression?
Part A: TSS = ∑(yi – )2 = 3,560 – 602 / 10 = 3,200
Part B: RegSS = [ ∑(xi – )(yi – ) ]2 / ∑(xi – )2 = 9602 / 800 = 1,152
∑(xi – )2 = 1,050 – 502 / 10 = 800
∑(yi – )2 = 3,560 – 602 / 10 = 3,200
∑(xi – )(yi – ) = 1,260 – 50 × 60 / 10 = 960
Jacob: What is the rationale for this formula?
Rachel: The regression sum of squares RegSS =
∑( ŷi – )2 = ∑ [ (á + âxi) – (á + â) ]2 = â2 ∑ (xi – )2
â = ∑(xi – )(yi – ) / ∑(xi – )2, so RegSS = [ ∑(xi – )(yi – ) ]2 / ∑(xi – )2
Part C: R2 = 1,152 / 3,200 = 0.360 = 36%
Part D: The correlation of X and Y is
(∑xi yi – N ∑xi ∑yi) / [ (∑x2i – N ∑xi) × (∑y2i – N ∑yi) ]0.5
= (1,260 – 50 × 60 / 10) / [ (1,050 – 502 / 10) × (3,560 – 602 / 10) ]½ = 0.600
Using deviations from the means, the correlation is
∑(xi – )(yi – ) / (∑(xi – )2 × ∑(yi – )2 )0.5 = = 960 / (800 × 3,200)½ = 0.600
Note: R2 is the square of the correlation = 0.6002 = 0.360
Part E: The residual sum of squares RSS = TSS – RegSS = 3,200 – 1,152 = 2,048
Part F: The omnibus F-value is (RegSS / k) / (RSS / N-k-1) = 1,152 / (2,048 / (10 - 1 - 1) = 4.500