Module 8: variances and means (practice problems)

Author	Message
NEAS	NEAS posted 12 Years Ago #12991 #
Supreme Being Group: Administrators Posts: 4.5K, Visits: 1.6K	Regression analysis Module 8: variances and means (practice problems) (The attached PDF file has better formatting.) Exercise 8.1: Variances and means A regression equation of Y on X, Y_i = á + â × X_i + å_i, with N=5 observations, has RSS, the residual sum of squares, = 5.10 ó ²_B, the variance of B, the ordinary least squares estimator of â, = 0.17 ó ²_A, the variance of A, the ordinary least squares estimator of á, = 1.87 What is the ó²_å, the variance of the error term? What is x_i², the sum of the squared x values? What is (x_i – )², the sum of the squared x residuals? What is , the mean of the X values? Part A: ó²_å = RSS / degrees of freedom of the regression equation, which is N – k – 1, where k is the number of explanatory variables: 5.1 / (5 – 1 – 1) = 1.700. Part B: The variance of B, ó²_B, is ó²_å / (x_i – )², and the variance of A, ó²_A, is [ó²_å / (x_i – )² ] × [ x_i² / N] x_i² = N × ó²_A / ó²_B = 5 × 1.87 / 0.17 = 55. Part C: (x_i – )² = ó²_å /ó²_B = 1.70 / 0.17 = 10. Part D: (x_i – )² = x_i² – 2 x_i + N ² = x_i² – N ², since x_i = N ², so = { [ x_i² – (x_i – )² ] / N }^½ = { [ 55 – 10 ] / 5 }^½ = 3 Exercise 8.2: Sampling variance A regression has N observations, with a standard error of ó²_å and a variance of the explanatory variable of S²_x. Explain how the following affect the sampling variances of the slope estimate B and the intercept estimate A. ó ²_å sample size N variance of the explanatory variable S²_x The closeness of the X values to zero The formulas for the sampling variances are ó ²_B = ó²_å / (x_i – )² = ó²_å / ( (N – 1) × S²_x) ó ²_A = ó²_B × ( x²_i / N) Part A: As ó²_å increases, ó²_B and ó²_A increase. Intuition: As the standard error of the regression increases, the estimates of the regression coefficients are less certain. Part B: As N increases, ó²_B and ó²_A decrease. Intuition: With only a few observed values, the estimated regression line is uncertain. Both the slope and the intercept may be distorted by one or two outlying values. With more observed values, the estimated regression line is more certain. Part C: As S²_x increases, ó²_B and ó²_A decrease. Intuition: The regression line passes through (, ), the means of the X and Y values. Think of the regression line as a bar hinged at the point (, ) but with unknown slope. Random fluctuations in the observed values of the response variable Y may distort the slope. If the X values are widely dispersed, some of them are far from the mean . An incorrect slope coefficient causes a large squared error at that point, so incorrect slope coefficients are less likely. An incorrect slope coefficient causes a large error in the intercept, so if the slope coefficient is more accurate, so is the intercept. Part D: The closeness of the X values to zero has no effect on ó²_B. B, the estimate of the slope coefficient â, depends on (x_i – ), not on x_i, so adding a constant to all the x values doesn’t change B. But if is far from zero, an error in the slope coefficient greatly affects the intercept. As ( x²_i / N) increases, ó²_A increases. Intuition: If = 0, the intercept is , with no uncertainty. No matter what value B has, A is . If is 100, an error of k in the estimate of B causes an error of 100 × k in the estimate of A. Exercise 8.3: Standard errors of ordinary least squares estimators for á (A) and â (B) A statistician uses a regression on the X values {-1, -0.9, -0.8, -0.7, ..., -0.1, 0, 0.1, …, 0.7, 0.8, 0.9, 1) to test null hypotheses that á = 0 and that â = 0. The ordinary least squares estimators of á and â are both 1.000. Which estimator has the higher standard error? Which estimator has the higher t-value? Which estimator has the higher p-value for the test of the null hypothesis? Part A: We don’t know the standard errors of A or B, since we don’t know the standard error of the regression S²_å. But we know the ratios of these standard errors. N (the number of data points) = 21, and x²_i = 7.7, so x²_i / N = 7.7 / 21 = 0.367. B has the higher standard error. Part B: The t-value is the regression coefficient divided by its standard error. B has the higher standard error, so A has the higher t-value. Part C: A higher t-value means a more significant coefficient so a lower p-value. B has the higher p-value. Attachments** Regression analysis variances and means pps df.pdf (1.6K views, 64.00 KB) 0
	Reply