MS Module 6: Inferences Based on Two Samples – practice problems
(The attached PDF file has better formatting.)
Exercise 6.1: Difference of means for large sample sizes
We test if the mean values in two groups differ. The number of observations in each group (sample size), the sample means, and the standard deviation of the sample values (sample SD) are shown below.
Group Sample Size Sample Mean Sample SD
Group #1 95 22 13
Group #2 85 25 11
The null hypothesis is H0: μ1 = μ2; the alternative hypothesis is μ1 ≠ μ2.
A. What is the distribution of each sample mean?
B. What is the distribution of the difference (group #1 – group #2) of the two sample means?
C. What is the z value for the difference in the two sample means?
D. What is the p value for a two-tailed test of the null hypothesis?
E. What is the two-sided 95% confidence interval for the difference in the population means?
F. If we use an α of 5% to test the null hypothesis, and the true difference in the means (μ2 – μ1) = 5, what is β, the probability of a Type II error?
Part A: The sample sizes are large enough that the central limit theorem applies. Even if the observations do not have a normal distribution, the average of a large number of identically distributed observations has an approximately normal distribution.
Question: How many observations are needed for the central limit theorem to apply?
Answer: The textbook suggests 40 observations are needed. No “bright line” cutoff exists; the distribution of the sum (or the mean) become more normal as the number of observations increases and as the distribution of the observations themselves is less skewed. The final exam problems use a large number of observations for normal distributions and a small number of observations for t distributions.
The sample mean is an unbiased estimate of the true mean and the sample variance is an unbiased estimate of the true variance. The sample standard deviation given in the practice problem is the square root of the sample variance. The sample standard deviation is not an unbiased estimate of the standard deviation of the population (since the square root is not a linear function), but the bias is small.
The sample standard deviation is of the observations, not of the mean. The variance of the mean is the variance of the observations divided by the number of observations. The variance of each group mean is
● Group 1: 132 / 95 = 1.77895
● Group 2: 112 / 85 = 1.42353
The distributions of the group means are N(22, 1.77895) for Group 1 and N(25, 1.42353) for Group 2
Part B: The variance of the difference of two independent distributions is the sum of their variances.
The sum of the variances is 1.77895 + 1.42353 = 3.20248. The difference of the means is 22 – 25 = –3, so the distribution of the difference of the group means is N(–3, 3.20248).
Part C: For this practice problem, the null hypothesis is H0: μ1 = μ2, so the z value is the observed difference in the means divided by the standard deviation of this difference. If the null hypothesis is that the difference in the means is a value k, the z value = (1 – 2 – k) / the standard deviation of (1 – 2).
(the observed difference in the means – k) / the standard deviation of this difference.
Take care that the observed difference is the same direction as the null hypothesis. If the observed difference is Group 1 – Group 2, then k should be for Group 1 – Group 2 (not Group 2 – Group 1).
● The standard deviation of the difference in the sample means is 3.202480.5 = 1.78955.
● The z value is –3 / 1.78955 = -1.67640
Part D: The p value depends on the test. The textbook discusses three types of tests. Let st* be the statistic which we are testing, such as μ1 - μ2. The null hypothesis is the complement of the alternative hypothesis.
Null Hypothesis Alternative Hypothesis Rejection Region
Lower-tailed Test st* ≥ k st* < k z ≤ zα
Upper-tailed Test st* ≤ k st* > k z ≥ –zα
Two-tailed Test st* = k st* ≥ k z ≥ zα/2 or z ≤ –zα/2
Let the standard normal distribution be CDF.
● If z is negative, the two-sided p value is 2 × CDF(z).
● If z is positive, the two-sided p value is 2 × CDF(-z).
The z value here is –1.67640:
● CDF(–1.67640) = 0.04683.
● The p value for the two-tailed test is 2 × 0.04683 = 0.09366.
Part E: The mid-point of the two-sided 95% confidence interval for the difference in the population means is the observed difference in the sample means, or 1 – 2. The half-width (from the lower end to the midpoint or from the midpoint to the higher end) is the z value (or the t value) × the standard deviation of the statistic.
● Mid-point = –3
● Half-width = 1.95996 × 1.78955 = 3.50745
● Lower bound of confidence interval = –3 – 3.50745 = -6.50745
● Upper bound of confidence interval = –3 + 3.50745 = 0.50745
Part F: The null hypothesis is that the difference in the means Δ0 = 0. We calculate the probability of a Type II error if the true difference in the means Δʹ = 5, assuming that α (the probability of a Type I error) is 5%. This value is the probability that the observed difference in the means falls within a 95% confidence interval (for a standard normal distribution) shifted by (Δʹ – Δ0) / (the standard deviation of the difference in the means):
The probability of a Type II error for the difference in means =
Φ(zα/2 – (Δʹ – Δ0) / σ) – Φ( –zα/2 – (Δʹ – Δ0) / σ) =
Φ(1.95996 – 5 / 1.78955) – Φ( –1.95996 – 5 / 1.78955) =
Φ(–0.83404) – Φ(–4.75397) = 0.202129 – 9.97314E–7 = 0.202128
table 10.1 on page 480 and table 10.2 on page 481
Group Sample Size Sample Mean Sample sd
control 79 23.87 11.60
experimental 85 27.34 8.85
Give: size of each group, sum, and sum of squares
give: null hypothesis and alternative hypothesis
ask: estimated variance of the difference of the two means; z value ; p value ; β if μ’s = ‘s.
Let μ1 and μ2 denote the true mean scores for the control group and the experimental group, respectively. The two hypotheses are H0: μ1 – μ2 = 0 versus Ha: μ1 – μ2 < 0.
H0 will be rejected if z ≤ –z.05 = –1.645. We calculate
z = (23.87 – 27.34) / √(11.602/79 + 8.852/85) = –3.47 / 1.620 =
(23.87 - 27.34) / (11.60^2/79 + 8.85^2/85)^0.5 = -2.14184
ask also for p value (which depends on type of test)
for lower-tailed test: p value = Φ(–2.14184) = 0.01610318
Excel: Norm.dist(-2.14184, 0, 1, 1)
see p480: “the P-value for a lower-tailed z test is … If the test had been two-tailed, then the P-value would be 2 × .016 = .032, so …”
Exercise 6.2: Confidence interval for difference of means
A statistician forms 95% two-sided confidence interval for the difference of the means of two populations. From previous analysis, the statistician believes the two populations have normal distributions with standard deviations of 3 and 4.
A. For samples of size N, what are the variances of the sample means?
B. For samples of size N, what is the variance of the difference of the sample means?
C. For samples of size N, what is the standard deviation of the difference of the sample means?
D. For samples of size N, what is the width of the 95% confidence interval for the difference of the sample means?
E. How many observations in each sample are needed for the width of the confidence interval to be 2?
Part A: For samples of size N, the variances of the sample means are 32 / N and 42 / N.
Part B: For samples of size N, the variance of the difference of the sample means is 32 / N + 42 / N = 25/N.
Part C: For samples of size N, the standard deviation of the difference of the sample means is 5 / √N.
Part D: For samples of size N, the width of the 95% confidence interval for the difference of the sample means is 2 × zα/2 × 5 / √N. For α = 5%, zα/2 = 1.96, so the width is 2 × 1.96 × 5 / √N = 19.6 / √N.
Part E: If the width is 2, we have
2 × 1.96 × 5 / √N = 2 ➾ √N = 1.96 × 5 ➾ N = (1.96 × 5)2 = 96.04
With 96 observations, the width is slightly more than 2. If the problem asks for a “width no more than 2,” one needs 97 observations.
The general formula is N = [4 × z2α/2 × ( σ21 + σ22 ) ] / w2