NEAS
|
|
Group: Administrators
Posts: 4.5K,
Visits: 1.6K
|
Module 3: Univariate displays (The attached PDF file has better formatting.) Homework assignment: stem and leaf display A stem and leaf display for assault rates in the fifty U.S. states appears below. The assault rates range from 4.5% to 33.7%. 4 | 568 5 | 367 6 | 7 | 2 8 | 136 9 | 10 | 2699 11 | 035 12 | 000 13 | 14 | 59 15 | 1699 16 | 1 17 | 48 18 | 8 19 | 0 20 | 14 21 | 1 22 | 23 | 68 24 | 99 25 | 2459 26 | 3 27 | 69 28 | 5 29 | 4 30 | 0 31 | 32 | 33 | 57
A. What is the median assault rate? There are 50 states, so average two points. B. What is the lower hinge (the 25th percentile)? C. What is the upper hinge (the 75th percentile)? D. What is the value of (HU – Median) / (Median – HL), where HU is the upper hinge and HL is the lower hinge? E. This ratio indicates the skewness of the distribution. Is this distribution positively skewed or negatively skewed?
|
|
|
Ron
|
|
Group: Forum Members
Posts: 5,
Visits: 1
|
For the given data on assault rates, X(13) = X(14) = 10.9% X(25) = X(26) = 15.9% X(37) = X(38) = 24.9%
The answers I'm getting are: Median: 15.9% Lower Hinge: 10.9% Upper Hinge: 24.9%
In order to gain an understanding of where the formulas for the hinges came from, I thought about how they relate to the quantile function.
Suppose the cumulative probability for the i-th sorted data point X(i) is given by the formula P(i) = (i - 0.5)/n [Fox, page 34] The corresponding quantile function is P_inv(z) = n*z + 0.5
Using this quantile function with n=50 gives P_inv(0.5) = 25.5 P_inv(0.25) = 13
Note that 25.5 = (P(25) + P(26))/2, so the median can be computed as (X(25) + X(26))/2. Also, the lower hinge is X(13).
As an alternative, consider this formula for CDF: P(i) = (i - 1)/(n - 1) The corresponding quantile function is P_inv(z) = (n-1)*z + 1
With n=50: P_inv(0.5) = 25.5 P_inv(0.25) = 13.25
|
|
|
CalLadyQED
|
|
Group: Forum Members
Posts: 62,
Visits: 2
|
Ron,
Thanks for your help. I had noticed that we'd be taking a weighted average of two X(i)'s that were the same. However, I figure that the final may be slightly different, so I need to know why it works the way it does.
I'm probably forgetting something I learn in Stats years ago, but...where did you get that CDF formula?
|
|
|
jgorab17
|
|
Group: Forum Members
Posts: 4,
Visits: 6
|
Can someone please explain what they mean by averaging 2 points for the first question? I've just never used a stem and leaf plot before...
|
|
|
CalLadyQED
|
|
Group: Forum Members
Posts: 62,
Visits: 2
|
For the median, the averaging by definition and has little to do with it being a stem and leaf plot. When an ordered data set has an odd number of observations n, then median = x((n+1)/2) . However, when n is an even number, median = ( x(n/2) + x(n/2 + 1)) / 2. does that help? [NEAS: Correct]
|
|
|
jgorab17
|
|
Group: Forum Members
Posts: 4,
Visits: 6
|
so the number of observations here is 50 right? so then we'd be using x(25) and x(26) but what are those numbers? i see that they both equal 15.9% but i'm not sure how you get those numbers and i think it's just because i don't know how to look at a stem and leaf plot... also, how are we supposed to tell from the ratio whether this is positively or negatively skewed? i understand how we'd know that from looking at a graph but i can't find anything about a ratio... [NEAS: The ratio uses the first and third quartiles.] [NEAS: Positive and negative skew can sometimes be unclear. For this homework assignment, if the upper hinge minus the median is more than the median minus the lower hinge, the distribution is positively skewed; if the upper hinge minus the median is less than the median minus the lower hinge, the distribution is negatively skewed. The relevant ratio is (upper hinge - median) / (median - lower hinge). For exact analysis of skewness, we should evaluate the significance of this ratio, but that is not covered in the text.]
|
|
|
slocal
|
|
Group: Awaiting Activation
Posts: 1,
Visits: 33
|
2 things: first for each row (e.g. 15|1699), do we read this as the different first decimal place options of the set, e.g. this set is 15.1, 15.6, 15.9, and 15.9? second, should we be using (.25)*x(12)+(.75)*x(13) for the first hinge, since 12.75=(50+1)/4? If not, what am I missing?
|
|
|
bubba gump
|
|
Group: Forum Members
Posts: 6,
Visits: 1
|
Isnt the lower hinge simply point 13 (10.9) and upper hinge point 38 (24.9)?
|
|
|
Briggs
|
|
Group: Forum Members
Posts: 2,
Visits: 1
|
I'm confused by something on page 39. The subscript for the lower hinge comes from 197-49+1=149. Where does the 197 come from? There are 193 data points. I'm clearly missing something...
|
|
|
wb_munchausen
|
|
Group: Forum Members
Posts: 1,
Visits: 4
|
NEAS, is the 197 in the equation 197-49+1=149 (p. 39, midpage) a typo? I thought it should be 193, since n = 193. [NEAS: Yes, it looks lke a typo.]
|
|
|