RA sproj on Bad vs Good NBA Teams

View Options

Author

Message

NEAS

posted 18 Years Ago

Supreme Being

Group: Administrators
Posts: 4.5K, Visits: 1.6K

The project template on sports won-loss records suggests numerous F tests. Comparing good vs bad teams (ranked by their won-loss records) is useful for several reasons:

~ You don’t need any knowledge of sports history or sports rules.

~ All the data are on the NEAS web site. You need not collect additional data.

~ You can use any sport and any time period.

~ You can define good vs bad teams several ways.

Candidates examining good vs bad teams may get different results, depending on their definitions of good vs bad teams, the sport, and the time period. Some might infer that good and bad teams have different regression equations; some might infer that they don’t.

We look at the reasoning in the student project, not at the particular choices. The candidate notes that one might use 1, 2, or 3 past years for the optimal regression equation. The candidate correctly explains the implications of each statistical variable. He choose 3 past years. Other statisticians might choose 1 or 2 past years. If you are unsure which regression equation is optimal, state the arguments for each.

The candidate defines good vs bad teams by their average won-loss records over the full time period. This is the simplest definition, and it is fine for the student project. If you are not proficient at Excel, choose simplicity.

This definition doesn’t differentiate much among teams. Over ten or twenty years, the average won-loss records don’t differ that much among the teams.

Some candidates would like to get a significant F test. This is not required for the student project, but it makes the project more interesting. If you want to get different equations for good vs bad teams, do the following:

Use basketball statistics, as this candidate does. A single high draft pick may turn a losing team into a winning team. In the other sports, a single draft pick has less effect on the won-loss record.

Instead of selecting a fixed set of good teams vs bad teams, use the best and worse teams each year.

~ We explain the logic of best vs worst teams.

~ We point out the problems that arise when the data points are too similar.

~ We show how to avoid these problems.

Logic: A team may vary from good to bad (or vice versa) over the years. To compare good vs bad teams, we must re-select the teams each year. We do the following:

For year X, we select the best team in each division and the worse team in each division. With 30 years and two divisions (so four teams for each year), we have 120 data points.

We divide the 120 data points into two groups of 60 data points each, representing the good teams and the bad teams. These are the best teams and the worst teams in each year, so we are more likely to get significant results. If good and bad teams have different regression equations, the difference should be clearest in this sample.

For each of these 60 data points, we regress the year X+1 won-loss record on the year X won-loss record. We have two unconstrained regression equations. Using one past year emphasizes the effect of the high draft choice for the worst teams. One past year works well for basketball, though it would not work well for some other sports.

~ We expect the good teams to stay good, so the slope coefficient should be high. The good teams will regress towards the mean. If the mean reversion has a 50% strength, we expect a regression equation with alpha = 25% and beta = 50%.

~ We expect the bad teams to improve from the high draft pick, so the slope coefficient should be lower. If the worst teams improve so much that they become average teams, alpha = 50% and beta = 0%.

The first time you try this, the regression analysis may not work well. The best teams all have about the same won-loss record in Year X. The variance of the beta parameter is high, so it may not be significant. You might get a low beta parameter and a high alpha parameter.

Illustration: Suppose the best team each year wins 70% of its games. The next year, these teams win an average of 60% of their games. In truth, beta = 50% and alpha = 25%. But the dispersion of the X values is so small that the variance of the beta coefficient is high. The regression analysis may give a beta of 0% and an alpha of 60%.

The same is true for the worst teams. If the worst team each year wins 30% of its games, the variance of the beta coefficient is high.

For robust regression equations, we use a wide dispersion of X values.

~ If we use all teams in one regression equation, we get more significant ordinary least squares estimators, since the X values are more widely dispersed.

~ If we use only the best teams or the worst teams, the estimators may not be significant.

To correct this, we use the four best teams (or 8 best teams) vs the four worst teams (or 8 worst teams) in each year. This gives twice (or 4 times) as many data points for each year, with a wider spread of won-loss records. You can try several versions to see which gives the clearest difference in the regression equations.

If you try this, keep two things in mind:

~ If you are proficient with Excel, this project template is fun. Use Excel to pick the best and worst teams in each year. If you pick the best and worse teams manually, this project is tedious.

~ Don’t spend too much time defining the data sets. Choose your definition of best and worst teams, determine the optimal regression equation for each group of teams, and compare the two groups with an F test.

We review your student project to see if you correctly apply the statistical technique. Focus on the statistical work. Explain what you expect, what you find, and whether the results are significant.

Attachments

RA sproj 1406031619060817032619.doc (534 views, 41.00 KB)

MConrad

posted 18 Years Ago

Junior Member

Group: Forum Members
Posts: 5, Visits: 1

How do you perform this....

"If we use all teams in one regression equation, we get ...."

.......in Excel using the Regression Add-In? How do you enter in all the parameters (X and Y ranges) for multiple teams? I know how to do this for 1 team, but not for multiple teams that each have their own set of won-loss percentages?

Do I just put data in this sort of format:

			Dependent Variable		Indep. Variables
			0	-1	-2	-3
Team 1:	Year	2005	28.85%	48.65%	34.29%	45.93%
		2004	48.65%	34.29%	45.93%	18.75%
		2003	34.29%	45.93%	18.75%	48.05%
		2002	45.93%	18.75%	48.05%	64.08%
		2001	18.75%	48.05%	64.08%	62.50%
		2000	48.05%	64.08%	62.50%	37.50%
Team 2:	Year	2005	63.46%	43.24%	45.71%	11.48%
		2004	43.24%	45.71%	11.48%	37.50%
		2003	45.71%	11.48%	37.50%	24.03%
		2002	11.48%	37.50%	24.03%	23.30%
		2001	37.50%	24.03%	23.30%	18.75%
		2000	24.03%	23.30%	18.75%	43.75%
Team 3:	Year	2005	75.00%	54.05%	57.14%	51.67%
		2004	54.05%	57.14%	51.67%	50.00%
		2003	57.14%	51.67%	50.00%	66.07%
		2002	51.67%	50.00%	66.07%	34.95%
		2001	50.00%	66.07%	34.95%	87.50%
		2000	66.07%	34.95%	87.50%	75.00%

And then for the Y range in the REGRESSION Add-In tool use the first column of data, and for the X range use columns 2,3 and 4 (if regressing on 3 past years)? ANY thoughts would be SOOO appreciated! I really want to get this project done and have been on hold with this question for 3 weeks.

[NEAS: Yes; that is how it is done.]

Merge into selected topic...

Merge into merge target...

Merge into a specific topic ID...

Reading This Topic