RA sproj on baseball success


RA sproj on baseball success

Author
Message
NEAS
Supreme Being
Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)Supreme Being (5.9K reputation)

Group: Administrators
Posts: 4.5K, Visits: 1.6K

This candidate regresses a team’s winning percentage on five explanatory variables. He starts with all five variables and eliminates the variable with the highest p-value one at a time, checking the effect on the adjusted R2. He concludes that only two of the five explanatory variables are good predictors of a team’s winning percentage.

This student project is analogous to actuarial pricing, and the baseball statistics provide a good data set to practice the statistical techniques. Pricing actuaries may start with a dozen explanatory variables for a driver’s loss frequency or a policyholder’s mortality rate. We use categorical (qualitative) explanatory variables, and we now use generalized linear models (GLMs) instead of classical regression analysis, but the concepts are the same. We eliminate one explanatory variable at time, to identify those variables that best predict future loss costs or mortality.

For your own student project, consider several ways to improve on the analysis here:

After reducing the full regression equation with five explanatory variables to a reduced regression equation with two or three explanatory variables, use an F test to see if the combination of excluded variables can improve the regression analysis.

Consider the multicollinearity among the explanatory variables. Batting averages and strike-outs are correlated. You may find it better to use uncorrelated explanatory variables, such as batting averages and a measure of pitching performance, such as runs given up or strike outs by the pitcher.

Compare the final regression equations for two time periods, such as 1910 to 1960 vs 1961 to 2000. You don’t have to use such large periods; you can use a sample of years in each period. After fitting the regression equation for each time period, use an F test to see if the differences are random fluctuation.

 

If you want to find other sports data, post a question on the discussion forum, such as "Where can I find batting averages for players on a particular team or for teams in a League?" Many sports web sites have this information, and other candidates may quickly direct you to a suitable data source.

If you have done a GLM analysis for your company, you can adapt that analysis for your student project. Explain the hypotheses, the techniques, the statistical tests, and the results. If you have not learned GLM analysis, use classical regression analysis and sports data.


Attachments
littlepig
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: Forum Members
Posts: 2, Visits: 1

I am interested in this student's approach to complete the RA project, but in his project, it said:"Data was used from the 2002-2006 season.", but no indicating where is the data from? which is hard for otjhers to follow.

Any help on this?

[NEAS: Data are available from web sites for each team. Spend half an hour with Google or another search engine and you will find more data than you can use.]

 


littlepig
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: Forum Members
Posts: 2, Visits: 1

Similar to this sample project, I would like to use some basketball offensive statistics to build a RA model for "Game Win".

Y= Game Win = 0, X1=Points, X2= Rebound, X3= Assistant, X4= Steal, X5=Blocks

I find some data on the NBA websites, but the challenge about the data is that all the statistics are individual basis, however, the dependent variable Y is "Game Win", meaning that this project needs team based data.

Here is what I did to get the team based data. Select the top 250 individuals' statistic about each independent variable from X1 to X5. sorting the data based on the team name, then taking the average points for each team.

I will use the average points as the team based data for my project.

Am I on the right track?

Thanks,

[NEAS: That is fine.]


GO
Merge Selected
Merge into selected topic...



Merge into merge target...



Merge into a specific topic ID...





Reading This Topic


Login
Existing Account
Email Address:


Password:


Social Logins

  • Login with twitter
  • Login with twitter
Select a Forum....













































































































































































































































Neas-Seminars

Search