For my regression project, I am analyzing data on % bodyfat vs. age, height, and body measurements (neck circumference, forearm circumference, abdomen circumference, etc). For the unrestricted model:
- 13 independent variables in the data.
- 5 significant t tests.
- P value of F test almost 0.
- R^2 75%
I'm concerned that the results of the regression are unreliable due to multicollinearity. Simple correlations between independent variables are generally high (> 50%).
- How high a simple correlation between 2 independent variables (i.e., more than 50%) would warrant removing one of the variables from the model?
- Could I sum of all body measurements into a new independent variable?
- Can I rely on the significant t tests from the unrestricted model? Would it add value to do a regressions separately for each independent variable?
[NEAS: Do the project and explain the effect of multi-collinearilty on the results.]