EMPIRICAL RESULTS
How do employers decide the salary payout to their employees every year? In a sales and consulting firm the salary can be based on the number sales or clients serviced. How do the National Basketball Association (NBA) team owners decide the players' salaries? The general consensus of basketball fanatics is that the wages are based on points scored. Others will argue that it is determined by other measurable statistics such as rebounds, assists, and games played. It is also possible that the salary payout is affected by the player's popularity and non-game issues. How do professional soccer club owners decide on the players’ salaries? Well after running many regressions some variables, from my data collected and run, offers insight on how salaries are decided.
First, two different regressions were run focusing on different explanatory variables for each. In the first regression (Model 1) the dependent variables is the log of salaries, and the explanatory variables: number of games started during the season, individual’s position as a defender during the season, individual’s position as a striker during the season, club’s attendance during the season, and age. For the second regression (Model 2) the dependent variable is the log of salaries, and the explanatory variables are: number of games started during the season, individual’s position as a defender during the season, individual’s position as a striker during the season, number of fouls committed during the season, club’s attendance during the season, percent of games won during the season, and age. Inserted below is table 2 which compares both regressions, including coefficients and t-values with asterisks denoting levels of significance.
TABLE 2 – REGRESSION RESULTS WITH ALL POSITIONS INCLUDED
*=t-Statistic significant at .10 level
**=t-Statistic significant at .05 level
***=t-Statistic significant at .01 level
Analyzing the presented data you can see that the, parameter estimates for model ones number of games started has a negative correlation to an athlete’s salary. This result does not correspond with the null hypothesis and the probability (or p value) is not significant at ten, one or five percent. Therefore, it is very likely to get random fluctuation and it is likely that the number is not reliable. For the defender position, there is a positive correlation to salary and corresponds with the null hypothesis however; the variable is not significant at a ten, five or one percent level concluding that the number is not reliable. In model the sticker position has a positive correlation to salaries and corresponds with the null hypothesis. The probability (or p value) is significant at a ten percent level concluding that the number given is a reliable number. So, this concludes that holding all other variables constant strikers make more than other players. The variable club attendance is positive which supports the null hypothesis, and suggests that for every one unit increase in attendance, player’s salaries see a 0.00000446 increase. Also the number given is very likely a reliable number since the probability (or v value) is significant at a one present level. This is because if the p-value is something other than 0 then there is random fluctuation. Therefore the (P>|t| ) measures how often one would get a coefficient that big or bigger and the idea is that if the number is less that .05 than it is very likely one will get random fluctuation and rather it is likely that the coefficient is a reliable number. The results for the age variable suggest that for every one year increase in age there will be a 0.0797267 unit increase to ones salary supporting the null hypothesis. This variable is also significant at a five percent level and therefore the number coefficient given is reliable. Looking at the adjusted R squared the number give is 0.5607 which is good since the closer to one R squared approaches there is better overall fit of the entire equation. R squared measures the percentage of the variation of Y around its mean that is explained by the regression equations.
There were three tests I ran in Stata for Model 1’s regression. First, I ran a test testing for multicollinearity. I did this by seeing the size of the variance inflation factors (VIF). If VIF is greater than five that indicates there is severe multicollinearity among the explanatory variables. VIF’s ranges from 1.07 to 1.22 therefore, this model does not appear to have significant multicollinearity. The second test I ran was correlation test among the variables. The correlation coefficients look good therefore there is no evidence of serious collinearity. And the third test was the Ramsey RESET test which is a general test that determines the likelihood of an omitted variable. A low p value (<.05) means that you can reject the null hypothesis. Therefore, if you reject the hypothesis that you don’t have omitted variables and the p value for the Ramsey test was 0.4667 so I can reject the null hypothesis and concluding the regression does not have omitted variable. With the results from the three tests and the regression results it can be concluded that the field position striker, the attendance of the club and age of the players are significant. Therefore, each of the significant variables one unit increase will increase salary of players by the estimate coefficient amount holding all other variables constant. This regression suggest that if a player is a older striker and the club has a high attendance record for the season, than that player will make more money than the other individuals on the team.
For model two: you can see that the, parameter estimates for model two’s number of games started has a positive correlation to an athlete’s salary. This result does correspond with the null hypothesis although the probability (or p value) is not significant at ten, one or five percent. Therefore, it is very likely to get random fluctuation and it is likely that the number is not reliable. For the defender position, there is a positive correlation with salary and corresponds with the null hypothesis however, the variable is not significant at a ten, five or one percent level concluding that the number is not reliable. The sticker position also has a positive correlation to salaries. The probability (or p value) is significant at a five percent level concluding that the number given is a reliable number. This concludes that holding all other variables constant strikers make more than other players. The data also suggest that the number of fouls a player commits in a season has a negative effect on their pay by, -0.0109062, this however does not support the null hypothesis. This variable is not significant at a ten five of one percent level so the coefficient given in the regression results is not reliable. For the clubs attendance variable to affect salaries of players it would have to be significant at a ten, five or one percent level yet, the coefficient given in the regression results in not therefore salary is not reliable even though it corresponds with the null hypothesis. The percent of games won for a season has a positive correlation supporting the null hypothesis; and in effect increasing player’s salary for every one game won by 5.99898, as well as, games won has a five percent significance level. This is good since the coefficient given is reliable. For the age variable, a positive correlation to the salaries of players is shown supporting the null hypothesis, meaning with one unit increase in age players salaries will increase by 0.0471925. Unfortunately, this coefficient is not significant therefore the number 0.0471925 is not reliable. And for the adjusted R squared the number give is 0.6615 which is good since the closer to one R squared approaches the better overall fit of the entire equation the variables are. Again, R squared measures the percentage of the variation of Y around its mean that is explained by the regression equations.
There were three tests I ran in Stata for Model 2’s regression. First, I ran a test testing for multicollinearity. I did this by seeing the size of the variance inflation factors (VIF). VIF’s ranged from 1.21 to 2.01 therefore; this model does not appear to have significant multicollinearity. The second test I ran was correlation test among the variables. The correlation coefficients look good therefore no evidence of serious collinearity. And the third test was the Ramsey RESET test which is a general is a test that determines the likelihood of an omitted variable. The p value for the Ramsey test was 0.3924, concluding the regression does not have omitted variable. With the results from the three tests and the regression results it can be concluded that the field position striker, and the attendance of the club are significant. Therefore, each of the significant variables one unit increase will increase salary of players by the estimate coefficient amount holding all other variables constant. This regression suggest that if a player is a striker and the club has a high attendance record for the season, then that player will make more money than the other individuals on the team.
Although running the two above regression gave some insight into what variables affect salary, I did find the result entirely satisfying. Therefore, I chose to run three more different regressions focusing on field positions. I did this by filtering out my data and running a regression on just defenders, midfielders and stickers. Inserted below is table 3 shows each regression, including coefficients and t-values with asterisks denoting levels of significance.
TABLE 3 – REGRESSION RESULTS WITH POSITIONS SEPARATED
*=t-Statistic significant at .10 level
**=t-Statistic significant at .05 level
***=t-Statistic significant at .01 level
Focusing on Model A, defense, there were only two significant variables in the regression. The club attendance had a ten percent significance level and the age had a one percent significance level therefore, both suggesting the estimated coefficients for these variable are reliable. However, making sense of this regression is difficult. The hypothesis suggests that games started, number of fouls committed, club attendance, percent of games won and age would all be positively correlated with higher salaries. However, number of fouls committed has a negative effect on salary. I conclude that lack of data is the reason for the variables not being significant.
Three tests were run for this regression. First I ran a test testing for multicollinearity. I did this by seeing the size of the variance inflation factors (VIF). VIF’s range from 1.76 to 2.76 therefore, this model does not appear to have significant multicollinearity. The second test I ran was the correlation test among the variables. The correlation coefficients look good therefore no evidence of serious collinearity. And the third test was the Ramsey RESET test which is a general test that determines the likelihood of an omitted variable. The p value for the Ramsey test was 0.2880 concluding the regression does not have omitted variable. With the results from the model A’s tests and the regression results it can be concluded that for the field position, defense, there are two significant variables. These are attendance and age. Therefore, each of the significant variables one unit increase will increase salary of defenders by the estimate coefficient holding all other variables constant. This regression suggest that if a defender is older and the club has a high attendance record for the season, then those players will make more money than the other defenders on the team.
Focusing on Model B, Midfield, there are three significant variables in the regression. Games states during the season with a ten percent significance level, number of fouls committed during the season also with ten percent significance level and, and age with a one percent significance level. Therefore the estimated coefficients of these variables are reliable. The hypothesis suggests that games started, number of fouls committed, club attendance, percent of games won and age would all be positively correlated with higher salaries. However, number of fouls committed has a negative effect on salary.
Three tests were run for this regression. First I ran a test testing for multicollinearity. I did this by seeing the size of the variance inflation factors (VIF). VIF’s range from 1.08 to 3.26 therefore, this model does not appear to have significant multicollinearity. The second test I ran was the correlation test among the variables. The correlation coefficients look good therefore no evidence of serious collinearity. And the third test was the Ramsey RESET test which is a general test that determines the likelihood of an omitted variable. The p value for the Ramsey test was 0.9466 concluding the regression does not have omitted variable. With the results from the model B’s tests and the regression results it can be concluded that for the field position, midfield, there are three significant variables. These are number of games started, number of fouls committees and age. Therefore, each of the significant variables one unit increase will increase salary of midfielder by the estimate coefficient holding all other variables constant. This regression suggest that if a midfielder is older, and starts a greater amount of games than those players will make more money than the other midfielders on the team. However, the more fouls the midfielder commits his salary will be decreased.
Focusing on Model C, Strikers, there are no significant variables in the regression. I conclude that lack of data is the reason for the variables not being significant. Three tests were run for this regression. First I ran a test testing for multicollinearity. I did this by seeing the size of the variance inflation factors (VIF). VIF’s range from 1.08 to 2.22 therefore, this model does not appear to have significant multicollinearity. The second test I ran was the correlation test among the variables. The correlation coefficients look good therefore no evidence of serious collinearity. And the third test was the Ramsey RESET test which is a general test that determines the likelihood of an omitted variable. The p value for the Ramsey test was 0.2737 concluding the regression does not have omitted variable. With the results from the model C’s tests and the regression results it can be concluded that for the field position, striker need much more data however, due to the limited amount of data available to the public drawing any kind of conclusion is impossible.
CONCLUSION
After research and regression analysis this paper presents five completely different regressions which include estimated coefficients and t-values with asterisks denoting levels of significance. Through the study there were four substantial conclusions drawn regarding the regressions. First, if a player is an older and plays in the striker position and the club has a high attendance record for the season, than that player will make more money than the other individuals on the team. Second, if a player plays in the striker position and the club has a high attendance record for the season, then that player will make more money than the other individuals on the team. These two regressions suggest that the striker position earned more in the 2007/2008 season.
On the other hand, when the regressions were analyzed specifically by field position there were different conclusions drawn. First, analyzing defense players, if a defender is older and the club has a high attendance record for the season, then those players will make more money than the other defenders on the team. Second, when analyzing midfield players, if a midfielder player is older, and starts a greater amount of games than those players will make more money than the other midfielders on the team. However, the more fouls the midfielder commits his salary will be decreased respectively. Unfortunately, when analyzing the striker position, more data is needed since there is a limited amount of data available to the public drawing any kind of conclusion is impossible.
In conclusion, the lack of public data put a constraint on the conclusions which I was able to draw from each regression model. However, I was able to learn a great deal about regression analysis in general and understanding the main ideas behind analyzing independent variables against dependent variables. Manipulating the regression to find significant variables was hard however running tests to check for multicollinearity, serial correlation and omitted variables helped eliminate variables in regressions that I needed to eliminate. Overall, I have a better understanding of econometrics now that I have completed this projects then when I began.
REFERANCES
"Roman colosseum and the gladiators." Essortment Articles: Free Online Articles on Health, Science, Education & More.. Web. 05 Dec. 2009. <http://www.essortment.com/all/roman gladiators_rfye.htm>.
"Sports Salaries: Is One Person Worth It? -." Associated Content - associatedcontent.com. Web. 06 Dec. 2009. <http://www.associatedcontent.com/article/98769/sports_salaries_is_one_ person_worth _pg3_pg3.html?cat=14>.
"Lazio President Claudio Lotito: We Need A Salary Cap in Serie A -." Goal.com. Web. 06 Dec. 2009. <http://www.goal.com/en/news/10/italy/2009/05/19/1274289/lazio-president-claudio-lotito-we-need-a-salary-cap-in-serie>.
"494 Player Salaries of Serie A Kaká is the king." MCalcio.com. Web. 06 Dec. 2009. <http:// www.mcalcio.com/494-player-salaries-of-serie-a-kaka-is-the-king/>.
"Italian Serie A Clubs - ESPN Soccernet." Football news, live scores, and results - ESPN Soccernet -. Web. 06 Dec. 2009. <http://soccernet.espn.go.com/clubs?league=ita.1&c c=5901>.
APPENDIX A – DATA
APPENDIX B – REGRESSION RESULTS AND TESTS RUN
Model 1:
. regress Lnsal numgst posdef posstr clbatt age
Source | SS df MS Number of obs = 56
-------------+------------------------------ F( 5, 50) = 11.23
Model | 18.6820475 5 3.73640951 Prob > F = 0.0000
Residual | 16.6311908 50 .332623815 R-squared = 0.5290
-------------+------------------------------ Adj R-squared = 0.4819
Total | 35.3132383 55 .642058878 Root MSE = .57674
------------------------------------------------------------------------------
Lnsal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
numgst | .0033614 .0079196 0.42 0.673 -.0125455 .0192683
posdef | -.1147892 .1707969 -0.67 0.505 -.4578448 .2282665
posstr | .6747037 .2309306 2.92 0.005 .2108659 1.138541
clbatt | 4.24e-06 1.21e-06 3.50 0.001 1.81e-06 6.67e-06
age | .0906692 .0231195 3.92 0.000 .0442323 .1371061
_cons | 9.418855 .7399261 12.73 0.000 7.93267 10.90504
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
posdef | 1.22 0.818628
posstr | 1.21 0.825730
age | 1.14 0.876317
clbatt | 1.08 0.925927
numgst | 1.07 0.933015
-------------+----------------------
Mean VIF | 1.15
. correlate Lnsal numgst posdef posstr clbatt age
(obs=56)
| Lnsal numgst posdef posstr clbatt age
-------------+------------------------------------------------------
Lnsal | 1.0000
numgst | 0.1067 1.0000
posdef | -0.2808 0.0702 1.0000
posstr | 0.3745 -0.1164 -0.4074 1.0000
clbatt | 0.4869 0.0448 -0.1034 0.0587 1.0000
age | 0.5301 0.2204 -0.1181 0.0427 0.2614 1.0000
. estat ovtest
Ramsey RESET test using powers of the fitted values of Lnsal
Ho: model has no omitted variables
F(3, 47) = 0.86
Prob > F = 0.4667
Model 2:
. regress Lnsal numgst posdef posstr numful clbatt pregmw age
Source | SS df MS Number of obs = 56
-------------+------------------------------ F( 7, 48) = 7.80
Model | 18.7880919 7 2.68401313 Prob > F = 0.0000
Residual | 16.5251464 48 .344273883 R-squared = 0.5320
-------------+------------------------------ Adj R-squared = 0.4638
Total | 35.3132383 55 .642058878 Root MSE = .58675
------------------------------------------------------------------------------
Lnsal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
numgst | .0036011 .0109333 0.33 0.743 -.0183818 .0255839
posdef | -.117013 .1817239 -0.64 0.523 -.4823935 .2483675
posstr | .656926 .2466234 2.66 0.010 .1610564 1.152796
numful | -.0003099 .0052978 -0.06 0.954 -.0109618 .010342
clbatt | 3.78e-06 1.51e-06 2.50 0.016 7.41e-07 6.83e-06
pregmw | .4514768 .8495055 0.53 0.598 -1.256569 2.159522
age | .0907166 .024234 3.74 0.000 .0419908 .1394424
_cons | 9.331261 .8182782 11.40 0.000 7.686002 10.97652
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
numful | 2.01 0.496894
numgst | 1.97 0.506687
clbatt | 1.63 0.613946
pregmw | 1.61 0.620802
posdef | 1.34 0.748468
posstr | 1.33 0.749347
age | 1.21 0.825500
-------------+----------------------
Mean VIF | 1.59
. correlate Lnsal numgst posdef posstr numful clbatt pregmw age
(obs=56)
| Lnsal numgst posdef posstr numful clbatt pregmw age
-------------+------------------------------------------------------------------------
Lnsal | 1.0000
Numgst | 0.1067 1.0000
posdef | -0.2808 0.0702 1.0000
posstr | 0.3745 -0.1164 -0.4074 1.0000
numfu l | 0.1188 0.5860 -0.2494 0.2012 1.0000
clbatt | 0.4869 0.0448 -0.1034 0.0587 0.0566 1.0000
pregmw | 0.3707 0.0384 -0.1217 0.1783 -0.0220 0.5785 1.0000
age | 0.5301 0.2204 -0.1181 0.0427 0.0110 0.2614 0.1397 1.0000
. estat ovtest
Ramsey RESET test using powers of the fitted values of Lnsal
Ho: model has no omitted variables
F(3, 45) = 1.02 Prob > F = 0.3924
Model A:
. regress Lnsal numgst numful clbatt pregmw age
Source | SS df MS Number of obs = 26
-------------+------------------------------ F( 5, 20) = 8.39
Model | 6.93934924 5 1.38786985 Prob > F = 0.0002
Residual | 3.30855202 20 .165427601 R-squared = 0.6771
-------------+------------------------------ Adj R-squared = 0.5964
Total | 10.2479013 25 .40991605 Root MSE = .40673
------------------------------------------------------------------------------
Lnsal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Numgst | .001153 .0125238 0.09 0.928 -.0249713 .0272772
numful | -.000653 .0080325 -0.08 0.936 -.0174086 .0161025
clbatt | 3.27e-06 1.85e-06 1.77 0.092 -5.86e-07 7.13e-06
pregmw | .6421211 1.00554 0.64 0.530 -1.455399 2.739642
age | .0863807 .0274996 3.14 0.005 .0290174 .1437439
_cons | 9.395947 .9123833 10.30 0.000 7.492749 11.29915
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
numgst | 2.76 0.362612
numful | 2.37 0.421108
clbatt | 2.17 0.460571
age | 2.10 0.475192
pregmw | 1.76 0.568713
-------------+----------------------
Mean VIF | 2.23
. correlate Lnsal numgst numful clbatt pregmw age
(obs=26)
| Lnsal numgst numful clbatt pregmw age
-------------+------------------------------------------------------
Lnsal | 1.0000
Numgst | 0.2484 1.0000
Numful | -0.0514 0.6042 1.0000
clbatt | 0.6296 0.0678 0.0630 1.0000
pregmw | 0.3461 0.1661 -0.0173 0.5385 1.0000
age | 0.7353 0.3425 -0.1144 0.4147 0.0978 1.0000
. estat ovtest
Ramsey RESET test using powers of the fitted values of Lnsal
Ho: model has no omitted variables
F(3, 17) = 1.36
Prob > F = 0.2880
Model B:
. regress Lnsal numgst numful clbatt pregmw age
Source | SS df MS Number of obs = 21
-------------+------------------------------ F( 5, 15) = 2.83
Model | 7.08873694 5 1.41774739 Prob > F = 0.0542
Residual | 7.52332351 15 .501554901 R-squared = 0.4851
-------------+------------------------------ Adj R-squared = 0.3135
Total | 14.6120605 20 .730603023 Root MSE = .70821
------------------------------------------------------------------------------
Lnsal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Numgst | .0514482 .0269311 1.91 0.075 -.005954 .1088503
numful | -.0252633 .0134059 -1.88 0.079 -.0538372 .0033106
clbatt | 1.80e-06 2 .93e-06 0.61 0.548 -4.45e-06 8.05e-06
pregmw | .9030625 1.548627 0.58 0.568 -2.397758 4.203883
age | .1375364 .0512311 2.68 0.017 .0283398 .246733
_cons | 7.837642 1.656715 4.73 0.000 4.306436 11.36885
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
numful | 3.26 0.306785
numgst | 3.11 0.321244
pregmw | 1.56 0.641090
clbatt | 1.55 0.645619
age | 1.08 0.926363
-------------+----------------------
Mean VIF | 2.11
. correlate Lnsal numgst numful clbatt pregmw age
(obs=21)
| Lnsal numgst numful clbatt pregmw age
-------------+------------------------------------------------------
Lnsal | 1.0000
Numgst | 0.2113 1.0000
numful | 0.0258 0.8226 1.0000
clbatt | 0.2778 0.1626 0.2491 1.0000
pregmw | 0.2941 0.1736 0.2484 0.5791 1.0000
age | 0.5620 0.1140 0.1426 0.2196 0.2425 1.0000
. estat ovtest
Ramsey RESET test using powers of the fitted values of Lnsal
Ho: model has no omitted variables
F(3, 12) = 0.12
Prob > F = 0.9466
Model C:
. regress Lnsal numgst clbatt pregmw age
Source | SS df MS Number of obs = 10
-------------+------------------------------ F( 4, 5) = 2.24
Model | 4.87241397 4 1.21810349 Prob > F = 0.2003
Residual | 2.72233457 5 .544466914 R-squared = 0.6416
-------------+------------------------------ Adj R-squared = 0.3548
Total | 7.59474854 9 .843860949 Root MSE = .73788
------------------------------------------------------------------------------
Lnsal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Numgst | .0338424 .0262684 1.29 0.254 -.0336825 .1013674
clbatt | 9.55e-06 5.06e-06 1.89 0.118 -3.47e-06 .0000226
pregmw | .8372471 2.570203 0.33 0.758 -5.769669 7.444163
age | .1296185 .1432986 0.90 0.407 -.2387424 .4979793
_cons | 7.553542 4.770114 1.58 0.174 -4.708427 19.81551
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
clbatt | 2.22 0.451018
pregmw | 1.97 0.508396
age | 1.37 0.728427
numgst | 1.08 0.926009
-------------+----------------------
Mean VIF | 1.66
. correlate Lnsal numgst clbatt pregmw age
(obs=10)
| Lnsal numgst clbatt pregmw age
-------------+---------------------------------------------
Lnsal | 1.0000
numgst | 0.2232 1.0000
clbatt | 0.6553 -0.1727 1.0000
pregmw | 0.5048 -0.2528 0.6219 1.0000
age | 0.0252 0.0897 -0.3899 0.0195 1.0000
. estat ovtest
Ramsey RESET test using powers of the fitted values of Lnsal
Ho: model has no omitted variables
F(3, 2) = 2.81
Prob > F = 0.2737