Discriminant Analysis on Determing if an MLB team will make the playoffs

University Degree Mathematical and Computer Sciences

Introduction

We are running a discriminant analysis to try and predict whether or not a Major League Baseball team will make the playoffs. We are running the analysis for the 2005-2007 MLB seasons. Also we are trying to see based on offense statistics and defense statistics how well the discriminant analysis function can predict the teams that will make the playoffs. The offense statistics that will be our independent variables are: Runs Scored, Batting Average, On Base Percentage, Average Batters Age, and Homeruns. And for defense our independent variables will include; Hits allowed, Runs allowed, Total Team Fielding Percentage, Saves and Average Pitchers Age. Our dependent variable, what we are trying to predict, is making the playoff or not, Playoffs. Which indicates whether or not a team made the playoffs or didn’t make the playoffs for that year. The discriminant analysis will try and predict which teams should have made the playoffs based on the statistics we indicate, and compare them to the actual results to see how accurate the model is at predicting the teams that made the playoffs. Also the anlaysis did not always choose 8 teams to make the playoff (4 from the American League and 4 from the National League) but due to the data provided it is impossible to make the model consistently provide 8 teams being predicted, so the while in those cases the accuracy may be a little off but the data still provides interesting and important results to our analysis.

2007

Descriptive Statistics

For Offense and Defense independent variables there exists no problems with skewness and Kurtosis. So the data for 2007 has no normality problems and the data is sufficient to use.

Offense

Level of measurement and sample size issues

The variables being used in a discriminant analysis should be non-metric for the dependenant variable and metric for the independent variables, which in this analysis and the following analysis’s is true so the measurement level requirement is satisfied. The minimum ratio of valid cases to independent variables for discriminant analysis is 5 to 1, with a preferred ratio of 20 to 1. In this analysis, there are 30 valid cases and 5 independent variables. The ratio here is 6 to 1 so the ratio exceeds the minimum. So the sample size requirement for discriminant analysis is satisfied.

Analysis Case Processing Summary

In addition to the requirement for the ratio of cases to independent variables, discriminant analysis requires that there be a minimum number of cases in the smallest group defined by the dependent variable. The number of cases in the smallest group must be larger than the number of independent variables, and preferably contain 20 or more cases. In this analysis, the number of cases in the smallest group does not contain more than 20 but it does contain more than 5, which is the number of independent variables. This requirement is also met. This will be the same for all preceding years and whether it is the offense or defense data, so for the preceding analysis it will not be included because the data will be redundant if provided.

Prior Probabilities for Groups

Assumption of homogeneity of variance

If we fail to reject the null hypothesis and conclude that the variances are equal, we use the SPSS default of using a pooled covariance matrix in classification. And in this case the significance of .149 > .05 so we fail to reject and the homogeneity is satisfied in this case.

Test Results

Overall Relationship

The Wilks' lambda statistic for the test of the function (Wilks' lambda=.496) had a probability ...