US win < 1 < EU win
Various statistical measures can then be used to find if there is a relationship between the relative size of victory in the majors and either a loss or win in the Ryder Cup. To do this, a potential model for correlation will be calculated, including relevant measures, such as the strength of that correlation. The question then arises whether, if a correlation becomes apparent, is indeed true. The Chi-squared test can be used to test if the two events are independent and if a potential dependency is due to chance. To support any findings that the Chi-squared test reveals, logic can be used to test the truth of the syllogism, which the correlation provides a potential answer to.
In short, data will be collected from the majors and the Ryder Cup, and the opponents’ relative win or loss will be calculated. A potential correlation will then be hypothesized. The conclusion will draw upon whether this hypothesis then is rejected or confirmed by the Chi-squared test and logical thinking.
INFORMATION AND DATA COLLECTION
As mentioned in the introduction, the final scores of the majors and the Ryder Cup played in the new millennium can be collected directly from the tournaments’ websites. The sources are as follows:
For the majors, the final scores are arranged as a total over the course of one tournament (four rounds). So to find the average of one team in a major, the players, US’s for example, will be added together and then divided by their total amount of participants and divided by four, because a tournament consists of four rounds of golf. So for example:
US Masters, 2002, US players. Total scores added together, and then a mean is found.
Eleven out of twelve players participated, hence:
A tournament consists of four rounds:
The tables on the next page summarizes all the information obtained using this method. No computations were required for the results of the Ryder Cup. A sample of the data (one tournament) can be found in the Appendices.
Ryder Cup 2002 vs. Majors 2002*
Ryder Cup 2004 vs. Majors 2003 - 2004
Ryder Cup 2006 vs. Majors 2005 - 2006
*2001 has not been included because the Ryder Cup was postponed due to 9/11.
MATHEMATICAL PROCESSES
The data has now been arranged into basically two categories: win/loss trend in stroke play and win/loss trend in match play. These two sets of numbers cannot be compared with each other, and therefore will have to be processed into another measure in which the units are the same. This was explained earlier. As mentioned a way to do this is by calculating a quantitative measure of how many times larger a given victory was, or how many times smaller a given loss was. That can be done for both the Ryder Cup and the majors.
For the Ryder Cup this is done in the following way:
In 2002, Europe won 15½ to 12½ over the US. Their victory was therefore this amount bigger:
Hence, Europe performed 1.24 times better (24%) than the US.
In 2004 and 2006, Europe also won win a winning score of 18½ to 9½.
Hence Europe performed 1.95 better (94.7%) than the US in 2004 and 2006.
When the same is done with the processed data of the majors, the following results become apparent:
In the time up till the Ryder Cup 2002, Europe performed this amount better than US in the majors:
.
This number shows how many times worse the US performed than Europe. This number reflects a European win because it is greater than 1.
In the seasons following up to Ryder Cup 2004, Europe performed this amount better than the US in the majors:
Indeed, Europe actually performed .9923089048 times worse than US during this period of time by an amount of since .9923089048 < 1.
The time prior to the most recent Ryder Cup in 2006, the following comparative results between the US and Europe in the majors have been found:
Again, Europe performed worse than the US in the majors by an amount .9932860227. Again, the number’s value less than one reflects a US win.
Now the two variables have been defined. The independent being the times better Europe performed better than US in the Majors, and the dependent being how many times greater the European victories in the Ryder Cup was. This can be represented as a scatter plot, which can assist in identifying any correlation between the two variables, and the strength between them.
GRAPH SHOWING THE RELATIONSHIP BETWEEN THE OUTCOME IN MAJORS AND THE OUTCOME IN THE RYDER CUP
As it appears in the scatter plot with the least square regression line, the greater the European win in the Majors is the smaller their victory in the Ryder Cup is. Hence there is a negative correlation between the two variables. In other words, the better the US performs in the majors (going toward zero on the x-axis, since any value below on represents a US win) relatively to Europe, the worse they will perform in the Ryder Cup relatively to Europe. Using a TI-84 calculator, the correlation coefficient, which measures how strong the relationship between the variables is, is determined as - .994 (rounded to three significant figures), and r2 as being .988 (rounded to three significant figures). Thus the negative correlation appears to be very strong.
On the X-axis, the value 1 represents the point were US and Europe ties in the majors. Below that point US wins, and above that point Europe wins. The values reflect how big the difference was for the relative win or loss. The same holds true for the Y-scale. Below 1, US wins, above 1, Europe wins. Although it is not very reliable to extrapolate the data, it is possible to calculate the point at which US would win the Ryder Cup if the line of best fit holds true. Rephrased that will be: How many times will Europe have to perform better in the majors for US to win the Ryder Cup. In order to so, the value of X below the Y-value of one will have to be found. That can be done by setting the equation of the drawn line equal to 1 (the Y-value) and then any X-value below that will result in US Ryder Cup victory. In order to do so, the equation of the line will have to be found. Two points on the line are required to do so. The Y-intercept can be found using the function used for that on the TI-84. Another point that will always be on the line of best fit will be whereandmeet.
The TI-84 identifies the Y-intercept as:
Y-int. = (0, 90.5)
The means of X and Y equals:
≈ .995
≈ 1.713
Hence the two points on the graph can be used to find the equation of the line. First the slope is found (Rise/Run):
b, which is the Y-intercept, we already have, therefore the equations is:
By setting the equation equal to one, it can be found how many times bigger the US will have to lose in majors, if it is to win in the Ryder Cup. Any number below x will result in US Ryder Cup victory: (calculations on the next page).
So Europe will have to perform 1.003363229 times better than US in the majors, for the US to win the Ryder Cup. Extrapolating on data, however, does not necessarily give a true prediction. So perhaps the model for the relationship between the two variables is not true. The Spearman’s rank order of correlation coefficient is used to determine the strength between two sets of data. Applied in this case, it can be used to see if the results in majors agree with the results in the Ryder Cups. This measure will add another perspective in order to determine the validity of the proposed strong negative correlation. A Spearman’s rank order of correlation coefficient value falls between -1 and 1. In the same as with the r-value a value of -1 is a strong negative relationship, 1 is a strong positive relationship, and as the values moves toward zero the strength of the relationship decreases. A value of zero means no correlation.
The following formula is used to calculate the Spearman’s rank order correlation coefficient:
Where
-
t = Spearman’s rank order correlation coefficient
-
d = Difference in rankings
-
n = Number of rankings (this value is six, since there are six data points).
Hence,
All there is left is a little plug&chug:
The Spearman’s rank order of correlation coefficient value of t.946007729 clearly indicates that there is a very strong positive correlation between the results of the majors and the results of the Ryder Cups, since the value is very close to one. In other words, as any team performs better in the majors, they perform better in the Ryder Cup. For this result to be coherent with correlation coefficient found earlier, the value should have been negative and very close to zero, which indicates a strong negative relationship. It is not. In fact the t-value is the exact opposite of the r-value. Therefore the two results conflict and thereby suggest that the r-value, the t-value, or both are false. In order to cast some light upon this, the data can be tested to see if the two events are actually independent. The Chi-squared test can be used to test for this independency.
-
The null hypothesis, H0: The US performance in majors does not affect their performance in the Ryder Cup.
The alternative hypothesis, Ha: The US performance in majors affects their performance in the Ryder Cup.
- Degrees of freedom = (r -1)(c-1)
Degrees of freedom = (2-1)(3-1)
Degrees of freedom = 2
- A 5% significance level is one often used by convention: 0.05
-
We reject H0 if 2calc > 5.99
-
The is:
= where: fo = Observed frequency
fe = Expected frequency
≈ .09 as it can be seen from the graph. Hence, the null hypothesis, H0, can be accepted because .09 < 5.99. In other words, the Chi-squared test confirms that the Europe’s and the US’s performance in the majors is independent from their performance in the Ryder Cup. Therefore, neither of the correlations found (the t-value and the r-value) between the two variables before are true, and thus the Ryder Cup is not a reflection of the performance in the majors. Neither a positive or negative association exists.
Indeed, the basis of mathematics, logic, can be used to support that it is invalid to say that Europe’s and US’s performance in the majors affect how that they perform in the Ryder Cup. Consider the following:
P: US win majors
Q: US loses Ryder Cups
“If US win the majors, then they lose the Ryder Cup.” ()
Using the relationship suggested by the Spearman’s rank order of correlation coefficient the following was derived:
P: US win the majors
Q: US win the Ryder Cup
“If US win the majors, then they win the Ryder Cup.” ()
For the conclusion of the syllogism to be valid, all answers in the two dark frames in the tables above will have to be true: it will have to be a tautology. It is not and therefore the statements that US victories in the majors lead to Ryder Cup losses and a US victory in the majors lead to a Ryder Cup win are invalid.
Saying this, it is implied that a performance in one tournament doesn’t affect the performance in another one: it is other factors that causes a certain outcome in tournaments.
ANALYSIS, INTERPRETATION AND CONCLUSION
The data that had been collected directly from the event’s website was processed in various ways, mostly statistical, and eventually tested with the chi-square. In addition, logic was used as a final step to underline the findings of the chi-squared test. First it was suggested by the correlation coefficient of -.994, a measure of the strength of a relationship between two sets of data, that a very strong negative correlation existed between results in the majors and the results in the Ryder Cups. In other words as Europe scored better than the US in the Majors, they performed worse in the Ryder Cup as compared with the US. By extrapolating on the data, the point at which US would win, if the correlation was true, Europe would have to win more than 1.0034 times better than the US for the US to win. Spearman’s rank order of correlation coefficient is another measure similar to the correlation coefficient of determination (which was .988) and it was used to double-check the result of the correlation coefficient. The value that was found (.946) suggested a very strong positive relationship between the two variables, which contradicts the initial findings. Therefore, the data was tested using the Chi-squared test, to see if the inconsistency in the two correlation coefficients found, indeed were misleading and no relationship between the two variables exists. The was very low (.09), which firmly rejects the alternative hypothesis, which stated that an association existed between the results in the majors and the results in the Ryder Cups. Hence the two variables are independent. No association exists. As a final step, to resolve some of the contradictions in the data, the basis of all mathematics, logic, was used. The propositions suggested being true by the first correlation coefficient and the second correlation coefficient were invalid. In order to draw a final conclusion, the results of the Chi-square and the logic are more reliable. The amount of data used to calculate the correlation coefficients was limited. Following along the lines of thinking that more data means more reliable results, the data was perhaps processed into too few data points, thus trading off reliability. The results of Chi-squared, which was underlined by the use of logic and the contradiction in the correlation coefficients, indicates that no relationship exists between performance in majors and the outcome of the Ryder Cup. Intuitively, although it is not always the best measure, this seems sensible, since a causational relationship between the two variables is unreasonable. It is more sensible to think that other factors affect the result of the two types of tournaments. That could, for example, be team spirit in the Ryder Cup that gives Europe the extra mile to beat the US substantially, despite the fact that they lost most of 4 out 5 majors. However, it is beyond the scope of this investigation to determine those causes. Returning to the central question of the investigation, it was asked if the Ryder Cup is a true reflection of whom among Europe and the US is the best golf region, the answer is no, if the majors are used as the basis for comparison. Since no relationship exists, it is not even a reflection. Even if the reflection existed as the one shown in the graph, it would result in a conclusion that would have to reject the Ryder Cup as a true reflection of the best region, because the scores in that tournament were the opposite of the majors. It is possible that the outcome in the majors is coherent with the outcome of the Ryder Cup in a given year, but the hypothesized reflection is not guaranteed, because it is not a causational relationship: The same results could is due to other factors. Another conclusion that can be drawn from the calculations is if the match play format is a true reflection of the stroke play. The outcome should be the same as for the Ryder Cup (match play) and the Majors (stroke play), and this relationship should be supported by the Chi-square. However, the method for testing this relationship was not optimal, because the collection of data was not aimed at determining this. If this had had to be done, perhaps not the Ryder Cup should have been used, but rather other tournaments, such as the Accenture World Match Play Competition. It would be speculation to draw any conclusions in that direction from this data sample.
VALIDITY
The extent to which the conclusions are valid is dependent on the level of limitations and the error faced in the investigation. The collection of a data sample was the first source of error. Only the relationship between majors and the Ryder Cup after the millennium was brought to a test, which might have been insufficient quantity to establish a correlation that applies to two kinds of tournaments that have run for more than fifty years. Furthermore it was a limitation in itself that only the majors and the Ryder Cup were used. A second problem was that not all the Ryder Cup players participated in all the major tournaments, thus creating a slightly different set of data sample within the majors. In addition to this, the two data sets were markedly different in sample size. The major concern for this is that the number of Ryder Cups was relatively low, and as the number of those is smaller, the findings become more unreliable. Indeed when the data was processed statistically into the two correlation coefficients, it contradicted itself, thus underlining, probably, the most significant error of the investigation. Another problem that had to be dealt with was how to compare performances in tournaments of different formats. A point system is used in the Ryder Cup, in which a match win gives 1 point, a tie a ½ point, and a loss none. The one two get more then 14 points win. In the majors, it is merely the strokes that count: the fewer the better. In order to compare these, the given regions relative win over the other in the given format was calculated, which gave the same unit of measurement (no. of times larger victory). Nevertheless, the point system of the Ryder Cup is a limitation in itself because it leaves out marginal differences in performances (a player can shoot a better score measured in strokes, but still lose in points to its direct competitor, and vice versa). There was also the question of which majors to compare with which Ryder Cups. The qualification runs two years prior to the Ryder Cup, and therefore majors in the same interval was chosen to be compared to the Ryder Cup at the end of that period. That was, again, subjectively determined, which brings in a human factor. The major from 2001 was omitted because it took place in the year of the terror attacks on world trade center which caused many players to stay away from the important tournament, which thus no longer represents the data: it is an outlier. Again a subjective decision was made. Some errors that were limited by choosing the majors as the basis for comparison to the Ryder Cup were the fact that players played under the same conditions. Otherwise, the players would have been playing at different courses under different weather conditions, etc. And the advantage of home course, or the disadvantage of playing on a new course, might have been even more explicit. Although there might be errors arising from rounding in the calculation, they were not of great importance, since it was only the overall trend that was sought for. When dealing with raw data, some subjective decisions will always have to be made. Therefore some errors always exist. In conclusion, there were many limitations to the investigation that made it subject to human judgment, and that mostly being in the gathering of data and a finding a useful way to compare these two. The Chi-square and application of logic, however, were coherent and thus more reliable, and therefore their conclusions were chosen. The validity of the investigation remains because the conclusions developed in a logical way, despite the limitations of the type and amount of data.
________________________________________________________________________
Word Count: 4559
APPENDICES
________________________________________________________________________
Ryder Cup players in 2002, 2004 and 2006
2002
- Scott Hoch
- David Toms
- David Duval
- Hal Sutton
- Mark Calcavecchia
- Stewart Cink
- Scott Verplank
- Paul Azinger
- Jim Furyk
- Davis Love III
- Phil Mickelson
- Tiger Woods
- Colin Montgomery
- Sergio Garcia
- Darren Clarke
- Bernhard Langer
- Padraig Harrington
- Thomas Bjørn
- Lee Westwood
- Niclas Fasth
- Paul McGinley
- Pierre Fulke
- Phillip Price
- Jesper Parnevik
2004
- Tiger Woods
- Phil Mickelson
- Davis Love III
- Jim Furyk
- Kenny Perry
- David Toms
- Chad Campbell
- Chris DiMarco
- Fred Funk
- Chris Riley
- Jay Haas
- Stewart Cink
- Paul Casey
- Darren Clarke
- Luke Donald
- Sergio Garcia
- Padraig Harrington
- David Howell
- Miguel Angel Jimenez
- Thomas Levet
- Paul McGinley
- Colin Montgomery
- Ian Poulter
- Lee Westwood
2006
- Tiger Woods
- Phil Mickelson
- Jim Furyk
- Chad Campbell
- David Toms
- Chris DiMarco
- Vaughn Taylor
- J. J. Henry
- Zach Johnson
- Brett Wetterich
- Stewart Cink
- Scott Verplank
- Darren Clarke
- Paul Casey
- Luke Donald
- Sergio Garcia
- Padraig Harrington
- David Howell
- Robert Karlsson
- Paul McGinley
- Colin Montgomery
- José Maria Olazabal
- Henrik Stenson
- Lee Westwood
“Play in which the score is reckoned by counting the holes won by each side.” 5th February, 2008: http://dictionary.reference.com/browse/match%20play>
“Golf competition in which the total number of strokes taken is the basis of the score.” 5th February, 2008: http://dictionary.reference.com/browse/stroke%20play