Math Studies - IA

IB MATHEMATICAL STUDIES

Internal Assessment

“An investigation into the value of Ryder Cup as a reflection of the US and Europe’s comparative strength in the sport of golf.”

Peter Frederiksen Svane

St. Mary’s International School

IB Candidate Number: 000134 - 039

March 10th, 2008

INTRODUCTION

The Ryder Cup takes place every other year in September, and is supposed to determine whether Europe or the US is the best in the sport of golf. Each side is represented by twelve golf players, who get the chance to play against each other over the course of three days. Contrary to regular tournaments, the Ryder Cup is played in a match play format rather than using stroke play. The question therefore arises if the Ryder Cup is a true reflection of which region (US or Europe) has the best group of golfers. Are Europeans really better golfers than Americans, since they have won all the meets since the new millennium? To put a final answer to this debate, the investigation will focus on performances in regular tournaments, in which the Ryder Cup players have all competed, and their performance in the Ryder Cup. Various mathematical processes will be carried out within the scope of relevance in order to reach a conclusion to the mentioned task.

The performance of the Ryder Cup team players in regular stroke play tournaments on their seasonal tours, where the players come in direct action against each other under the same conditions, will be determined, and this will be compared and weighed in relation to the outcome of the Ryder Cup. Hence the investigation is intended to compare the results of the Ryder Cup with the Ryder Cup players’ performance in regular stroke play tournaments where they play directly against each other. Using those tournaments as the basis, it can be determined if the Ryder Cup really finds the true winner. If, for example, the European Ryder Cup players have a better score in the regular tournaments in a given period of time than the US Ryder Cup players, and the Europeans have won all the Ryder Cups in the same period of time, then the two results correspond, and it can hence be concluded that the Ryder Cup is a true reflection.

PLAN OF INVESTIGATION

As briefly mentioned in the introduction, the last three Ryder Cups and regular tournaments in the same period of time, where the Ryder Cup players have played against each other under the same conditions will constitute the basis for any mathematical computations and eventually any mathematical processes and analysis. It is beyond the scope of this investigation to collect data from all tournaments, in which the players have appeared. Therefore a select range of tournaments will be chosen, in which the players play under the same conditions: they play on the same course at the same time. The so-called majors (four of them) fall into this category, since all the Ryder Cup players are invited to these tournaments, and usually participate in them because of the attraction of enormous prizes and the great honor of potential victory. Since the millennium 96 rounds of golf have been played in the majors. Although, only three Ryder Cups have been played during this period, the aim of this investigation is limited to ascertaining the trends after the new millennium because a new trend seems to have developed during this time. Europe has won all of them, which was not the case before the millennium where US won 25 out of 33: a winning percentage of () 75.8 %, since the Ryder Cup started in 1927. Therefore it would be interesting to see if the Ryder Cup really reflects the true trend of Europe being better at golf. As mentioned the Majors have been chosen as the basis of determining whether the new Ryder Cup trend is coherent with reality. The different sample sizes are at this point already a potential limitation to the investigation, since it can change the data. Nevertheless, two sets of data rarely have the same sample size.

Final scores from the 96 rounds of golf and 3 Ryder Cups can be found directly from the tournaments’ websites. To collect data from the majors, the players who participated in the Ryder Cup will again be divided into two teams: US and Europe. Each hypothetical team’s performance in the majors will be found by finding the mean of the team players’ total strokes per round. For example both Harrington and Casey participated in the ’06 Ryder Cup. In the Masters (one of four majors) their mean score was 72. In the same way, the entire team’s mean score can be determined in the majors. It is unlikely that all Ryder Cup players participate consequently, and that will be a limitation to the investigation. Concerning the Ryder Cup, Europe has won every time on this side of the millennium. In 2002, 2004 and 2006 Europe won 15½-12½, 18½-9½ and 18½-9½ respectively.

These two sets of data can then be processed into a conclusion. To compare these two different sets of data, a similar unit or measurement will have to be found. The margin of a team’s win is suitable. This will be measured in how many times greater a victory is. So to compare a European Ryder Cup win of 18½ - 9½ (number of points) with a US major win the same year of 72.5 – 73 (mean number of strokes. The lower, the better), the winning margin of these two results can be found:

In the Ryder Cup, Europe, as it can be seen, performed 1.95 times better then the US and in the .9934 better than the US (which is actually worse, because the number is below 1). These two numbers can then be compared. Whenever the calculated value is below 1, US wins, and when it is above 1, Europe wins. A value of 1 represents a tie because for example a Ryder Cup final score of 14 – 14 is () 1. This can be summarized as this:

US win < 1 < EU win

Various statistical measures can then be used to find if there is a relationship between the relative size of victory in the majors and either a loss or win in the Ryder Cup. To do this, a potential model for correlation will be calculated, including relevant measures, such as the strength of that correlation. The question then arises whether, if a correlation becomes apparent, is indeed true. The Chi-squared test can be used to test if the two events are independent and if a potential dependency is due to chance. To ...

This is a preview of the whole essay

US win < 1 < EU win

In short, data will be collected from the majors and the Ryder Cup, and the opponents’ relative win or loss will be calculated. A potential correlation will then be hypothesized. The conclusion will draw upon whether this hypothesis then is rejected or confirmed by the Chi-squared test and logical thinking.

INFORMATION AND DATA COLLECTION

As mentioned in the introduction, the final scores of the majors and the Ryder Cup played in the new millennium can be collected directly from the tournaments’ websites. The sources are as follows:

For the majors, the final scores are arranged as a total over the course of one tournament (four rounds). So to find the average of one team in a major, the players, US’s for example, will be added together and then divided by their total amount of participants and divided by four, because a tournament consists of four rounds of golf. So for example:

US Masters, 2002, US players. Total scores added together, and then a mean is found.

Eleven out of twelve players participated, hence:

A tournament consists of four rounds:

The tables on the next page summarizes all the information obtained using this method. No computations were required for the results of the Ryder Cup. A sample of the data (one tournament) can be found in the Appendices.

Ryder Cup 2002 vs. Majors 2002*

Ryder Cup 2004 vs. Majors 2003 - 2004

Ryder Cup 2006 vs. Majors 2005 - 2006

*2001 has not been included because the Ryder Cup was postponed due to 9/11.

MATHEMATICAL PROCESSES

The data has now been arranged into basically two categories: win/loss trend in stroke play and win/loss trend in match play. These two sets of numbers cannot be compared with each other, and therefore will have to be processed into another measure in which the units are the same. This was explained earlier. As mentioned a way to do this is by calculating a quantitative measure of how many times larger a given victory was, or how many times smaller a given loss was. That can be done for both the Ryder Cup and the majors.

For the Ryder Cup this is done in the following way:

In 2002, Europe won 15½ to 12½ over the US. Their victory was therefore this amount bigger:

Hence, Europe performed 1.24 times better (24%) than the US.

In 2004 and 2006, Europe also won win a winning score of 18½ to 9½.

Hence Europe performed 1.95 better (94.7%) than the US in 2004 and 2006.

When the same is done with the processed data of the majors, the following results become apparent:

In the time up till the Ryder Cup 2002, Europe performed this amount better than US in the majors:

This number shows how many times worse the US performed than Europe. This number reflects a European win because it is greater than 1.

In the seasons following up to Ryder Cup 2004, Europe performed this amount better than the US in the majors:

Indeed, Europe actually performed .9923089048 times worse than US during this period of time by an amount of since .9923089048 < 1.

The time prior to the most recent Ryder Cup in 2006, the following comparative results between the US and Europe in the majors have been found:

Again, Europe performed worse than the US in the majors by an amount .9932860227. Again, the number’s value less than one reflects a US win.

Now the two variables have been defined. The independent being the times better Europe performed better than US in the Majors, and the dependent being how many times greater the European victories in the Ryder Cup was. This can be represented as a scatter plot, which can assist in identifying any correlation between the two variables, and the strength between them.

GRAPH SHOWING THE RELATIONSHIP BETWEEN THE OUTCOME IN MAJORS AND THE OUTCOME IN THE RYDER CUP

As it appears in the scatter plot with the least square regression line, the greater the European win in the Majors is the smaller their victory in the Ryder Cup is. Hence there is a negative correlation between the two variables. In other words, the better the US performs in the majors (going toward zero on the x-axis, since any value below on represents a US win) relatively to Europe, the worse they will perform in the Ryder Cup relatively to Europe. Using a TI-84 calculator, the correlation coefficient, which measures how strong the relationship between the variables is, is determined as - .994 (rounded to three significant figures), and r2 as being .988 (rounded to three significant figures). Thus the negative correlation appears to be very strong.

On the X-axis, the value 1 represents the point were US and Europe ties in the majors. Below that point US wins, and above that point Europe wins. The values reflect how big the difference was for the relative win or loss. The same holds true for the Y-scale. Below 1, US wins, above 1, Europe wins. Although it is not very reliable to extrapolate the data, it is possible to calculate the point at which US would win the Ryder Cup if the line of best fit holds true. Rephrased that will be: How many times will Europe have to perform better in the majors for US to win the Ryder Cup. In order to so, the value of X below the Y-value of one will have to be found. That can be done by setting the equation of the drawn line equal to 1 (the Y-value) and then any X-value below that will result in US Ryder Cup victory. In order to do so, the equation of the line will have to be found. Two points on the line are required to do so. The Y-intercept can be found using the function used for that on the TI-84. Another point that will always be on the line of best fit will be whereandmeet.

The TI-84 identifies the Y-intercept as:

Y-int. = (0, 90.5)

The means of X and Y equals:

≈ .995

≈ 1.713

Hence the two points on the graph can be used to find the equation of the line. First the slope is found (Rise/Run):

b, which is the Y-intercept, we already have, therefore the equations is:

By setting the equation equal to one, it can be found how many times bigger the US will have to lose in majors, if it is to win in the Ryder Cup. Any number below x will result in US Ryder Cup victory: (calculations on the next page).

So Europe will have to perform 1.003363229 times better than US in the majors, for the US to win the Ryder Cup. Extrapolating on data, however, does not necessarily give a true prediction. So perhaps the model for the relationship between the two variables is not true. The Spearman’s rank order of correlation coefficient is used to determine the strength between two sets of data. Applied in this case, it can be used to see if the results in majors agree with the results in the Ryder Cups. This measure will add another perspective in order to determine the validity of the proposed strong negative correlation. A Spearman’s rank order of correlation coefficient value falls between -1 and 1. In the same as with the r-value a value of -1 is a strong negative relationship, 1 is a strong positive relationship, and as the values moves toward zero the strength of the relationship decreases. A value of zero means no correlation.

The following formula is used to calculate the Spearman’s rank order correlation coefficient:

Where

t = Spearman’s rank order correlation coefficient
d = Difference in rankings
n = Number of rankings (this value is six, since there are six data points).

Hence,

All there is left is a little plug&chug:

The Spearman’s rank order of correlation coefficient value of t.946007729 clearly indicates that there is a very strong positive correlation between the results of the majors and the results of the Ryder Cups, since the value is very close to one. In other words, as any team performs better in the majors, they perform better in the Ryder Cup. For this result to be coherent with correlation coefficient found earlier, the value should have been negative and very close to zero, which indicates a strong negative relationship. It is not. In fact the t-value is the exact opposite of the r-value. Therefore the two results conflict and thereby suggest that the r-value, the t-value, or both are false. In order to cast some light upon this, the data can be tested to see if the two events are actually independent. The Chi-squared test can be used to test for this independency.

The null hypothesis, H0: The US performance in majors does not affect their performance in the Ryder Cup.

The alternative hypothesis, Ha: The US performance in majors affects their performance in the Ryder Cup.

Degrees of freedom = (r -1)(c-1)

Degrees of freedom = (2-1)(3-1)

Degrees of freedom = 2

A 5% significance level is one often used by convention: 0.05
We reject H0 if 2calc > 5.99
The is:

= where: fo = Observed frequency

fe = Expected frequency

≈ .09 as it can be seen from the graph. Hence, the null hypothesis, H0, can be accepted because .09 < 5.99. In other words, the Chi-squared test confirms that the Europe’s and the US’s performance in the majors is independent from their performance in the Ryder Cup. Therefore, neither of the correlations found (the t-value and the r-value) between the two variables before are true, and thus the Ryder Cup is not a reflection of the performance in the majors. Neither a positive or negative association exists.

Indeed, the basis of mathematics, logic, can be used to support that it is invalid to say that Europe’s and US’s performance in the majors affect how that they perform in the Ryder Cup. Consider the following:

P: US win majors

Q: US loses Ryder Cups

“If US win the majors, then they lose the Ryder Cup.” ()

Using the relationship suggested by the Spearman’s rank order of correlation coefficient the following was derived:

P: US win the majors

Q: US win the Ryder Cup

“If US win the majors, then they win the Ryder Cup.” ()

For the conclusion of the syllogism to be valid, all answers in the two dark frames in the tables above will have to be true: it will have to be a tautology. It is not and therefore the statements that US victories in the majors lead to Ryder Cup losses and a US victory in the majors lead to a Ryder Cup win are invalid.

Saying this, it is implied that a performance in one tournament doesn’t affect the performance in another one: it is other factors that causes a certain outcome in tournaments.

ANALYSIS, INTERPRETATION AND CONCLUSION

The data that had been collected directly from the event’s website was processed in various ways, mostly statistical, and eventually tested with the chi-square. In addition, logic was used as a final step to underline the findings of the chi-squared test. First it was suggested by the correlation coefficient of -.994, a measure of the strength of a relationship between two sets of data, that a very strong negative correlation existed between results in the majors and the results in the Ryder Cups. In other words as Europe scored better than the US in the Majors, they performed worse in the Ryder Cup as compared with the US. By extrapolating on the data, the point at which US would win, if the correlation was true, Europe would have to win more than 1.0034 times better than the US for the US to win. Spearman’s rank order of correlation coefficient is another measure similar to the correlation coefficient of determination (which was .988) and it was used to double-check the result of the correlation coefficient. The value that was found (.946) suggested a very strong positive relationship between the two variables, which contradicts the initial findings. Therefore, the data was tested using the Chi-squared test, to see if the inconsistency in the two correlation coefficients found, indeed were misleading and no relationship between the two variables exists. The was very low (.09), which firmly rejects the alternative hypothesis, which stated that an association existed between the results in the majors and the results in the Ryder Cups. Hence the two variables are independent. No association exists. As a final step, to resolve some of the contradictions in the data, the basis of all mathematics, logic, was used. The propositions suggested being true by the first correlation coefficient and the second correlation coefficient were invalid. In order to draw a final conclusion, the results of the Chi-square and the logic are more reliable. The amount of data used to calculate the correlation coefficients was limited. Following along the lines of thinking that more data means more reliable results, the data was perhaps processed into too few data points, thus trading off reliability. The results of Chi-squared, which was underlined by the use of logic and the contradiction in the correlation coefficients, indicates that no relationship exists between performance in majors and the outcome of the Ryder Cup. Intuitively, although it is not always the best measure, this seems sensible, since a causational relationship between the two variables is unreasonable. It is more sensible to think that other factors affect the result of the two types of tournaments. That could, for example, be team spirit in the Ryder Cup that gives Europe the extra mile to beat the US substantially, despite the fact that they lost most of 4 out 5 majors. However, it is beyond the scope of this investigation to determine those causes. Returning to the central question of the investigation, it was asked if the Ryder Cup is a true reflection of whom among Europe and the US is the best golf region, the answer is no, if the majors are used as the basis for comparison. Since no relationship exists, it is not even a reflection. Even if the reflection existed as the one shown in the graph, it would result in a conclusion that would have to reject the Ryder Cup as a true reflection of the best region, because the scores in that tournament were the opposite of the majors. It is possible that the outcome in the majors is coherent with the outcome of the Ryder Cup in a given year, but the hypothesized reflection is not guaranteed, because it is not a causational relationship: The same results could is due to other factors. Another conclusion that can be drawn from the calculations is if the match play format is a true reflection of the stroke play. The outcome should be the same as for the Ryder Cup (match play) and the Majors (stroke play), and this relationship should be supported by the Chi-square. However, the method for testing this relationship was not optimal, because the collection of data was not aimed at determining this. If this had had to be done, perhaps not the Ryder Cup should have been used, but rather other tournaments, such as the Accenture World Match Play Competition. It would be speculation to draw any conclusions in that direction from this data sample.

VALIDITY

The extent to which the conclusions are valid is dependent on the level of limitations and the error faced in the investigation. The collection of a data sample was the first source of error. Only the relationship between majors and the Ryder Cup after the millennium was brought to a test, which might have been insufficient quantity to establish a correlation that applies to two kinds of tournaments that have run for more than fifty years. Furthermore it was a limitation in itself that only the majors and the Ryder Cup were used. A second problem was that not all the Ryder Cup players participated in all the major tournaments, thus creating a slightly different set of data sample within the majors. In addition to this, the two data sets were markedly different in sample size. The major concern for this is that the number of Ryder Cups was relatively low, and as the number of those is smaller, the findings become more unreliable. Indeed when the data was processed statistically into the two correlation coefficients, it contradicted itself, thus underlining, probably, the most significant error of the investigation. Another problem that had to be dealt with was how to compare performances in tournaments of different formats. A point system is used in the Ryder Cup, in which a match win gives 1 point, a tie a ½ point, and a loss none. The one two get more then 14 points win. In the majors, it is merely the strokes that count: the fewer the better. In order to compare these, the given regions relative win over the other in the given format was calculated, which gave the same unit of measurement (no. of times larger victory). Nevertheless, the point system of the Ryder Cup is a limitation in itself because it leaves out marginal differences in performances (a player can shoot a better score measured in strokes, but still lose in points to its direct competitor, and vice versa). There was also the question of which majors to compare with which Ryder Cups. The qualification runs two years prior to the Ryder Cup, and therefore majors in the same interval was chosen to be compared to the Ryder Cup at the end of that period. That was, again, subjectively determined, which brings in a human factor. The major from 2001 was omitted because it took place in the year of the terror attacks on world trade center which caused many players to stay away from the important tournament, which thus no longer represents the data: it is an outlier. Again a subjective decision was made. Some errors that were limited by choosing the majors as the basis for comparison to the Ryder Cup were the fact that players played under the same conditions. Otherwise, the players would have been playing at different courses under different weather conditions, etc. And the advantage of home course, or the disadvantage of playing on a new course, might have been even more explicit. Although there might be errors arising from rounding in the calculation, they were not of great importance, since it was only the overall trend that was sought for. When dealing with raw data, some subjective decisions will always have to be made. Therefore some errors always exist. In conclusion, there were many limitations to the investigation that made it subject to human judgment, and that mostly being in the gathering of data and a finding a useful way to compare these two. The Chi-square and application of logic, however, were coherent and thus more reliable, and therefore their conclusions were chosen. The validity of the investigation remains because the conclusions developed in a logical way, despite the limitations of the type and amount of data.

________________________________________________________________________

Word Count: 4559

APPENDICES

________________________________________________________________________

Ryder Cup players in 2002, 2004 and 2006

2002

Scott Hoch
David Toms
David Duval
Hal Sutton
Mark Calcavecchia
Stewart Cink
Scott Verplank
Paul Azinger
Jim Furyk
Davis Love III
Phil Mickelson
Tiger Woods

Europe

Colin Montgomery
Sergio Garcia
Darren Clarke
Bernhard Langer
Padraig Harrington
Thomas Bjørn
Lee Westwood
Niclas Fasth
Paul McGinley
Pierre Fulke
Phillip Price
Jesper Parnevik

2004

Tiger Woods
Phil Mickelson
Davis Love III
Jim Furyk
Kenny Perry
David Toms
Chad Campbell
Chris DiMarco
Fred Funk
Chris Riley
Jay Haas
Stewart Cink

Europe

Paul Casey
Darren Clarke
Luke Donald
Sergio Garcia
Padraig Harrington
David Howell
Miguel Angel Jimenez
Thomas Levet
Paul McGinley
Colin Montgomery
Ian Poulter
Lee Westwood

2006

Tiger Woods
Phil Mickelson
Jim Furyk
Chad Campbell
David Toms
Chris DiMarco
Vaughn Taylor
J. J. Henry
Zach Johnson
Brett Wetterich
Stewart Cink
Scott Verplank

Europe

Darren Clarke
Paul Casey
Luke Donald
Sergio Garcia
Padraig Harrington
David Howell
Robert Karlsson
Paul McGinley
Colin Montgomery
José Maria Olazabal
Henrik Stenson
Lee Westwood

“Play in which the score is reckoned by counting the holes won by each side.” 5th February, 2008: http://dictionary.reference.com/browse/match%20play>

“Golf competition in which the total number of strokes taken is the basis of the score.” 5th February, 2008: http://dictionary.reference.com/browse/stroke%20play

Math Studies - IA

This is a preview of the whole essay

Document Details

Related Essays

Logarithm Bases Math IA

Math IA -Modelling Population Growth in China.

SL Math IA: Fishing Rods

MATH Lacsap's Fractions IA