Let’s consider linear function;
Figure 6 the graph of a linear equation, y=x
The linear equation has a line graph. The general form of linear equation is y=ax+b. In the equation, ‘a’ is a constant which is called gradient. Linear equation can have both upward and downward sloping graphs when ‘a’ is greater than 0, and ‘a’ is lesser than 0 respectively.
However, no matter what value do x and y have, the gradient ‘a’ is the same in a linear function.
Figure 7 the graph of winning gold medal heights from year 1932 to 1980
Figure 7 has the same points with Figure 2, but this seems little different with what shape figure 2 has. This is because figure 2 has y-axis which starts from 195. But in figure 7, it has origin (0, 0). Therefore we can see the same graph in different appearance. In figure 7, we can notice that a black line passes through some points, not all, in figure 7. If we assume the line is a graph of linear function (note that the linear function has a line graph), we can know that the points don’t shape a line to match the function graph. Let’s be more specific;
For solving out the gradient, we should use that gradient=
.
Note that
are calculated by subducting two consecutive numbers or data. According to the table shown, we can notice that the gradient of the graph of winning height (figure 7) changes irregularly. Therefore, the graph cannot be interpreted into a linear function.
Through the processes, I excluded 4 functions; sine, cosine, linear and quadratic functions. Since the data given has irregular patterns, I think that I can find best-fit model in polynomial functions. Thus, I am going to investigate the task from the cubic function. I will try total three different types of functions; cubic, quartic, and quintic functions.
Let’s start the investigation.
-
Consider cubic function:
The general form of cubic function is:
By using the points (data) given, I am going to calculate the values of coefficient a, b, c, and constant d. I will solve them using matrix.
Before getting into solving step, first of all, I am going to choose 4 points which are needed to solve 4 unknown variables. I have thought about the points I am going to choose which can be affecting the results seriously, and I decided to choose the first and the last points, and two points in middle. How to pick two points in middle? Look at the table below;
There are 11 points (data). I chose the first and the last one, so that choose two more points among 9 points that are not selected. Let (64, 216) be the middle, (40, 203) be the starting point, and (80, 225) be the ending point. I have chosen the point (56, 204) which is between (40, 203) and (64, 216). I also have chosen the point (72, 224) which is between (64, 216) and (80, 225). Therefore, four points I chose are, (36, 197), (56, 204), (72, 224) and (84, 236).
Suppose the function is:
Given that 4 points are (36, 197), (56, 204), (72, 224) and (84, 236)
Substitute the points into the equation, and I got;
For using matrix, we should know
, and we have;
Let A =
, X=
and B=
=
By Graphing Calculator (GC), the solution is (remain to 6 significant figures);
So, the cubic function model 1 is,
y=
Figure 8 The red points show original data, and the blue curve is the cubic function curve.
Note that highlighted in red color points have the difference value greater than 6.
For adjusting the model, I would try twice more, with 2 different set of points.
The light-blue colored data was used, so this time, I am going to pick 4 other points which were not selected before.
I will pick (40, 203) and (80, 225). At this time, still, I will set the point (64, 216) as a middle point, and I will pick and (68, 218)
Suppose the function is:
Given that 4 points are (40, 203), (60, 212), (68, 218), and (80, 225)
Substitute the points into the equation, and I got;
For using matrix, we should know
, and we have;
Let A =
, X=
and B=
=
By Graphing Calculator (GC), the solution is (remain to 6 significant figures);
So, the cubic function model 2 is,
y=
Figure 9 The red points show original data, the blue curve is the cubic function curve.
Note that red colored points have the difference value greater than 6.
At the last test of cubic function model, I decided to choose middle points of the data given. I will choose the middle point first two data. Because in figure 1, we can see that these two points have relatively lesser gap in x value, than the gap of x value between the second point (40, 203) and the third point (52, 198). And then, I will separate other data into 3 groups of 3, and figure out the middle points.
For finding middle point between two points, we should use;
(
,
)
For finding middle point among three points;
(
,
)
The points are:
Point 1: (
,
) = (38, 200)
Point 2: (
,
) = (56, 204.67)
Point 3: (
,
) = (68, 219.33)
Point 4: (
,
) = (80, 228)
Suppose the function is:
Given that 4 points are (38, 200), (56, 204.67), (68, 219.33), and (80, 228)
Substitute the points into the equation, and I got;
For using matrix, we should know
, and we have;
Let A =
, X=
and B=
=
By Graphing Calculator (GC), the solution is (remain to 6 significant figures);
So, the cubic function model 3 is,
y=
Figure 10 The red points show original data, and the blue curve is the cubic function curve.
Note that red colored points have the difference value greater than 6.
Note that red colored points have the difference value greater than 6.
The table above shows the difference value of y and y2 in three different cubic models which have used three various sets of points. We can see that, although the third model has only one difference greater than 6, the model 2 has the least difference among three of them with the least differences in 6 data among 11 data given. Therefore, I will choose the second model as the representative model of cubic function.
- Consider quartic function;
The general form of quartic function is:
As I solved the cubic function by matrix, I am going to find out the coefficient a, b, c, d and constant e in matrix again.
Since there are 5 unknown variables, I have to determine what 5 points going to choose. Still, I will pick the first and the last points, and the point listed in middle.
I will try two times to test the quartic function. I will fix these three points, and I am going to set three different sets of two points. First time, I will choose point (52, 198) and (76, 223). Second time, I will take two middle points solved by using 4 points each. The 4 points are the points colored in black in the table.
Given that 5 points are (36, 197), (52, 198), (64, 216), (76, 223) and (84, 236)
Substitute the points into the equation, and I got;
For using matrix, we should know
, and we have;
Let A =
, X=
and B=
=
By Graphing Calculator (GC), the solution is (remain to 6 significant figures);
So, the quartic function model 1 is,
Figure 11 The red points show original data, and the blue curve is the quartic function curve.
Note that red colored points have the difference value greater than 6.
- In the second test, we need to find out the mid points,
(
,
)
Point 1: (
,
) = (52, 204.25)
Point 2: (
,
) = (74, 222.5)
Given that 5 points are (36, 197), (52, 204.25), (64, 216), (74, 222.5) and (84, 236) , Substitute the points into the equation, and I got;
For using matrix, we should know
, and we have;
Let A =
, X=
and B=
=
By Graphing Calculator (GC), the solution is (remain to 6 significant figures);
So, the quartic function model 2 is,
Figure 11 The red points show original data, and the blue curve is the quartic function curve.
Note that red colored points have the difference value greater than 6.
Note that red colored points have the difference value greater than 6.
In three different quartic models, I could find the difference value which is y value of quartic function - y value of original function. The table shows very well in different colors. The table presents the difference values, and the model 1 has the least difference value among the three of them. Model 1 has the smallest value at 7 points among 11 points given. Therefore, the representative model of the quartic function is the first model.
Now, let’s compare the difference values of representative models of cubic and quartic functions.
According to the comparison, the quartic model has lesser difference values than cubic model has. The lesser difference value means the curve is close to the original curve. Therefore, I would like to say that the quartic function is the best-fit model. The equation of the function would be:
Figure 12 the red points show original data, and the blue curve shows my model functional curve, quartic function.
We can notice that my model curve doesn’t match with the original curve at some points. These graphs (in figure 12) are shown in y-axis having small scale, but if it is shown in big scale, then it would be almost get together except for the second point and the third point in the original graph. Since the quartic equation curve has upward sloping regularly, the model doesn’t pass through the points which have sudden increase, or sudden decrease. However, it fits the original curve looking at big- scaled graph.
Figure 13 The model curve and the original curve in big-scaled graph
Depending on the investigation, let’s assume and estimate what winning height would be if there were Olympics held in year 1940 and 1944. The x-values of year 1940 and 1944 are 44 and 48 respectively. In the model, the year 1940 (44) has 195.7643 of y-value, which means the winning height is about 195 centimeters. In year 1944 (48), the y-value is 199.1665, about 199 centimeters high is praised gold medal. However, in year 1936, the data given is 203 cm, and year 1948, when the first Olympic composed after the wars ended, has the data of 198. Because the model curve is based on the data given already, it cannot give us predictable results for the year 1940 and 1944. But, we can know that the original curve has irregularly upward sloping, and there are also some factors causing the sudden increase or decrease in data number. So we can estimate the winning height of year 1940 and 1944 for setting a range. For prediction, I think the winning height in 1940 and 1944 would be
How about other years? I will estimate the winning heights of 1984 which is not shown data, and 2016, the future outcomes. The x- values of them are 88 and 120 respectively which are calculated by number of year – the first year. According to the graph of the model curve, we can notice that the y-value of year 1984 (x- value =88) will have 245.7069 and the y- value of year 2016 (x-value = 120) will be, 805.0598. However, the data of year 1980 shown is 236cm high, and the year 1976 has the data of 225. There was 11 cm gap between the winning heights of different Olympics. The model curve has increasing sloping although there is small fluctuation, I may estimate the winning height of year 1984, will has the range of
, which is concerned about fluctuations. Year 2016, will have much higher predicted winning height, about 805 cm. in my opinion, this is impossible in high jump which is jumping without any instrument can enhance the athlete’s jumping ability. Therefore, I may predict the winning height which has higher than the data given; the winning height of 2016 high jump will have the range of
There are one more data set given shows the winning heights or all the other Olympics composed since 1896.
If I organize the table using my x-value, then;
I draw a graph to show how my model is fit to the other additional data;
Figure 14 the red curve is original curve and the blue curve is the graph of the model.
Figure 14, shows that my model curve only fits to the data that I have investigated above. My model curve has a very high start points, and it decreases rapidly. And then, the curve closed to the original curve where from x=36 to x=84 around 200 (cm), and it goes up again afterward. However, the original curve remains its upward sloping, but less increases. If we look at the original graph, it still has fluctuations but keeps its upward sloping. Look at the below graph which shows original curve.
Figure 15 the original curve with much more data.
The figure shows the original graph and its fluctuations. It has two significant fluctuation points. The first one is year 1904, has 10cm lesser height than the winning height of year 1896. The possible reason for this is that the Olympic was not held many times, so there would be lack of trained athletics. The second one is the year 1948. Because the Olympics were not composed in 1940 and 1944, the winning height decreased from 203cm in 1936 to 198 in 1948. There are two possible reasons to support my opinion. First, because of outbreak of wars, the athletics died. Second, after the war ended, it might be hard to have a great circumstance for training.
Since the original curve is looked like a linear functional curve, I tried to find the trend curve by using MS excel;
In figure 16, Except for the some fluctuation points, the linear functional curve is quite fit to the all data. It has Pearson correlation value of 0.9289, and it’s quite close to 1. Therefore, the linear function is the final model for best-fit functional curve to the original graph.
Figure 16 the original graph and its trend shown.
My investigation started from analyzing the given data, and I developed it one after another. For the first, the task is analyzed based on the limited number of the data given. I tried the sine and cosine functional curves first, quadratic function curve next and then tried linear function curve by using characteristics of functions. Because the functions I mentioned were not fit to the data given, I tried to consider the cubic and quartic functions to figure out the best-fit functional form. I tested three different cubic models and two different quartic models. I made the models to have different points so that able to find the model which is closest to the original data. For the limited number of data given, I found that the quartic function, especially for the quartic model 1 is the best- fit functional curve. However, when I attempted to check whether the quartic model is fit to the other data or not, I discovered that my function model curve was only fit to the data which is limited, given first. I used the data, include the second data set, or the data given later, I drew out the graph to notice the trend. Eventually, I saw that the trend was more likely to the linear functional curve. Although the gradient of the curve is not constant, the data has increasing trend. Therefore, in short, through the processes, I got my best-fit function curve, which is linear functional curve.