Figure 3
The simple cubic function of y= x3 must be modified in order to visualize the curvature of the function better. The cubic function has a similar aspect to that of a quadratic, however in a cubic function the slope rises to a maximum then decreases to a minimum and continues to increase. This function will not accommodate with the data (Figure 1) because the data rises and then slowly the positive correlation begins to decrease in slope. In this curvature, the function increases like the data but does not slow down, therefore this curve will not accurately match the correlation (Figure 1)
Figure 3 shows the shape of a cubic function which has been modified to h = ½ t3 + 2x2 + ½ x -2
Figure 4 Square root function- general equation f (x) =
Figure 4 shows the shape of a Square Root function, h =
is more zoomed in to its origin
The curvature of a square root function is fairly simple, in Figure 4 the root curve begins with a slope greater than one. It eventually hits the slope of one and lastly it continues to decrease in slope meaning it will infinitely approach closer to a value but never reaching it. This function could represent majority of the data points as the correlation increases then slowly levels out. However, to specific reference to the data point (1936, 203) indicates that a line of best fit would start with a negative correlation then switch into a positive correlation with the following years. The square root function does not have a curvature of a negative slope curving into a positive slope. Therefore, even though a Square root function may represent most of the data points, it will not accompany the curvature the correlations indicate.
Figure 5 Sinusoidal Function- general equation f (x) = a sin (bx – c) +d
Figure 5 shows the shape of a Sine Function of one period, y = sin (x)
A sinusoidal function is composed of continuous periods that follow the same pattern. In one period as shown in figure 5 begins with a negative slope slowly decreasing. Then it hits a minimum point with slope zero. Afterwards the slope increases in a positive manner hitting an inflection point where the concavity becomes concave down. Lastly it hits a maximum point with slope zero and the curve goes down into a negative slope. This would best represent the curvature of the data collection because it follows the same pattern as the sine function. Upon observing the points (1932, 197), (1936, 236) and (1948, 198) from figure 1 it is evident that a line of best fit would begin with a negative slope and then rise. As the curve continues from concave up, the points (1956, 212) and (1960, 216) show a change of concavity to concave down. Furthermore, the trend slowly begin to decrease in slope, and the point (1982, 223) has a lower height than the previous year indicating a negative slope, showing a sign of a maximum point needs to be established. An acknowledgement of the specific point (1980, 236) is made as it shows a significant increase in height which may provide limitation of the sinusoidal function however, it still models the correlation better in comparison with the other function. Therefore, the sinusoidal function will best represent the data in figure 1 and will be used in this investigation.
Creating an equation for the Sinusoidal Function
The general equation for the function is f (x) = a sin (bx – c) +d, where each variable impacts the appearance of the function in different way. The parameters of the sinusoidal function in the equation are a, b, c, and d. The a effects the amplitude also knows as vertical stretch/compression which is calculated by finding the midpoint between the max and min. The b value affects the horizontal stretch/compression which is calculated by two times pi all over k. The k represents the period which is the distance between two successive maximum points or minimum points. The c value affects the horizontal shift which is calculated after knowing all over variables then solving for c. Lastly, the d value represents the vertical shift, which can be calculated by finding the average point of heights.
Algebraically approaching the function
Amplitude (a)
Period (k)
The year with lowest height is 1932, and highest height is 1980. Therefore, multiplying the difference by two will give the distance of one cycle (between two successive max and/or min).
Horizontal stretch/ compression (b)
Sub in value of k
This will not be converted to degree in order to maintain exact value.
Vertical shift (d)
Horizontal shift (c)
h = a sin [b (x-c)] +d
Sub in all known variables
h= 19.5 sin [
(x – c) ] + 216.5
Take a value of any point and plug it into the equation and solve for c, the decision was taking (1960, 216) since it’s in centre of the values gives.
216 = 19.5sin [
(1960-c)] + 216.5
-0.5 = 19.5 sin [
(1960-c)]
−0.0256410256410 ≈ sin(π/48(1960-c))
−0.0256438361401≈ (π/48(1960-c)) - sin inverse applied
−0.3918089550276≈1960-c - note: calculator in radiant
−1960.391808955 ≈ -c
c≈1960.391808955
c≈1960.4 - therefore c is approximately 1960
Model of the algebraic derived function
The general formula of sinusoidal function becomes h = 19.5 sin [
(t - 1960)] + 216.5 based solely on the data points from the gold heights attained between years 1932 and 1980.
Figure 6
Figure 6 shows the derived model function employed on the original data from figure 1
The algebraically derived function is actually well representation of the original data. Even though the graph is zoomed in to a large extent, the model function passes through approximately 4 points. However, it is not very accurate since many points are located further away below the function. In order to make this function more accurate a few adjustments need to be made.
The restrictions on the function as established in the beginning remains the same. Domain of the function in Figure 6 is {tϵℝ |1932 ≤ x ≤ 1980} and the range for this function is {hϵℝ |197 ≤ y ≤ 236}. Also, the grid range for the data remains the same which were 1915 to 1995 on the horizontal axis and 185 to 245 on the vertical axis.
First of all, the function needs to be shifted a little to the right; therefore a greater numerical value needs to be present for a shift to the right.
Figure 7
After a minor adjustment in moving the horizontal shift to the right, the function is much more accurate. However, the function can better represent the model if it was vertically shifted lower by the slightest amount.
Figure 7 shows the horizontal shifted function by two units to the right
Figure 8
Figure 8 shows the finalized function after the vertical shift of 1 unit down
After the minimal change in vertical shift of 1 unit down, this figure 8 best represents the correlation of the data points. The equation for the refine function becomes:
Discussion and evaluation of final model function
There are many positive aspects of viewing figure 8 as an accurate function of modeling the data. First of all, the function does indeed come extremely close to many of the points such as (1932, 197), (1952, 204), (1964, 218), and (1968, 224). Also, there are points which are very close to the function such as (1948, 1948), (1956, 212), (1960, 216), and (1980, 236). The function passes very closely with 8 out of 11 points in the data, which shows the model function is very accurate. On the down side, there are some significant fluctuations in the data such as the point (1936, 203) which does not match with the trend. Also, the specific point (1976, 225) is much lower than the trend line, representing the limitations of the model function. These two points will be considered as anomalies because they are located further away from the trend in relation with the other points. This may not even be considered a limitation because the model function actually balances the anomalies as there is one above the model function and one below it, showing the model is an average of those points. All in all, a function will never pass through all the points perfectly. However, the final function in figure 8 accurately models the original data given
Comparing the refined model function with technologically derived function
In order to compare both functions, the algebraic approached function and the regression model, they must be laid out together on one graph. The programme that is used to model the regression is Graphmatica.
Figure 9
Figure 9 shows the two functions, the one in blue is the self derived one and the one in red is regression model given by Graphmatica
In many aspects, there are similarities and differences between the model function and the regression function by Graphmatica. The two models represent the data through different perspectives. In order to compare the two model functions, the one derived through algebraic calculations has an exact b value. However this needs to be converted to approximate degree in order to allow easier comparison between the two models.
Algebraic derived function :
Converted to approx. Degrees :
Regression by Graphmatica: h = 12.871 sin (0.0818t + 3.0456) + 213.5766
There are many similarities, solely based on the visual representation of the two model functions. Both models begin with a negative slope and hit a minimum point relatively the same area. Afterwards, they both increase in slope and eventually hit an inflection point turning into a concave down curve. Now that a similar curvature is established between the two models, a deeper analysis will take place. Visually, both models lay very closely on top of each other from the years 1945 all the way to 1967. Before and after these years they two functions follow a different trend.
Variation of amplitude is evident between the two models. The amplitude of the regression is much less than the amplitude of the created model function. The reason for this is that the regression function only models the absolute average of the data points. It considers majority of the points as the general trend. However the refined function that was created considers every data point and has been designed to represent the overall average not just majority of the points. The amplitude difference between the two functions is approximately 6.63 (a2-a1 = 19.5-12.871). With specific reference to the last data point, (1980, 236) the regression model does not accurately represent the data. In the same case, the refined model is able to represents this point since it has greater amplitude.
Another difference between the two model function is the difference of horizontal stretch. The refined model function has a greater stretch then the regression function. The stretch is based on the period of a sine wave and the regression function has a shorter periodic distance then the algebraic function. In the algebraic function, the stretch is created in order to have the maximum point at the height points of data point and the minimum point at the lowest data point. The regression model is limited because the stretch only represents majority of the average data and not the overall image of the correlation. For example, the regression model cannot pass through the first point (1932, 197) and the last point (1980, and 236) at the same time since it is not horizontally stretched enough. Also, there is a difference in the values for horizontal shift in the equations simply because there was a different approach of obtaining the c value.
Lastly, there is a difference between the vertical shifts of the two models. The vertical shift in the algebraic function is slightly higher than regression model. The algebraic function better represents the data because the vertical shift accommodates both the highest points of the data and the lowest points. The algebraic model is approximately the midpoint between the maximum and minimum allowing the function to accommodate all points of the data. The regression is much lower and does not represent the last coordinate of (1980, 236).
Overall, the regression function emphasizes more on the average trend of the data and the algebraic refined function models the overall correlation. In comparison with the regression model, the derived model more accurately represents the trend of the data.
Extrapolation and interpolation of data
Knowing the Olympics were not held in the years 1940 and 1944, the gold medal heights that would have been achieved will be estimated through the use of the algebraic function.
When t = 1940
h (1940) = 19.5 sin {
/48 [(1940) – 1962)]} +215.5
≈196.167
≈196
Therefore, the approximate gold medal height achieved in the year 1940 would be 196 cm.
When t= 1944
h (1944) = 19.5 sin {
/48 [(1944) – 1962)]} +215.5
≈ 197.484
≈ 197
Therefore, the approximate gold medal height achieved in the year 1944 would be 197cm.
When t = 1984
h (1984) = 19.5 sin {
/48 [(1984) – 1962)]} +215.5
≈ 234.8332
≈ 235
Therefore, the approximate gold medal height achieved in the year 1984 would be 235 cm.
When t = 2016
h (2016) = 19.5 sin {
/48 [(2016) – 1962)]}+215.5
≈ 208.0377
≈ 208
Therefore, the approximate gold medal height achieved in the year 2016 would be 208 cm.
The interpolated data adjusts with the trend of the overall data points. Through mathematical calculation, these interpolated data makes sense because they follow the trend of the data. The 1940 is the lowest point in the data (196 cm) and 1944 is slightly higher at 197 cm, correctly representing the trend. The extrapolated points also follow the trend of the data, as the year 1984 has a height of 235.
Figure 10 shows the interpolated and extrapolated data with the model function
This makes sense because the correlation of the data show a slow decrease in slope and therefore the year 1984 is near the maximum point. Furthermore, the year 2016 is a very far extrapolation from the given data points. However, the height of 208 in 2016 would be approximately correct since the data indicates a change from upward concavity to downward concavity. This would mean that in the near future from 1980 the height of 2016 would be less than 236. Which is exactly what would happens since the height of 2016 is 208 cm which is less than the height of 236 in the year 1980. In conclusion, the estimates of the interpolated and extrapolated data make sense according to calculations plus trend of data.
Additional data is plotted of the Olympic Games from 1896 to 2008.
Table 2
Table 2 shows the gold medal heights achieved previous to and after the years from the original data from Table 1.
Figure 11
Figure 11 shows the algebraically derived function with the addition set of data from Chart 2.
Evidently, the model created does not accurately fit the additional data as visible in figure 11. The model was created solely based on the first set of data given in Chart 1. That is the reason why the model function does not match the additional data, since it had no relation with the data from Chart 2. Through mathematical calculation, the parameters of the sinusoidal function had been calculated to represent the first set of data. This causes the model to solely represent the primary data and the model will not effectively accommodate with any additional data.
The overall trend of the addition data from 1896, to 2008 also seems as if it can be represented by a sine function. From 1896 to 1948 there is a positive correlation with a very small slope. However the coordinate (1904, 180) is a significant fluctuation as it does not follow the general trend of a positive correlation. Also, the coordinate (1936, 203) is also a significant fluctuation from the positive correlation as it has a greatly increased height from the trend. Furthermore, from the years 1952 to 1968 there is not only a positive correlation but the slope is much greater from the previous years. The two succeeding years prove to be fluctuations as they are located a greater distance from the trend. Lastly, the years of 1980 to 2008 demonstrate a decreased slope, coming closer to zero. This would indicate that the trend hits a maximum point as it has a change of concavity around the coordinate (1956, 212). The overall trends in figure 11 show a constant increase in height as the year progresses. However, as each section of the data has been analyzed, the sinusoidal curve would still best model the trend of the data.
Modifications can be made to the algebraic model function based on the first set to data to best fit the additional data as well. The amplitude needs to be increased to adjust with fluctuation in wide range of heights that are achieved by gold medalists. There also needs to be a significant change in the period because the maximum and minimums are located much further apart with the additional set of data, thus requiring a greater horizontal stretch. Afterwards, a horizontal shift will remain relatively the same as equal amount of Olympic Games have been added before and after the original set of data. The vertical shift will be reduced by a little because the previous set of data from 1932 is much less in heights than the increase in height after the year 1980. Therefore the average of the vertical height will drop allowing for a smaller vertical shift. Through these modifications the new function would represent the addition data much more accurately.
This investigation has contributed in further understanding the detailed curvature of various functions and using it to model data. The functions have several parameters which affect it in various ways, either horizontally or vertically. I have learned how to algebraically approach in creating a function to model specific set of data. Upon deriving an algebraic function, I compared it to a regression model using computer software which helped me learn the similarities and differences of different methods in approaching a best fit model. I learned how to interpolate and extrapolate using a derived function and was able to evaluate its accuracy by making connecting with the trend of the data. Lastly, I learned that a model function my not adjust with addition data. This occurs because the derived function was solely based on the first set of data and new parameters would need to be set for additional data. Overall, many previous concepts were revived and new ideas were learned from conducting this investigation.