Further investigation
Although you can see from graph fig. 1 that there is correlation, you cannot measure it from just the graph. Therefore I will use Pearson Product Moment Correlation.
Pearson Product Moment Correlation
First of all, I got all my data that I used to make the graph into two columns on a new spreadsheet. I labelled them “a” and “e”. I then calculated “a²”, “e²” and “ae”.
Here is my data:
Calculations:
n = 32
Σa = 387300
Σe = 237
Σa² = 5809458546
Σe² = 2211
Σae = 3505658
ā = Σa ÷ n
= 387300 ÷ 32
= 12103.125
ē = Σe ÷ n
= 237 ÷ 32
= 7.406 [3dp]
SD of a = √ Σa²/n - ā²
= √ (5809458546/32) - 146485634.8
= √ 181545579.563 [3dp] - 146485634.8
= √ 35059944.763 [3dp]
= 5921.1439 [3dp]
SD of e = √ Σe²/n - ē²
= √ (2211/32) - 54.849 [3dp]
= √ 69.093 [3dp] - 54.849 [3dp]
= √ 14.245 [3dp]
= 3.774 [3dp]
Covariance = Σae/n - āē
= 3505658/32 - 12103.125 x 7.406
= 109551.813 [3dp] - 89635.744 [3dp]
= 19916.069 [3dp]
r = Covariance ÷ SDa X SDe
= 19916.069 ÷ 5921.1439 x 3.774
= 19916.069 ÷ 22346.397 [3dp]
= 0.89124295652441436609136724928043
= 0.891 [3dp]
Analysis
This shows in detail that there is very high correlation.
Answer/conclusion
I have found that there is correlation between the original price of a car, and its insurance group. I have found from a scatter graph that it has high positive correlation. I then investigated further, and by using Pearson Product Moment Correlation I found that it has 0.891 Pearson correlation, which is very high.
Other Variables
Colour: Blue vs. Red
Question
Is there any correlation between the colour of a car, and what insurance group it is in?
Hypothesis
I predict that there might be some difference between the average insurance groups of blue cars and red cars, because of the types of cars made red, and the types of cars made blue.
As I have already found out that more expensive cars are in higher insurance groups I predict that red cars will be in higher insurance groups because more expensive cars tend to be red rather than blue, for example, Ferrari cars are more usually red, and red Ferrari’s are more expensive.
However, cheaper blue cars may be unreliable and/or dangerous, therefore in higher insurance groups, but this would not fit with the conclusion of the previous investigation.
Sampling
I went back to my original data, and extracted all the blue and red cars from it. I chose blue and red because there are more blue and red cars than others, so I would get a better result.
Processing Data
I then calculated the mean of the insurance groups of blue cars, and of red cars.
Mean blue = Σblue ÷ n
= 191 ÷ 21
= 9.095 [3dp]
Mean red = Σred ÷ n
= 167 ÷ 23
= 7.261 [3dp]
Graph
Analysis
This shows that blue cars are in higher insurance groups than red, in this sample.
Answer/conclusion
I have found that my hypothesis is true to some extent, and also false to some extent. It is true that colour effects insurance group in this case, but it’s not true that red cars are the ones which are in higher ones. However, I did not analyse the whole population of data, only the small selection of “Gary’s Car Sales” which may have particularly expensive blue cars.
Previous owners
Question
Is there any correlation between the number of previous owners of a car, and what insurance group it is in?
Hypothesis
I predict that there will be a fairly strong relationship between number of previous owners and the insurance group a car is in, and that the higher the amount of owners, the higher the insurance group. Insurance group was affected by original price, so it will probably be affected by current price too, and the number of owners affects the price. The number of previous owners may have something to do with the quality or reliability of a car. If a car that’s not very old has a large number of owners, it raises the question of why. Why would people want to keep selling this car? It may be because the car is unreliable, or not be good quality. It may be just that the owners didn’t like the car for other reasons, such as style, and they made a mistake purchasing, but this investigation will find out.
Sampling
I randomly selected 30 items of data from the data that I have. To do this I used a macro to work out a random number from 1 to 7, and then selected the data that that number corresponds to, and every 3rd item below. I removed this data and put it in a separate sheet, and I will order it from least amount of owners to most.
Processing data
After ordering the data in order from smallest no. of owners to most, I removed incomplete data [for example, one entry had the insurance group, but not the actual number of owners]. I then calculated the mean of the insurance groups of cars with 1, 2, and 3 previous owner(s).
Mean 1 = Σ1 ÷ n
= 129 ÷ 17
= 7.588 [3dp]
Mean 2 = Σ1 ÷ n
= 72 ÷ 11
= 6.545 [3dp]
Mean 3 = Σ1 ÷ n
= 57 ÷ 6
= 9.5 [3dp]
Graphs
Analysis
Look at the first graph, this data shows at face value, that the number of previous owners does not affect the insurance group in a linear fashion. However, we only have 3 groups to work from. Also, there is a fairly big jump from 2 owners to 3 owners, so maybe there would be a constant, linear increase, but at 2 owners, there is a dip. This, however, is only speculation, and we cannot be sure. When looking at the second graph though, it appears that the reason for the differences in mean average is because of anomalies. If we exclude these, then we get this graph:
This data suggests that number of previous owners does not affect insurance group, or if it does, in a very small way. The trendline shows a small increase, but the gradient of this line is very small, approx 0.4. Here’s is a chart of the new mean averages.
This unfortunately does not differ much in shape to the first mean average chart, but the line graph has given us a reasonable answer.
Answer/conclusion
Again, the result is not conclusive, but the second line graph, with anomalies removed, suggests that the number of previous owners slightly affects the insurance group of a car. Again, we have only taken a sample, and not the whole data we had to begin with. However, the mean average data does not fit with this conclusion. With the data I have, I am not able to draw a clear conclusion, but it is feasible to speculate that there might be a slight affect on insurance group by the number of owners, because of the line graph data I have, and that the mean average chart shows a sharper increase at the 3rd owners mark.
Total conclusion
The original question that I asked was “is there any link between original price and insurance group, and if so, is it affected by other variables?”
For an answer I could say yes. It is clear that there is a link between original price and insurance, as there is a very strong Pearson correlation - 0.891. Also, from the data that I had, insurance is affected by colour and possibly by no. of previous owners. The average insurance group was higher for blue cars than red, so it appears that this does in fact affect the insurance group. This may just be chance, and if we took the entire population of blue and red cars, there might be a different result. However, this is just speculation, and from the data we have, insurance group is affected by colour. When I investigated the number of owners against insurance group, the answer was not as apparent, and there wasn’t a clear answer. However, because of line graph data, and part of the mean average data, it is possible that there is correlation, but the data is not conclusive. So, to finally conclude, there is a strong connection and high correlation between original price and insurance group, there is some connection with colour, and possibly a small connection with the number of original owners.