• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

The aim of this project is to find out which factors affect the selling price of a house.

Extracts from this document...


Maths Coursework


The aim of this project is to find out which factors affect the selling price of a house. I have been given four districts, and in each there are four streets. In each of the 16 streets there a number of houses which the previous factors effect and result in a different house price. In the data presented I have found several rogue results (results that do not fit in with the rest of the results). If these were to be kept, the results would be biased and so these rogue items must be removed. The rogue items were £475, 000 in house price, 13,000 in squared ft and 20 in number of bedrooms. They are too large to fit in with the other data. The ranges (highest – lowest values) where £161, 800 in house price, 2900 in square ft and 6 in number of bedrooms. 27% of the homes had a large garden, 62.5% had a small one and none was 10%. In Garages 68% had a garage and 32% didn't.

In order to get an idea of the nature of this data I have been given, I will divide Price into suitable groups and draw a histogram. Then I will make a cumulative frequency table and draw a cumulative frequency curve. I will then state my median and the inter-quartile range for my cost, as outliers do not affect them.

Out of the fields given, some of these affect the price of the houses.

...read more.




Modal class is the value, which the highest frequency occurs in.

From looking at our histograms it is clear that there is a definite modal class in both, it is in the £40,000 to £50,000 section, this value is highest in both. The lowest values in both are in the £160,000 to £200,000 bar. The Inter quartile range of the population is lower than that of the sample; this shows the measure of spread is greater in the sample. The greater measure of spread is because the results are more spaced out, i.e. a higher upper or lower quartile in the sample than that of the populations. The layout in both sample and population histograms is very similar proving the sample that has been taken is a good representation of the population.

I then drew box plots for the price.  I found that in Arlington the median was £129,800, therefore on average the house prices in that district are dearer.  In Castlemains there is an even distribution of mid-priced houses.  In Tobermory there are more expensive houses than cheaper ones. Also in Westlake most of the houses are cheap compared to the others.  If the median is closer to the LQ it is positively skewed, in my box plots Tobermory and Castlemains are positively skewed.  Therefore this means the values above the median are more spread out that than below the median.  If the median is closer to the UQ it is negatively skewed.  Westlake and Arlington are negatively skewed in this case; this means the values below the median is more spread out.

The stronger the gradient the stronger the relationship is. Price of a house = in Arlington's case: -

Price  = 18650 times number of bedrooms plus 51900 the answer to the equation would therefore be more reliable with a greater gradient.

I calculated the IQR of each district and found that Arlington an IQR of 21,750 this shows a greater range in prices.  This compared to Westlake with an IQR of 10,675.  This means that the smaller the value the less range of prices in the district. Castlemains had an IQR of 15,300 and Tobermory an IQR of 18,475; again from this we can establish a pattern.

The stronger the gradient, the stronger the relationship is.

Looking back at my scatter-graphs using the R squared value I can determine the correlation of the graphs.  The strongest graph of positive correlation is for Arlington with an R squared value of 0.9272.  The graph that shows the weakest graph of positive correlation is for the Castlemains district.

Using the equation of the line I was able to interpret the missing data. I will demonstrate how I gathered the missing data.  These are 2 examples.

Arlington (house number 154)

Y=18650x + 51900

122400= 18650x + 51900



x= 3.78   x≈ 4

Castlemains (house number 76)

Y=17379x + 11107

Y= 17379 (3) + 11107

Y= £63244

The missing data is presented in the blue.  They are:




House Number

No. Of Bedrooms

House Price











...read more.


My sampling technique was the most appropriate as it was easy to work with, but it was small, if I were to repeat my sampling I would take 60 values instead of 40.  The population was represented well by the sample.

My overall strategy was effective; my only criticisms are that I could have drawn both Histograms on the same graph, as well as both Cumulative Frequency graphs on the same sheet.  It would have been easier to compare to one another using the % Frequency Density/Cumulative Frequency values.  I did address the problem I had hoped.  The limitations were that I could not draw box plots for the number of bedrooms as it is discreet data, instead of a box plot I could have drawn a pie chart or bar chart to represent the number of bedrooms.  Also we do not know about the condition of the house or its interior.  Modifications could have been made to the houses like extensions, central heating fitted, double-glazing or a loft conversion.

If I were to have more time I would investigate the square footage of the house using the same sample and technique I employed earlier.  Then after that I would investigate if a garage affects the price of a house also.

Any house that breaks the trend could be down to its condition or modifications made to its interior or exterior.  More data would have been helpful to gather a clearer overall picture.

...read more.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. I have been given the task of finding what affects the price of a ...

    * Mileage proved very weird. The data was in two groups basically one showing high mileage and low price while the other low mileage and low price. From this I can deduce that the mileage is a limiting factor of used price. * Insurance group showed no correlation with data all over the place, show

  2. Statistics. The purpose of this coursework is to investigate the comparative relationships between the ...

    There are many forms of sampling, though 3 basic ways will be shown here; I will then apply them later on in the coursework to my data to prove my hypotheses. Random, Stratified and Systematic sampling are the three that I will define here: Random: all data have equal chances of being chosen: there is no system in choosing them.

  1. Statistics Coursework

    96.3 225 99.47 21 63.49 62 82.28 103 88.89 144 92.86 185 96.3 226 99.47 22 65.61 63 82.28 104 88.89 145 92.86 186 96.3 227 99.47 23 65.87 64 82.54 105 88.89 146 93.1 187 96.32 228 99.47 24 67.13 65 83.07 106 89.15 147 93.12 188 96.56 229

  2. "The lengths of lines are easier to guess than angles. Also, that year 11's ...

    -2 4 4 45 8 27.8 -19.8 392.04 5 35 22.5 10 12.5 156.25 4.5 40 14 16.5 -2.5 6.25 5 45 22.5 27.8 -5.3 28.09 6 40 30.5 16.5 14 196 TOTALS -19.8 392.04 Now, to find out the correlation I will substitute the values for the year 11

  1. Statistics - My aim is to investigate whether it is possible to gain information ...

    1 CRIPPLED 8 57 8 10 MARS 4 23 8 2 ABOUT 5 119 9 1 LOOK 4 26 29 1 WORTH 5 74 22 5 WONKA 5 24 7 2 YOU 3 111 25 3 YOU 3 138 2 6 I 1 70 23 6 RAN 3 158 27

  2. My aim is to find out if there is :a) Any correlation within ...

    I produced scatter diagrams for all the leagues I sampled and produced these results: 4-Day Division 1: This graph showed me that, as my prediction had suggested, there was negative correlation within my data. I took 24 results from the sample I took and this gave me my results.

  1. My aim is that within the limits of a small-scale survey I will collect ...

    Estimate of the Mean of the population of smarties. The mean is an unbiased estimator, that is, the mean of its distribution is equal to the mean of the parent population. For this reason it can be used as an estimator for the mean of the population of smarties.

  2. Throughout this experiment I have decided that I am going to investigate the tensile ...

    The plastic behaviour of the material is due to the dislocations in the poly crystalline structure of the metal moving. These dislocations or gaps in the structure can move along by process of slip. As more and more of the dislocations move and form a tighter crystalline structure then the material will become more brittle.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work