For our GCSE statistics coursework, we were given the question Where are houses most expensive? To answer this question I have posed the hypothesis House Prices for 3 bedroom detached houses in the North of England are cheaper than those in the s

Mubeen Uppal Statistics coursework. Mr Fisher.

Statistics coursework

Hypothesis: ‘House Prices for 3 bedroom detached houses in the North of England are cheaper than those in the south of England. Therefore the south is a more expensive region’.

Hypothesis and strategy

For our GCSE statistics coursework, we were given the question ‘Where are houses most expensive?’ To answer this question I have posed the hypothesis ‘House Prices for 3 bedroom detached houses in the North of England are cheaper than those in the south of England. Therefore the south is a more expensive region’. I chose this hypothesis as the south stereotypically have been known for having more expensive houses and therefore its occupants enjoy a higher standard of living than those in the north. I need to gather evidence to support my hypothesis or not to support my hypothesis. Also I have chosen a 3 bedroom detached house as this seems to be the average household that the bulk of the English population live in.

I will gather evidence to help my investigation by doing the following:

Firstly I will collect my data, 30 pieces from the north and 30 from the south, from different counties in the north and south.
From the data I have collected I will produce a histogram to determine the shape of the distribution which is important as this will then show me the best measure of average that needs to be used. If the histogram shows a normal distribution then I will use the mean as well as the standard deviation and if it is a skewed distribution then I will use the Inter Quartile Range (IQR) and the median.
Then I will make a box and whisker plot I will do this as this is a clearer indication if there is there is a positive or negative skew as well as clearly showing the median, IQR and range of the data and if there is any outliers within the data set. Also this is a really good way of comparing to sets of data.
For my calculations I will be doing outlier calculations, standard deviation and finally parsons measure of skewness.
Then I will go on to conclude the investigation.

Data Collection and sampling.

This is a preview of the whole essay

Data Collection and sampling.

I will be collecting secondary data, data that has already been collected and I will get this from the internet, the reason for collecting secondary data is that there is not enough time to collect to collect primary data and also because secondary data from is reliable as well as up to date. I used several websites one being , this website gave the postcodes for specific counties, so I could just type in the postcode into which also has a specific search feature which I put into a ‘three bedroom detached house’ this narrows down all the search to ‘three bedroom detached houses’ within that area. As well as using Wikipedia to find which counties are in the North/South of England. The advantages of ‘right move’ were that the specific search meant only the necessary prices came up.

Systematic sampling

I used a systematic sample to determine which counties I would use to find the postcodes. A systematic sample works like this:

Say if the population size was 200 (this is an example) and you needed a sample size of 50 you would divide 200/50=4. So therefore you would start with a random number and use every fourth number in that sample. In my case I numbered the counties in the north and south of England, there were approx 23 in the south, so I did the following equation: 30(the sample size I needed) / 23(number of counties approx) = 1.2 (approx). 30/23=1.2. This meant I chose 1 postcode from each of my registered counties and from every 5th county I took an extra postcode. From the north I did the same equation but I used 20(approx) counties, so the equation this time was: 30(sample size needed) / 20(number of counties approx) = 1.5. 30/20=1.5. This meant I took 1 postcode from every county and an extra postcode from every second county. To find the house prices I would use, I just used a random sample; I generated the numbers I would use for the random sample using the random number generator on a calculator. I used the systematic sample for choosing the counties’ as it is easy, quick and fair way of sampling, I used this method instead of stratified sampling as this is a simpler sampling method and it is less time consuming. I used a random sample to get the house prices as this is the fairest sampling method as each house price within the search had a fair chance of being represented in my data set.

Histograms

I used a histogram because histograms are a good way to see the measure of spread as well as the skewness of the data. They give a good visual aid to see the spread of the data.

House prices for the North

House prices for the south.

As you can see from my above histograms; my North histogram appears to be quite a strong positive skew, while my south histogram appears to be a weak positive skew. The range of my south data is larger than the range of my north data. South: 799,950-285,000=514,950. North: 550,000-87,000=463,000. This supports my hypothesis as the south has a higher range of data. Also the South’s modal class is 300,000 – 450,000, as opposed to the North’s which is 200,000 – 300,000. This shows the South’s prices are higher and more prices are at a higher price, which also supports my hypothesis that the south is a more expensive place to live.

Pearsons measure of skewness.

Pearsons measure of skewness’ calculation.

3(mean-median) / standard deviation = If it’s a negative number its shows a negative skew.

If it’s a positive number then it’s a positive skew.

South

3(482,958-487,500) / 135231.5664 = -0.10. This actually shows a negative skew for the south of England house prices. The reason this may not have come up clearer on my histogram is because the class interval widths were different, so this might explain why my histogram didn’t show a negative skew at first.

North

3(236,959.83-219,498) / 104055.7943 = 0.5. This shows a positive skew, for house prices in the north of England, as my histogram predicted.

Box plots

North

LQ=£176,237.50

UQ=£288750

MEDIAN=£219,498

IQR=£43,260

South

LQ=380,000

UQ=570,000

MEDIAN=487500

IQR=190000

I used box plots as from my Pearson’s measure of skewness calculation I found a skewed distribution, therefore to display my skew clearly I have used a box plot, this also shows the median, UQ, LQ and range very clearly. From the above diagrams you can clearly see that the house prices in the south have a higher price as the range is bigger also the median is clearly larger as well as LQ of the South being higher than that of the UQ in the North. This supports my hypothesis that the south is a more expensive place to live. However the box plots shows that the house prices in the North are closer together than those in the South, we can see the IQR of the North is smaller than the IQR of the South.

Outliers in my data set

South

Now I am going to calculate if there are any outliers within my dataset, to find out if there are any outliers I will perform the calculation:

Xi: LQ-(1.5*IQR) and xi: UQ+ (1.5*IQR). If there are any pieces of data that are outside of the range of these calculations they will be outliers within my data set. The reason I am doing these calculations is because these are the specific outlier calculations for a skewed distribution, we know I have a skewed distribution because this is what my box and whisker plot indicated.

Outliers in the south dataset:

Xi: 380,000-(1.5*190,000) = 95,000. This means anything below 95,000 in my data set would be classed as an outlier. But there are no pieces of data below this limit.

Xi: 570,000 + (1.5*190,000) = 855,000. This means anything over 855,000 in my data set would be classed as an outlier. Again there are no pieces of my data that are outside this boundary.

To conclude this section there are no outliers in my south house prices data set.

North

Again I am going to calculate if there are any outliers, but this time within my north house prices data set.

Xi: 176,237.50-(1.5*112512.50) = 7468.75. This means anything below this number would be classed as outlier in my data set. But there are no pieces of data lower than this value within my data set.

Xi: 288,750 + (1.5*112512.50) = 457,518.75. This means anything over this price would be classed as an outlier in my data set. And there is one piece of data that is above this limit, this piece of data was a 3 bedroom detached house costing 550,000 in Cheshire. So there is one outlier within my north house prices data set.

There is one outlier within my north data set.

Median and IQR

The median and the IQR that I got from the box plot, I going to use this measure of averages as they are the ones you should use if you have if you have a skewed distribution. My IQR indicates that the variation of house prices in the North is lower than that in the South. The median is the preferred measure of average for a skewed distribution, and the South’s median is higher than that of the North’s median which shows that the house prices in the north are lower than that in the south, which supports my hypothesis.

For our GCSE statistics coursework, we were given the question Where are houses most expensive? To answer this question I have posed the hypothesis House Prices for 3 bedroom detached houses in the North of England are cheaper than those in the s

This is a preview of the whole essay

Document Details

Related Essays

Rollercoasters. I will use the rollercoaster database to answer the follow...

To test my hypothesis, I will use primary data. I will collect information...

Statistics Coursework. I am going to study the wealth of countries in the...

GCSE STATISTICS/Data Handling Coursework 2008