- Statistical Analysis: Findings
The Findings of the report has been dived into 3 parts:
- The overall distribution of the house prices in the survey; this take into account all the house price within the 5 township without distinguishing for any other factor such as bedrooms and bathrooms number or size.
- An examination of the proportion of the houses with a pool. This proportion was then investigated in relation to the garage and in the 5 townships.
- An investigation of possible factors affecting the price such as the presence of a pool, the relationship with its size, the possibility of a link with the desirability and the distance to the nearest large town.
4.1 – Overall Distribution of the house price
By looking at Graph 1 it appears that the overall distribution is symmetrical. This is confirmed by comparing the value of the mean with the median: as the two figures have approximately the same value, it implies that the distribution is roughly symmetrical. The mean is the sum of all value divided by the date set, 100.
The distribution of house price varies between a minimum value of $127,70and a maximum of $284,00with a range of $156,30. However, 25% of the houses have a price between the minimum value of $127,70 and the value of the first quartile $179,93. 25% of the houses have a value between the third quartile $221,15 and the maximum value $284,00.
The graph clearly shows that there is a higher concentration of houses with a price between Q1 ($127,70) and Q3 ($221,15). These represent 50% of the overall distribution.
The value of the standard deviation indicates how spread are the data in respect to the mean.
4.2 – Examination of house with a pool
As shown in the Graph 2 above, 55% of the houses (that represent 55 out of 100 houses in the data set given) have a pool. In the Minitab output the percentage equal the count because it is out of 100. As a result, 45% of the houses analysed does not have a pool.
The Graph 3 shows the proportion of the houses with a pool and a garage. By looking at the table it is clear that the majority of the houses with a pool have also a garage, with 58,18% (32 out of 55 houses with a pool); while 41,82% (23 out of 55) houses with a pool do not have a garage.
However, for houses without a pool, the proportion of houses without a garage is higher than houses with a pool where 82,22% (37 out of 45 houses) do not have a pool nor a garage.
It is evident from Graph 4 that the proportion of the houses with a pool is not the same in all the 5 townships.
In township 5, all the houses (100%) have a pool; followed by township 4 with a 94,4% of the houses. On the other extreme there is township 1 with only 13,33% (2 out of 15 houses) have a pool, followed by township 2 with 22,22% (6 out of 27 houses). As table 4 shows the proportion of houses with a pool are in ascending order with the number of township: township 1 has the lowest percentage and township 5 has the highest. This could be a coincidence.
However, on the overall distribution, township 4 has the highest percentage of houses with a pool, with 32,73% (18 out of 55 total house with a pool).
4.3 – Investigation of Factors affecting the house price
The box plot clearly shows that the overall distribution of the price for the house with a pool is higher than houses without a pool. By comparing the data from table 5 it is clear that all the values relative to the distribution (mean, median, min, 1st quartile, 3rd quartile and maximum) for houses with a pool are higher. This leads to state that the houses with a pool are generally more expensive that houses without. Moreover, by comparing the mean and the median value for both groups, it is possible to identify that the distribution for the houses without a pool is skewed to the left or negatively skewed. This indicates that there are a few extreme low values that pull down the value of the mean. However, the * indicate that there is also an extreme high value of $250,20. By comparing at the same way the houses with a pool it emerges that the distribution is roughly symmetrical because the value of the mean and the median are very close.
Another important consideration about the distribution is given by the quartiles that in the graph are represented by the lower and higher limits of the boxes. The 1st quartile of the houses with a pool ($195,90) is higher than the 3rd quartile of houses without a pool ($192,05). This implies that 75% of houses without a pool has price similar to the lowest 25% of houses with a pool.
However, the standard deviation measures how spread the data set is. The houses with a pool have a higher standard deviation, which imply that they have a more variable set in which each value is more “distant” to each other and to the mean while they are slightly more concentrate for the houses without a pool. By comparing the values of the range and inter-quartile range in relation with the standard deviation, it is clear that the houses with a pool have a higher dispersion and are more spread out than houses without a pool.
The scatter plot in graph 6 gives an indication that there is a relationship between the house price and the size of the house. The upward trend indicates that there is a positive linear relationship as both variables are moving in the same direction: when the size rises, the price rises as well. In this case it worth to continue investigating the relationship.
However, the point are scattered quite broadly, so it is necessary to analyse the value of r in order to determine how strong the relationship is. The correlation coefficient (0,65) indicates that there is a positive (given by the sign +) relationship, not very strong given by the value being lower than 0,8.
The regression equation is Price = -11,1 + 0,0979 * sqrFt
However, the value of the intercept is not statistically meaningful. This is given by the value T being -0,44 and also because logically a house price cannot be negative. In spite of this, the model is still good because the value T of the gradient (or slop) is statistically significant as T = 8,46. Nevertheless, the slop is very low and it indicates in increment of $0,0979 for each extra sqrFt.
The value of R-Sq suggests that only 42.2% of the house prices are explained by the size. This implies that there are other more significant factors that explain the changes in price.
By eye it is also possible to estimate that the houses with a square feet between 1900sqrFt and 2300sqrFt are more frequent.
However, it is important to consider that this graph takes in consideration the houses over the 5 townships with or without pool and with different bedroom and bathroom numbers.
The scatter plot shows the relationship between the house price and the distance to the nearest large town.
It actually clearly illustrate that there is not a relationship between the two variables.
This is confirmed by the correlation coefficient equal to 0,042. Moreover, as it is explained by the R-sq value, only 0,2% of the house price is related to this relationship.
It is not necessary to continue this investigation any further.
- Conclusion
The following is a summary based on the findings:
- The overall price distribution is roughly symmetrical and there is a higher concentration (50%) of houses with a price between $127,70 (Q1) and $221,15 (Q3). (Graph 1 – Table 1)
- The proportion of houses with a pool is slightly higher than houses without a pool: 55% against 45%. (Graph 2 – Table 2)
- The majority of houses with a pool have also a garage but the highest proportion does not have neither of the two. (Graph 3 – Table 2)
- The percentage of houses with a pool increase with the township with number 1 having a minority of houses with a pool and 5 having 100% of houses with a pool. 3 out of 5 towns have a higher proportion of houses with a pool. (Graph 4 – Table 4)
- Houses with a pool are more expensive that houses without. 75% of houses without a pool have a lower price than the 25% lowest prices for houses with a pool. (Graph 5 –Table5)
- There is a positive relationship between the price and the size of the house; although this relationship is not very strong. Per each extra square feet the price rise of $ 0,0979. There is a higher concentration of houses with square footage between 1900sqrft and 2300sqrft. (Graph 6 – Table 6)
- There is a link between the price and the desirability of a house. However, this relationship is not very strong. (Graph 7 – Table 7)
- The distance between the house and a large city does not affect the price. (Graph 8 – Table 8)
- Recommendations
Based on the above conclusions of the analysis, the following are suggestions for an investor interested in buying a house in one of the 5 townships:
- The most popular and thus more demanded price for a house is between $127,70 and $221,15. For a Luxury house the highest demand would be between $245 and $275. Over this amount the demand is very low which imply that it is very exclusive. It depends by the main aim of the investor.
- There is a slightly higher demand for houses with a pool.
- If the investor decides to buy a house with a pool, it is suggestible to have a garage as well. Otherwise it is more convenient to have a house without any of the two.
- If the house is in township 3 to 5, it is highly recommend to have a pool, especially for the last one.
- The pool will make a huge difference for the value of the house. The value will rise by about 75% if it has a pool.
- The bigger the house is, the more it values. However, houses with less than 1900sqrft are not very demanded. There is a medium demand for houses with a bigger size.
- Desirability scale 6 has a highest average and median price and it had a good demand.
- It is not relevant the distance between the house and a big city.
NOTES:
All the figures used to refer to the price are expressed as thousands of dollars ($ ,000).
In order to determinate the demand, it has been used the assumption that the higher frequency has a higher demand. For example: in township 5 all the houses have a pool. It implies that everybody in that area demand and is willing to buy a house with a pool. Thus there is a very high demand.