GCSE Statistics Project

DATABASE

I am going to use the Mayfield High School Database which is secondary data for my investigation.

HYPOTHESIS

- Blonde girls are more intelligent than non blonde girls.

- Blonde girls that have a higher IQ watch comparatively less television. This will not be the case however for non blondes as there will be little or no correlation.

- The IQs for non blonde girls and blonde girls are normally distributed.

PLAN FOR DATA COLLECTION

I shall collect data the relevant data to my hypothesis first separating by Gender and then gathering relevant information on IQ, Hair Colour and Number of Hours of Television viewed.

I shall use a sample size of 50 blonde girls and 50 non blonde girls as I feel that a sample size of 100 offers a good proportion to analyse being that there are roughly 1,182 entrants in the database. Also 100 is a good round even number which will make calculations of the data easier.

I will use stratified sampling so that a there is not a disproportionate number of any particular group and so that the different year groups are properly represented.

I shall then begin on my first hypothesis –

Blonde girls are more intelligent than non blonde girls.

To investigate this hypothesis I need to present and analyse the two sets of data using various graphs and calculations. As I am dealing with 2 ranges of values which are not related I will need to use relevant techniques to demonstrate and investigate, in terms of the whole groups, which hair type is more intelligent than the other. Problems could occur as different results from the different parts of the investigation could appear to contradict each other and therefore I will have to choose which one is best to follow. For example: if one should measure the spread of data in terms of the middle 50% and compare that or if one should measure the full 100% and include the outliers which could cause an inaccurate and therefore wrong conclusion. Also, different styles of graph will focus on different parts of the data and forming a general, all encompassing conclusion and thus proving/disproving my hypotheses could prove difficult.

For this I will use:

- Box and Whisker Diagram – With this I will be able to calculate the mean, the inter-quartile ranges (IQR) and find any possible outliers.

- Standard Deviation – To calculate the spread of the data.

- Grouped Frequency Tables – So I can do Cumulative Frequency Diagrams etc.

- Cumulative frequency – Measuring the spread of the data, can compare this against Box and Whisker diagrams.

I will then continue my investigation with my second hypothesis:

Blonde girls that have a higher IQ watch comparatively less television. This will not be the case however for non blondes as there will be little or no correlation.

I am now dealing with two sets of variables (television and IQ) which are related (Bivariate data) and therefore different methods will have to be employed. I will again have to use relevant data, techniques etc to present and prove/disprove my hypothesis. Again I will have to make all of the previously mentioned considerations as to what methods of analysis are the most effective.