However, you must also take into consideration that relationships will be different when genders are treated separately.
In order to collect the data, it would take too much time and energy to unnecessarily include every person from the whole school. Therefore, a type of sample is needed. I have decided to take a sample rather than use the whole of the population, as it is quicker to take samples than to collect information from the whole population. Because time is a limiting factor, sampling will help me very much. It is important to choose the sample without bias so that the results will represent the whole population. There are many types of sampling, and I now need to find out which type suits my investigation best.
Random Sampling
In a random sample, every member of the population has a chance of being selected.
- Advantages: Every member of the population has a chance of being selected.
- Disadvantages: Due to its unpredictability, anomalous results can sometimes be obtained that are not representative of the population. In addition, these irregular results may be difficult to spot. For our purposes, there won’t be the same amount from each year and equal amounts of both genders.
Systematic Sampling
In a systematic sample, every member of the sample is chosen at regular intervals from the list.
- Advantages: Can eliminate some sources of bias
-
Disadvantages: Can introduce bias where the pattern used for the samples coincides with a pattern in the population. For our purposes, there is a guarantees representative sample of year groups but not of gender
Stratified Sampling
A population may contain separate groups or strata. Each group needs to be fairly represented in the sample. The number from each group is proportional to the group size. The selection is then made at random from each group.
- This form of sampling will work well for our purposes
Quota Sampling
As with stratified samples, the population is broken down into different categories. However, the size of the sample of each category does not reflect the population as a whole. This can be used where an unrepresentative sample is desirable (e.g. you might want to interview more children than adults for a survey on computer games), or where it would be too difficult to undertake a stratified sample.
- Advantages: Simpler to undertake than a stratified sample. Sometimes a deliberately biased sample is desirable
- Disadvantages: Not a genuine random sample, and is likely to yield a biased result. For our purposes it is not very reliable because it depends on the interviewer to choose the sample
Cluster Sampling
Used when populations can be broken down into many different categories, or clusters (e.g. church parishes). Rather than taking a sample from each cluster, a random selection of clusters is chosen to represent the whole. Within each cluster, a random sample is taken.
- Advantages: Less expensive and time consuming than a fully random sample. Can show "regional" variations.
- Disadvantages: Not a genuine random sample. Likely to yield a biased result (especially if only a few clusters are sampled).
After looking at all of the advantages and disadvantages of each types of sampling, I have chosen to use stratified sampling, as this form of sampling will work well for our purposes. The reasons are stated above.
As I have now decided on my line of enquiry and type of sampling, I now need to decide how big my sample size will be. As different sizes of sample will affect the reliability of my results and conclusions, it is imperative that I make the correct choice when deciding the size of my sample.
The bigger a sample, the more useful the data will be. I you select a lot of people, your results will be closer to the actual results for the whole school. However, if you choose too many people the data becomes too difficult to analyze and takes too long to collate and sort. 5 – 10% is usually a fair representation of population, so I have decided to use a 9% sample, which is 54 people. In my opinion, I think this will be a good representation of population and is also a reasonable figure to manage.
When collecting my data, I need to check for outliers and anomalies. I will need to check my sampled data for untypical values which appear to lie outside the general range. (E.g. weight: 1kg/600kg and height: 0.01m/10m) Once I present my results in a graph it will be easy to see where the outlier resides:
If these outliers were included in my calculations or graphs they would distort the data, disrupt the correlation of graphs, and therefore effect my conclusion, and whether or not my hypothesis is correct. This is why it is crucial that I disregard any information that is blatantly incorrect.
Sampling Method (In Detail)
In order to produce my results, I need to know how my sampling method works.
- Count boys and girls per year group
- Work out sample size
- Find the fraction of pupils in each year
- Find how many people there are in each year out of 54 (9% sample)
- Use same method to calculate amount of girls and boys in each year for sample
- Use random sampling to choose correct number of boys and girls per year group and enter results in tables
- Identify and anomalous data/outliers. Reselect data item
Mathematical Techniques
In order to thoroughly analyze and evaluate my data, there are many mathematical techniques, diagrams and graphs I will need to use. Here is a list of them:
Diagrams:
-
Histograms – A histogram is constructed from a The intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a rectangle located above the interval.
-
Box Plots - A box plot provides an excellent visual summary of many important aspects of a distribution. The box stretches from the lower quartile to the upper quartile and therefore contains the middle half of the scores in the distribution. The is shown as a line across the box. Therefore 1/4 of the distribution is between this line and the top of the box and 1/4 of the distribution is between this line and the bottom of the box.
-
Scatter Diagram - A type of diagram used to show the relationship between items that have two numeric . One property is represented along the x-axis and the other along the y-axis. Each item is then represented by a single point.
-
Cumulative Frequency Graphs – A cumulative frequency graph can be used to estimate some useful statistical measures.
-
Line Of Best Fit - Single line drawn through a series of data points as a best representation of the underlying trend. Can be a straight line or a curve.
Calculations:
- Mean
- Mode
- Median
-
Mean & Modal Class for Grouped Continuous Data – This calculates the mean for grouped continuous data.
-
Interquartile Range - The distance between the upper and lower quartiles. As a measure of variability, it is less sensitive than the standard deviation or range to the possible presence of outliers. It is also used to define the box in a box-and-whisker plot.
-
Standard Deviation - It is the most commonly used measure of spread.
-
Normal distribution - Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped.
-
Spearman’s Rank Correlation Coefficient - The Spearman's Rank Correlation Coefficient is used to discover the strength of a link between two sets of data.
-
Equation of Line of Best fit – Equation of line that shows underlying spread.
Collecting the Data
In order to find my results, I will need to sort the data and put it into tables. As I am using stratified sampling, I have had to count up the amount of boys and girls in each year and work out my sample size. Once I have done this, I will record my results in two separate tables (one for males, one for females), in year order. From there, I will then create separate tables for each year and then create 1 large mixed table. After I have finished sorting out the tables, I will then do various scatter diagrams. Firstly, one for males one for females, mixed and then one for each year (for both mixed and separate genders).
Finding the Results
As I have previously stated, I have decided to use a samples size of 9%, which in total is 54 people. I now need to apply that information to the investigation and work out my sample for each year, gender etc.
Data:
Sample size : 9% of 600 = 54
Now, I have to calculate how many pupils to examine within each year, because each year group varies in total amount of students. I will calculate the proportion of pupils from each of the year groups.
Stratified Sample:
Due to rounding, my sample size has been adjusted from 54 to 55. Given as a percentage, this would be:
55/600 x 100 = 9.166666667
= 9.2%
I now need to randomly select, within the specified year and gender, the designated amount for each category. I will do this by using the random function on my calculator. I need to make sure the results are random, so that they will not be biased. Once I have done this, I need to check for any anomalies in my selected pupils’ weight/height.
Results
Organising My Results
Although I have already presented my results into 2 separate tables, one for each gender, the results are not concise enough. In order to fully analyse my results, I will need to put my results into scatter diagrams and histograms etc. Therefore, my results need to be grouped into around 5-8 groups, which are the same for both genders. This is because when I put my results into the scatter diagrams (etc), I will need to compare both genders, thus requiring me to use the same groups for both sexes. Once I have chosen my groups, I will enter the information into the frequency tables and use those for me histograms and scatter diagrams.