# AS and A Level: Probability & Statistics

## Statistical diagrams

1. 1 When working with grouped data, if the class is from 9 to 12, this includes values from 8.5 up to 12.5, which means the class width is 4, not 3.
2. 2 In a histogram, the area is the frequency. The y-axis is the frequency density.
3. 3 When working out lengths of scaled histograms, it is always helpful to draw the rectangle and label the relevant sides with the lengths given.
4. 4 When drawing a stem and leaf diagram, make sure to include a key. The key is worth a mark. For example 2|1 represents 2.1 or on a different stem and leaf, 3|2 represents 32.
5. 5 Always draw a scale when drawing a box plot, the scale is worth a mark.

## How to tackle questions on regression and correlation

1. 1 When asked about the relationship in a regression model, always get the context the correct way round. For example, weight does not affect height, height effects weight.
2. 2 When asked if your answer is reliable for the regression model, comment on whether the x value you used to get the answer is within the original data set. If the x value is within the boundary it is suitable. Never extrapolate when using a regression model.
3. 3 If you have found a regression model for a relationship between h and p, and are then told h=x+100 and p=y-20 and asked to find a regression model for x and y. Sub x+100 and y-20 into your original equation and re-arrange.
4. 4 When data is coded the correlation co-efficient is not changed.
5. 5 If a regression model is created using for example, heights and weights of children. This model could not be used to predict the weight of an adult. Models are very specific to the data with which they were created..

## Normal distribution

1. 1 When answering normal distribution questions always draw a picture and shade in the part of the graph that you know and/or want.
2. 2 In a normal distribution, the area under the curve represents the probability.
3. 3 A normal distribution model is appropriate if the mean and median are the same, or very close.
4. 4 The big normal distribution table gives area to the left of the line. The small table has areas to the right of the line.
5. 5 If unsure what the question is asking. Do the first step which is to rewrite the question, but converted to the normal distribution.

1. ## The aim of this investigation was to look at the reliability and validity of Hans Eysenck's EPI Test. The EPI questionnaire comprises of items of a 'yes/no' variety.

The critical table value tells us that a value 0.464 and above has a degree of accuracy to 0.05, and a value of 0.622 and above has a degree of accuracy to 0.01. The results of the 'E' and 'N' score for the E.P.I test were highly significant as they both exceeded the value of 0.622 and were therefore accepted, with the Alternative Hypothesis and rejected the Null. Introduction Personality is defined as the totality of a persons attitudes, interests, behavioural patterns, emotional responses and social roles.

• Word count: 3116
2. ## I have been given the task of finding what affects the price of a used car, using a spreadsheet given to me displaying data on a hundred cars with data on about each car.

It then dawned one me that I could use the depreciation price, the price when I took the used price away from the new, this perhaps could be a more accurate look at the data as some cars depreciate quicker than others. Looking further into that work I decided against it as it would take longer and time was of the essence, but this was perhaps an extension that could be added on at the end. Reasons Why * Age: Has a large range and would be interesting to see what sort of relationship there is * Insurance Group: Again a wide range.

• Word count: 2263
3. ## The b**b Calorimeter

How does it work? �The sample is placed in the sample cup �The ignition wires are placed in contact with the sample �An oxygen cylinder is connected to the b**b �The precision thermometer and stirrer are fixed in place �Exactly 2000cm�

• Word count: 268
4. ## Study of the height/diameter ratio of limpets inhabiting the middle shore region of exposed and sheltered shores

in order to resist wave attack and predators. When the tide rises and covers these molluscs, they move around and feed on algae before returning to their rock scar. Limpets have an opening underneath the shell where a muscular foot attaches the mollusc to the rock by means of suction and glue like adhesion. The clamping down also prevents them from desiccation. Water is drawn in through a hole above the head; gills will then be used for gaseous exchange. They will most commonly be found in the middle shore, this being the reason why we performed the study in the latter part of the shoreline.

• Word count: 2624
5. ## Investigation into the length of time which men and women can hold their breath for, after taking a deep breath

This involves comparing two sets of data: * A sample of the length of the length of time that the men held their breath after taking a first deep breath ('Deep1' - see data details on page1) * The same data type, this time for the women To make any calculations accurate enough to draw a valid conclusion at least 50 sets of data will need to be taken (ideally 25 males and 25 females), this will be my sample. A random sample would be the fairest way to sample the original data, as it eliminates the possibility of bias.

• Word count: 1820
6. ## I shall collect data from a population in order to estimate population parameters (e.g. and 2) by using estimating techniques.

(According to the Central Limit Theorem). THE STATISTICAL THEORY Central Limit Theorem: i) If the sample size is large enough, the distribution of the sample mean is approximately Normal. ii) The variance of the distribution of the sample mean is equal to the variance of the sample mean divided by the sample size; ?2 n _ Symbolically if X~ (unknown)(�, ?2) then X n ~N �, ? 2 n These approximations get closer as the sample size, n gets bigger.

• Word count: 2386
7. ## Bivariate Data - The aim of this coursework is to discover whether there is a correlation between the heights of people and there shoe size.

(Table shown below) Height (cm Shoe Size (Nearest Integer) All 50 people co-operated and filled in the table as requested. There are weaknesses with this form of data collection. However, asking one person, does not affect the second persons answer. Also, I am relying on the customer's results to be accurate. There was roughly an even number of men and women filling in my chart. Another point to mention is, the majority of the data was collected from adults; there were no children in my population and no elderly people, all by coincidence.

• Word count: 1455
8. ## Year 10 students generally over estimate obtuse angles but under estimate acute angles

Using the same formula I used in my second hypothesis I identified, removed and replaced my outliers. But I did find that although the lower boundary for angle 2 was 141.5 many people in this sample estimated it at 120 which suggested to me that it must have been quite typical so I left these values in. Over the whole sample I replaced 7 lots of information. I then placed the information into a spreadsheet such as the one shown on page ?. This made it easy for me to compare, enter formulas and produce graphs based on the data.

• Word count: 1064
9. ## Guestimate - Is there a link between a person's ability to estimate the length of a line and their ability to estimate the size of an angle?

This difference suggests that there is something worth investigating further. The data we had already collected however was not sufficient to draw a firm conclusion from, as it was only our age group and in the top set. We widened our search for data and asked the whole of year ten, the whole of year seven and some adults working at our school. We asked each person taking part to specify their age group, (year 7, year 10 or adult)

• Word count: 954
10. ## The normal distribution

Mode The value that occurs most often Standard deviation s A measure of the standard (average) deviation of the scores from the mean.The larger the standard deviation the larger the range of values/variation in the data 1. Subtract each score from the mean 2. Times each difference by itself (negs turn positive) 3. Add up all the squared differences 4. Divide the total by the number of scores minus 1 5. Take the square root Standard deviation The normal distribution at right shows the percentage of scores/observations that lie within one, two or three standard deviations either side of the mean.

• Word count: 1115
11. ## Statistics: Survey of Beijing and China during the SARS storm

As the capital of China, it had been developed to an international city. The biggest cultural and political center. Weather is cold and has a big population. But it is the biggest disaster area of China. Aim: � To investigate the relationship(distribution) of death number in March and April during the SARS storm in China mainland. � To investigate the direction of SARS trended. Hypothesis: � I predict the current of SARS started from Guangzhou, which is a southern city to Beijing , which is a northern city. At the prophase of the SARS storm, Guangdong is the factor effect the SARS tainted number of China.

• Word count: 2212
12. ## Mayfield School Statistics - IQ Correlation

Once I had plotted the graph (see appendix 1) I wanted to find the regression line and correlation coefficient. I could see there was a relationship and wanted to get a measure of it. The correlation coefficient is how strong a certain correlation is. Certain calculations have to be done to obtain this one significant number. Example: For example of use see appendix 2. What does r actually tell you? �The nearer the value of r is to 1 the stronger the positive linier correlation between the independent and dependent variables. �For values between -1 and 0 the closer the value of r gets to -1 the stronger the negative linier correlation between the independent and dependant variables.

• Word count: 931
13. ## Statistical Analysis of Facial Proportions

The measurements that I will be taking will be: 1. The width of the mouth and the width of the bottom section of the nose. 2. The length of the proximal phalanges and the length of the central phalanges of the hand. These measurements will then be expressed as a ratio of the smaller to the larger in the form 1:n. Methods of measurements. * Measurements of the Proximal Phalanges and Central Phalanges were taken using a 30-cm ruler. These were measured to the nearest mm. * Measurements of the mouth (from crease of bottom and top lips, on either side of the mouth)

• Word count: 1428
14. ## Estimates of a straight line and a curved line

Evidence: Before I write my hypothesis I am going to write a small number of my ideas on what affects an individuals ability to estimate a straight line and a curved line. 1. I think the older the person is the better they estimate. 2. I also think that if the person estimates within a 10% error of the actual length for one of the lines they will be within 10% of the other line. 3. I think that boys will have a greater spread of data than girls.

• Word count: 1791
15. ## Statistical Analysis of a Survey concerning A "Monorail"

The writer of the article also says that "90% of people do not want the new 'Metro 2000' system". Again 48 people is not a big enough number to suggest the whole of Southampton do not want the monorail. The results the newspaper have published are not at all representative of the population of Southampton and may only be the views of people in a certain area of Southampton who have a reason for the system not been built. The sampling method used was not very successful and the survey could have been improved by using better sampling methods.

• Word count: 817
16. ## An Investigation Into the Density of &#147;Mock&#148; Blood.

However, there will be a difference between samples A and B compared to sample C. APPARATUS * 100cm3 sample A * 100cm3 sample B * 100cm3 sample C * 24.96g copper (II) sulphate * 1dm3 distilled water * three 1cm3 plastic syringes fitted with long needles * three 100cm3 measuring cylinders * stopwatch or clock METHOD 1. Fill three measuring cylinders with a 0.1mol/dm3 copper (II) sulphate solution, to a depth of approximately 5cm above the 100cm3 level.

• Word count: 548
17. ## How Can Samples Describe Populations?

To try and fulfil this rudimentary and salient criterion in investigation, sampling techniques have been developed and employed. Number of Samples When using samples and attempting to represent the view of a designated population, it is apparent that the data acquired from one member of the population is very unlikely to lead to any conclusions. The law of averages suggests that the greater the number of samples, the more accuracy the data will have. Therefore, it is better to include as many samples as possible, but how many samples are sufficient to justify findings?

• Word count: 2969
18. ## Telepathy Maths Investigation

The people I am targeting to run these telepathy tests on are young adults aged 16-19, from New College. These people are similar in age and have been assumed to be of the same intelligence, so the results for each every person should remain constant, relative to the level of my telepathic ability. The sample size will be 20 people large, as this should be enough to see if I possess any telepathic powers. If any of the results were larger than the others, then this would suggest telepathic ability of that particular receiver rather than me, the sender.

• Word count: 1000
19. ## The Relationship Between Price, Date of Release/ Re-Release of a Sample of 52 Randomly Selected Films

I planned this coursework for future reference so I could add the information to my media coursework as a tested fact. To determine a population for my course-work, I am going to use the HMV superstore Internet site to search for my sample of films. This film site is split up into categories corresponding to letters of the alphabet; therefore there are a total of 26 categories as there is 26 letters of the alphabet. Each category contains roughly 500 films giving a total population of 130,00 films.

• Word count: 923
20. ## Investigating Growth in Stride Length During the Human Growth Stage

Also the numbers of pupils in each year group are of very similar sizes so stratified sampling would not have been of any use. The sampling frame that I have chosen to be a representation of the population, is a class list arranged alphabetically for both years 7 and 12. YEAR 7 Total number of pupils= 150 Sample size= 30 150/30= 5 Random number from 1 to 5 on calculator= 2 I therefore chose every 5th member of the population starting from the student numbered 2 in the class list, e.g.

• Word count: 2163
21. ## Investigation into Relationship between Volume and Diameter in Sand Piles

Quantitative Prediction: V = 1/3?r2h = 1/3?(d/2)2h = 1/12?d2h {i.e. V ? d2h} h ...but for constant shaped sand piles, we can add more... h ? d ? V ? d3 So, here is the proof that the volume does vary with diameter, but it does not only vary in general, it effectively varies directly with the cube of the diameter, so that an exponential graph is produced if the graph is plotted (shown -->2pages). METHOD 1] Initially, we decided to take 5 diff.

• Word count: 1263
22. ## Comparative Newspaper Project

It is also quite sensible, as I am collecting data in a group of 7, so everyone can count 25 sentences from each newspaper. To make this sample more reliable, each sample is going to be selected at random, but first I'm going to choose two newspapers at random using a random number generator on my calculator. (Listed alphabetically to ensue fairness.) I also used this method to (Ran# * 7) top generate the day on which to buy the relevant newspapers (including numbers less than 1 this time).

• Word count: 1693
23. ## Haney, Banks and Zimbardo (prison simulation)

This was to avoid any pre-existing friendship tendencies, which might have arisen during the study. This is clearly not a representative sample - the subjects were all male, almost entirely Caucasian, and largely middle-class. There were also not enough subjects for the sample to be representative - with only 22 subjects taking part in the study, it was unlikely that they would represent a large percentage of the world. ii) The sampling method was, at least in my opinion, a very successful one - it was incredibly thorough (with an extensive questionnaire, and an interview by one of the experimenters), and therefore provided exactly the kind of sample that Haney, Banks and Zimbardo were looking for.

• Word count: 1335
24. ## Investigating how much the 5 pence minimum charge on local calls increases the cost of making local calls.

Such a random method would however give no indication of whether the duration of calls remained constant over time. Random sampling may allow a certain cluster of calls to dominate the general trend. Stratified sampling would not be appropriate for the investigation because we want to find out the number of 5-minute calls, not just select a certain number of them. This sampling method would be useful for a sub-investigation, for example to see how much effect the over 5-minute calls make to the charging by only selecting the over 5-minute calls and assessing their magnitude.

• Word count: 3265
25. ## Are Modern Musicians Lazy: A Comparison between the Lengths of Modern and Classical Music

By Classical Music I mean any music that is more than 100 years old. This will include composers like Beethoven, Tchaikofsky, Rachmaninov and Handel. In other words, when I say Classical I mean it in the common sense, not music from the "Classical" period (As opposed to, for example, the "Romantic" period.) In order to collect my sample data, I will be using my (electronically stored) music collection for the modern music and my parents' collection of classical music. The music I have stored on my PC has been filtered to remove any music which does not fit the classifications stated above.

• Word count: 1417