Chapter 2
Frequency distribution lists data values (either individually or by groups of interval), along with their corresponding frequencies (or counts.)They are constructed for these reasons: Large data sets can be summarized, we can gain insight into the nature of data, and we have a basis for constructing important graphs.
- decide on number of classes.(5 to 20)
- calculate : Class width= (highest value)- (lowest value)/number of classes.
- Choose lowest data value or convenient value whichever is smaller.
- Using lower limit of the first class and the width, proceed to list the lower class limits. (add the class width to the starting pt. to get the second class limits. Etc.0+100=100, 100+100=200
- List the lower class limits in a vertical column and proceed to enter the upper class limits, which can be easily identified.
- Go through the data set putting a tally in the appropriate class for each data value. Use tally marks to find the total frequency for each class.
Relative frequency distribution =class frequency/sum of all frequencies
Cumulative frequency- The sum of all class frequencies for that class and all previous classes.
Lower class limits- are the smallest numbers that can belong to the different classes.
Upper class limits- are the largest numbers that can belong to the different classes.
Class boundaries- are the numbers used to separate classes, but without the gaps created by class limits. They are obtained as follows: Find the size of the gap between the upper class limit of one of the class and the lower class limit of the next class. Add half of that amount to each of upper class limit to find the upper class boundaries; subtract half of that amount from each lower class limit to find the lower class boundaries.
Class midpoints- are the midpoints of the classes. Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2.
Class width- is the difference between two consecutive lower class limits or two consecutive lower class boundaries.
Histogram- is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies. The heights of the bars correspond to the frequency values, and the bars are drawn adjacent to each other (without gaps)
1) Click DDXL
2) Click charts and plots
3) Click function type Histogram
4) enter range
5)Ok
Frequency polygons
1)enter cell a info. ( a: 0,11,12,14,1,2,0)
2) Click insert then click chart
3) Click line then next
4) Enter range (A1:A7) Then column
5) Click on the series in the box labeled “category (X) axis labels,” entera space followed by a comma, the second class midpoint value followed y a comma, and so on.
(space), 1,4,7,10,13,(space) Click next when done
6) Click Legend and grid tab keys to remove any default check marks so that the legend and gridlines are deleted. Click next when done.
7) Finish
Measure of center- is a value at the center or middle of a data set.
Arithmetic mean- of a set of values is the measure of center found by adding the values& div the total by the number of values. This measure of center will be used often throughout the remainder of the text, and it will be referred to simply as the mean.
Median- of a data set is the measure of center that is the middle value when the original data values are arranged in order of decreasing or increasing magnitude. The median is often denoted by x.
Mode- of a data set is often denoted by M, is the value that occurs most frequently.
When two values occur with the same greatest frequency, each one is a mode and the data set is bimodal.
When more than two values occur with the same greatest frequency, each is a mode and the data set is said to be multimodal.
When no value is repeated, we say that there is no mode.
Midrange-is the measure of the center that is the value midway between the highest and lowest values in the original data set. It is found by adding the highest data value to the lowest data value and then dividing the sum by2, as in the following formula.
Midrange=highest value +lowest value/2
Skewed –a data set is skewed if it is not symmetric and extends more to one side than the other. (A distribution of a data is symmetric if the left half of its histogram is roughly a mirror image of the right half.)
Standard deviation of a set of sample values is a measure of variation of values about the mean. It is a type of average deviation of values from the mean that is calculated by using formulas.
Variance- of a set of values is a measure of variation equal to the square of the standard deviation.
Sample variance: Square of the standard deviation s.
S= 7.0 min then, sample variance=s2=7.0 sq’d=49.0 min sq’d
Population variance: Square of the population standard deviation.
For estimating a value of the standard deviation s: to roughly estimate the standard deviation, use
S=range/4 where range= (highest value)-(lowest value)
Example………. Lowest is 0 and highest is 491 so…….S= range/4 491/4=122.75=123
For interpting a known value of the standard deviation S: If the standard deviation s is known, use it to find rough estimates of the minimum and maximum “usual” sample values by using Min “usual” value= (mean) – 2 multiplied (standard deviation)
Max “usual” value = (mean) + 2 multiplied (standard deviation)
Example……the mean is 40.05cm and the standard deviation of 1.64cm
Min= (mean) – 2 mult (standard deviation)
40.05 – 2(1.64) = 36.77 cm
Max= (mean) + 2 mult (standard deviation)
40.05 + 2(1.64) = 43.33 cm
A standard score, or z score, is the number of standard deviation that a given value x is above or below the mean. It is found using the following expressions:
Sample Population
Z= x- x/s Z= x - /o
(round z to 2 decimal places)
Percentile of a value x = number of values less than x/total number of value mult 100
L= (k/100)n where…..n= number of values……k= percentile in question
Example……L= 68/100 multiplied by 40 = 27.2 then is it a whole # ?
No ……change L by rounding it up to the next larger whole #...The Pk is the Lth value, counting from the lowest.
Yes……The value of the Kth percentile is midway between the Lth value and the next value in the sorted set of data. Find Pk by adding the Lth value and the next value and diving the total by 2.
Boxplot is a graph of a data set that constist of a line extending from the min value to the max value, and a box with lines drawn at the first quartile, q1; the median; and the third quartile, Q3.
Generating boxplot:
Enter data in A
Click DDXL an select charts and plots
Under function type select option boxplot
In dialog box enter range of data (A1:A15) the click OK
Chapter 3
Event is any collection of results or outcomes of a procedure.
Simple event is a outcome or an event that cannot be further broken down into simpler components.
Sample space for a procedure consists of all possible simple events. That is, simple space consists of all outcomes that cannot be broken down any further.
Notation for Probibilities
P denotes a probability
A , B, and C any other variable denotes specific events
P (A) denotes the probability of events A occurring
Rule 1: Relative Frequency Approximation of Probability
Conduct (or observe) a procedure a large number of times, and count the number of times that event A actually occurs. Based on these actual results, P (A) is estimated as follows:
P (A) = number of times A occurred/number of times trial was repeated.
Example of Rule 1: A gallup poll found that 5000 people survived 3842 lived in rentals. 1120 owened their homes and 38 were homeless.
Find P (homeowner) 1120/5000 = P = .224
Probabilities range from 0 to 1.0
P= 0 there is no chance the event will occur
P= 1 the event will occur no matter what.
P= .05 unlikely to occur.
Rule 2: Classical Approach to Probability (requires equally likely outcomes)
Assume that a given procedure has n different simple events and that each of those simple events has an equal chance of occurring. If event A can occur in s of these n ways, then
P(A) = number of ways A can occur/ number of different simple events= s/n
Example of rule 2: Your rolling a fair die . What is P(4) =1/6 = .1667= 16.67%
Rule 3: Subjective Probabilities
P(A), the probability of event A, is found by simply guessing or estimating its value based on knowledge of the relevant circumstances.
Example of rule 3: What is the probability that your house will be attacked by mokeys in the next 2 weeks. P= .000000000000000001
P ( 2 boys exactly in 3 births)
1st 2nd 3rd
B B B
B B G < 3/8 = P = .375
B G B <
B G G
G B B <
G B G
G G B
G G G
Law of large numbers
As a procedure is repeated again and again, the relative frequency probability (from Rule 1) of an event tends to approach the actual probability.
Rare events rule .05
If the probability of a given events is extremely small, we conclude the assumption is probability incorrect.
Complement of event A, denoted by A, consists of outcomes in which event A does not occur. 1- 1/6= 5/6 1- 5/6= 1/6
Rounding off probabilities- When expessing the value of a probability, either give the exact fraction or decimal or round off final decimal results to three significant digits. (Suggestion: When a Probability is not a simple fraction such as 2/3 or 5/9, express it as a decimal so that the number can be better understood.)
Actual odds against event A occurring are the ratio P(A)/P(A), usually expressed in the form of a:b (or “a to b”), where a and b are integers having no common factors.
Actual odds in favor of event A are the reciprocal of the actual odds against that event. If the odds against A are a:b, then the odds in favor of A are b:a.
Payoff odds against event A represent the ratio of net profit (if you win) to the amount bet.
Payoff odds against event A = (net profit) : (amount bet)
Notation for addition rule
P(A or B) = P(event A occurs or event B occurs or they both occur)
Addition Rule: If the events are mutually exclusive (disjoint) then use this formula.
P (A or B) = P (A) + P (B)
If the events are not mutually exclusive (meaning they are disjoint) use
P (A or B) = P(A) + P(B) – P(A-B)
Example: 6 circles P(C) = 6/24 = .25 P(triangle or circle)
10 squares P(S) = 10/24 = .416 .333 + .25 = .583
8 triangles P(T) = 8/24 = 1/3 = .333 P (triangle or circle or sqare)
.333 + .25 + .416 = 1.0
P (red or triangles) P(purple or square)
8/24 + .333 = 11/24 + 10/24 – 3/24 = 18/24
Intuitive Addition Rule
To find P(A or B), find the sum of the number of ways event A can occur and the number of ways event B can occur, adding in such a way that every outcome is counted only once. P(A or B) is equal to that sum, divided by the total number of outcomes in sample space.
P(A or B)
Addition Rule
Are A and B disjoint? Yes…P(A or B) = P(A) + P(B)
No….P(A or B) = P(A) + P(B) – P(A and B)
Rule of complementary events
P(A) + P(A) = 1 P(A) = 1- P(A) P(A) = 1 – P(A)
Notation- P (A and B) = P( event A occurs in a first trial and event B occurs ina second trial)
Notation for Conditional Probability
P(B/A) represents the probability of event B occurring after it is assumed that event A has already occurred. (We can read B|A as “B given A.”)
Two events are independent if the occurrence of one does not affect the probability of the occurrence of the other. (Several events are similarly independent if the occurrence of any does not affect the probabilities of the occurrence of the others.) If A and B are not independent, they are said to be dependent.
Formal Multiplication Rule
P(A and B) = P(A) x P(B|A)
Intuitive Multiplication Rule
When finding the probability that event A occurs in one trial and event B occurs in the next trial, multiply the probability of event A by the probability of event B, but be sure that the probability of event B takes into account the previous occurrence of event A
P( A and B) Multiplication rule
Are A and B independent? Yes……P(A and B) = P(A) x P(B)
No…..P(A and B) = P(A) x P(B|A)
If the sample size is no more than 5% of the sample size of the population, treat the selections as being independent (even if the selections are made without replacement, so they are technically dependent).
Complements: The probability of “at least one” is equivalent to “one or more”
*The complement of getting at least one item of a particular type is that you get no items of that type.
Conditional probability of an event is a probability obtained with the additional information that some other event has already occurred.P(B|A) denotes the conditional probability of event B occurring, given that event A has already occurred, and it can be found by dividing the probability of events A and B both occurring by the probability of the event A:
P(B|A) = P(A and B)/P(A)
Intuitive approach to conditional Probability
The conditional probability of B given A can be found by assuming that event A has occurred and, working under that assumotion, calculating the proability that event B will occur.
Simulation of a procedure is a process that behaves the same way as the ptocedure, so that similar results are produced.
Fundamental Counting Rule
For a sequence of two events in which the first event can occur m ways and the second event can occur n ways, the events together can occur a total of m X n ways.
Notation
The factorial symbol! Denotes the product of decreasing positive whole numbers. For example, 4! = 4 x 3 x 2 x 1 = 24. By special definition, 0! =1. ( Many calculators have a factorial key.)
Factorial Rule
A collection of n different items can be arranged in order n! different ways. (This factoral rule reflects the fact that the frist item may be selected n different ways, the second item may be selected n – 1 ways, and so on.)
Permutations Rule (when items are all different)
The number of permutations (for sequences) of r items selected from n available items (without replacement) is
nPr = n!/(n –r)!
Permutations Rule (when some items are identical to others)
If there are n items with n1 alike, n2 alike,…..nk alike, the number of permutations of all n items is
n!/ n1!n2!....nk!
Combinations Rule
The number of combinations of r items selected from n different items is
nCr = n!/(n –r)!r!