• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month
Page
1. 1
1
2. 2
2
3. 3
3
4. 4
4
5. 5
5
6. 6
6
7. 7
7
8. 8
8
9. 9
9
10. 10
10
11. 11
11
12. 12
12
13. 13
13
14. 14
14
15. 15
15

# Identifying Relationships -Introduction to Statistical Inference.

Extracts from this document...

Introduction

Lecture 6 - MG2007 Data Analysis

Identifying Relationships –Introduction to Statistical Inference

## BOX 1         Further Analysis ( chi-squared )

Categorical response variable and categorical factor

 RV C/M Factor C/M Analysis FurtherAnalysis Accepts package toursPACKAGES C Type of HotelTYPE C Crosstabs Chi-Square

## Cross-tabulations

Giving frequency count for each combination of categories on the two variables of interest

Convention for constructing the table

The column variable is the independent variable ( the factor )

The row variable is the dependent variable ( the RV )

Accepts package tours * Type of Hotel Crosstabulation

 Type of Hotel Total Class 1 Luxury Class 2 Medium Class 3 Basic Class 4B and B Accepts package tours Yes 5 2 14 25 46 No 2 10 15 12 39 Total 7 12 29 37 85

Example information given in the table

How many of the sample hotels take package customers

How many of the sample hotels do not take package customers

How many of the sample hotels are medium sized

How many of the sample hotels are bed and breakfast only

How many of the luxury hotels also take packages

The actual number of cases is not too helpful

Of the 7 luxury hotels, 5 take packages

Of the 39 hotels who do not take packages, 12 are bed and breakfast establishments

but are these figures in any way significant?

The first step to analysing the table is to ask SPSS to calculate the % of cases in each cell.  The percentages are calculated in the direction of the factor - the factor is the column variable.

Accepts package tours * Type of Hotel Crosstabulation

 Type of Hotel Total Class 1 Luxury Class 2 Medium Class 3 Basic Class 4B and B Accepts package tours Yes Count 5 2 14 25 46 % within Type of Hotel 71.4% 16.7% 48.3% 67.6% 54.1% No Count 2 10 15 12 39 % within Type of Hotel 28.6% 83.3% 51.7% 32.4% 45.9% Total Count 7 12 29 37 85 % within Type of Hotel 100.0% 100.0% 100.0% 100.0% 100.0%

Out of the 85 respondents taking part in the survey:

7 hotels  fall into the group CLASS LUXURY

of those        5 or 71.4% do take package customers, compared to 54.1% for all Hotels.

2 or 28.6% do not take package customers, compared to 45.9% for all Hotels.

12 hotels  fall into the group CLASS 2 MEDIUM SIZED

of those                2 or 16.7% do take package customers, compared to 54.1% for all Hotels

10 or 83.3% do not take package customers, compared to 45.9% for all Hotels

Middle

Class 2 Medium

Class 3 Basic

Class 4

B and B

Accepts package tours

Yes

Count

5

2

14

25

46

Expected Count

3.8

6.5

15.7

20.0

46.0

No

Count

2

10

15

12

39

Expected Count

3.2

5.5

13.3

17.0

39.0

Total

Count

7

12

29

37

85

Expected Count

7.0

12.0

29.0

37.0

85.0

χ2Calculation:

Small differences between observed and expected produces a small contribution to the χ2statistic

Large differences between observed and expected produces a large contribution to the χ2statistic

SPSS calculates the value of the  χ2statistic for us:

Chi-Square Tests

 Value df Asymp. Sig. (2-sided) Pearson Chi-Square 10.717 3 .013 Likelihood Ratio 11.274 3 .010 Linear-by-Linear Association 2.615 1 .106 N of Valid Cases 85

Making a decision:

In effect the null hypothesis is presumed innocent until proven guilty

We require a decision rule to help us to test the hypothesis we have stated

The decision rule

We set up a decision rule - detailed explanation of the theory behind the use of this rule will not be discussed on this module.  It will be used as a methodology for determining the existence of a relationship when you are unsure and no more.

There are many chi-square distributions. The one used is determined by degrees of freedom.

Degrees of freedom are actually calculated using the following formula

df = ( no. of rows in the table - 1 ) ( no. of columns in the table - 1 )        = ( 2- 1 )( 4-1 )        = (1)(3)        giving 3df (but the SPSS output calculates the df for you)

DIAGRAM OF CHI-SQUARED DISTRIBUTION and DECISION RULE

The 95% decision point or the critical value is taken from pre-printed chi-square tables using the degrees of freedom (df) given.

Using the tables provided, what is the critical value for this example?

Applying the decision rule

It is unlikely that we will get a value of the test statistic in the 5% region.

Given that a value lying in the 5% region is very unlikely, we shall reject the null hypothesis if the value of the chi-squared statistic calculated from the sample data falls in this region.

What is the value of the χ2 statistic calculated by SPSS from the sample data?  (see SPSS output)

If the value of our test statistic falls inside the 5% region on the diagram

reject H0: in favour of the alternative hypothesis

i.e. on the basis of the sample data there is a relationship between the two variables

If the value of our test statistic falls inside the 95% region on the diagram

we do not reject H0:

i.e. on the basis of the sample data, there is no relationship between the two variables

Analysis Conclusion

Strength of the relationship

Type 1 and Type 11 Errors

Hypothesis testing is not foolproof!

It is possible to make an error but it obviously would be bad luck if we did considering the small number of instances of a value in this region

Statistical Hypothesis Testing is a reasonable decision procedure in the face of two types of unavoidable ignorances

a)        we will never know the truth

1. we will never know whether our decision is correct or incorrect

CHI-SQUARED OUTPUT OF PACKAGES and TYPE

SPSS for Windows

Get the HOTELS data file

Select Analyze

Select Descriptive Statistics

Select Crosstabs

Move the variable Accepts package tours (packages) into the Rows box

Move the variable Type of Hotel (type) into the Columns box

Select Statistics button at the foot of the window

Select Chi-square

Select Continue

Select Cells button at the foot of the window

Select Expected in the counts box to give the expected frequencies.

Select Column in the Percentages box to give the column percentages

Select Continue

The OK to get the required output.

Lecture 7 - MG2007 Data Analysis

### Plotting combinations of variables - categorical & measured

#### Objective 1 IDA Plan

Response variable

C/M

##### Factor

C/M

Initial method of analysis

Further analysis

Result

Number of cars

NUMCARS

C

INCOME

C

Crosstabs

Chi-squared

No. of Family members

## FAMILY

C

Crosstabs

Chi-squared

No. years in Education

## EDUCATE

M

Comparison of means

T-test

Region of residence

REGION

C

Crosstabs

Chi-squared

Conclusion

Type 0 ( the code for group 1 of the attribute variable ) into the Group 1: box

Type 1 ( the code for group 2 of your attribute variable ) into the Group 2 box

Select Continue

OK

Unequal or equal variances?

Levene test for equality of Variances

 Sig. >0.05 Equal Variances assumed Sig. <0.05 Equal Variances not assumed

The value of the test statistic t-calc ( short for the value of t calculated from the sample data ) is calculated by SPSS, for this example is  –4.758,  and the degrees of freedom are 380.

Looking up the threshold value of t in statistical tables:

We are working at a 95% level of confidence and completing a two-tailed test.

Is the sample data compatible with the null hypothesis?

Unlikely to get a value of t in the critical region.  The critical region consists of all those values of the test statistic that provide strong evidence of the alternative hypothesis.  There is only a 5% probability that we will observe a value in this region. Hence a value in here will lead to a rejection of the null hypothesis.

The conclusions?

Describing the relationship:

Confidence Intervals - calculated using the SPSS output

95% confident that the difference between the mean amount borrowed by owner-occupiers and the mean amount borrowed by those renting lies in the interval -51.56 and -21.41.

95% confident that owner-occupiers borrow, on average, between £21.41 and £51.56 less than those renting. Values anywhere in this range are possible

Remember:

• we cannot be 100% confident unless we carry out a census
• hypothesis testing is not foolproof!
• we can make one of two errors, known as Type I and Type II errors

Type I error         reject the null hypothesis when in fact it is true

Type II error        accept the null hypothesis when in fact it is false and should be rejected

The only way to arrange things so that the probability of both Type 1 and Type 11 errors is minimised, is to use large samples ( >30 )

For information only,

Decision

 Do not reject H0 RejectH0 H0True Correct decision Type I error H0False Type II error Correct decision

MG2007        Page                        A.Haines

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

## Found what you're looking for?

• Start learning 29% faster today
• 150,000+ documents available
• Just £6.99 a month

Not the one? Search for your essay title...
• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month

# Related AS and A Level Probability & Statistics essays

1. ## The mathematical genii apply their Statistical Wizardry to Basketball

infinite range of shots that may be required to score a basket. The sum of all the probabilities will equal one (a probability density function). If X and Y have a geometric distribution, the distribution should look like this: The sample size shall be 80 as a large sample size

2. ## Statistics. The purpose of this coursework is to investigate the comparative relationships between the ...

Civic 12895 7995 1 9500 1 69 Rover Club 19530 14999 1 2000 1 70 Fiat Bravo 10810 4995 2 18500 1 71 Landrover Discovery 27855 13995 1 40500 1 72 Mercedes Elegance 26425 17500 2 22000 1 73 Porche Sport 32995 19495 6 46000 1 74 Volkswagen Beetle 14950

1. ## I am going to design and then carry out an experiment to test people's ...

This means that 15 sets of data will be needed from each year group. Interpreting the data There are many different aspects of the data I could use to make graphs, and interpret the data to prove or disprove my hypothesis.

2. ## Design an investigation to see if there is a significant relationship between the number ...

Precautions: The prominent safety issue posed by this investigation is the fact that the tide at Robin Hood's Bay is known to move up the sea shore at a quick rate, and that deep gullies mean that when the tide comes in it is easy to get cut off from the main shore.

1. ## Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

A change in the pH of the solution may alter the shape of the excitation spectrum of the fluorescent compound. The presence of anions, such as chloride, bromide, iodine and nitrate may affect fluorescence. Quinine sulphate is highly fluorescent in 0.1M H2SO4, but becomes non-fluorescent in 0.1M HCl.

2. ## Study of the height/diameter ratio of limpets inhabiting the middle shore region of exposed ...

* The measurements will be done on the same day at mean tide level3. This will ensure that all results were obtained under the same circumstances. * Equipment used by all parties at each shore must be identical * Method of obtaining study co-ordinates is random.

1. ## Driving test

I will now continue with the investigation. Hypothesis 1 I will investigate the hypothesis: "The more lessons taken by a pupil the fewer mistakes they make in the test.' I predict that this hypothesis is correct. This is because I think this because the more you practice at something, generally

2. ## Application of number: level 3 - Is House Buying a Good Idea or Not?

/ �64,982] x 100 = 26.2% (to 3sf) The East (midlands): * First-time buyers' houses: Mean (of current house prices) = �64,900 (to the nearest �) Mean (of 1998 house prices) = �49,600 (to nearest �) Percentage increase = [(current mean - 1998 mean)/ 1998 mean] x 100 = [(64,900 - 49,600)

• Over 160,000 pieces
of student written work
• Annotated by
experienced teachers
• Ideas and feedback to