Identifying Relationships -Introduction to Statistical Inference.

AS and A Level Maths

Lecture 6 - MG2007 Data Analysis

Identifying Relationships –Introduction to Statistical Inference

BOX 1 Further Analysis ( chi-squared )

Categorical response variable and categorical factor

Cross-tabulations

Giving frequency count for each combination of categories on the two variables of interest

Convention for constructing the table

The column variable is the independent variable ( the factor )

The row variable is the dependent variable ( the RV )

Accepts package tours * Type of Hotel Crosstabulation

Example information given in the table

How many of the sample hotels take package customers

How many of the sample hotels do not take package customers

How many of the sample hotels are medium sized

How many of the sample hotels are bed and breakfast only

How many of the luxury hotels also take packages

The actual number of cases is not too helpful

Of the 7 luxury hotels, 5 take packages

Of the 39 hotels who do not take packages, 12 are bed and breakfast establishments

but are these figures in any way significant?

The first step to analysing the table is to ask SPSS to calculate the % of cases in each cell. The percentages are calculated in the direction of the factor - the factor is the column variable.

Accepts package tours * Type of Hotel Crosstabulation

Out of the 85 respondents taking part in the survey:

7 hotels fall into the group CLASS LUXURY

of those 5 or 71.4% do take package customers, compared to 54.1% for all Hotels.

2 or 28.6% do not take package customers, compared to 45.9% for all Hotels.

12 hotels fall into the group CLASS 2 MEDIUM SIZED

of those 2 or 16.7% do take package customers, compared to 54.1% for all Hotels

10 or 83.3% do not take package customers, compared to 45.9% for all Hotels

29 hotels fall into the group CLASS 3 BASIC HOTELS

of those 14 or 48.3% do take package customers, compared to 54.1% for all Hotels

15 or 51.7% do not take package customers, compared to 45.9% for all Hotels

37 hotels fall into the group CLASS 4 B AND B

of those 25 or 67.6% do take package customers, compared to 54.1% for all Hotels

12 or 32.4% do not take package customers, compared to 45.9% for all Hotels

Analysis of the cell percentages should always be the first stage in reading a crosstabulation. In some cases this is all that will be required. A graphical presentation can help with this. ( see Week 5 lecture for the SPSS commands to produce this type of graph )

The Research Question

Investigate whether the acceptance of package customers is associated with the type of hotel?

Are the RV PACKAGES and the factor TYPE related in any way? Are they dependent?

Investigating relationships

If there was no relationship between the RV and the factor i.e. acceptance of package customers was completely independent of type of hotel, you would expect the percentage of respondent’s in each package group of the total sample to be replicated in each of the hotel types.

Accepts package tours * Type of Hotel Crosstabulation

There are three possible outcomes to the analysis

Outcome 1 Where there are large differences in the expected and the actual percentages, then we may assume that there is a relationship between the RV and the factor of interest.

Outcome 2 Where there are very small differences in the expected and the actual percentages, then we may assume that there is no relationship between the RV and the factor of interest

Outcome 3 Where it is not clear, there are some differences but also some similarities, then we may need a further statistical test to help us to make a decision on this piece of analysis. The test is the chi-squared test of significance.

We will now develop a methodology for tackling this issue. It will involve a statistical procedure known as a hypothesis test.

Hypothesis Testing

What is a hypothesis? a claim, assertion about the population

What is a hypothesis test? A procedure that enables us to test our claim in the light of the available sample evidence. Its purpose is to draw an inference about the target population based on what we see in the sample

Our hypothesis is about the target population

Our evidence is from the sample

What is the hypothesis in this example?

We are investigating a possible relationship between PACKAGES and TYPE

We are asked to test the statement that the acceptance of package customers is associated with hotel type, and we adopt this as our null hypothesis.

Unrelated is always the null hypothesis

What is the formal null hypothesis for this example?

H0:

What is the alternative hypothesis?

If the statement is not supported by the sample data then the acceptance of package customers pattern will be different from hotel type to hotel type, there will ...