Quantitative Analysis        

Quantitative Analysis for Business

Summarising Data

Part of the reason why we analyse data is to see patterns.  It is difficult to see patterns in data without summarising the data in some way. The most common way to summarise data is to convert the data into a summary table, into a graph or picture, or to use summary measures like the average.  The benefit of this type of summary is that it gives us an instant picture of what is going on in our data set. The problem is that we often lose the detail of the original data.

Frequency Distributions

Suppose you have collected some data on the number of children in people’s families.  The variable we are measuring is ‘number of children in the family’.  The values that this variable can take are numbers like 0, 1, 2 or 3.  These are discrete data, in that they can only be measured as whole values.  You can’t measure children in ever more accurate values like you can time or distance.  When you can measure data in continuously more accurate measures (providing you have an appropriate measuring instrument), we call this type of data - continuous data.

Start by turning your data set on the variable, number of children, into a frequency distribution.  A frequency distribution is a table that shows the values that a variable can take on the left hand column and then the frequency with which we observe the values in the right hand column.  For example, it might look like this.

Frequency Distribution Table

Values of Variable                Frequency

Number of Children             Number of families

  1. 11
  2. 12
  3. 13
  4. 6
  5. 3
  6. 1
  7. 1

Notice that the right hand column is the frequency with which we find different values of our variable.  We found 11 families with no children, 12 families with 1 child, etc.  In all we have data from 47 families.  You can see the pattern already.  Most families have between 0 and 2 children.  Having more than 3 children is quite rare.  Note that we can still construct the original data from this table.  There would be 11 zeros, 12 ones, 13 twos, etc.

Suppose you have a much larger data set with hundreds of values. If you created a table like the one above, it would go on for pages, and you wouldn’t see any patterns. So in these cases, we group the values together to give a shorter table.  Look at this table of the value of orders received in a company over a sample of 40 orders.  It is called a grouped frequency distribution table.

Grouped Frequency Distribution Table

of Value of Orders Received (£)

Values of Variable        Frequency

Value of Order        No. of

£                        orders/week

5 < 10                        1

10 < 15                7

15 < 20                11

20 < 25                10

25 < 30                7

30 < 35                2

35 < 40                0

40 < 45                1

45 < 50                1

This table shows us that out of a 40 observations (add up the frequency column), most of our orders have a value between £10 and £30.  It is rare to get orders with values greater than £30.  You can see that there were 7 weeks in which we observed orders with values between £10 and just less than £15, but we cannot tell from this table what the original value of those 7 orders were.  We have lost the original data.

Note the groups of values in the left hand column do not overlap.  The first group is £5 but less than (<) £10.  So £10 goes in the next group.  There is no ambiguity here. We know into which group every value in the data set should go.  You must not have overlapping groups.  If we had £5-£10 and then £10-£15, we wouldn’t know whether to put an order worth £10 into the first group or the second group.  

Histograms

A histogram is a picture of a grouped frequency distribution.  Usually we only draw histograms for continuous data.  The picture below shows a histogram of our data.

It looks like a bar chart, but it is not a bar chart.  Bar charts only have a numerical scale on one axis, the other axis has some sort of category.  A histogram has a scale on both axes.  The area of the bar in a histogram represents the frequency.  If the bar widths are all the same, then the height of each bar is plotted at the frequency for group and you can read the histogram like a bar chart.

You can see that I have drawn this like a bar chart with the values beneath each bar.  The height of each bar is set at the value of the frequency.  So the first bar tells us that there was only one observations of an order with a value between £5 and £9 inclusive.  Really, the scale should go from 0 to 50, but it is difficult to achieve this in Excel.  The bar line should fall on a value that is not in the data, so the bar line between the first and second bar would be at the value of £9.5 which is not in the data as all the data have been rounded to whole pounds.  However, it is usual now to put the bar lines at the lowest value of the right hand bar.  So the left hand bar line of the first bar would be at the value £5, the left hand bar line of the second bar would be at the value of £10, etc.

Join now!

Problems with histograms.

If the widths of the bars (the groups in your frequency distribution table) are not all the same, you cannot read the histogram like a bar chart.  You cannot plot all the heights of every bar at the frequency because it is really the area that represents the frequency and not the height.  Look at the example below on the value of orders.

Value of Variable        Frequency

Value of Orders        Number of orders        Width                Plot height

£                        /week

0 < 5                        1                        £5                2        (x 2)

5 < 10                        3                        £5                6        (x 2)

10 < 20                11                        £10                11

20 < 30                25                        £10                25

30  < 40                41                        £10                41

40 < 50                36                        £10                36

50 < ...

This is a preview of the whole essay