After I have collected my primary data I will collect secondary data to further test my hypothesis.
RAW DATA
The letters in the third column describe the type of land around the reading point. This is the key to those letters.
BIASES
I believe there may well be a bias in the height in the buildings as I explained above so I want to have a look at my data to see if my data is biased as I believe the data will be affected by the nearby buildings. The bar chart below shows the amount of recordings with buildings of 1 stories, 2 stories … etc.
As
As shown above there are 1 or 2 more 1 and 3 story buildings surrounding the survey points than the others.
As shown below there is a table below which shows the mean humidity and temperature for each land use / building height, and the mean humidity divided by the mean temperature.
Grass and water = 0/1
The first thing I will do with this data is to put the data into a scatter graph to find out if there is a correlation in my results.
SCATTER GRAPH
In all scatter graphs in this paper the y axis measures humidity, and the x axis measures temperature.
RAW DATA
As shown above there are quite obviously 2 anomalies here which have disturbed the line of best fit. The red line I have added into the picture above is quite obviously closer to the line of best fit that autograph has put in for me. This means that the anomalies have affected the line of best fit so I will remove them for my next graph to get a more accurate result.
As you can see the correlation is poor as it is 0.50501 so I will rid the graph of the anomalies.
(see graph on following page)
NO ANOMOLIES
This is a scatter graph for all of the data I collected excluding the anomalous results (23.7, 32), (23.5, 53) and (22.8, 23).
The diagonal line is the line of best fit given to me by “autograph”.
I have got rid of the anomalies from the previous graph and come out with a much better line of best fit than of the previous graph. This is a much more accurate graph as the coefficient correlation is -0.8013 this is much closer to the 1 which is an absolutely perfect correlation graph.
OTHER SCATTER GRAPHS
Since I have found a closely coefficient graph I will now try to find if my data can be changed to find if there is an even closer coefficient correlation than that which I already have.
I first tried just grass and water land type as we did not take the.
The Correlation = 0.09644
I then tried all the buildings/tarmac … etc.
The Correlation = 0.1085
As the correlation in both cases was so low I am not showing these graphs.
The following graph takes account of all land types (grass, water, tarmac … etc) up to a height of 3 stories.
The blue line on the graph is the line of best fit given to me by “autograph”.
There appears to be only one anomaly.
However the coefficient correlation was only 0.7698 which although it is good it is not as good as the graph on page 7.
(This is because in the graph on page 7 I have rid the graph of anomalies and in this graph I have not)
STEM AND LEAF
HUMIDITY
0:
10:
20: 3
30: 2 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9
40: 0 0 0 0 1 1 1 1 2 2 2 2 2 3 3 3 5 5 9
50: 1 1 1 2 3
60:
70:
80:
90:
100:
Here the data is spread out over a fairly small area but the majority is in the late 30 and early 40 section there are a few anomalous results such as 23 and 53 ( the spread therefore is 30).
Mean humidity: 39.82%
TEMPERATURE
0:
10:
20: 0.9 1 1.2 1.3 1.5 2.6 2.7 2.8 3 3.3 3.4 3.5 3.5 3.5 3.5 3.7 3.7 3.9 3.9 4 4 4.1 4.2 4.2 4.2 4.2 4.3 4.3 4.3 4.4 4.4 4.5 4.6 4.6 4.6 4.6 4.6 4.7 4.7 4.7 4.8 4.9 4.9 5 5 5.1 5.2 5.2 5.3 5.3 5.3 5.5 5.5 5.5 5.7 5.8 5.9 5.9 6.1 6.3
30:
40:
50:
60:
70:
80:
90:
100:
The problem with this stem and leaf diagram is that all data falls under 1 group (20) so I have redone but with every 1 instead of 10. I have done this because all the data is so closely related.
20: .9
21: .0 .2 .3 .5
22: .6 .7 .8
23: .0 .3 .4 .5 .5 .5 .5 .7 .7 .9 .9
24: .0 .0 .1 .2 .2 .2 .2 .3 .3 .3 .4 .4 .5 .6 .6 .6 .6 .7 .7 .7 .8 .9 .9
25: .0 .0 .1 .2 .2 .3 .3 .3 .5 .5 .5 .7 .8 .9 .9
26: .1 .3
27:
28:
29:
As you can see this makes the data much more widely spread so I can point out groups such as the 20.9 to 21.5 section which is along way off the next at 22.6. Although there is quite a large gap between the first small group and the main group the ret are pretty close together.
I have found out the mean for each section of my data.
Mean temperature: 24.22°C
HISTOGRAMS
I have placed histograms directly after my stem and leaf section as they are so closely related to each other.
HUMIDITY
The bar chart above displays the data from the stem and leaf diagram on page 8.
The items circled on the left and right of the graph represent the anomalous results shown in the stem leaf diagram on page 8.
TEMPERATURE
The bar chart above consists of 2 groups which I pointed out earlier in my stem and leaf diagram. The first group, which is obviously a group of anomalies is the one encircled.
The rest of the data seems to go up steadily and down steadily which indicates that the data in my stem and leaf diagram is accurate.
WHISKER AND BOX PLOT
In this graph I have decided to compare my humidity and my temperature to look at the difference of spread and average between the 2.
The spread of data is completely different and the averages are completely different, so I from this graph I would say there is no correlation between the 2 box and whisker plots, however I have already proved that there is a correlation between them in my scatter graphs on pages 5, 6 and 7.
SECONDARY DATA
I collected this data off the internet and it is based on points around England in December.
I did not collect as much data as in my primary experiment however there is enough data to make accurate graphs.
SCATTER GRAPH (secondary data)
The coefficient correlation of this graph is -0.3945.
This graph is of negative slant indicating that my hypothesis is correct.
STEM AND LEAF (secondary data)
HUMIDITY
0:
10:
20:
30:
40:
50:
60: 2 2 2 4 4 5 5 6 6 6 7 8 8 8 9
70: 0 0 0 1 1 1 2 4 6 8 9
80: 2
90:
100:
All the above data is well grouped.
TEMPERATURE
0: 8 9 9
10: 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4
20:
30:
40:
50:
60:
70:
80:
90:
100:
This data is all within 6 degrees of each other so this data does need to be separated however it is hard to do as I only have 2 figures to work with.
COMPARING SPREAD OF DATA
I believe that the data spread has no correlation as my primary data is a much more localised resource; this means that the data is not averaged so it is much more precise.
WHISKER AND BOX PLOT
I have decided not to do a whisker and box plot for my secondary data or indeed to compare the box plots with my earlier ones as they are based in different seasons, this would have no use at all except to find the spread of humidity and temperature which I have already worked out from the stem and leaf diagram.(as shown on page 14)
SCATTER GRAPH (combined data)
I have combined my primary and secondary data into one scatter graph. The hottest temperatures out of all the data have the lowest humidity and the lowest temperatures have the highest humidity.
The primary data is encircled.
The secondary data is surrounded by a square.
The complete combined data when put together create a coefficient correlation of –0.9553.
The samples in this study are small (based in one country at only 2 separate days during the year) however there appears to be a close correlation between temperature and humidity.
If I was to study this further I would want take samples all around the globes at separate times during the year.
However from the data presented in this paper it would appear that the original hypothesis holds true.