Take Web data analysis to the next level with PHP
By Paul Meagher2004-06-15
Model the null hypothesis: The Chi Square statistic
In the case of this beer poll, the expected frequency under the null hypothesis is the following:
Expected Frequency = Number of Observations / Number of Response Options
Expected Frequency = 1000 / 4
Expected Frequency = 250
To compute an overall measure of how much the responses deviate from the expected frequency per cell, you can sum up all the differences into an overall measure of how much the observed frequencies differ from the expected frequencies: (285 - 250) + (250 - 250) + (215 - 250) + (250 - 250).
If you do this, you find the the expected frequency is 0 because deviations from a mean always sum to 0. To get around this problem, square all the difference scores (hence the square in Chi Square). Finally, to make the score comparable across samples with different numbers of observations (in other words, to standardize it), divide by the expected frequency. So, the formula for the Chi Square statistic looks like this ("O" means "observed frequency" and "E" equals "expected frequency"):
Figure 1. The formula for the Chi Square statistic
If you calculate the Chi Square statistic for the beer poll data, you obtain a value of 9.80. To test your null hypothesis, you want to know the probability of obtaining a value this extreme under the assumption that it is due to random sampling variability. To find this probability, you need to understand what the sampling distribution for Chi Square looks like.
First published by IBM developerWorks
|
|||||||||
You might also want to check these out:
|
Leave a Comment on "Take Web data analysis to the next level with PHP"
You must be logged in to post a comment.
Link to This Tutorial Page!

