Take Web data analysis to the next level with PHP
By Paul Meagher2004-06-15
Repoll
Another interesting application of the one-way Chi Square test is to repoll to see if responses have changed.
Imagine that you were to do another Web poll of Nova Scotia beer drinkers after a period had elapsed. You again ask about their favorite brand of beer and now observe the following:
The obvious difference between the poll outcomes is that the first poll had 1,000 observations and the second one had 1,400 observations. The main effect of these additional observations is a 100-point increase in the frequency count for each response alternative.
When ready to do the analysis of the new poll, you can choose to analyze the data using the default method of computing the expected frequencies or you can initialize the analysis with the expected probability of each outcome based on the proportions observed in the previous poll. In the second case, you load the previously obtained proportions into an expected probability array ($ExpProb) and use them to compute the expected frequency values for each response option.
Listing 6 shows the beer-poll analysis code for detecting changing preferences:
Listing 6. Detecting changing preferences
<?php
// beer_repoll_analysis.php
require_once "../init.php";
require PHP_MATH . "chi/ChiSquare1D_HTML.php";
$Headings = array("Keiths", "Olands", "Schooner", "Other");
$ObsFreq = array(385, 350, 315, 350);
$Alpha = 0.05;
$ExpProb = array(.285, .250, .215, .250);
$Chi = new ChiSquare1D_HTML($ObsFreq, $Alpha, $ExpProb);
$Chi->showTableSummary($Headings);
echo "<br><br>";
$Chi->showChiSquareStats();
?>
Tables 5 and 6 show the HTML output that the beer_repoll_analysis.php script generates:
Table 6 shows you have a 77 percent probability of obtaining the Chi Square value of 1.14 under the null hypothesis. We cannot reject the null hypothesis that the preferences of Nova Scotia beer drinkers have changed since your last poll. Any discrepancies between the observed and expected frequencies can be accounted for as expected sampling variability from the same population of Nova Scotia beer drinkers. This null finding should not be a surprise given that the transformation of the original poll results was just to add a constant of 100 to each previous poll outcome.
You can imagine, however, that the results might have been different and that they may have suggested a different brand of beer was becoming more popular (by noting the size of the variance reported below each column in Table 5). You can further imagine that such a finding would have significant financial implications for the breweries in question since bar owners tend to stock the most popular beer in their locality.
These results would be subjected to intense scrutiny by brewery owners who would question the appropriateness of the analytic procedures and experimental methodology; in particular, they would question the representativeness of the samples. If you plan to conduct a Web experiment that may have significant practical implications, you need to pay equal attention to the experimental methodologies you use to collect the data and the analysis techniques you employ to make inferences from your data.
So not only can this article give you a good grounding so you can increase your effective understanding of Web data, it can offer some advice on how to defend your selection of statistical test and provide additional legitimacy to the conclusions you draw from the data.
Tutorial Pages:
» Take Web data analysis to the next level with PHP
» Relate Web data to experimental design
» Examples of measurement scales
» Start with the sampling
» Test the hypothesis
» Model the null hypothesis: The Chi Square statistic
» Look at the Chi Square sampling distribution
» Chi Square instance variables
» The Constructor: Backbone of the Chi Square test
» Handle output issues
» Repoll
» Apply the knowledge
» Resources
First published by IBM developerWorks
