Take Web data analysis to the next level with PHP
By Paul Meagher2004-06-15
Apply the knowledge
In this article, you have learned how to apply inferential statistics to the ubiquitous frequency data used to summarize Web data streams, focusing on the analysis of Web poll data. However, the simple one-way Chi Square analysis procedure discussed can be fruitfully applied to other types of data streams (access logs, survey results, customer profiles, customer orders) to turn raw data into actionable knowledge.
I also covered the desirability, when applying inferential statistics to Web data, to regard data streams as outcomes of Web experiments so that you increase the likelihood of invoking experimental design considerations in making your inferences. Often you cannot make inferences because you do not have adequate controls in your data-collection process. This can change, however, if you become more proactive in applying experimental design tenets to your Web data collection procedures (such as, randomize the selection of voters in your Web polls).
Finally, I demonstrated how to simulate the Chi Square sampling distribution for different degrees of freedom, going beyond simply commenting on its derivation. In doing so, I also demonstrated a workaround (simulating the sampling distribution for experiments using a small $NTrials value) to the prohibition of using the Chi Square test in cases in which the expected frequency of measurement categories is less than 5 (in other words, a small N experiment). So, instead of just using the df from the study to compute the probability of a sample outcome, for small numbers of trials, you might also need to use the $NTrials value as a parameter to evaluate the probability of the observed Chi Square result.
It is worth pondering how you might analyze small N experiments because often you might want to analyze your data before data collection is complete -- when each observation is costly, when observations take a long time to obtain, or simply because you are curious. These two questions are good to keep in mind when attempting this level of Web-data analysis:
* Are you justified in making inferences under conditions of small N or not?
* Can simulation help you determine what inferences to draw under these circumstances?
Tutorial Pages:
» Take Web data analysis to the next level with PHP
» Relate Web data to experimental design
» Examples of measurement scales
» Start with the sampling
» Test the hypothesis
» Model the null hypothesis: The Chi Square statistic
» Look at the Chi Square sampling distribution
» Chi Square instance variables
» The Constructor: Backbone of the Chi Square test
» Handle output issues
» Repoll
» Apply the knowledge
» Resources
First published by IBM developerWorks
