Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

Take Web data analysis to the next level with PHP

By Paul Meagher
2004-06-15


Start with the sampling

Imagine you run a weekly poll on your site -- www.NovaScotiaBeerDrinkers.com -- asking members for their opinions on various topics. You've created a poll that asks members their favorite brand of beer (in Nova Scotia, there are three well-known brands: Keiths, Olands, and Schooner). So the survey is as inclusive as possible, you include "Other" among the responses.

You receive 1,000 responses and observe the results in Table 1. (The results shown in this article are for demonstration purposes only and not based on actual surveys.)



The data appears to support the conclusion that Keiths is the most popular brand among Nova Scotia residents. Based on these numbers, can you draw this conclusion? In other words, can you make an inference about the population of Nova Scotia beer drinkers on the basis of results obtained from the sample?

Many factors related to how the sample was collected could render your relative popularity inferences incorrect. Perhaps the sample consists of an inordinate number of employees of Keith's Brewery; perhaps you didn't properly guard against multiple votes by one person who may have biased the outcome; perhaps those who elected to vote are different from those who elected not to vote; perhaps the online voters are different from the offline voters.

Most Web polls are subject to such interpretive difficulties. These interpretive difficulties arise when you try to draw conclusions about a population parameter from a sample statistic. From an experimental design point of view, one of the first questions to ask before you collect data is whether you can take steps to help ensure that your sample is representative of the population of interest.

If drawing conclusions about the population of interest is your motivation for a Web poll (versus entertainment for site visitors), then you should implement techniques to ensure one vote per person (so that, they must login with a unique ID to vote) and randomize the selection sample of voters (for instance, select a random subset of members and e-mail them encouragement to vote).

Ultimately, the aim is to eliminate, or at least reduce, various biases that might impair your ability to draw inferences about your population of interest.

Tutorial Pages:
» Take Web data analysis to the next level with PHP
» Relate Web data to experimental design
» Examples of measurement scales
» Start with the sampling
» Test the hypothesis
» Model the null hypothesis: The Chi Square statistic
» Look at the Chi Square sampling distribution
» Chi Square instance variables
» The Constructor: Backbone of the Chi Square test
» Handle output issues
» Repoll
» Apply the knowledge
» Resources


First published by IBM developerWorks


 | Bookmark
Related Tutorials:
» Zend Framework Tutorial
» Port Scanning and Service Status Checking in PHP
» Web Database Access from Desktop Applications
» CubeCart 3.0 Installation and Configuration
» PHP Site Search Made Easy
» Installing and Configuring Drupal 6.1

Ask A Question
characters left.