Conduct Web experiments using PHP, Part 2
By Paul Meagher2005-03-18
Independence model
Another model that you might want to test is one that assumes that cell probabilities are the simple product of the marginal probabilities.
pij = pi+p+j
with row marginals computed using this formula
pi+ = ni+ / N
and column marginals using this formula
p+j = n+j / N
Table 4 illustrates how to use these formulas to convert a table of frequency counts (see Table 2) to a table of response probability estimates.
Table 4. Converting observed frequencies to probability estimates
| TEXT | ||||
| short | long | sum | ||
| IMAGE | person | p11 = (10/18) * (8/18) = 0.2469 | p12 = (10/18) * (10/18) = 0.3086 | p1+ = 10/18 |
| product | p21 = (8/18) * (8/18) = 0.1975 | p23 = (8/18) * (10/18) = 0.2469 | p2+ = 8/18 | |
| sum | p+1 = 8/18 | p+2 = 10/18 | 18 | |
You can use these probability estimates to derive the expected cell count where Eij is equal to Npi+p+j.
Table 5. Converting probability estimates to expected counts
| TEXT | ||||
| short | long | sum | ||
| IMAGE | person | E11 = 18 * 0.2469 = 4.4442 | E12 = 18 * 0.3086 = 5.5548 | 10 |
| product | E21 = 18 * 0.1975 = 3.555 | E22 = 18 * 0.2469 = 4.4442 | 8 | |
| sum | 8 | 10 | 18 | |
The product rule pij = pi+p+j expresses the idea of factor independence, the idea that Factor A exerts a constant factor-level effect regardless of the level of Factor B (and vice versa).
Test this "independence model" (and the expected cell counts derived from it) using the chi-square goodness-of-fit procedure. A large summed-differences score returned by the two-dimensional chi-square test procedure tells you that your factors are not independent. Your theoretical goal might then be viewed as trying to find the simplest model to explain your results.
The most complex model, called the saturated model, requires at least one parameter to represent each cell in the table. When modeling your data, your aim might be to reduce that number (use the same parameter estimate for more than one cell) while accurately accounting for the data patterns.
If your observed chi-square score is not significant (as in a null interaction), then examine each factor separately to determine whether there were any main effects and if so, what their size is. You can use the one-dimensional chi-square procedure to assess main effects (such as factor-level differences for one factor) once you recompute your cell totals by collapsing over (or ignoring) the levels of the other factor. You can think of one-dimension chi-square analysis as doing main effects analyses on the row or column marginals. The Chi1D.php and Chi2D.php classes also have a showResidualErrors() method that reports the residual error between your expected and observed counts. Examination of residuals is a critical part of the chi-square model-fitting procedure.
I use the independence model as the default model in Chi2D.php to compute the expected frequencies for use in the two-dimensional chi-square analysis. This is because the two-dimensional chi-square procedure is most commonly used in experimental contexts to test for possible interactions between your categorical variables where the null model is the factor independence model.
Tutorial Pages:
» Categorical data analysis
» 2x2 contingency tables
» Sampling model
» Discrete probability distributions
» Binomial sampling model
» Poisson sampling model
» Envisioning your results
» Eliciting your prior distribution
» Model fitting with chi-square
» Null effects model
» Independence model
» Prior model
» DOE explorer
» Explorer output
» Conclusions
» Resources
First published by IBM developerWorks
