Confirmatory theory testing: Moving beyond NHST
By Keith Widaman, PhD
Meehl’s (1967) paradox emphasizes the differences between the physical sciences, where advances in measurement make it harder to corroborate theories, and psychological science, where measurement advances, ironically, make it easier to corroborate theories. Hypotheses in the physical scientists usually specify precise numerical results, whereas in the psychological sciences, hypotheses typically specify directional results. In psychology, much theory development has relied on null-hypothesis significance testing (NHST).
NHST epitomizes an exploratory approach to theory corroboration. NHST is somewhat counter-intuitive on its face, because rather than testing the prediction of the theory, it sets up a “straw man” representing the opposite of the theory’s prediction. For instance, if our theory predicts that a 10-week trial of cognitive-behavioral therapy reduces depression symptoms, NHST tests the hypothesis that the therapy has no effect.
Confirmatory strategies offer a resolution to Meehl’s paradox, because they offer a generalized approach to testing substantive (non-null) hypotheses. These approaches can be relatively simple to implement and interpret, using regression methods that provide straightforward interpretations of beta weights. In addition, confirmatory approaches permit the use of fixed, free, and constrained parameters – as commonly used in structural equation modeling – so they can support the testing of models specified in many different ways. Most importantly, and in contrast to NHST, confirmatory strategies allow the hypothesized model to be embodied in the hypothesis being tested.
The benefits of the confirmatory approach may become more apparent when we contrast competing hypotheses in a particular research application. Consider the study of influences of genetic and environmental factors on risk for negative outcomes such as depression. Research on gene X environment (GxE) interactions often investigates GxE interactions under one or both of two competing models of depression risk. First, the diathesis-stress model (Caspi et al., 2003; Belsky & Pluess, 2009) predicts that a problematic genetic allele creates a diathesis, or vulnerability, to depression. Expression of the depressive phenotype, however, depends on stress. In the absence of stress, there are no differences in depressive symptoms between the high-risk group (genetic diathesis present) and the low-risk group (genetic diathesis absent). Group differences emerge only when activated by stress.
The confirmatory approach provides a basis for evaluating two versions of this model. The strong diathesis-stress model, shown in Figure 1A, suggests that the low-risk group will be completely unaffected by stress loading. The weak diathesis-stress model, shown in Figure 1B, proposes that the low-risk group will be affected (will show symptoms of depression) as a function of stress, but the effect will be smaller than that with the high-risk group.
A competing theory of depression vulnerability is the differential-susceptibility model (Belsky & Pluess, 2009). This model allows for the high-risk group to have better outcomes than the low-risk group under certain environmental conditions. In other words, the genetically vulnerable group may show more depression than the low-risk group when stress is high, and less depression than the low-risk group when stress is low. Thus, the differential-susceptibility model predicts a “crossover” interaction effect.
As with the diathesis-stress model, there are strong and weak versions of the differential-susceptibility model. The strong version, shown in Figure 1C, holds that the low-risk group is unaffected by changes in the environment, whereas the weak version, shown in Figure 1D, asserts that the low-risk group is affected by the environment, but to a lesser degree than the high-risk group.
The competing predictions in Figures 1A through 1D present the two versions of each model, illustrating two key differences between the models. First, the two models differ substantively, a difference represented by the position of the crossover point on the graph. Second, the two models differ quantitatively, in the number of parameter estimates required to test hypotheses generated by the models.
A primary advantage of the confirmatory approach now becomes apparent. It turns out that the weak differential-susceptibility hypothesis can be specified as a generalized, four-parameter model, of which the other three competing hypotheses are special cases. To illustrate, let us first examine the usual regression approach to testing models of genetic vulnerability.
Typically, such models test for a main effect of environment, X, a main effect of genes, G, and the interaction of genes and environment (G*X), using an equation of this type:
where B0 is the intercept for non-risk group when X = 0, B1 is the effect of environment X for the non-risk group, B2 is the intercept difference for the risk group when X = 0, and B3 is the slope difference for the high-risk group. In this approach, the interaction term (B3) must be significant for the analysis to proceed.
To move beyond the usual regression approach, it is necessary to incorporate the substantive differences between the competing models. As mentioned earlier, these substantive differences are represented by the crossover point, which can be estimated as C = -B2/B3 (see Aiken & West, 1991). This method of estimating C, however, yields neither a standard error term nor a confidence interval, and so the interaction term (product of genes and environment terms) must still be significant in order to proceed.
Estimating the crossover point does permit a new formulation of the regression equation, one where X is centered at the crossover point, C. As before, the model includes a main effect of environment, X, a main effect of genes, G, and the interaction of genes and environment, which is now represented as (G·(X– C )):
where B0 is the predicted value of Y at the crossover point, B1 is the effect of environment X for the non-risk group, C is the estimate of the crossover point, and B3 is the slope difference for the high-risk group.
When X is centered at the crossover point, in the manner shown above, the resulting equation represents the weak differential susceptibility model. To see why this model subsumes the other three competing models, consider again the graphs in Figure 1. Visual inspection suggests that the weak differential susceptibility model is the least constrained of the three models. It is less constrained than the strong versions of both models, because it permits the low risk group to be affected by the environment, reflected in the B1 parameter in the re-parameterized model. And, it’s less constrained than the weak diathesis stress model, because it allows that, under certain optimal environmental conditions, the high-risk group may have a better outcome than the low-risk group.
Figure 2 provides another perspective on the hierarchical nature of the re-parameterized models. The weak differential susceptibility model requires the estimation of four parameters: B0, B1, B3, and C. Each of the other three models requires estimation of only a subset of these four. Solving the regression equation for the weak differential susceptibility hypothesis, therefore, simultaneously provides a solution that is relevant when evaluating the other three regression equations under study. This method epitomizes a confirmatory approach, therefore, because the four competing models are embodied in the hypothesis test, and the hypothesis that “nothing is going on” is nowhere to be found. As an answer to Meehl’s paradox, the confirmatory approach stands in contrast to NHST, where the only hurdle a theory needs to surmount is to predict an effect that is greater than nothing. And, the confirmatory approach is quite adaptable, applicable across most or all domains of psychology, if only we can embody our predictions in clearly specified regression models.
Figure 1: Predicted outcomes under the diathesis-stress and differential susceptibility models
Figure 2: Competing models of GxE interactions
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks, CA: Sage Publications.
Belsky, J., & Pluess, M. (2009). Beyond diathesis-stress: Differential susceptibility to environmental influences. Psychological Bulletin, 135, 885-908. doi:10.1037/a00173761988314110.1037/a00173762009-19763-005.
Caspi, A., Sugden, K., Moffitt, T. E., Taylor, A., Craig, I. W., Harrington, H., et al. (2003). Influence of life stress on depression: Moderation by a polymorphism in the 5-HTT gene. Science, 301, 386-389. doi:10.1126/science.1083968
Meehl, P. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103-115.