Contextual validity: Knowing what works is necessary, but not sufficient

As an applied researcher, my behaviors have been shaped by many people, the most important being the children I have worked with

By Christopher H. Skinner, PhD

In the past, when I have been asked, invited, enticed and/or cajoled into writing a paper or book chapter, it has usually been on a specific topic. Having the opportunity to write “anything I want” is a rather scary proposition, especially for readers. I would certainly understand if you stop reading now as I am not sure I have much new to say on this topic. I have been trained by applied intervention researchers including Ed Shapiro, Ed Lentz, Bud Mace, Kirby Brown, Bob Suppa, Tim Turco and Don Campbell. When I say applied research, I know I mean something very different than others. Regardless, I would like to thank those mentioned above for teaching me their version of applied intervention research as I have found it rewarding to compare and evaluate learning and behavior change procedures as I simultaneously a) train my graduate students in collaborative problem-solving, b) learn from public school teachers, c) improve students functioning, and d) learn from my graduate students. As an applied researcher, my behaviors have been shaped by many people, the most important being the children I have worked with, either directly or indirectly. To paraphrase B.F. Skinner, when I find students not behaving “as they should,” I have learned a great deal.

While at Lehigh, I was introduced to The Juniper Garden’s Project and their outstanding Classwide Peer-Tutoring (CWPT) programs. I learned more than I’ll ever realize from studying this program. When developing CWPT, Greenwood and associates attempted to create programs that would (a) not create extra work for the classroom teacher, (b) benefit all students in the class, (c) use existing materials and resources, (d) enhance rather than replace current instruction and (e) be carried out within existing instructional time (Delquadri, Greenwood, Whorton, Carta & Hall, 1986). These are the types of characteristics I want to describe in this paper as I attempt to provide some useful approaches to working with educators while conducting applied research.

We need three validities

I took a course from Don Campbell and learned about internal and external validity (Campbell & Stanely, 1966). With this paper, I focus on a third type of validity that I refer to as contextual validity. From a practitioner’s perspective, evidence of internal validity gives the consumer (e.g., reader, listener, teacher) confidence that a particular strategy, procedure, or intervention caused behavior change in the study being described. Evidence of external validity suggests that the procedure may be effective across a variety of factors including settings, students, teachers, target behaviors, and contexts. While both are necessary, from a teacher’s perspective neither are sufficient. If we expect teachers to apply the strategies, procedures, or interventions that we validate, we should also provide evidence that enhances their confidence that they can apply these procedures in their contexts or adapt them for application in their contexts (Blondin, Skinner, Parkhurst, Wood, & Snyder, 2012; Foster & Skinner, 2011; Skinner & Skinner, 2007).

When I started to think seriously about this construct, I came across many different terms designed to describe easily applied interventions including sustainable, efficient, effectiveness, acceptable, usable, feasible, transportable, ecologically valid, and socially valid (Detrich, Keyworth, & States, 2007; Drake, Latimer, Leff, McHugo, & Burns, 2004; Shriver & Watson, 2005) . I settled on context validity for several reasons. First, when I was trying to figure out what the word “context” meant, I re-read Ringeisen, Henderson, and Hoagwood’s (2003) article on how context influences applied research. Also, when working with students with disabilities, professionals are encouraged and required to consider idiosyncratic factors when developing interventions and learning procedures. When discussing whether a teacher could apply a particular intervention, I did not want to focus on within-teacher variables (e.g., attitude, training). Far too many uninformed people already blame too many teachers for too many problems. However, there are numerous context-specific factors that may influence a teacher’s ability to apply a learning or behavior change procedure or strategy. Thus, I wanted to focus my attention on characteristics of the procedures, not the teacher.

The reason I use the term validity is all learning and behavior-change researchers should consider these characteristics of their interventions, strategies, or procedures. As almost all applied intervention researchers who I admire already address issues related to internal and external validity, I thought the term “context validity” would allow those inclined to address this third critical issue with the same breadth. Furthermore, I hope this term would encourage all of us in our efforts to control for threats to contextual validity. Our current focus on identifying and publishing what works is of little use to educators if they cannot implement what works.

Threats to contextual validity

When I got thinking about contextual validity, I found that it was easier to describe threats to contextual validity than it was to quantify and control for them. Thus, I will describe some broad (but not exhaustive) categories of threats to context validity. I have no doubt that I will leave some/many out and that others would parse them differently. The fact that I cannot identify all threats to contextual validity puts me in good company. Dr. Campbell told our class that he and others (Dr. Cook, I believe) had once parsed threats to internal validity so finely that they came up with over 100.

Threats to contextual validity are relative, unstable, difficult to quantify and are not consistent across or within contexts. As I discuss the first threat, perceptions, I will try to bring these characteristics to light. Furthermore, as I describe these threats, I provide some experiential examples, which I hope will allow me to write this paper in more of an accessible conversational tone. Finally, threats to contextual validity are relative and are influenced by problem severity, degree and speed of change caused by the intervention, and the relative effectiveness and relative contextual validity of alternative procedures (Witt, Elliot, & Martens, 1985).

1. Perceptual threats to contextual validity

I am aware of how perceptions affect a teacher’s ability to apply interventions. I once delivered a workshop on group-oriented reinforcement programs to about 100 practicing educators, mostly support personnel. When I was finished, a practicing school psychologist raised his hand and indicated that while he agreed with me, he was having trouble getting other educators to consider applying these procedures because they had been taught that rewards ruin children. Now the final .5 hour of this workshop includes some advice on how to address these issues. Often applying different procedures (change) requires support from others including children, parents, administrators, educators, and peers. Whether perceptions are based on empirical support, popular press, philosophy or faith, these perceptions matter.

Perceptions regarding learning and behavior change strategies are relative, unstable, difficult to quantify and are not consistent across or within contexts. Perception is relative. For example, while many educators are not in favor of applying punishment (e.g., remove access to recess) their opinions may change depending upon the behaviors being punished (fighting, versus making a spelling error) or characteristics of the child (Ray, Watson, & Skinner, 1995; Witt et al., 1985). Because so many variables influence perception, perception is unstable. For example, evidence suggests that if the same intervention is described using different terms, perceptions of the intervention will differ (Witt, Moe, Gutkin, & Andrews, 1984).

Interventions are clearly more acceptable in some contexts than others. For example, if a student has a history of misbehavior, developing a program where the teacher rewards the student for improved behavior can damage the social fabric of the classroom as peers who have been behaving well observe this students being reinforced. Alternatively, if you alter the context and use a home note program where the parents provide the reward at home, this problem may be resolved (Skinner, Skinner, & Burton, 2009). Finally, it is very difficult to measure and quantify perceptions. Although various researchers have developed measures designed to assess treatment acceptability, it is extremely difficult to apply one measure across interventions.

2. Skills, training and resources needed to install and maintain

This area has received much attention from researchers, policy makers, and those who train professionals. Put simply, interventions that require much specialized training, skills or resources may be less contextually valid than interventions that are equally effective but require fewer specialized skills and fewer resources. In some instances, when interventions are not applied with integrity, applied intervention researchers may be better served by focusing their attention on altering their interventions, as opposed to focusing on the teachers. Everyone has strengths and weaknesses related to skills or abilities. Contextual validity concerns related to installing something are different from maintaining or sustaining. For example, training to do something is related to skill development, but one’s ability to apply and maintain those learned behaviors is influenced by their perceptions, motivation, and the time, effort, and reinforcement for engaging in the new behavior relative to the time, effort, and reinforcement for engaging in competing behaviors.

3. Complexity threats to contextual

Validity Resource-efficient procedures that require few specialized skills may not be contextually valid if the procedures have multiple components and involve multiple decisions that may require evaluation and interpretation. I remember feeling overwhelmed as I tried to decide every 10 min if each student in my class had followed each of a set of five rules and to what degree they followed these rules. This complex task could be made easier by reducing the number of times these judgments are made, the number of categories of behavior, and numerous other variables. Similarly, attempts at running a token economy or similar system that appears easy may appear easy, but many will find that such procedures are very complex.

One mistake made by people who develop complex interventions is they fail to consider that teachers already have so much to do, additional complex tasks (e.g., running a token economy is like setting up a small business) may not be feasible. Repeatedly, when my students and I have worked with teachers to discuss and select interventions, the teachers enthusiastically suggest and support applying numerous interventions simultaneously, as opposed to selecting one. A typical comment might be “let’s do all three!” Most teachers I have worked with are so serious about helping struggling students that they will overcommit and attempt to apply complex, multi-component interventions. Most teachers’ eyes are bigger than their stomachs. Consequently, when they attempt to apply these multicomponent, multistep interventions, they may find themselves overwhelmed given all their other responsibilities. This issue can be seen as an adopt verses sustain problem. Most teachers will agree to and attempt to apply very complex interventions, but may find they have difficulty sustaining them. Consequently, during problem solving consultation, I train my students that one of their tasks may be to reel teachers in so that they do not overcommit.

4. Required precision

Interventions that require high degrees of precision to be effective may not be contextually valid. Sometimes precision and complexity are correlated, but not always. Consider the class clown whose inappropriate behavior is maintained by attention. Extinction is a very simple intervention that is difficult to apply 100 percent of the time. Even great teachers who try their hardest are likely to occasionally chuckle at the class clown’s antics (Skinner et al., 2002). Occasionally failing to ignore the class clown may result in thinning the schedule of reinforcement, which can maintain behavior. Consequently, extinction, a simple procedure that may require precise application, is often combined with reinforcement and applied using differential reinforcement of incompatible behaviors, other behaviors or lower rates of the target behavior. There are several reasons why I like this example. First, by acknowledging the difficulty with applying extinction in context, as opposed to blaming teachers for being poor ignorers, we encourage researchers to focus on adapting procedures to fit the context. Second, this process of adapting or altering procedures to fit context is evolutionary and can result in entirely new strategies and procedures (Skinner, McCleary, Poncy, Cates, & Skolits, in press). Third, while there has been much focus on getting teachers to apply interventions with integrity, this example shifts the focus and suggests that perhaps we should consider developing and adopting interventions that are effective even when not applied with perfect integrity. Classrooms are vibrant, complex, unpredictable, and unstable (learning = change, not stability) that often do not lend themselves to precise work.

5. Consistent and compatible with law, ethics, standards, policies and trends

These concerns seem obvious when we discuss laws, professional ethics and school policies. Also, we must remain vigilant and guard against rules of thumb, current trends or mass assumptions. For example, we have just finished working with two teachers to develop and apply sight-word reading strategies. Although these teachers work over 40 miles from one another and do not know one another, each expressed frustration that the broad based acceptance of phonemic-based instruction made it difficult for them to try alternative approaches. My students and I had similar experiences as we tried to develop automatic responding to basic math facts. Even teachers who thought this was a good idea where concerned that others would find fault with their “drill and kill” approach.

6. Negative side effects

Most often when I think of negative side effects, I have focused on the child who received the intervention or treatment. For example, punishment may teach the child to avoid school altogether. Additionally, when teaching strategies and procedures, we have to concern ourselves with multiple-treatment interference. For example, teaching counting procedure for solving addition problems may enhance accuracy, while also making it difficult to develop automaticity (Ysseldyke, Thill, Pohl, & Bolt, 2005).

It is beyond the scope of this paper to describe all of the possible negative side effects, but I urge researchers to consider contextual side effects. Providing reinforcement to a child for not misbehaving may be effective, but it can have a detrimental effect on peers who are not rewarded for “behaving as they’re supposed to.” Establishing one set of contingencies and applying them exactly the same to each student (independent group-oriented contingencies) may encourage those students with well developed skills to complete tasks but prove ineffective in encouraging those with weaker skills who must expend much more time and energy to meet criterion for receiving reinforcement (Friman & Poling, 1995 Skinner & McCleary, 2010). Yet, in many instances, when teachers make exceptions to treating everyone the same, people (parents, classmates, administrators) consider it unfair (Skinner, Williams, & Neddenriep, 2004).

The human body is complex; consequently, applying procedures designed to treat something may have difficult-to-anticipate negative and positive side effects. Classrooms are complex social settings and applying new procedures (change) may have unanticipated positive and negative side effects. While medical trials emphasize the assessment of such side effects, educational researchers have placed less emphasis on measuring and understanding these effects. It is critical that we work with teachers to attempt to identify and mitigate negative side effects and strengthen positive side effects as we develop our remedial and intervention procedures.

7. Temporal threats to context validity

Teacher time is at a premium. Interventions that require more teacher time generally have less contextual validity (Witt et al., 1985). However, it is not merely the amount of time that matters, schedules also matter. For example, on my internship I got a referral and wanted to pull a group of high school teachers together to determine if any of them had any insights regarding a particular student’s problems. All were willing to devote the time to this group problem-solving effort, but finding a time when they could all meet, even for 10 minutes, was impossible. We could not meet after school because the majority of them either had second jobs or after school assignments (e.g., coaching and clubs). As an aside, we eventually got most of them together and two other teachers also were experiencing similar problems which they successfully remedied using two different procedures. Thus, the teachers left the room with two proven and efficient strategies and the total time spent on the process was less than five minutes. Of course, additional time was spent as the teachers chatted about other professional and personal stuff, which made me realize how little time teachers get to spend together.

For several reasons, additional student time required for a strategy, procedure or intervention may be an even bigger concern. Like teachers, students have very busy schedules and finding the extra time for remedial activities is challenging. I have written about my concerns with re-allocating time from recess (where social skills are learned), physical education (obesity), art, and music (Skinner, 2008; Skinner, 2010). Additionally, more effective classroom management procedures, in particular transition procedures, can free up more time for learning, particularly in elementary classrooms (Fudge et al., 2008). Regardless, a recent conversation with an earlier adopter of response to intervention (RtI) caused me serious concerns regarding our process of re-allocating time to apply remedial procedures. He indicated that his district started with reading and after a few years of getting their model in place at all their schools, they added math. They found a group of students who would move back and forth across RtI remedial service, typically 30 minutes per day, four days per week. When reading was improved, they needed additional service and time for math; after math improved they found the students once again needed remedial help with reading.

8. Adaptability

Because interventions must be applied in context, the ability to adapt them is critical to installation and maintenance. One of my former students, Dr. Gary L. Cates, has discussed with me the importance of tweaking. Interventions that are easily adjusted or altered to fit different contexts (e.g., those with fewer resources), but still retain their effectiveness, have much more contextual validity. Additionally, when conducting problem-solving consultation or remediation, the process of tweaking often allows educators to have significant input into intervention development, which almost always improves our interventions. Furthermore, as discussed earlier, tweaking is evolutionary, causing very specific procedures to evolve into new strategies and procedures as they are re-applied in slightly different forms to fit different contexts (Skinner et al, in press).

9. Interactions

Again borrowing from Campbell and Stanley (1966), I will conclude with interaction effects. Earlier I gave an example of a simple intervention whose context validity was questionable because it had to be implemented with high precision. The opposite is also true; an intervention may be very complex but still have strong context validity when high levels of precision are not necessary. For example, we ran numerous applied studies evaluating a classroom management procedure, The Color Wheel System, which was developed by Drs. Gina Scale, Deb Dendas, and Edward Lentz (Skinner, Scala, Dendas, & Lentz, 2007). Although we conducted numerous studies that provided evidence for the procedure’s contextual validity, we became very frustrated when a review informed us that he/she did not believe the procedure worked when it was not implemented with high levels of integrity. Yet, our research showed it did. While many have worked on procedures designed to enhance integrity, I would encourage more focus on developing and validating interventions that work well even when they are not applied with integrity.

Concluding comments regarding applied intervention research

have made many mistakes as I have conducted applied intervention research. These mistakes have reinforced the idea that how you do something is as important as what you do. Teachers are smart, busy people and they do not need you to make them any smarter or any busier. Most teachers really enjoy theories, but when you are there to help address a presenting problem, it may not be the best time to provide tangential information on nuanced intricacies of theories.

I have now worked at three major land grant universities, and we university folks have to stop soiling our sand box. Many educators are reluctant to work with folks from the university because in the past their approach was - “hi my name is Dr. _________ and I am here to tell what is wrong or what you are doing wrong and how to fix it.” Much of our applied research (my students and I) involves partnering with educators from the very beginning, letting them identify the problems or target behaviors. While this reactive, unplanned, applied research has limitations, it does have a place and has forced me to focus on contextually valid interventions (Skinner et al., in press).

We have reviewed the literature in school psychology; we have found few studies evaluating interventions (Bliss, Skinner, Hautau, & Carroll, 2008) and few papers authored by practitioners (Carroll, Skinner, McCleary, Hautau von Mizner, & Bliss, 2009). Most of the professional educators we have partnered with have indicated that they do not care if they are co-authors of studies. Yet, in most instances, the publication of their articles really excites them. I strongly recommend that you make practitioners partners in all aspects of your research and share the credit. I have always found that I have less trouble coming up with ideas when you work directly with people charged with changing behavior (i.e., teachers).

If we want practitioners to apply empirically validate our strategies, procedures, and interventions, we must develop, implement, and evaluate contextually valid interventions. However, the process of establishing an intervention’s contextual validity is a lot like establishing its external validity. It requires replication studies. Thus, I want to commend some for their efforts to disseminate (e.g., publish) applied intervention replication studies and encourage others interested in this type of research to consider actually making efforts to publish such work.

Not what works, but what works best

We are very concerned with establishing WHAT WORKS. Assuming an educator can apply two different, empirically-validated interventions and have equivalent levels of contextual validity; to select which intervention to apply to educators will need to know what works best. Some have used effect size and similar calculations to make cross-study comparisons of treatments to determine what works best. These studies disturb me for numerous reasons—the biggest being that other variables are not held constant across studies. Thus, my final plea will be for more researchers to focus on conducting comparative effectiveness studies that allow educators to determine what works best. These studies will require that educators measure both the amount of learning and the amount of time spent learning (Skinner, 2008; Skinner, 2010; Skinner, Belfiore, & Watson, 1995/2002).


Bliss, S.L., Skinner, C.H., Hautau, B., & Carroll, E.E. (2008). Articles published in four School Psychology journals from 2000-2005: An analysis of experimental/intervention research. Psychology in the Schools, 45, 483-498.

Blondin, C.A., Skinner, C.H., Parkhurst, J., Wood, A., & Snyder, J. (2012). Enhancing on-task behavior in fourth-grade students using a modified Color Wheel System. Journal of Applied School Psychology, 28, 37–58.

Campbell, D.T., & Stanley, J.C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNalley.

Carroll, E., Skinner, C.H., McCleary, D.F., Hautau von Mizner, B., & Bliss, S.L. (2009). Analysis of author affiliation across four school psychology journals from 2000-2008: Where is the practitioner-researcher? Psychology in the Schools. 46, 627-635.

Delquadri, J.C., Greenwood, C.R., Whorton, D., Carta, J., & Hall, R.V. (1986). Classwide peer tutoring. Exceptional Children, 52, 535-542.

Detrich, R., Keyworth, R., & States, J. (2007). A roadmap to evidence-based education: Building an evidence-based culture. Journal of Evidence-Based Practices for Schools, 8, 26-44.

Drake, R.E., Latimer, E.A., Leff, H.S., McHugo, G.J., & Burns, B.J. (2004). What is evidence? Child and Adolescent Psychiatric Clinics of North America, 13, 717-728.

Friman, P.C. & Poling, A. (1995). Making life easier with effort: Basic findings and applied research on response effort. Journal of Applied Behavior Analysis, 28, 583-590.

Foster, L.N. & Skinner, C.H. (2011). Evidence supporting the internal, external, and contextual validity of a writing program targeting middle school students with disabilities. Evidence-Based Communication Assessment and Intervention, 5(1), 37-43.

Fudge, D.L., Skinner, C.H., Williams, J.L., Cowden, D., Clark, J., & Bliss, S.L. (2008). The color wheel classroom management system: Increasing on-task behavior in every student in a second-grade classroom. Journal of School Psychology, 46, 575-592.

Ray, K., Watson, T.S., & Skinner, C.H. (1995, May). Let’s spank the big kid. Paper presented at the Annual Convention of the Association for Applied Behavior Analysis: International, Washington, DC.

Ringeisen, H., Henderson, K., Hoagwood, K. (2003). Context matters: Schools and the “research to practice gap” in children’s mental health. School Psychology Review, 32, 153-168.

Shriver, M.D., & Watson, T.S. (2005). Bridging the great divide: Linking research to practice in scholarly publications. Journal of Evidence Based Practices for Schools, 6, 5-18.

Skinner, C.H. (2008). Theoretical and applied implications of precisely measuring learning rates. School Psychology Review, 37, 309-315.

Skinner, C.H. (2010). Applied comparative effectiveness researchers must measure learning rates: A commentary on efficiency articles. Psychology in the Schools, 47, 166-172.

Skinner, C.H., Belfiore, P.B., & Watson, T.S. (1995/2002). Assessing the relative effects of interventions in students with mild disabilities: Assessing instructional time. Journal of Psychoeducational Assessment, 20, 345-356. (Reprinted from Assessment in Rehabilitation and Exceptionality, 2, 207-220, 1995).

Skinner, C.H., & McCleary, D.F. (2010). Academic engagement, time on task, and AAA responding. In A. Canter, L. Z. Paige, & S. Shaw (Eds.), Helping children at home and school-III: Handouts for families and educators, (pp. S3H1 – S3H3) Bethesda, MD: National Association of School Psychologists.

Skinner, C.H., McCleary, D.F., Poncy, B.C., Cates, G.L., & Skolits, G.J., (in press). Emerging opportunities for enhancing our remediation procedure evidence base as we apply response to intervention, Psychology in the Schools.

Skinner, C.H., Scala, G., Dendas, D., & Lentz, F.E. (2007). The color wheel: Implementation guidelines. Journal of Evidence-Based Practices for Schools, 8, 134-140.

Skinner, C.H., & Skinner, A.L. (2007). Establishing an evidence base for a classroom management procedure with a series of studies: Evaluating the Color Wheel. Journal of Evidence-Based Practices for Schools, 8, 88-101.

Skinner, C.H., Skinner, A.L., & Burton, B. (2009). Applying group-oriented contingencies in classrooms. Akin-Little, K. A., Little, S. G., Bray, M., & Kehle, T. (Eds.) Behavioral interventions in schools: Evidence-based positive strategies (pp. 157-170). Washington, DC: APA Press.

Skinner, C.H., Waterson, H.J., Bryant, D.R., Bryant, R.J., Collins, P.M., Hill, C.J., Tipton, M.F., Ragsdale, P., & Fox, J. (2002). Team problem solving based on research, functional behavioral assessment data, teacher acceptability, and Jim Carey’s Interview. Proven Practices: Prevention & Remediation Solutions for Schools, 4, 56-64.

Skinner, C.H., Williams, R.L., & Neddenriep, C.E. (2004). Using interdependent group-oriented reinforcement to enhance academic performance in general education classrooms. School Psychology Review, 33, 384-397.

Witt, J.C., Elliot, S.N., & Martens, B.K. (1985). The influence of teacher time, severity of behavior problem, and type of intervention on teacher judgments’ of intervention acceptability. Behavior Disorders, 17, 31-39.

Witt, J.C., Moe, G., Gutkin, T.B., & Andrews, L. (1984). The effect of saying the same thing in different ways: The problem of language and jargon in school-based consultation. Journal of School Psychology, 22, 361-367.

Ysseldyke, J., Thill, T., Pohl, J., & Bolt, D. (2005). Using Math Facts in a Flash to enhance computational fluency. Journal of Evidence-Based Practices for Schools, 6, 59-89.

Certificate text

Stacey Overstreet (left) and President Shane Jimerson (right) present Chris Skinner with the Senior Scientist awardDr. Christopher Skinner has sustained a highly impressive record of programmatic research that has advanced the science and practice of school-based intervention for academic difficulties. Dr. Skinner and his students have conducted over 100 behavior change studies that examine effective interventions that ameliorate reading, mathematics, writing, and behavior problems. His efforts to merge strong methodological design with practicality and feasibility for school personnel are especially unique and serve as a model for the field. Dr. Skinner has made equally impressive contributions to theories, such as those addressing students’ academic choice behaviors. Given his prolific record of scholarship, it is no surprise that Dr. Skinner was recently cited as one of the most productive school psychology researchers over the past 15 years.

Author's notes

This paper was completed with support of all the teachers, students, colleagues and professional educators who have co-labored on applied intervention research with me.


Christopher H. Skinner 
University of Tennessee 
EPC, BEC 535
Knoxville, TN 37996-3452