Featured Article

Using the MMPI-2-RF validity scales in forensic assessments

This article reviews literature available to guide the use of validity scales from the most recent version of the MMPI-2-RF in the criminal justice system.

By Yossef S. Ben-Porath, PhD

Psychological tests are used widely in the criminal justice system, and the MMPI in particular remains among the measures most commonly applied by forensic psychologists (Archer, Buffington-Vollum, Stredny, & Handel, 2006). One likely reason why the test is used widely in the criminal justice system is the availability of a broad range of validity scales, which can be used to gauge a test-taker’s approach to what are often high-stakes evaluations. Lack of motivation, low reading and language comprehension skills, limited intellectual resources, and cognitive impairment may compromise an individual’s capacity to respond meaningfully to psychological test items. Whether as part of an effort to avoid the assignment of criminal responsibility or to obtain valuable psychotropic medications in the prison system (to cite just two of many possible motivations), individuals undergoing psychological evaluations in the criminal justice system often have incentives to over-report psychological problems. And in other scenarios, such as a pre-release risk assessment, test-takers may be motivated to under-report certain psychological problems. Although, for these reasons, validity scales are arguably the most important MMPI measures when the test is used in criminal justice settings, they are also among the most challenging scales to interpret.

This article reviews some of the literature available to guide use of validity scales of the most recent version of the inventory, the Minnesota Multiphaisc Personality Iniventory-2 Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011), in the criminal justice system. The MMPI-2-RF has been in use for five years, during which a substantial literature has accumulated on the nine validity scales of the inventory. This literature is reviewed following a brief discussion of a conceptual framework that guides MMPI-2-RF validity scale interpretation. This discussion is excerpted from a recently published chapter on forensic uses of the MMPI-2-RF (Ben-Porath, 2013).

Ben-Porath (2012) describes a conceptual framework that has guided MMPI-2 (Butcher et al., 2001) and MMPI-2-RF (Ben-Porath & Tellegen, 2008/2011) validity scale interpretation for some time. This approach identifies two general classes of threats to the validity of an MMPI protocol – Non-content-based invalid responding, consisting of non-responding, random responding, and fixed responding (in the “True” or False” direction); and content-based invalid responding, consisting of over-reporting or under-reporting. Studies of MMPI-2-RF measures designed to assess the two types of threats are reviewed next.

Measures of Non-Content-Based Threats to Protocol Validity

Two simulation studies have examined the MMPI-2-RF measures of non-content-based invalid responding. The first focuses on non-responding and the second investigation involves identification of random and fixed responding.

Non-responding. Non-responding to the MMPI items has traditionally been measured with the “Cannot Say” index, a count of the number of items to which the test taker failed to respond, or responded both “true” and “false”. Non-responding artifactually deflates scale scores. In an extreme case, if a test-taker does not respond to any of the items on a scale, the resulting raw score is zero, and cannot be distinguished from one where the individual responded to all of the items on that scale in the non-keyed direction. Ben-Porath and Tellegen (2008/2011) recommend that non-responding be addressed by considering the percentage of scorable responses to the items of each MMPI-2-RF scale. As long at least 90 percent of the items on a scale are scorable, the result can be interpreted in the standard manner. As the percent of scorable responses falls below 90, the absence of elevation on a scale becomes increasingly uninterpretable.

In support of this recommendation, Ben-Porath and Tellegen (2008/2011) cite an investigation that was unpublished at the time the MMPI-2-RF manual was written. The study by Dragon, Ben-Porath, and Handel (2012) has since been published. The authors examined the impact of unscorable item responses on the validity and interpretability of scores on the MMPI-2/MMPI-2-RF Restructured Clinical (RC) Scales. Dragon and colleagues first reported that in five archival samples they examined, unscorable responses at the level of 10 percent or more of a scale’s items was relatively uncommon, occurring most often in forensics samples. The authors simulated non-responding by inserting varying proportions of non-responses (10 to 90 percent in 10 percent increments) in place of the actual responses of subjects in two of their archival samples. They then examined two distinct attributes of the affected scales, their validity, as reflected in correlations between RC Scale scores and external criteria, and interpretability, indicated by examination of the percentage of cases with clinically elevated scores as a function of different levels of simulated non-responding.

Dragon and colleagues found that scale score validity was relatively robust, up to the level of 50% non-responding. Interpretability, on the other hand, was affected much more rapidly by non-responding. Consistent with Ben-Porath and Tellegen’s (2008/2011) recommendations, Dragon and colleagues found that scale score interpretability is significantly compromised if the percentage of scorable responses to the items of a scale falls below 90. If a scale remains elevated in spite of non-responding, that is an interpretable finding. However, the absence of elevation on a scale that has less than 90 percent scorable responses is uninterpretable.

Random and Fixed Responding. Random and fixed responding to the MMPI-2-RF items are assessed with the Variable Response Inconsistency (VRIN-r) and True Response Inconsistency (TRIN-r) scales respectively. In the Technical Manual for the inventory, Tellegen and Ben-Porath (2008/2011) cite an unpublished study in support of the validity of these scales. The study has since been published by Handel, Ben-Porath, Tellegen, and Archer (2010), who investigated the identification of random and fixed responding with VRIN-r and TRIN-r, as well as the impact of these test-taking approaches on the RC Scales. The authors evaluated the effect of increasing levels of simulated random and fixed responding. Their results supported the interpretation of scores on VRIN-r and TRIN-r as measures of random and fixed responding, respectively, and are consistent with the interpretive guidelines in the MMPI-2-RF manual. This indicates that caution should be exercised if VRIN-r or TRIN-r T-scores reach 70 and that protocols should be deemed invalid if T-scores on either of these scales reach or exceed 80.

Measures of Content-Based Threats to Protocol Validity

Detecting Over-reported Psychopathology. Differentiating between genuine psychopathology and symptom over-reporting has been one of the key functions of the MMPI validity scales since the early days of the instrument (Ben-Porath, 2012). A significant confound between severe psychopathology (e.g., psychotic disorders) and scores on the MMPI-2 F scale led Arbisi and Ben-Porath (1995) to develop a scale, Fp, designed to optimally perform this task. Revised versions of both of these scales are included on the MMPI-2-RF.

Utilizing a known-groups research design, Sellbom, Toomey, Wygant, and Kucharsli (2010) examined the ability of the MMPI-2-RF over-reporting indicators to identify criminal defendants classified as malingering based on the Structured Interview of Reported Symptoms (SIRS, Rogers, Bagby, & Dickens, 1992). The two MMPI-2-RF validity scales designed to detect over-reporting of psychopathology symptoms, F-r and Fp-r, best differentiated between those classified as malingering based on the SIRS and those not so classified. Each scale added incrementally to the other in this differentiation task. Both scales had specificities reaching or exceeding .90 when cutoffs recommended in the test manual were applied.

Overall, these and other findings reviewed by Ben-Porath (2013) indicate that Fp-r is the most effective MMPI-2-RF validity indicator when the differential diagnosis involves genuine psychopathology versus over-reporting. Specificity estimates are generally well in excess of .90, indicating a false positive rate generally well below 10% when a cutoff of 100T is applied.

Detecting Over-reported Somatic and Cognitive Complaints. As noted earlier, the original MMPI validity scales focused on detection of over-reported psychopathology. However, in some instances test makers may over-report somatic and cognitive symptoms, a task for which the original MMPI F scale proved generally ineffective (Lees-Haley, 1989). Lees-Haley, English, and Glenn (1991) developed the scale originally called “Fake Bad,” now labeled Symptom Validity (but still abbreviated FBS to identify its origin) to address this problem. A revised version of FBS, FBS-r (also labeled Symptom Validity), is complemented by Fs and the Response Bias Scale (RBS; Gervais, Ben-Porath, Wygant, & Green, 2007) as measures of over-reported somatic and cognitive complaints on the MMPI-2-RF. Overall, the literature (reviewed in detail by Ben-Porath [2013]) shows that four of the MMPI-2-RF over-reporting measures, F-r, Fs, FBS-r, and RBS provide empirically verified indications of over-reported somatic and cognitive complaints. FBS-r and Fs have been found to be particularly effective in mild Traumatic Brain Injury-related assessments, and RBS provides a strong incremental contribution in the assessment of exaggerated memory complaints. F-r is a more general over-reporting indicator, sensitive to all three varieties of symptom over-reporting (psychological, somatic, and cognitive).

Detecting Under-reporting. As was the case with the MMPI-2, research on detection of under-reporting with the MMPI-2-RF has been relatively limited, yet supportive of use of the under-reporting f indicators of the inventory, Uncommon Virtues (L-r) and Adjustment Validity (K-r). Sellbom and Bagby (2008) examined the utility of these scales using three samples. In their first study the authors ound that both L-r and K-r significantly differentiated two groups of individuals (patients with schizophrenia and university students) who had been instructed to under-report from participants who took the test under standard instructions. L–r and K–r also added incremental predictive variance to one another in differentiating these groups. In a second study, a similar set of outcomes emerged using a differential prevalence design. L–r and K–r significantly differentiated a group of child custody litigants from university students taking the test under standard instructions.

In summary, the MMPI-2-RF includes a set of well-validated measures of protocol validity, designed to inform the interpreter about a host of possible challenges to the interpretability of a test protocol. Ben-Porath (2012) provides a detailed discussion of important confounds that need to be considered when interpreting validity scale scores in a broad range of settings, including the criminal justice system. Recommendations provided in the test manual (Ben-Porath & Tellegen, 2008/2011) are intended to guide test users to consider these potential confounds when interpreting the MMPI-2-RF.

The author is a paid consultant to the MMPI publisher, the University of Minnesota, and Distributor, Pearson. As co-author of the MMPI-2-RF he receives royalties on sales of the instrument. Address correspondence to Yossef Ben-Porath.

References

Arbisi, P. A. & Ben-Porath, Y. S. (1995). An MMPI-2 infrequent response scale for use with psychopathological populations: The Infrequency-Psychopathology Scale, F(p). Psychological Assessment, 7, 424-431.

Archer, R.P., Buffington-Vollum, J.K., Stredny, R.V., & Handel, R.W. (2006). A survey of psychological test use patterns among forensic psychologists. Journal of Personality Assessment, 87, 84-94.

Ben-Porath, Y.S. (2012). Interpreting the MMPI-2-RF. Minneapolis, MN: University of Minnesota Press.

Ben-Porath, Y.S. (2013). Forensic applications of the Minnesota Multiphaisc Personality Inventory-2 Restructured Form. In R.P. Archer and E.M.A. Wheeler (Eds) Forensic use of clinical assessment instruments (pp. 63-107). NY: Rutledge.

Ben-Porath, Y.S. & Tellegen, A. (2008/2011). MMPI-2RF: Manual for administration, scoring, and interpretation. Minneapolis, MN: University of Minnesota Press.

Dragon, W.R., Ben-Porath, Y.S., & Handel, R.W. (2012). Examining the impact of unscorable item responses on the validity and interpretability of MMPI-2/MMPI-2-RF restructured clinical (RC) scale scores. Assessment, 19(1), 101-113.

Gervais, R. O., Ben-Porath, Y. S., Wygant, D. B., & Green, P. (2007). Development and validation of a Response Bias Scale (RBS) for the MMPI-2. Assessment, 14(2),
196-208.

Handel, R.W., Ben-Porath, Y.S., Tellegen, A., & Archer, R.P. (2010). Psychometric functioning of the MMPI-2-RF VRIN-r and TRIN-r Scales with varying degrees of randomness, acquiescence, and counter-acquiescence. Psychological Assessment, 22, 87-95.

Lees-Haley, P. R. (1989). Malingering post-traumatic stress disorder on the MMPI. Forensic Reports, 2, 89-91.

Lees-Haley, P.R., English, L.T., & Glenn, W.J. (1991). A fake bad scale on the MMPI-2 for personal injury claimants. Psychological Reports, 68, 203-210.

Rogers, R., Bagby, R.M., & Dickens, S.E. (1992). Structured Interview of Reported Symptoms Professional Manual. Psychological Assessment Resources. Odessa, FL.

Sellbom. M. & Bagby, R. M. (2008). Validity of the MMPI-2-RF (Restructured Form) L-r and K-r scales in detecting underreporting in clinical and nonclinical samples. Psychological Assessment, 20, 370-376.

Sellbom, M., Toomey, J.A., Wygant, D.B., Kucharski, L.T., & Duncan, S. (2010). Utility of the MMPI-2-RF (Restructured Form) Validity Scales in detecting malingering in a criminal forensic setting: A known-groups design. Psychological Assessment, 22, 22-31.