In this issue

Cross-cultural measurement

A perspective on developing culturally fair tests from the student editor of The Score.

By Sarah Mills

The Census Bureau projects that racial and ethnic minorities will become the majority by 2050, and one-third of United States residents will be Hispanic (Ennis, Rios-Vargas, & Albert, 2011). Hispanics account for a large and growing proportion of the United States population, yet this group is underrepresented in behavioral science research, particularly in comparison to the study of Whites. Furthermore, even fewer studies include Spanish-speaking Hispanics. This may be partially a result of the limited number of psychological measures developed and validated for Spanish speakers. I believe that more measures need to be developed and psychometrically validated for the growing Hispanic population. In effort to promote the development of valid and reliable measures for Hispanics, this article will provide (a) a discussion of test bias and (b) recommendations on how to best evaluate the psychometric properties of a measure cross-culturally.

According to the APA ethics code, assessment instruments are required to be culturally sensitive, and reliability and validity should be established in the group for which the measure is intended (American Psychological Association, 2002). Until a measure has been psychometrically tested in the population of interest, it’s not possible to know whether the test is biased towards that group. Test bias occurs when test scores are influenced by membership in a certain demographic group, independently of the influence of the ability or trait that the test is supposed to measure (Geisinger, 1992). 

In their quest to develop fair instruments, test developers should be aware of five common assumptions about respondent traits and attitudes towards assessment. The first assumption is that linguistic barriers will have no impact on performance.  According to the APA, to prevent bias test takers should complete assessments in their preferred language (APA, 2002).  This assumption is relevant for Hispanics because, for many, English is not their preferred language. The second assumption is that the items of the measure are appropriate and of equal difficulty for individuals of the same age, regardless of their ethnic background.  Especially in groups with large immigrant populations, such as among Hispanics in the United States, education and test-taking abilities vary.  The third assumption is that respondents are familiar with standardized tests or measures.  Familiarity with standardized tests may vary depending on one’s culture, level of education and socioeconomic status.  Scores may be biased when individuals with little test-taking experience are compared to those with greater exposure to tests.  The fourth assumption is that respondents are motivated to perform well, or respond accurately, on a measure.  When tests are not viewed favorably, an individual may disregard the test and give little effort.  The last assumption is that respondents do not have adverse reactions when completing assessments.  If respondents have had previous negative experiences when completing assessments or are uncomfortable because of cultural or linguistic differences, this may cause a bias in the responses (Padilla & Medina, 1996).  Because assessment outcomes may have long-term impacts on a person’s life, it is critical for clinicians to use unbiased instruments.

In effort to reduce test bias, the psychometric properties of measures should be assessed before they are used in new populations. Measures should be evaluated cross-culturally. When evaluating instruments cross-culturally, the primary concern is measure equivalence.  Four forms of measure equivalence are of primary interest: translation, metric, conceptual and function (Allen & Walsh, 2000).  Translation equivalence is an index of the accuracy of the translated items in different languages.  Even when an entire population speaks the dominant language, there may be different dialects, and familiarity in speaking or writing the dominant language may vary.  Particularly when evaluating a measure used with ethnic minorities, it is important to assess whether items have the same meaning across different ethnic subgroups and individuals of varying socioeconomic status.  Metric equivalence determines if the same metric can be used to evaluate a single attribute in two or more distinct groups. For example, if standardized scores from English and Spanish versions of a measure are derived from demographically non-equivalent samples, the measures do not have metric equivalence. Conceptual equivalence reflects whether the psychological construct has the same meaning across all groups within the population intended for the measure. Functional equivalence evaluates whether the construct of interest has a different function across cultures (Allen & Walsh, 2000; Bravo, 2003). For example, certain behaviors (e.g., speaking softly) can have unique meanings (e.g., timid vs. respectful) in different cultures.

Multigroup confirmatory factor analysis is a statistical technique that can be used to examine cross-cultural measure equivalence.  The goal of multigroup confirmatory factor analysis is to establish structural invariance of a measurement model across groups. To accomplish this, multigroup confirmatory factor analysis is typically conducted in three phases. First, the least restrictive model, the configural invariance model, examines whether the measurement model has the same factor structure across different groups.  If the same factor structure is found across groups, a more restrictive model can be examined. This second model, the metric invariance model, examines whether the factor loadings are equivalent across groups. If this model fits well, and the factor loadings are equivalent across groups, the most restrictive model can be examined, the factor variance/covariance invariance model.  This most restrictive model restricts factor loadings, factor variances, and factor covariances to equivalence across groups. The metric invariance model is the critical prerequisite for cross-group comparison. If the metric invariance model fits the data well, it can be assumed that the relationships between the items on the measure and the factor structure are the same across groups (Bollen, 1989; Cheung & Rensvold, 2002).

In summary, valid and reliable measures are needed for the growing Hispanic population in the United States. To accomplish this, the psychometric properties of existing measures need to be evaluated among Hispanics. Multigroup confirmatory factor analysis is a statistical method that can be used to effectively evaluate the measure equivalence of a construct across groups, and should be increasingly used to promote the use of psychometrically sound measurement among Hispanics.


Allen, J., & Walsh, J. A. (2000). A construct-based approach to equivalence: Methodologies for cross-cultural/ multicultural personality assessment research. In R. H. Dana (Ed.), Handbook of cross-cultural and multicultural personality assessment. Personality in clinical psychology series (pp. 63-85). Mahway, NJ: Lawrence Erlbaum Associates.

American Psychological Association. (2002). Ethical principles of psychologists and code of conduct. American Psychologist, 57, 1060-1073.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

Bravo, M. (2003). Instrument development: Cultural adaptations for ethnic minority research. In Bernal, G., Trimble, J. E., Burlew, A. K., & Leong, F. T. L. (Eds.), Handbook of racial & ethnic minority psychology (pp. 220-236). Thousand Oaks, CA: Sage.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255.

Ennis, S. R., Rios-Vargas, M., & Albert, N. (2011). The Hispanic population: 2010. Retrieved from 
(PDF, 2.5MB)

Geisinger, K. F. (1992). Fairness and selected psychometric issues in the psychological testing of Hispanics. In K. F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 17-42). Washington, DC: American Psychological Association.

Padilla, A. M., & Medina, A. (1996). Cross-cultural sensitivity in assessment: Using tests in culturally appropriate ways. In L. A. Suzuki, P. J. Meller, & J. G. Ponterotto (Eds.), Handbook of multicultural assessment: Clinical, psychological, and educational applications (pp. 3-28). San Francisco: Jossey-Bass Publishers.