APA’s Committee on Psychological Tests and Assessment (CPTA) recently created a series of short (5 to 12 minute) videos on the use of technology in testing that may be of interest to Division 5 members. These six videos contain brief introductions to technology use in different areas of testing and assessment, including game-based assessments, clinical testing and assessment, workplace testing and assessment, assessment of individual attributes using online behavioral data, educational testing and assessment, and the use of automated video interviews. For each video, two to three experts were asked to share their views on the most exciting developments or applications in the area, new developments on the horizon, and diversity, equity/fairness, and inclusion issues specific to the area. I provide brief descriptions for each of the six areas below. Interested members can access any of the six videos from a convenient landing page.
In the video on game-based assessment, Kristen DiCerbo (Khan Academy), Yoon Jeon Kim, (University of Wisconsin-Madison), and Sidney D’Mello (University of Colorado Boulder) discuss the developments they find most exciting in this area and offer their perspectives on critical issues that will need to be overcome in the future. For example, D’Mello describes how game-based assessments can be used to assess complex concepts such as creativity and collaboration, while Kim discusses the use of game-based assessments to monitor student engagement and learning over time. D’Mello is excited about the possibilities for applying the lesson learned from research on game-based assessments to more traditional, but less expensive, classroom assessments. These possibilities include incorporating the lessons learned about what makes a game fun to teachers’ own assessment practices. The presenters end the video by discussing possible risks or downsides of game-based assessment. These include the high costs of game-based assessments that are prohibitive for most schools, as well as the lack of transparency about what students are actually being assessed on that results from the “black box” nature of many game-based assessments. Finally, the panelists stress the need for strong psychometric evidence for the use of scores from these assessments.
Tulane University's Patrick Bordnick, PhD, discusses the use of technology in clinical testing and assessment and highlights exciting developments such as the use of mobile phone apps that provide clinical patients with the ability to check in between visits. Bordnick notes that this could be particularly useful for monitoring symptoms in areas such as anxiety or substance abuse. He goes on to discuss the use of virtual reality for bringing more context into clinical situations. For example, virtual reality applications could be used in sensitization treatments for fear of heights and other phobias. Bordnick ends the video with a discussion of DEI issues, including disparities in access to technology that would prevent some clients from being able to benefit.
In the video technology in workplace testing and assessment, Tara Behrend (NSF and Purdue University), Richard Landers (University of Minnesota—Twin Cities), and Fred Oswald (Rice University) discuss what they see as the most exciting new applications of technology and where they see the field heading next. For example, Landers argues that the use of natural language processing and machine learning methods have the potential to improve construct measurement because they enable the assessment of a wider range of job applicant skills and employee performance, such as leadership capabilities. Such expansion of the range of characteristics being measured could enhance equity and fairness by creating more opportunities for applicants and employees to demonstrate their skills. Behrend and Oswald both argue that the use of new technologies renders concerns with the psychometric qualities of measurement even more important than with traditional assessments, as many technology-based assessments do not yet have a body of evidence to support them.
In the video, assessing individual attributes using online behavioral data, Sandra Matz (Columbia University) and Andrew Schwartz (Stony Brook University) talk about the types of online behavioral data that appear to be most useful in measuring psychological attributes. They go on to discuss potential risks of using such data. Schwartz feels that one of the richest sources of evidence is language-based data such as that obtained from tweets, blogs, and social media posts. Some researchers also use information from sequences of keyboard clicks. Matz differentiates identity claims, such as those in Facebook posts, from and behavioral residue, such as information obtained from GPS devices. She notes that the former are high signal in that they are rich in information but possibly biased, whereas the latter are less easily faked and may therefore be more authentic, but have a lower signal to noise ratio. The two researchers go on to discuss privacy implications and the very real possibility of unwanted targeted advertising, noting the need to give people more control over their data. In this vein, Matz points out that, like psychological testing in general, psychological profiling is not good or bad in itself, but depends on how the information is used.
Alina Von Davier (Duolingo) and Cynthia Parshall (Touchstone Consulting) discuss the use of technology in educational testing and assessment. In this video, Von Davier notes the need to consider which constructs can best be measuring in the digital environment and suggests that collaboration and other so-called soft skills might be good candidates. She goes on to argue that the use of machine learning and AI techniques allows psychologists to capitalize on the rich data and large number of data points captured by such methods, and that this could result in scores with greater reliability and validity. Parschall discusses the recent pushback against testing and states that overcoming such resistance will require better communication about the utility of testing. Such communication could be facilitated if the incorporation of technology in testing results in more accurate results and more accessible and personalized feedback systems. With regard to testing fairness, Parschall notes that some technologies could create challenges for test takers with certain disabilities and argues for the incorporation of Universal Design principles when developing technology-based assessments. Von Davier offers the caution that the use of AI methods in testing necessitates defining and training the AI models on diverse data to avoid bias in the resulting algorithms. On the plus side, Von Davier argues that AI methods can be used to create more personalized tests that should match test takers’ profiles better than traditional tests.
In the video, automated video interviews, Louis Tay (Purdue University) and Nathan Mondragon (HireVue) discuss the types of information available from video interviews that might be useful for psychological measurement. Tay argues that automated video interviews can assess language features such as tone as well as body language and communication style that cannot be captured by paper and pencil measures, and therefore provide richer measurement of some attributes. Mondragon notes that although automated interviews can provide enhanced information, the research findings from in-person and phone interviews, such as the benefits of a structured set of questions, still apply. Echoing Von Davier’s comments, Mondragon goes on to note that rating algorithms should be based on diverse sets of raters and training videos to mitigate against bias. Tay argues that automated interviews may result in less biased scores than in–person and phone interviews. He reasons that in-person and phone interviews are based on human interviewers who can introduce bias. Although such bias may also be present in the algorithms used to score video interviews, these videos are recorded and are based on large amounts of data, allowing them to be examined for sources of bias. If bias is detected, the scoring algorithms can be adjusted. Thus, as with more traditional assessments, video interview methods must be continuously evaluated.
Overall, this series of short videos provides a useful introduction to a wide range of technology-based assessments. The presenters offer their thoughts on the most exciting new developments in their area and speculate on what might come next. Striking parallels with more traditional assessments include the need for a strong body of psychometric evidence for the use of these new methods as well as research on fairness on equity issues.