Authors from left to right: M. Zafar Iqbal, PhD, Rodica Ivan, PhD, Kaitlin Bynkoski, PharmD, Kristen Archbell, PhD, Cynthia Richard, PhD, and Kristina H. Petersen, PhD
Personal and professional skills (i.e., empathy, collaboration, communication, ethics etc.) are integral to success in both academic and professional settings. These competencies not only enhance healthcare professionals’ academic performance but also contribute to long-term career development (McGill, Ali, & Barton, 2020; Saunders & Bajjaly, 2022). Despite their significance, health professions training programs often face challenges in effectively assessing and nurturing these skills due to limited resources, lack of standardized evaluation tools and personalized feedback (Alt, Naamati-Schneider, & Weishut, 2023; Saunders & Bajjaly, 2022). To address this gap, formative situational judgment tests (SJTs) offer a promising solution. SJTs provide structured, context-rich scenarios that can evaluate and develop essential soft skills in a low-stakes, feedback-driven manner (Patterson at al., 2019; Sahota et al,. 2023; Cullen et al., 2022), and can help identify students in need of early intervention and professional growth support (Cullen et al., 2017, 2020).
In this paper, we outline the design and utility of a formative SJT assessment, ACE (Acuity Competency Enhancer), referencing its application in two health sciences programs. Within the context of this tool, social and personal intelligence is understood as the ability to critically reflect on and respond to interpersonal and professional challenges.
Background
SJTs in health professions education can be implemented across the learning spectrum or continuum—from admissions and undergraduate training to postgraduate selection and ongoing professional development. SJTs have been delivered in various formats (e.g., free-text, multiple choice type, video recordings, etc.) and used for both formative and summative purposes (Ballejos et al., 2024; Lievens & Motowidlo, 2016; Patterson & Driver, 2018; Saxena et al., 2024).
A recent systematic literature review uncovered eight studies that utilized bespoke, in-house developed SJTs predominately as formative assessments (Foucault et al. 2015; Kiessling et al. 2016; Frohlich et al. 2017; Goss et al. 2017; Antes et al. 2020; Graupe et al. 2020; Ludwig et al. 2021; Reiser et al. 2021). Findings from these studies collectively suggest potential links between SJT performance and professional behaviors. For instance, traits like agreeableness, conscientiousness, extraversion, and openness were associated with stronger performance, while neuroticism predicted lower scores. Some studies (Antes et al., 2020; Foucault et al., 2015; Goss et al., 2017) used SJTs as learning tools, though without measuring knowledge or attitude gains. To support students’ development of the measured skills, Goss et al. (2017) added feedback for low performers, an approach echoed in other studies. These studies collectively align with broader thinking about professional identity formation and suggest formative SJTs may enhance learning. Noteworthily, all formative SJT studies used close-ended, text-based scenarios with ranking or multiple-choice formats, though video-based delivery (Frohlich et al., 2017; Graupe et al., 2020; Reiser et al., 2021) also showed promise for fostering desirable traits.
Assessment Design
Acuity Competency Enhancer (ACE) was developed as a formative, open-response SJT combining both qualitative and quantitative measures, and provides feedback to all test takers to help them improve their personal and professional skills. The formative SJT version used in this study included 13 scenarios. Open-ended responses were rated on a scale of 1 to 9 by experienced raters trained on ACE-specific scoring criteria.
ACE comprises four domains; each domain reflects a distinct yet interrelated skill area as depicted in Figure 1. These domains were created as aggregated composites as a series of sub-constructs that were reflected in the question content of the SJT. Specifically, the “READ” domain comprises empathy, equity, and ethics. The “YOU” dimension comprises self-awareness, resilience, and motivation. The “INTER” dimension comprises communication and collaboration. And, finally, the “ENACT” dimension comprises problem solving and professionalism. Together, these domains enable a holistic assessment of personal and professional skills, potentially informing both individual student development, broader educational strategies and curricular reforms.
Figure 1. ACE Four Domain Model.
|
Reading the Situation (READ): Accurately interpreting context and nuance Concepts: ethics, active listening, body language, compassion, cultural sensitivity, bias, client safety |
How it Affects You (YOU): Recognizing personal reactions, experiences, and biases Concepts: Self-awareness, motivation, accountability, responsibility, self-evaluation, life-long learning, honesty |
|
Effective Interacting (INTER): Engaging others constructively and respectfully, navigating social dynamics Concepts: Communication, collaboration, teamwork, interpersonal skills, empathy |
Resolution Enactment (ENACT): Implementing thoughtful, adaptable solutions to dilemmas, demonstrating commitment and energy to see them through Concepts: problem solving, critical thinking, persistence, recognition, contingency planning, assumption analysis |
Initial Psychometric Indicators: Reliability
Initial technical reports (Sitarenios, 2024a; 2024b) provided early evidence supporting its reliability and construct validity. For the open-ended items, domain-level reliability estimates ranged from .79 to .92. In contrast, reliability for the aggregated multiple-choice score was lower, at .63. A four-factor model corresponding to the ACE domains demonstrated acceptable model fit (CFI = .96, TLI = .93, RMSEA = .12).
This report draws on aggregate data (N = 338) from two programs who participated in the pilot phase of the development of this formative SJT. A total of 219 students included came from the Class of 2027 from New York Medical College's School of Medicine MD program, and an additional 119 second year students from University of Waterloo, School of Pharmacy program. A demographic breakdown of the entire sample is shown in Table 1.
Table 1. Demographic breakdown of students included in this study.
| Gender | Percentage |
| Male | 36.1 |
| Female | 63.2 |
| Other | 5.6 |
| Ethnicity | Percentage |
|
Black, African, Caribbean, or African American |
5.6 |
|
East Asian (Including Chinese, Japanese, Korean, Mongolian, Tibetan, and Taiwanese) |
18.5 |
|
Hispanic, Latinx, or Spanish origin |
5.6 |
|
Middle Eastern or Northern African |
9.3 |
|
South Asian (Including Bangladeshi, Bhutanese, Indian, Nepali, Pakistani, and Sri Lankan |
16.9 |
| White or European | 37.7 |
| Other | 6.0 |
Using these data, both internal consistency reliability (Table 2) and inter-rater reliability (Table 3) was examined. Internal consistency is reported using both Cronbach’s alpha and McDonald’s omega. While alpha remains the most frequently cited metric, it assumes item homogeneity and may underestimate reliability under certain conditions (Deng & Chan, 2017). Omega is considered more robust, especially for multidimensional constructs, as it imposes fewer assumptions and often provides a more accurate estimate (McDonald, 1999). In this analysis, all four domain scores showed strong internal consistency, with alpha values ranging from .85 to .89 and omega values from .88 to .92 (see Table 2).
Table 2. ACE reliability by domain score.
| # items | Alpha | Omega | Median Item r | |
| READ - OE | 7 | .86 | .90 | .49 |
| YOU - OE | 7 | .85 | .88 | .45 |
| INTER - OE | 10 | .89 | .92 | .45 |
| ENACT - OE | 8 | .88 | .91 | .44 |
Inter-rater reliability data were available for a subset of the data. All scenarios were rated by a second distinct set of raters for a random group of 50 of the students. These paired ratings were then used to compute IRR statistics. When aggregated by scale, all four dimensions demonstrated strong inter-rater reliability, supporting consistent scoring practices for this formative SJT (see Table 3).
Table 3. Inter-rater reliability statistics for R1 rater group for overall domain scores.
| Metric | R11 |
| READ - OE | .94 |
| YOU - OE | .95 |
| INTER - OE | .92 |
| ENACT - OE | .92 |
1All correlations significant, p < .001
Item-level rater consistency was with 35 independent questions being rated by two independent raters, and inter-rater correlations were calculated for each item. Strong inter-rater consistency was observed at the item level, further supporting the reliability of individual question scores (see Table 4).
Table 4. Question level Inter-Rater Reliability.
| Average | Minimum | Maximum | SD | |
| Pearson r | .79 | .63 | .91 | .07 |
In-program utilization
Beyond basic psychometric evaluation, a key goal of this investigation was to explore the tool’s utility in supporting programs in measuring and developing students’ personal and professional skills. The following sections describe its implementation in two pilot programs.
Program A (New York Medical College)
Students in this program completed the ACE assessment at the end of their first semester, integrated into their coursework as part of their Physicianship and Professionalism module. They received formative feedback on their open-ended responses for two scenarios. Figure 2 illustrates an example of the feedback provided.
Figure 2. Sample Scenario Feedback.
Scenario Content:
This scenario presents a dilemma where the applicant must navigate the ethical problem of whether to accept a well-paying position at a big firm where covering up unethical practices may be necessary. It is unclear if the concerns about unethical practices are legitimate or if they are based on rumors. The company is also requiring a non-disclosure as part of the contract to be signed to accept the job.
Sample Feedback:
Your responses show a good start in considering multiple perspectives, particularly in the first question where you acknowledge Mel's lack of prior mistakes and suggest that education might be a more appropriate response than termination. For a more comprehensive perspective, it would be beneficial to also consider the perspectives of the patron affected by the allergy, the restaurant management, and the potential legal implications for the restaurant. In terms of justification and rationale, your arguments are coherent and clear, but they could be strengthened by providing more depth. For instance, in your second response, you could elaborate on specific collaborative strategies that Mel could use with her coworkers, kitchen staff, and management to address the situation and prevent future incidents. This would show a deeper understanding of the collaborative process and its benefits. In your third response, you've identified potential emotions Mel might be feeling, which is great, but consider also how these emotions could affect her interactions with others and her decision-making process. Overall, you've made a good effort, and with a bit more detail and consideration of all parties involved, your responses could be even more insightful.
Students also received feedback on their performance on the multiple-choice section of the assessment. One item focused on distinguishing between equity and equality. Figure 3 presents this portion of the feedback.
Figure 3. Equity Feedback.
Equality and equity are two distinct concepts, although they’re often used interchangeably. Where equality gives everyone access to the same opportunities, equity in the workplace means that access to opportunities is proportional to each person’s needs. This ensures that the opportunity is equally advantageous to everybody, because it’s offered in a way that takes diversity into account.
In other words, equity will eliminate barriers and level the playing field by giving each person the right kind of support. Angus Maguire’s popular cartoon (see below) illustrates the difference with a visual metaphor. It shows three people of varying heights standing on boxes to watch sports from behind a fence. In the ‘equality’ part of the image, each person receives the same number of boxes to stand on, meaning only the tallest can see the action. In the ‘equity’ part of the image, each person has the number of boxes they need to see over the fence, creating an equal vantage for everyone.
Cartoon Attribution: Interaction Institute for Social Changeopens in new window | Artist: Angus Maguireopens in new window
Program B (University of Waterloo, School of Pharmacy)
As part of its annual Midpoint Assessment, the School of Pharmacy traditionally administers an OSCE and a multiple-choice exam to identify students who may need remediation. In a recent enhancement, the ACE was added as a third component to better support holistic student development, addressing not only academic performance but also professionalism-related competencies. A remediation framework was implemented based on ACE domain scores, classifying students into three categories:
- No Remediation – Students showing moderate to high competency across all domains (82%)
- Resource Review Recommended – Students with one domain in the “developing” range (10%)
- Advisory Support Recommended – Students with two or more developing domains (7%)
- Remediation with Academic Advisor Strongly Recommended (2%)
Importantly, there was minimal overlap between students flagged by the SJT and those identified through the OSCE/MCQ assessments—only one student required remediation across all three components. This suggests that the ACE SJT effectively identifies a unique subset of students needing support in professionalism competencies.
Overall, integrating the ACE SJT with existing assessments has created a more comprehensive and balanced framework—one that supports both academic knowledge and critical soft skills, preparing students more effectively for the demands of modern healthcare practice.
Discussion
In line with the principles of formative assessments, the ACE assessment was developed to foster student growth in personal and professional skills critical to professional success. As this formative SJT is still in its pilot phase, early efforts centered on evaluating its core psychometric properties. Results demonstrated high acceptable reliability across the four domain scores based on open-ended responses, along with preliminary support for the underlying factor structure.
An additional objective was to examine ACE’s utility in identifying students at risk for professionalism-related challenges and guiding targeted remediation. The implementation of ACE by two distinct programs highlights its utility in both developmental feedback and early intervention, contributing to the training of well-rounded professionals prepared for real-world demands.
Future development efforts will focus on refining the assessment to improve its psychometric robustness, incorporating feedback from pilot deployments, and further aligning the tool with student support and remediation strategies.