{{sect1}}
Upon the completion of our initial usability study's data analysis in the fall of 2023, my team and I began to outline and draft two papers reporting on our R&D so far.
Since then:
Once these papers are published, my team and I plan to explore next-step funding avenues to design and build out additional features and modules for our training tool, implement it within an existing training program, and test its effectiveness in addition to its evolving usability, acceptability, and feasibility.
{{sect2}}
At Neurohue and in collaboration with a university mental health innovation center, I led an R&D team through the participatory design, prototyping, and initial usability testing of Strongr, an AR glasses-based training simulation tool. It allows mental health care trainees to practice therapeutic skills, such as screening for mental health concerns, via realistic and engaging interactive exercises with virtual patients.
{{sect3}}
This project was a collaboration between my research nonprofit Neurohue and the City University of New York's (CUNY's) Center for Innovation in Mental Health (CIMH). CIMH was recently awarded a large NIH grant to "develop a multisectoral coalition to implement a community-wide collaborative care model to address mental health and socio-economic risks across a network of housing developments, primary care sites, and community-based organizations [CBOs]." In speaking with CIMH's previous Program Manager, Catherine Dinh-Le, about potential collaboration opportunities in the digital health XR space, the need for more accessible and effective training approaches led to the vision of XR-based role-playing practice simulations.
We identified two major aspects of the existing training's challenges:
Our long-term goal is to extend the training resources offered to social services workers and community leaders and build capacity in response to the global mental health care gap. We aim to develop a tool that can help both to scale training to reach more trainees in communities in need as well as to better equip these trainees with new, practical skills needed to help their community members in:
For the initial R&D of the tool, we decided to focus on exploring the design potential of XR technology to expand and improve the training's experiential learning aspects.
This research study aimed to answer the following questions:
{{sect4}}
Our motivation for choosing XR as the training medium stemmed from team members' experience working with the medium and familiarity with its key affordances, including an increased sense of realism and user engagement.
We were encouraged by a study that demonstrated that, compared to e-learning and classroom learning, XR-based learning of soft skills unlocked learners’ significantly greater (1) focus on content; (2) emotional connection to content; and (3) confidence in applying skills.
Through a literature review, our research scientist helped to clarify and validate the psychological mechanisms underlying XR experiences and mapped out the key factors of: (1) presence, or the subjective perception of being a part of an experience; (2) embodiment, or the involvement of one’s body in an experience, including the integration of gestures or even full-body movement; (3) transportation, or the cognitive process of becoming focused on an event occurring in a narrative; and (4) identification, or the emotional and cognitive process of taking on the role of a character in a narrative.
Given that soft skills consist of more than cognitive skills, including emotional, social, non-verbal, and gestural skills, we hypothesized that these complex psychological factors would synergize well within XR simulations of role-playing experiences, which are a cornerstone of traditional mental health training.
Through our initial scan of XR soft skill trainings in the market and in research, we found that many consist of: (1) proposals for future explorations of this form of training; (2) restrictive, pre-structured experiences, which rely on multiple choice prompts or preset dialogue options; (3) computer desktop experiences that lack immersive embodiment; or (4) setups relying on actors for dialogue flow.
With developments in conversational AI, applications are emerging that integrate more fluid interactivity; however, we are not aware of XR training tools that allow both an open-ended expressiveness for trainees to practice conversing in tandem with a subtle curation of virtual patients' embodied, emotional, and realistic responses.
Inspired by (1) limitations in the field; (2) theories of constructivist and authentic learning, which posit that learning is an active, exploratory process and benefits greatly from hands-on activities; as well as (3) studies calling for more sophisticated designs of virtual patients; we were excited to explore creating a highly dynamic and realistic role-playing XR experience for trainees.
Also, given that we were designing for trainees coming from and serving underrepresented communities, we acknowledged the role that cultural sensitivity has in shaping mental health outcomes and committed to a participatory design approach to embed that sensitivity into our design process.
{{sect5}}
Our approach to participatory design involved a mix of:
We individually interviewed and held a group workshop with our three SME stakeholders, who are researcher-clinician-trainers working at the innovation center. They included a training program lead, an implementation development lead, and a cultural inclusivity specialist.
Their collective input yielded an array of insights that helped ensure our design would be responsive to the center's needs.
Specifying the Curriculum
Our deliberations with the SMEs led to a consensus on the selection of screening skills, specifically the skill of administering the brief PHQ-4 questionnaire, which assesses a patient's depression and anxiety symptoms.
Simulation Design Mapping: Responding to Training Gaps
With the content of our simulation clarified, we then needed to map out the main experiential gaps within the innovation center's existing training to identify the AR simulation's UX requirements. We have organized the SMEs' guidance below, categorizing the areas for improvement they highlighted as either skill or quality gaps.
Assessing these requirements holistically, as well as the SMEs’ emphasis on their limited time and human resources for embedding extensive role-playing in their training, it was clear that a well-designed and effective AR simulation could not only help with scaling up task-sharing training programs but, equally importantly, be a critical lever in:
To further validate our design direction, the design lead participated in one of the existing training’s webinars. They learned that trainees were especially eager to learn more about screening skills and experienced first-hand the limited role-playing practice opportunities. Of the participants that were able to join the webinar synchronously, some did not get a chance to take part in role-playing due to time or Internet quality constraints, and for those who did, they were only able to experience one round of role-playing. As noted in the third-listed quality gap above, there were also no quality measures in place to ensure that the role-played patient accurately portrayed symptoms or a patient profile specific to the target community.
Over the course of three months, our XR team worked on translating these participatory insights, bolstered by context from the literature review, into a workable prototype for the Magic Leap 2 via the Unity game engine. The SMEs’ perspectives helped us to distill the following four overarching heuristics to guide our approach:
Along the way, our prototyping team solicited feedback regularly from a larger team of interdisciplinary stakeholders and, as the prototype became more functional, conducted informal user testing sessions with some of the SMEs as well as potential end-users. These additional participatory inputs were essential in guiding the evolution of the simulation toward becoming more realistic, usable, and effective. It was important to hone these qualities before submitting the prototype to more objective evaluations through the tests of our preliminary usability study.
Working within the constraints of our limited time, budget, and development team size, we did our best to balance expediency with fidelity and the assembling of a high-quality, interactive experience. We describe some of the design features that emerged as well as the story of their refinements below.
Realistic, Engaging Experience
Our workshop with the SMEs led us to specify the demographic of our prototype’s VP as Latino and young adult, aligned with the trainees’ target community, and the name Alvaro. We chose the male gender, as statistically men are less likely to proactively seek out mental health services, and we chose depression for his mental health condition due to the expected ease of translating more muted bodily symptomatology to animations, compared to the fidgeting of anxiety.
When it came to designing the VP, we explored a variety of 3D avatar creation tools. Some of these tools’ avatar outputs could only represent a narrow range of VP demographics, and some did not come with key technical features including pre-rigging, the mechanism behind avatar body animations, and blend shapes, the mechanism behind avatar facial animations, such as lip movements. Of those that could generate diverse demographic characteristics and embed technical features, their avatar fidelity ranged from low to high. Interestingly, we discovered from discussions with the team that higher fidelity did not necessarily imply higher realism, as some very detailed avatars skirted too close to the edge of the uncanny valley. Our final pick was the Ready Player Me avatar platform, which offered a friendlier aesthetic, quick and expansive avatar customization, efficient performance across hardware platforms, and professional technical features.
While we considered subtle body animations to be a key feature, we did not have the resources to create totally customized animations and were glad to find that the Mixamo animation library includes a range of sitting animations that we could apply to our VP avatar. With some light adjustments to these animations in Unity, we were able to configure our avatar to smoothly sequence through different sitting postures and body gestures and activate these dynamically based on the shifting emotional tenor of the VP’s dialogue and mood throughout the simulation.
VP facial features are another key to unlocking the sense of a realistic encounter, and we worked with the Salsa Lipsync Suite plugin for Unity for subtle control over animating these. We started by setting up lip-syncing and blinking, and, in combination with the body animations, these features produced the first pass of a working conversational experience.
Through some rounds of user testing, we received feedback that the animations were too dramatic and obviously looping at times, suggesting more of a manic versus depressive VP profile, and that the VP’s lack of eye contact and staring straight ahead made them look somewhat psychotic. For our next iteration, we adjusted and resequenced the body animations for a subtler presentation. We also added head movement and eye contact features, which really helped to dynamize the VP’s behavior and increase user engagement. To better convey depressive symptoms and internalized stigma, we added an attention span feature that caused the VP to now and then avoid eye contact and look down and away from the trainee, suggesting hesitation in opening up. We found that these revised aspects worked well to convey the VP’s mood and strengthen the users’ sense of human connection with them.
Finally, our choice of hardware helped to convey realism and engender engagement. The hands-free and head-mounted nature of AR glasses allows users to seamlessly view virtual content embedded in their surroundings. In the case of a screening simulation, this allows the trainee to hold a physical clipboard with a PHQ-4 form and fill it out naturally as they interact with the VP. The Magic Leap 2 specifically has a feature called segmented dimming that makes virtual content appear more physically solid and thus real. These features add up to a perceptual finessing of the experience that supports the sense of presence of the VP and thus one’s engagement with them.
Fluid Conversation
This section will specifically explore the verbal flow aspects of the conversation. Rather than structuring the dialogue with multiple choice selections, we wanted to design a more expressive conversational experience by allowing the trainee to speak in their own voice, in tune with their body language, as this would allow them to more realistically and impactfully practice the application of soft skills and refine their own style of caring communication. At the same time, we needed to ensure that the trainee would learn best practices and stay on track in the conversation to effectively and efficiently administer the PHQ-4 questionnaire. This led us to the idea of a text prompt user interface (UI), which offers loose coaching on what the trainee should say while empowering them to translate what the prompt means into their own words.
In order for the simulation to be aware of what the trainee says and cue the VP to respond accordingly, we would need to implement a speech-to-text feature and process the trainee’s recognized speech with conversational AI or natural language processing. However, we were not able to get speech-to-text up and running, and in any case, we did not have the budget to implement conversational AI. In thinking through alternative solutions, we realized that the VP did not have to be able to open-endedly respond to the trainee, but it could be programmed to respond in a preset way based on the trainee’s anticipated dialogue guided by the prompt UI text.
Furthermore, since at the prototype’s current stage, we would always be testing the simulation with a member of our development team on site, we devised a way to “fake AI” by giving control over the simulation’s responses to a behind-the-scenes team member, or “flow guide.” At first, we set up an experience that guides the trainee at each step of the interaction via the prompt UI and allows the flow guide to decide whether or not the trainee has adequately responded to the prompt. If the trainee does, the flow guide can trigger the VP to respond in a pre-programmed way. If they don’t, the flow guide can trigger the prompt UI to reveal a “Try this” text sample, written by an SME clinician, that the trainee can then read aloud to ensure that they learn best practices. After user testing, we found that it generally worked well in allowing the trainee to explore responding to prompts in their own voice while keeping them on track and teaching them best practices.
However, we received feedback that users wished to have more freedom to go “off-prompt” at times and found that the “Try this” prompt disrupted the flow of their conversation. In response to both of these issues, we added more controls that the flow guide can use to expand the range of ways the trainee can respond at each step of the interaction. These include simple “yes” and “no” answers from the VP, repeating what they said previously, or a response such as “Can you repeat that?” In the latter case, the “Try this” prompt still appears, but the added VP interactions serve to mitigate disruption of the conversational flow.
For the VP’s voice, we initially set up a text-to-speech feature that would allow them to respond dynamically, but with the decision to pre-program their responses, we decided to work with a team member matching the VP’s demographic to record human voiceovers, which added to the realism.
Nuanced Skill-building
More elusive (but critical) skill-building goals, like practicing cultural humility or developing a sense of therapeutic alliance with the VP, did not depend on any specific design features but more so the overall design and lifelike interaction structure of the simulation. Contrasted with cultural competence, traditionally conceived of as a skill that can be learned by mastering a finite body of knowledge, cultural humility “incorporates a lifelong commitment to self-evaluation and self-critique, to redressing the power imbalances in the patient-physician dynamic." In this sense, one could envisage the simulation serving the valuable role of priming the trainee to assess their own biases and begin practicing cultural humility by giving them a chance to dialogue with simulated others and exposing them to a diverse set of VP demographics (a goal for our future iterations). The notion of the therapeutic alliance refers to "a collaborative relationship between therapist and patient that is influenced by the extent to which there is agreement on treatment goals … and the formation of a positive emotional bond." Our design aims to offer the opportunity for the trainee to practice developing this alliance through the overall mix of embodied, emotional, and realistic VP responses, as well as through specific prompt guidance throughout the interaction, including guidance on when to give the VP space to process, how to reassure them, and how to empathize with them.
Flexible Configurability
While the goal of our initial prototype was to bring only one particular VP case and skill practice to life, we wanted to set up our development processes to be flexible and scalable in order to prepare for future, expanded simulation development, as well as continuing research and testing. The ability to generate new scenarios and diverse VPs would be key to addressing the requirements of preventing the trainee’s over-training on particular scenarios, giving them opportunities to practice cultural humility, keeping them engaged and incentivizing them to practice often, and, in general, better preparing them for the open-ended dynamics of real-world settings.
Toward this end, we created a configuration file system. This allows developers to load a local spreadsheet file, populated with a scripted sequence of prompt texts, “Try this” dialogue texts, voiceover clips, and corresponding VP animation labels and produces a simulation session that structures the trainee’s interaction with a VP based on the associated script. This also supports the modification of existing scenarios.
{{sect6}}
I led a small design and development team to:
In concert, I:
Each user testing session involved a mix of qualitative (i.e. think-aloud method and pre- and post-interviews) and quantitative (i.e. an adapted PSSUQ form) data collection. Notably, one participant had to leave before user testing the prototype, further limiting the generalizability of our results; however, following Nielsen's recommendation of an ideal number of five early-stage testers, our goal at this stage was assessing initial impressions and usability problems versus generalizability. Overall, the PSSUQ results indicated an above-average usability score, and the qualitative responses we got from trainees were quite positive, eliciting excitement about the potential and future development of the tool. Participants noted their ability to feel empathy for the VP, the engaging effect of the VP's shifting body postures pulling them in to want to reassure the VP, and their ability to see this being very useful for helping trainees translate lecture content to practice, especially for more difficult scenarios that are challenging to role-play. The main usability problems involved fixable technical issues such as too-low voiceover volume, our forgetting to bring the AR glasses' nose bridge accessories, and our flow guide's human error at times in triggering the intended VP response.
{{sect7}}
Reflecting back on our design journey, it has been inspiring to see how the synergy of collaborative efforts among disciplinary and community-based stakeholders can help to bring about freshly conceived design directions in the arenas of XR learning and mental health care provider training. To summarize some key insights learned along the way, we discovered that:
{{sect8}}
While our design's materialization so far has not extended beyond a single-module prototype, we are excited by the validations that our initial testers have indicated and the expansive opportunities, which our design processes have begun to map out, that lie ahead for continuing to research the nuanced design of immersive and experiential soft skill learning simulations. Some key trajectories to explore include:
{{sect1}}
Upon the completion of our initial usability study's data analysis in the fall of 2023, my team and I began to outline and draft two papers reporting on our R&D so far.
Since then:
Once these papers are published, my team and I plan to explore next-step funding avenues to design and build out additional features and modules for our training tool, implement it within an existing training program, and test its effectiveness in addition to its evolving usability, acceptability, and feasibility.
{{sect2}}
At Neurohue and in collaboration with a university mental health innovation center, I led an R&D team through the participatory design, prototyping, and initial usability testing of Strongr, an AR glasses-based training simulation tool. It allows mental health care trainees to practice therapeutic skills, such as screening for mental health concerns, via realistic and engaging interactive exercises with virtual patients.
{{sect3}}
This project was a collaboration between my research nonprofit Neurohue and the City University of New York's (CUNY's) Center for Innovation in Mental Health (CIMH). CIMH was recently awarded a large NIH grant to "develop a multisectoral coalition to implement a community-wide collaborative care model to address mental health and socio-economic risks across a network of housing developments, primary care sites, and community-based organizations [CBOs]." In speaking with CIMH's previous Program Manager, Catherine Dinh-Le, about potential collaboration opportunities in the digital health XR space, the need for more accessible and effective training approaches led to the vision of XR-based role-playing practice simulations.
We identified two major aspects of the existing training's challenges:
Our long-term goal is to extend the training resources offered to social services workers and community leaders and build capacity in response to the global mental health care gap. We aim to develop a tool that can help both to scale training to reach more trainees in communities in need as well as to better equip these trainees with new, practical skills needed to help their community members in:
For the initial R&D of the tool, we decided to focus on exploring the design potential of XR technology to expand and improve the training's experiential learning aspects.
This research study aimed to answer the following questions:
{{sect4}}
Our motivation for choosing XR as the training medium stemmed from team members' experience working with the medium and familiarity with its key affordances, including an increased sense of realism and user engagement.
We were encouraged by a study that demonstrated that, compared to e-learning and classroom learning, XR-based learning of soft skills unlocked learners’ significantly greater (1) focus on content; (2) emotional connection to content; and (3) confidence in applying skills.
Through a literature review, our research scientist helped to clarify and validate the psychological mechanisms underlying XR experiences and mapped out the key factors of: (1) presence, or the subjective perception of being a part of an experience; (2) embodiment, or the involvement of one’s body in an experience, including the integration of gestures or even full-body movement; (3) transportation, or the cognitive process of becoming focused on an event occurring in a narrative; and (4) identification, or the emotional and cognitive process of taking on the role of a character in a narrative.
Given that soft skills consist of more than cognitive skills, including emotional, social, non-verbal, and gestural skills, we hypothesized that these complex psychological factors would synergize well within XR simulations of role-playing experiences, which are a cornerstone of traditional mental health training.
Through our initial scan of XR soft skill trainings in the market and in research, we found that many consist of: (1) proposals for future explorations of this form of training; (2) restrictive, pre-structured experiences, which rely on multiple choice prompts or preset dialogue options; (3) computer desktop experiences that lack immersive embodiment; or (4) setups relying on actors for dialogue flow.
With developments in conversational AI, applications are emerging that integrate more fluid interactivity; however, we are not aware of XR training tools that allow both an open-ended expressiveness for trainees to practice conversing in tandem with a subtle curation of virtual patients' embodied, emotional, and realistic responses.
Inspired by (1) limitations in the field; (2) theories of constructivist and authentic learning, which posit that learning is an active, exploratory process and benefits greatly from hands-on activities; as well as (3) studies calling for more sophisticated designs of virtual patients; we were excited to explore creating a highly dynamic and realistic role-playing XR experience for trainees.
Also, given that we were designing for trainees coming from and serving underrepresented communities, we acknowledged the role that cultural sensitivity has in shaping mental health outcomes and committed to a participatory design approach to embed that sensitivity into our design process.
{{sect5}}
Our approach to participatory design involved a mix of:
We individually interviewed and held a group workshop with our three SME stakeholders, who are researcher-clinician-trainers working at the innovation center. They included a training program lead, an implementation development lead, and a cultural inclusivity specialist.
Their collective input yielded an array of insights that helped ensure our design would be responsive to the center's needs.
Specifying the Curriculum
Our deliberations with the SMEs led to a consensus on the selection of screening skills, specifically the skill of administering the brief PHQ-4 questionnaire, which assesses a patient's depression and anxiety symptoms.
Simulation Design Mapping: Responding to Training Gaps
With the content of our simulation clarified, we then needed to map out the main experiential gaps within the innovation center's existing training to identify the AR simulation's UX requirements. We have organized the SMEs' guidance below, categorizing the areas for improvement they highlighted as either skill or quality gaps.
Assessing these requirements holistically, as well as the SMEs’ emphasis on their limited time and human resources for embedding extensive role-playing in their training, it was clear that a well-designed and effective AR simulation could not only help with scaling up task-sharing training programs but, equally importantly, be a critical lever in:
To further validate our design direction, the design lead participated in one of the existing training’s webinars. They learned that trainees were especially eager to learn more about screening skills and experienced first-hand the limited role-playing practice opportunities. Of the participants that were able to join the webinar synchronously, some did not get a chance to take part in role-playing due to time or Internet quality constraints, and for those who did, they were only able to experience one round of role-playing. As noted in the third-listed quality gap above, there were also no quality measures in place to ensure that the role-played patient accurately portrayed symptoms or a patient profile specific to the target community.
Over the course of three months, our XR team worked on translating these participatory insights, bolstered by context from the literature review, into a workable prototype for the Magic Leap 2 via the Unity game engine. The SMEs’ perspectives helped us to distill the following four overarching heuristics to guide our approach:
Along the way, our prototyping team solicited feedback regularly from a larger team of interdisciplinary stakeholders and, as the prototype became more functional, conducted informal user testing sessions with some of the SMEs as well as potential end-users. These additional participatory inputs were essential in guiding the evolution of the simulation toward becoming more realistic, usable, and effective. It was important to hone these qualities before submitting the prototype to more objective evaluations through the tests of our preliminary usability study.
Working within the constraints of our limited time, budget, and development team size, we did our best to balance expediency with fidelity and the assembling of a high-quality, interactive experience. We describe some of the design features that emerged as well as the story of their refinements below.
Realistic, Engaging Experience
Our workshop with the SMEs led us to specify the demographic of our prototype’s VP as Latino and young adult, aligned with the trainees’ target community, and the name Alvaro. We chose the male gender, as statistically men are less likely to proactively seek out mental health services, and we chose depression for his mental health condition due to the expected ease of translating more muted bodily symptomatology to animations, compared to the fidgeting of anxiety.
When it came to designing the VP, we explored a variety of 3D avatar creation tools. Some of these tools’ avatar outputs could only represent a narrow range of VP demographics, and some did not come with key technical features including pre-rigging, the mechanism behind avatar body animations, and blend shapes, the mechanism behind avatar facial animations, such as lip movements. Of those that could generate diverse demographic characteristics and embed technical features, their avatar fidelity ranged from low to high. Interestingly, we discovered from discussions with the team that higher fidelity did not necessarily imply higher realism, as some very detailed avatars skirted too close to the edge of the uncanny valley. Our final pick was the Ready Player Me avatar platform, which offered a friendlier aesthetic, quick and expansive avatar customization, efficient performance across hardware platforms, and professional technical features.
While we considered subtle body animations to be a key feature, we did not have the resources to create totally customized animations and were glad to find that the Mixamo animation library includes a range of sitting animations that we could apply to our VP avatar. With some light adjustments to these animations in Unity, we were able to configure our avatar to smoothly sequence through different sitting postures and body gestures and activate these dynamically based on the shifting emotional tenor of the VP’s dialogue and mood throughout the simulation.
VP facial features are another key to unlocking the sense of a realistic encounter, and we worked with the Salsa Lipsync Suite plugin for Unity for subtle control over animating these. We started by setting up lip-syncing and blinking, and, in combination with the body animations, these features produced the first pass of a working conversational experience.
Through some rounds of user testing, we received feedback that the animations were too dramatic and obviously looping at times, suggesting more of a manic versus depressive VP profile, and that the VP’s lack of eye contact and staring straight ahead made them look somewhat psychotic. For our next iteration, we adjusted and resequenced the body animations for a subtler presentation. We also added head movement and eye contact features, which really helped to dynamize the VP’s behavior and increase user engagement. To better convey depressive symptoms and internalized stigma, we added an attention span feature that caused the VP to now and then avoid eye contact and look down and away from the trainee, suggesting hesitation in opening up. We found that these revised aspects worked well to convey the VP’s mood and strengthen the users’ sense of human connection with them.
Finally, our choice of hardware helped to convey realism and engender engagement. The hands-free and head-mounted nature of AR glasses allows users to seamlessly view virtual content embedded in their surroundings. In the case of a screening simulation, this allows the trainee to hold a physical clipboard with a PHQ-4 form and fill it out naturally as they interact with the VP. The Magic Leap 2 specifically has a feature called segmented dimming that makes virtual content appear more physically solid and thus real. These features add up to a perceptual finessing of the experience that supports the sense of presence of the VP and thus one’s engagement with them.
Fluid Conversation
This section will specifically explore the verbal flow aspects of the conversation. Rather than structuring the dialogue with multiple choice selections, we wanted to design a more expressive conversational experience by allowing the trainee to speak in their own voice, in tune with their body language, as this would allow them to more realistically and impactfully practice the application of soft skills and refine their own style of caring communication. At the same time, we needed to ensure that the trainee would learn best practices and stay on track in the conversation to effectively and efficiently administer the PHQ-4 questionnaire. This led us to the idea of a text prompt user interface (UI), which offers loose coaching on what the trainee should say while empowering them to translate what the prompt means into their own words.
In order for the simulation to be aware of what the trainee says and cue the VP to respond accordingly, we would need to implement a speech-to-text feature and process the trainee’s recognized speech with conversational AI or natural language processing. However, we were not able to get speech-to-text up and running, and in any case, we did not have the budget to implement conversational AI. In thinking through alternative solutions, we realized that the VP did not have to be able to open-endedly respond to the trainee, but it could be programmed to respond in a preset way based on the trainee’s anticipated dialogue guided by the prompt UI text.
Furthermore, since at the prototype’s current stage, we would always be testing the simulation with a member of our development team on site, we devised a way to “fake AI” by giving control over the simulation’s responses to a behind-the-scenes team member, or “flow guide.” At first, we set up an experience that guides the trainee at each step of the interaction via the prompt UI and allows the flow guide to decide whether or not the trainee has adequately responded to the prompt. If the trainee does, the flow guide can trigger the VP to respond in a pre-programmed way. If they don’t, the flow guide can trigger the prompt UI to reveal a “Try this” text sample, written by an SME clinician, that the trainee can then read aloud to ensure that they learn best practices. After user testing, we found that it generally worked well in allowing the trainee to explore responding to prompts in their own voice while keeping them on track and teaching them best practices.
However, we received feedback that users wished to have more freedom to go “off-prompt” at times and found that the “Try this” prompt disrupted the flow of their conversation. In response to both of these issues, we added more controls that the flow guide can use to expand the range of ways the trainee can respond at each step of the interaction. These include simple “yes” and “no” answers from the VP, repeating what they said previously, or a response such as “Can you repeat that?” In the latter case, the “Try this” prompt still appears, but the added VP interactions serve to mitigate disruption of the conversational flow.
For the VP’s voice, we initially set up a text-to-speech feature that would allow them to respond dynamically, but with the decision to pre-program their responses, we decided to work with a team member matching the VP’s demographic to record human voiceovers, which added to the realism.
Nuanced Skill-building
More elusive (but critical) skill-building goals, like practicing cultural humility or developing a sense of therapeutic alliance with the VP, did not depend on any specific design features but more so the overall design and lifelike interaction structure of the simulation. Contrasted with cultural competence, traditionally conceived of as a skill that can be learned by mastering a finite body of knowledge, cultural humility “incorporates a lifelong commitment to self-evaluation and self-critique, to redressing the power imbalances in the patient-physician dynamic." In this sense, one could envisage the simulation serving the valuable role of priming the trainee to assess their own biases and begin practicing cultural humility by giving them a chance to dialogue with simulated others and exposing them to a diverse set of VP demographics (a goal for our future iterations). The notion of the therapeutic alliance refers to "a collaborative relationship between therapist and patient that is influenced by the extent to which there is agreement on treatment goals … and the formation of a positive emotional bond." Our design aims to offer the opportunity for the trainee to practice developing this alliance through the overall mix of embodied, emotional, and realistic VP responses, as well as through specific prompt guidance throughout the interaction, including guidance on when to give the VP space to process, how to reassure them, and how to empathize with them.
Flexible Configurability
While the goal of our initial prototype was to bring only one particular VP case and skill practice to life, we wanted to set up our development processes to be flexible and scalable in order to prepare for future, expanded simulation development, as well as continuing research and testing. The ability to generate new scenarios and diverse VPs would be key to addressing the requirements of preventing the trainee’s over-training on particular scenarios, giving them opportunities to practice cultural humility, keeping them engaged and incentivizing them to practice often, and, in general, better preparing them for the open-ended dynamics of real-world settings.
Toward this end, we created a configuration file system. This allows developers to load a local spreadsheet file, populated with a scripted sequence of prompt texts, “Try this” dialogue texts, voiceover clips, and corresponding VP animation labels and produces a simulation session that structures the trainee’s interaction with a VP based on the associated script. This also supports the modification of existing scenarios.
{{sect6}}
I led a small design and development team to:
In concert, I:
Each user testing session involved a mix of qualitative (i.e. think-aloud method and pre- and post-interviews) and quantitative (i.e. an adapted PSSUQ form) data collection. Notably, one participant had to leave before user testing the prototype, further limiting the generalizability of our results; however, following Nielsen's recommendation of an ideal number of five early-stage testers, our goal at this stage was assessing initial impressions and usability problems versus generalizability. Overall, the PSSUQ results indicated an above-average usability score, and the qualitative responses we got from trainees were quite positive, eliciting excitement about the potential and future development of the tool. Participants noted their ability to feel empathy for the VP, the engaging effect of the VP's shifting body postures pulling them in to want to reassure the VP, and their ability to see this being very useful for helping trainees translate lecture content to practice, especially for more difficult scenarios that are challenging to role-play. The main usability problems involved fixable technical issues such as too-low voiceover volume, our forgetting to bring the AR glasses' nose bridge accessories, and our flow guide's human error at times in triggering the intended VP response.
{{sect7}}
Reflecting back on our design journey, it has been inspiring to see how the synergy of collaborative efforts among disciplinary and community-based stakeholders can help to bring about freshly conceived design directions in the arenas of XR learning and mental health care provider training. To summarize some key insights learned along the way, we discovered that:
{{sect8}}
While our design's materialization so far has not extended beyond a single-module prototype, we are excited by the validations that our initial testers have indicated and the expansive opportunities, which our design processes have begun to map out, that lie ahead for continuing to research the nuanced design of immersive and experiential soft skill learning simulations. Some key trajectories to explore include: