Nonverbal Behavior Reinforcement in VR

  • About:
    his research work investigates the user experience of an alternative method to teach nonverbal behavior to Embodied Conversational Agents (ECAs) in immersive environments. We overcome the limitations of the existing approaches proposing a VR game in which the player takes an active role in improving the learning models of the agents. The study explores how a game interaction in an immersive environment can improve the user experience in performing this interactive task, sharing the same space with the learning agents.

  • What: Master thesis

  • Where: LISN Lab @ Université Paris-Saclay

  • Duration: April 2021 - August 2021 (5 months)


Humans don't communicate only through words: the effectiveness of communication depends also on the nonverbal behavior content. Embodied Conversational Agents (ECAs) deserve the same: they can't just look like real humans, they also have to behave like them, communicating verbally and nonverbally, to enhance a good user experience. However, designing proper nonverbal behavior is challenging because of the risk of falling into the uncanny valley.
So far, to generate nonverbal behavior is through interpolation of keyframes or through motion capture using a body-tracking suit. However, these approaches lack adaptability, because they require either professional actors, particular technologies, too much data, or processing time.


We want to explore a new approach for teaching nonverbal behavior to ECAs by creating a VR game, in which we gamify the process of the Human-in-the-Loop framework with human preferences.


VR systems

The idea of sharing the same space with the learning agent leads to more effective training of the agent.

An immersive system provides the trainers with a means of visualization and improves the trainer experience.

Human-in-the-Loop with human preferences

It learns a reward function from the human feedback, instead of using the human feedback directly as a reward function.

Users have just to express their preferences between different behavior performed by different agents (no demonstrations).

The amount of feedback from the human and the hours of experience required are reduced.

Comparisons are easier and faster to provide rather than giving an absolute numerical score.

Gamification and Adaptive gameplay

We gamify the Human-in-the-Loop framework, placing the training of ECAs in a game scenario and creating a story around the learning task.

The AI is trained inside the game through the feedback of the player. They improve time by time, giving us the opportunity to create an adaptable game scenario according to the abilities of the user.


Research questions


How a game interaction can help in keeping engaged the users in performing the learning task?

Solving the machine learning problem of teaching nonverbal behavior is out of the scope of our work as we focus on the design of the human-system interaction.


How users should interact with the framework in real-time applications?

  • How the player should give feedback to the agents.

  • How the agent should observe the received feedback and the state of the environment to make a decision.


Virtual Reality Gamers
They couldn't have knowledge of ML

Virtual Characters Designers


  • To avoid the difficulties related to multi-agent collaborative systems, which are out of scope, we design the system to be one-player only.

  • The player may not be familiar with the task performed, so there could be some bias in judging. For example: if the task is to cook something, but the player does not have any experience with it, the judgment could be useless. So, the task to perform is an emotion.

  • We are not equipped with technology to detect facial expressions while wearing a headset, so we are not considering them. Moreover, we are not considering the walk of virtual entities, because it could bias the judgment of the player.In the end, since we removed facial expressions and leg movements, we consider only the upper-body language for expressing the emotions.

  • We thought that the player's judgment could be not accurate if they have to look at more performances at the same time. So, the users see only one performance at a time, in order to have the full focus.

Game Idea: Casting for a Movie

The player takes the role of a movie director, who is looking for an actor for a masked character in a silent movie. Due to these particular constraints, the director has to hire an actor who better performs an emotion with the upper part of the body, without considering the facial expressions and speech.

Some aspiring actors go to the auditions for this part, but not all of them are professionals. The role of the director is to get rid of the bad actors who try to invade his audition through different rounds of the game.


Between the actors, some are driven by real humans and others are virtual agents.

Virtual Actor

Their performance is generated by the ML algorithm, in real-time.

Throughout their learning, the game becomes more and more difficult, because they become better at performing the emotions.

Human Actor

Their performance is pre-recorded by human actors.


Takes the role of the movie director.

Doesn't know who between the actors is a “human” or a “virtual” agent.

Task: choose the best actor following their own personal taste.


Movie director's part

Where the player sits looking at the actors' performances.

Composed of some scenographical objects related to the cinema, to immerse the user more in the context of cinema auditions, and decoration objects, to make the user feel more relaxed.

The user interacts with the system through a smart table.

Actors' part

Where the actors perform the emotion of joy.

Actors appear through a lifting platform from inside the trapdoor, and they disappear falling inside the trapdoor when they have finished the performance.

The trapdoors are useful to hide actors' walks, because the latter could lead to biased judgments, even if they would be the same for every actor.

Sequence of events

  1. The player performs an example of the emotion with only the upper part of the body. This motion is recorded to provide new data to the neural networks for the generation of the emotion.

  2. One by one, all the actors appear in the audition room through the trapdoor to perform their emotion. After every performance, the director can ask for a replay.

  3. The director, if needed, can call back an actor and ask for a replay of the previous performance. This phase can be skipped.

  4. All the actors are called back in the room, appearing through the trapdoors. The director votes for the one who won the round.

At the end of every round, the player could decide to do another one.

On the left, the virtual characters are training their ML models in the “underground”.
On the right, the user view with an actor on a trapdoor.

Game interaction

The game experience is Seated VR.

The interaction is situated in a smart table, where there are the following elements:

  • A screen that guides the user in the round, prompting statements and questions. For example “Now it's the turn of the actor number 3. Are you ready?”, “Do you want to see a replay of the performance?”, “Now it's time to vote!”.

  • A “YES” and a “NO” buttons, to answer the questions shown on the screen and proceed with the game round.

  • Some numbered buttons, that correspond to the actors, to vote for the best actor or to ask for a replay of a specific actor.


Architecture of the system

The Human Actors are prerecorded-driven virtual characters, which reproduce an emotion selected from a database. The database was filled with animations directly playable in Unity, which were generated through a motion capture system using Optitrack Motive.

The Virtual Actors are the learning agents, with the 3 "brains". The ML-Agent Plugin gets as input the recorded emotion and the vote from the player, and the emotion-expressed gestures from the Virtual Actors. The Python side of the ML-Agent plugin is to execute the reinforcement learning algorithm for training the Virtual Actors.

The Player interacts with the system through an HMD, HTC Vive controllers, and VIVE Tracker with Belt Strap. Their gestures are mapped onto a representing invisible avatar through Final Inverse Kinematics (Final IK).

The environment

IDE: Unity & Visual Studio Code

Tools: Unity ML-Agents Toolkit. Unfortunately, it doesn't provide a mechanism to learn a reward function from the human feedbacks, in order to be able to implement the Human-in-the-loop framework with human preferences. We implemented a temporary solution in which the agents only try to imitate the player movement recorded at the beginning. With this approach, it is possible to implement a reward function manually, but the feedback of the player is ignored.

Technology: HMD, HTC Vive controllers, and VIVE Tracker with Belt Strap to track users' chest movements.

Player motions recording

There is an invisible character representing the real-time position of the player through Final IK and RootMotion. During the recording phase, all the rotations of the upper-body bones performed by this avatar are saved in CSV files.

Motions generation for virtual actors

Goal of the agent: imitate the user motion recorded at the beginning.

The arms, head, and chest of the characters are moved using the Inverse Kinematic (IK) approach. Every character has 4 cubes used as targets for the IK: one for the left hand, one for the right hand, one for the head, and one for the chest.

To imitate the player movement, every character has 3 neural networks learning in parallel: one for the left arm, one for the right arm, and one for the head and chest. Every neural network learns how to move the correspondent body part.

Goal of the algorithm: predict the positions and rotations of the cubes used as targets for the IK.

Motions recording for human actors

This is me performing some joyful motions while wearing the motion-tracking bodysuit.

Tool used: OptiTrack Motion Capture System.

We captured several human motions of different expressions of joy, directly playable in Unity.

We created a database of 20 different performances of joy (10 made by a female interpreter and 10 by a male interpreter) to have different motions to apply on female and male avatars.

Experiment Design

Goal: investigate how game mechanisms can help in maintaining people engaged in an interactive learning task.

HT: people in the gamified environment should remain engaged in the task for a longer time period and have a better user experience.

How: we created another system that accomplishes the same objective of the main system, without having a game story built around it.

The participants were divided equally into 2 groups: the first started testing the Game system and then switched to the other system, while the second did the opposite.


For the Game system: “Imagine you are a movie director who organized a casting for an important role. This character wears a mask on his face for the entire movie and is always sitting. At the casting, 5 actors will show up. You have to vote for the one who best interprets the emotion of joy with the upper part of the body.”

For the No Game system: “Select the avatar who best performs the emotion of joy with the upper part of the body.”

Evaluation methods

  • Thinking-aloud method and observation (during the interaction with the systems)

  • Open-questions interview:

    1. What do you think about this system?

    2. Why did you decide to stop at that time?

    3. Do you think the system accomplishes the goal of teaching avatars how to perform joy with upper-body language?

    4. (FOR GAME SYSTEM) Which was your favorite part of the game?

    5. (FOR GAME SYSTEM) Which was the part that you liked the least in the game?

    6. (FOR GAME SYSTEM Would you play this game again?

    7. (FOR GAME SYSTEM) Which was your favorite part of the game?

    8. (FOR GAME SYSTEM) Which was the part that you liked the least in the game?

    9. What should be improved in the system?

    10. Which system do you prefer?

  • SUS questionnaire

  • GodSpeed questionnaire

  • Presence questionnaire


Our research question “How can a game interaction help in keeping the users engaged in performing the learning task?” seems to be confirmed, since the data collected from all the evaluation methods show more positive outcomes in the Game one.

The fact that some participants started testing System A and others the System B did not affect at all the user experience. Most of the comments were in common between all the users.

Because of restrictions imposed by the COVID-19 crisis, it was possible to perform the experiments only with colleagues of the laboratory. So, the information collected through the experiments is biased due to the experience the test subjects have in Virtual Reality programming or Machine Learning techniques. We managed to find only 6 available participants.

What affected the user experience the most was the effectiveness of the system. The temporary solution for generating the agents' nonverbal behavior impacted the involvement in the game, sometimes decreasing the satisfaction factor.


Actor performances

Users laughing or scared, especially with impossible movements to perform in reality or weird positions of avatars.

Some users noticed that the virtual characters were improving round by round.

Smart table interaction

Almost all the users liked the SeatedVR experience and the pushing of the buttons on the smart table.

One user found it too essential: they would have preferred to also manipulate some virtual objects.

Some sentences prompted on the screen are not enough clear.

Recording of emotion

Part that created more troubles!

Some participants did not remember what that part was for, or they were not expecting that the actors were going to imitate the movement.

Activity was found to be not intuitive enough.


Game system

Cool context idea (casting auditions),
nice design of scene (trapdoors)

Enjoyed the presence of 5 actors
(more unpredictability, more fun)

2/6 users think is more effective:
people are more likely to use it for a longer time because it's more fun

SUS score: 70.42 (Good)

GodSpeed questionnaire:
negative perception of agents

Presence questionnaire:
good degree of immersion,
bad interface quality

No Game system

Boring system,
but less pressings with a shorter story

1 user preferred this because they found it easier to use (only 2 actors to consider)

4/6 users think is more effective:
faster and more efficient
(they can focus only on 1 character's improvements)

SUS score: 65.83 (Poor)

GodSpeed questionnaire:
negative perception of agents

Presence questionnaire:
good degree of immersion,
bad interface quality

The number of actors influenced the user experience

  • 2/6 users preferred to have more actors, because it implies more unpredictability due to the various performances, leading to a more fun experience.

  • 4/6 users preferred to have only 2 actors, because it is easier to keep the focus and it is easier to remember the performances.

Future work

Improve ML algorithm

Validate the player's decision.

Avoid impossible movements with the body by learning agents.

Reduce bias in testing population

Include the other users we focused on: virtual reality gamers without knowledge in machine learning.

Redesign recording of emotion

Try to increase the feeling of immersion without impacting the spontaneity of the movement.

Improve test design

Both systems should have the same number of performing actors since we discovered that it influences the user experience.

Explore new interactions

Include the navigation in the environment, where the player can go closer to virtual agents.

Investigate the perception of the agents from a closer perspective and how this impacts the quality of the nonverbal behavior generated.