Training Emotionally Aware AI with Intrinsic Motivation

May 22, 2025

Intrinsic motivation frameworks may offer insights for developing more adaptable AI systems that could eventually exhibit emotional intelligence.

In organic intelligence, internal goals like curiosity, mastery, or uncertainty reduction drive what we might call “motivational learning.” AI systems may engage with human emotions more nuancedly by modeling similar drives in machines, like rewarding them for exploration or positive affect rather than fixed outcomes. This may scaffold behaviors relevant to emotional understanding, though this remains largely unexplored.

A Refresher On Reinforcement Learning

Reinforcement learning (RL) is a way to train machines by giving them feedback(in the form of rewards or penalties) based on how well they perform a task.

While conceptually straightforward, RL involves significant complexity in practice, learning policies over time, balancing exploration and exploitation, and optimizing expected future rewards. Over time, the system learns to make better decisions by maximizing those rewards.

Many of the Large language models(LLMs) we’re familiar with(ChatGPT, Claude, Gemini, etc) are fine-tuned using reinforcement learning from human feedback (RLHF). This is where humans rate or rank different outputs, and the model gradually learns to generate more preferred responses (Christiano et al., 2017). Think of being asked to choose between two model responses, or annotators rating which outputs are more helpful or appropriate.

For an LLM whose primary use case is to provide the user with basic or synthesized information, it’s easy to see why RL is a popular training method. It works well when there's a clear goal and a measurable outcome.

While reinforcement learning offers a robust framework for training agents through external rewards, some researchers have noted its limitations in emotional domains. As Moerland et al. (2018) assert, “a core challenge of this field is to translate higher-level psychological concepts into implementable mathematical expressions” (p. 455).

However, this framework may become insufficient when designing systems that aim to be more empathetic, conversational, and emotionally intelligent.

Emotions are messy, deeply biological, and not fully understood, so trying to teach them using a framework designed for measurable, task-based outputs may be inherently limiting. Emotions like empathy or joy, and cognitive states like uncertainty, are not easily captured through conventional task-based learning, which is why alternative approaches, like intrinsic motivation, are gaining traction.

Intrinsic Motivational Learning

It’s crucial that we distinguish intrinsic from extrinsic goals here, as they’re often muddled. Traditional reinforcement learning relies on externally defined tasks and rewards. Intrinsic motivation, by contrast, centers on internally generated goals that may foster more flexible and emotionally nuanced behavior.

Extrinsic reward: Giving a dog a treat after he does a trick.
Intrinsic reward: A child exploring the world out of curiosity or a drive to resolve uncertainty.

Now that we’ve made this distinction, we must also realize that implementing systems based on intrinsic motivation presents new challenges. How do we define progress when the goal is internal(and is “internal” to a machine)? How do we evaluate performance when exploration itself is the reward?

These challenges extend beyond evaluation; intrinsic motivation approaches face issues like reward hacking, where agents exploit unintended loopholes in their internal reward systems, and the fundamental difficulty of defining meaningful internal objectives for artificial systems.

Frameworks of Intrinsic Motivation

Researchers have developed several core frameworks to formalize intrinsic motivation in machines. These approaches reward agents not for completing external tasks, but for learning, adapting, or reducing uncertainty, just like humans do in emotionally complex environments.

Researchers have approached intrinsic motivation in various frameworks. Many of them are based on human motivational psychology and neuroscience.

Curiosity and Novelty

Agents are rewarded based on information gain. They’re driven to seek out surprising or novel experiences.
Think of our example of a child exploring the world out of curiosity. By simulating this behavior computationally, researchers attempt to create agents that learn autonomously in unfamiliar or open-ended environments.

“Curiosity-driven learning mechanisms are based on the idea that agents have intrinsic rewards linked to novelty, surprise, or prediction error.”
Oudeyer & Kaplan (2007), What is Intrinsic Motivation?

Competence and Flow

Here, agents are rewarded for improving their performance or mastering self-generated challenges.
The idea is rooted in developmental psychology, specifically effectance motivation (White, 1959), the state of deep focus and satisfaction that comes from doing something slightly beyond your current skill level.
“Competence-based models reward agents for improving their ability to perform tasks or reducing prediction error.”
Singh et al. (2010), Intrinsically Motivated RL: An Evolutionary Perspective, IEEE TAMD

Prediction Error

Based on the discrepancy between expected and actual outcomes.
Analogous to emotions arising in humans due to an unexpected turn of events. Surprise.
Ideally, this would help agents respond to changing, unpredictable environments.
Note: Unlike in standard RL, where prediction error informs updates to value functions, in intrinsically motivated systems, prediction error can be used as a reward signal, fueling curiosity rather than correcting behavior.

"Prediction errors can be computed as the discrepancy between the expected sensory inputs and the actual sensory observations. The system thus monitors the dynamics of such prediction errors over time. It selects those behaviours expected to produce big variations in the prediction errors - or, in other words, those activities that may generate a high information gain." (p. 1)
Schillaci et al(2020), Intrinsic Motivation and Episodic Memories for Robot Exploration of High-Dimensional Sensory Spaces

Real-world research on this and results

Traditional AI metrics like accuracy tend to miss the point when evaluating emotional intelligence. How do you measure empathy or social adaptability? Perhaps evaluations should be based on how effectively these systems react to novel experiences or human interactions. Some research shows that they tend to show stronger generalization to novel environments and more diverse behavior, especially in open-ended tasks where no clear objective is defined.

While these studies don't directly create emotional AI, they suggest intrinsic motivation produces more flexible, adaptable behaviors, key precursors to emotional intelligence.

Zadok, McDuff, and Kapoor (2021) trained an agent using intrinsic rewards based on spontaneous human smiles during simulated driving. Without explicit objectives, the affect-driven agent explored 46% more of its environment, collided 29% less often, and generated superior data for downstream vision tasks.

Pathak et al. (2017) demonstrated this with curiosity-driven agents. When agents trained purely on curiosity in Mario Level-1 were tested on new levels, they explored dramatically faster than starting from scratch. Figure 8 (below) shows that agents that first learned to explore using only curiosity dramatically outperformed agents starting from scratch when tested on entirely new maps, nearly perfect performance versus close to zero.

This reflects broader progress in intrinsically motivated learning. Forestier et al. (2022) showed robots autonomously developing complex tool-use skills, discovering stepping stones between complex and straightforward behaviors without external rewards. The agents self-organized their learning, focusing on high-progress activities while ignoring distractors.

The pattern is consistent: agents optimized for internal growth show promising adaptability that may transfer to complex social interactions, though this remains an open research question.

Feeling Machines

Discussion about this post

Feeling Machines

Training Emotionally Aware AI with Intrinsic Motivation

A Refresher On Reinforcement Learning

Intrinsic Motivational Learning

Frameworks of Intrinsic Motivation

Real-world research on this and results

WORKS CITED

Modeling Affect-based Intrinsic Rewards for Exploration and Learning - Zadok et al (2021)

Deep reinforcement learning from human preferences - Christiano et al., (2017)

Emotion in Reinforcement Learning Agents and Robots: A Survey - Moerland et al., (2018)

What is Intrinsic Motivation? A Typology of Computational Approaches - Oudeyer & Kaplan, (2007)

Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective - Singh et al.,(2010)

Curiosity-driven Exploration by Self-supervised Prediction - Pathak et al., (2017)

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning - Forestier et al. (2022)

Discussion about this post