
The Shape of Stories Inside Neural Networks
While reading stories, LLMs represent human emotions both geometrically (in activation space) and temporally (changing across the course of a story).
Authors
† Equal senior contribution
1 Goodfire
2 Stanford University
You might not remember every word from The Lion King, but you do remember how it felt.
The sun rising on Pride Rock, the tragic death of Mufasa, the triumphant return of Simba—humans follow the emotional dynamics of a story as they process the literal events.
So do language models.
Our previous posts focus only on static geometry in neural representations, during a single forward pass of a language model. In contrast, this post explores neural geometry over time, by studying how language models track emotional dynamics as they read simple stories.
The geometry of human emotion in LLMs
Before we get to temporal dynamics, we need to understand the (static) neural geometry of emotions in language models. Like the other concepts we’ve previously discussed in this series, emotions are represented along a curved structure in the activation space of LLMs.
[REFER TO PRIOR WORK studying the representation geometry of emotions. 1 paragraph, non-exhaustive, just a few casually explained compelling examples]
Recovering the valence-arousal model in activation space
Psychologists have long tried to model human emotions. For decades, the classic approach to describing the (dis)similarities between emotions has been the valence-arousal model [CITE Russell], which organizes emotions along two dimensions (much like an alignment chart):
- Valence: how positive or negative is the emotion?
- Arousal: how intense/energetic is the emotion?
Each emotion is represented in this two-dimensional space. Anger is negative valence and high arousal; sadness is negative valence and low arousal; happiness is positive valence and high arousal. While this model doesn’t capture the full richness of human emotion (and there are many competing models!), it’s popular because it seems to cleanly capture basic emotions with a surprisingly simple model.
Below, we show how six basic emotions are mapped in valence-arousal space using human-judgment data (i.e., according to real people). This distance in this two-dimensional conceptual space defines a geometry of human emotions, which we show below in the similarity heatmap on the right.
The behavioral readouts from LMs are six-dimensional vectors where each dimension is a number from 1 to 10 indicating the degree to which the LM believes that emotion describes the story. If we model these behavioral readouts using a manifold, we recover a two-dimensional space that is strikingly similar to human judgements on valence and arousal.
Emotional humans wrote the text we use to train LMs, and consequently the geometry of human emotion recapitulated in LM behavior. You are what you eat.
LLMs track emotional dynamics in stories
Putting aside the question of neural geometry for a moment, we might ask whether LLMs keep track of how emotions change as a story progresses. Indeed, they seem to have a dynamically shifting sense of emotion from sentence to sentence. These emotional dynamics are apparent regardless of whether we look at internal representations or externalized behavior.
Stories follow the dynamics of human emotion—from tragic falls, to heroic redemptions, to unexpected twists.1 During training, LMs read millions of stories2 while learning to predict each word from everything that came before.
This task may seem simple, but sometimes the arc of a story hinges on a single word. Imagine the understanding of story, character, and emotion needed to successfully predict the next word in the line “Luke, I am your ___” from the screenplay of Star Wars: A New Hope.
To understand how LLMs track the emotional dynamics of a story, we simply ask them.
After each sentence, we prompt the LM to report the degree of surprise, disgust, anger, happiness, sadness, and fear in the story thus far on a scale from 0 to 10. The answers to these questions form a six dimensional vector that serves as a behavioral readout of what the LM believes has happened up to a given moment.
We plot these behavioral readouts in the demo below where you can see how the LM follows the arc of a story.
Combining space and time: the dynamics of emotions in activation space
Armed with an understanding of both the neural geometry of emotions and how LLMs track emotional dynamics over the course of a story, we can now ask: how are emotional dynamics represented over [...]
In addition to asking the LM what it believes, we also harvest the internal activations from the last token of each sentence in a story. Such activations serve as a snapshot of what the model represents after reading the story so far, and to model the geometry of these LM representations, we fit a manifold to these activations. In the demo below, we show internal dynamics as trajectories along this representation manifold.
Conclusion
Stories rise, fall, twist, collapse, and recover and we showed that language models track these movements not only in verbalized reports, but also in the geometry of their internal representations.