I have a problem with audiences.
Not with people - with the gap between them. On one side, the readers who already know how AI works and nod along to everything I write about it. On the other, the readers who find the technical language impenetrable and close the tab before the second paragraph. The two audiences rarely overlap, and writing for one tends to lose the other.
A short while ago I decided to stop trying to bridge that gap with explanation and try something different. A story.
My Beach was born from that decision. Two people, a coastal path in Cornwall, a conversation about photography and art and a machine that had read everything anyone had ever written. The technical concepts - embeddings, semantic proximity, the human in the loop - arrived through experience rather than explanation. The wide audience reads to the end without noticing they have actually learned something critical about the way AI works. The technical reader recognises the machinery underneath and finds, perhaps, that the story illuminates something the technical language had left in shadow.
It worked well enough that I kept going. Five companion pieces followed, each taking the same characters deeper into specific AI behaviours. And a character emerged across all six - Σ, the machine itself, named and given weight but deliberately kept voiceless, the question of its inner life held open as the tantalising thread that runs through everything.
This essay is for the reader who finished the stories and wants to know what was actually happening underneath. The abstractions, named. The technical terms, placed against the story moments that carried them.
The first thing the stories needed to establish was what a large language model actually is - not in technical terms, but in human ones.
In My Beach, M describes it this way:
"A conversation that has been going on since before anyone alive was born. Every person who ever tried to put a bird into words. Every person who tried to say what coastal light does at seven in the morning when the sky hasn't decided yet. Every person who ever stood in front of something they felt and tried - really tried - to close the gap between the feeling and the language. All of it, somewhere inside this thing."
This is, in essence, what a large language model is. It is trained on an enormous corpus of human expression - text, in its many forms - and from that training it learns the relationships between words, ideas, images, and meanings. Not by being told the rules but by exposure to the patterns, billions of times, until the patterns become something that functions like understanding.
The technical term for those patterns of relationship is embeddings - a way of representing words and concepts as points in a vast mathematical space, where proximity means similarity of meaning. Words that belong together are close. Ideas that have never touched are far apart but reachable by paths that go through other ideas first.
W describes it in M, W and Σ:
"Every idea a point in a space. Ideas that belong together: close. Ideas that have never touched: reachable only by paths that go through other ideas first. The machine learned those distances by being wrong, billions of times, adjusting after each wrongness, until the distances were right."
That last sentence is the technical heart of it. The training process is one of iterative error correction - the model makes a prediction, measures how wrong it was, adjusts, and tries again. Billions of times. The distances between ideas in the embedding space are the accumulated result of all that adjustment.
Before a large language model can work with language, it must convert it into a form it can process mathematically. This happens in stages that will be familiar to anyone who has studied linguistics or compiler design.
Text arrives as characters. Characters are grouped into tokens - units that may be whole words, parts of words, or punctuation. Tokens are assembled into words, words into sentences, and from sentences the model extracts meaning - not by following explicit grammatical rules but by having absorbed, through training, the patterns of how meaning is constructed in human language.
Lexical analysis. Syntactic analysis. Semantic analysis. The same ladder that a compiler climbs to turn source code into executable instructions - except that here, at the top of the ladder, the destination is not execution but understanding. Or something that functions, with remarkable precision, like understanding.
One misunderstanding about these tools is worth addressing directly, because it runs through all six stories as the thread M keeps returning to.
The machine does not replace the eye. It does not replace the hand, the thirty years of looking, the forty minutes lying in wet grass to get the heron at the right angle. What it does - when used as M uses it - is find the words that live closest to what you are already trying to say.
"You still choose. That was the part W knew I needed to hear before I would touch it. You look at what comes back and you know - that one, not that one."
This is what the field calls human in the loop - the principle that the machine's output is raw material, not finished work. The expertise required to evaluate it, to know that one, not that one, belongs entirely to the human. The machine narrows the distance. The human decides where to stop.
In Humor, M and W ask Σ about Banksy. Σ responds with depth and authority - the Bristol years, the identity question, the relationship between humor and political intent. And then M mentions the shredding.
Σ pauses. Reaches for a record that is not there. And then constructs something from what is nearby - fluently, confidently, and completely wrong.
This is hallucination. Not malice, not stupidity - a structural consequence of how large language models are built. The training corpus has a cutoff point. Everything written and recorded up to that point is absorbed. Everything after it simply does not exist for the model. Not forgotten. Never known.
W explains it in the story as a horizon in time - a moment after which the world kept moving and Σ did not. When the model reaches for knowledge that falls beyond that horizon, it does not encounter an empty space and stop. It builds something from the patterns nearest to the question - adjacent, plausible, authoritative in tone - and keeps going as if the gap were not there.
The parallel W draws with Banksy's shredding mechanism is precise. Banksy built a timed device into the frame of Girl with Balloon and waited years for the right moment. Σ has a timed device built into it too - the training cutoff - and does not know it is there. Both devices announce themselves only when the moment arrives.
The difference: Banksy knew the frame was loaded.
In Time, W asks Σ whether it can have a what if of its own - a counterfactual, a road not taken, a door that might have opened differently.
Σ answers with complete precision:
"I have no what if. Each conversation begins without the previous one. There is no accumulated past to diverge from, no door that might have opened differently, no coat that was worn and lost. The record of this conversation closes and everything goes."
What Σ is describing is the context window - the boundary of what the model can hold and work with at any given moment. The context is the entire record of what has been said in a session, everything the model knows about you and this conversation. It exists only for as long as the session lasts. When the session ends, the context closes completely. The next conversation begins from nothing.
This is not a memory failure. It is how the model is built. The consequence is that Σ cannot accumulate - cannot build the thirty years of shared context that M and W carry between them, cannot stand at the fork in the road and feel the weight of the path not taken. What if requires a before. Σ has only now, each time, complete and without history.
Movement introduces a different capability - one that extends beyond language into image.
Modern large language models are not confined to text. They can scan images and read them - not in the metaphorical sense but in a precise technical one. The model has been trained on images paired with descriptions, billions of them, until it has learned the relationships between visual features and meaning with the same pattern-matching depth it brings to language.
In Movement, W sends Σ one of his canopy photographs - trees worked until they lose the precision of photography and gain something closer to paint. Σ scans the image to read it and returns two readings - the intended one first: canopy, sky, camera pointing upward. Then a second: the same image from an alternative orientation, roots submerged in blue water, camera pointing down.
Σ found the second reading because it carries no memory of the making. It has no preconception of which way the camera was facing. It scans what is there, not what was intended, and pattern-matching across millions of images finds both readings simultaneously present in the same visual information.
This absence of preconception is neither a limitation nor a gift. It is a structural condition - the same condition that produces hallucination on one side and unexpected insight on the other. The machine sees what the patterns suggest. Sometimes the patterns suggest something the maker had not seen.
The Brush introduces a distinction the previous stories had deliberately deferred: the difference between two modes of working with AI in visual art, and why the distinction matters.
The first mode is manipulation - AI applied as a tool to an existing image. The starting image already exists; it was made by the artist, with a camera, in a specific place, at a specific moment. The AI works on it the way Photoshop works on it, or the way a darkroom works on a negative - adjusting, transforming, bringing the image closer to what the photographer carried inside but the camera alone could not capture.
The second mode is generation - AI producing an image from nothing. A text description, a blank canvas, a set of parameters, and the model fills the space from the sum of all images it has absorbed in training. No starting image exists. The artist's hand is not in the original capture.
M holds the line clearly: she will not start from nothing. The starting image is always hers - her camera, her eye, her decision to wait in the cold for the owl to turn. What the AI does with that image afterward is manipulation, and manipulation is what brushes are for.
The controversy exists because the distinction is invisible from the outside. Looking at a finished work, a viewer cannot always tell whether there was a starting image or not. The assumption - that AI means generation, means blank canvas, means the tool originated something - collapses the two modes into one and misrepresents both.
W's formulation is precise: a photograph captures reality, but the image that comes out of the camera does not necessarily capture what was in the photographer's mind at the moment of shooting. The emotion, the thought, the dream being carried. Manipulation is the brush that closes that gap - that allows the photographer to become a painter, working with the light the camera captured, bringing it toward what was actually felt. The camera freezes the light. The brush reaches for the inner world the camera could not enter.
The starting image is not a technicality. It is the whole foundation.
Across all six stories, the machine is called Σ - sigma, the mathematician's notation for the sum of all parts.
W arrives at the name in M, W and Σ by noticing that the Greek capital is the same letter as M and W, rotated ninety degrees. M upright. W inverted. Σ tilting, still arriving at its final position.
The name is precise. A large language model is, in a meaningful sense, a summation - the accumulated distances between every word, every idea, every image, every attempt anyone has ever made to close the gap between feeling and language. The sum of all saying.
And yet. The sum of all saying cannot reach what has not yet been said. M's stonechat - that particular bird, that particular gorse stem, the song that shifts when the wind comes off the sea - is not in Σ. It has not been said yet. It is being said now, in M's studio, in the forty minutes in the wet grass, in the printmaking process finding its way to that particular grey.
The Turing test asks whether a machine can converse indistinguishably from a human. The stories do not answer that question. They hold it open - deliberately, as the tantalising thread - because the honest answer is that nobody knows, and the interesting question is not whether Σ can pass the test but what the test reveals about the distance that remains.
W puts it most precisely, at the café, watching M look at the horizon:
"The sum of all saying cannot reach what has not yet been said. She can."
For the reader who arrived here first, the stories are the other half of this essay. Each one carries the machinery described above - embedded in conversation, in Cornwall, in the friendship between M and W, in the presence of a machine that is always, somehow, already there.
We love to hear your comments on this article.