Burning Through Unknown Part II Thought

How does a machine reason? In November 2025, three days of building a semantic search engine burned away the mystery of how machines find similar things - revealing geometry underneath. This essay takes the next step: from finding to thinking, from geometry to reasoning - and asks how far that journey has actually gone.

From Search to Thought

How machines moved from finding to reasoning - and how far that journey has actually gone

The Open Mouth

There is a technique I have used more times than I can count, across five decades of problem solving. When a problem refuses to yield - when you have stared at it long enough that the words stop making sense and the thinking feels circular - you find someone and start explaining it to them. It does not matter who. A colleague, anyone who will listen. You begin at the beginning, you describe what the process is supposed to produce, you walk through what it is actually producing instead, you start to articulate the gap between the two -

And somewhere in the middle of that explanation, you stop.

Your listener's mouth is still open. They have not said a word. They may not have understood a single sentence. And you say: "Thank you. I know what was wrong."

The act of describing found what the act of thinking could not. Articulation created an accountability that silent reasoning had avoided. The gap that could hide inside your head became impossible to sustain the moment you had to say it out loud.

I did not expect that observation - so familiar, so human - to tell me something precise about how machines are learning to think. But here we are.

Where We Left Off

In November 2025 I wrote about three days spent building a semantic search engine for my artwork. The essay was called Burning Through Unknown, and it documented the moment a set of intimidating abstractions - with difficult names like embeddings, vector spaces, CLIP, and FAISS - burned away to reveal something unexpectedly simple underneath. Images become numbers. Text becomes numbers. Similar things land close together in a high-dimensional space. Finding what is similar becomes a matter of measuring distance - geometry, not magic.

That essay ended at retrieval. The machine found things. It ranked them by proximity. It did not think about them.

This essay starts where that one ended. The question it asks is the one I left open: what happens after retrieval? How does a machine move from finding to reasoning?

Understanding how embeddings work - how meaning becomes geometry - is not optional background for what follows. It is the foundation everything else is built on. If that concept is unfamiliar, Burning Through Unknown is the place to start. Without it, the steps that follow will lack their footing. What follows here assumes that floor and builds upward from it.

The Librarian and the Scholar

Imagine a scholar of extraordinary breadth. Before the door to his study closed, he read. He did not read selectively or sparingly - he read everything. Books, articles, conversations, instructions, arguments, stories, scientific papers, legal judgments, song lyrics, code. Billions of pages. And he did not simply absorb them passively - he was tested, constantly, on what came next. Given the beginning of a sentence, what follows? Given a question, what is the answer? Given an argument, what is the counterargument? Every time he was wrong, he adjusted - slightly, invisibly, cumulatively. Every time he was right, the adjustment reinforced itself. After billions of such corrections, across billions of such pages, the testing stopped. The door closed. What remained was not a database of facts he could point to - it was something more distributed and more remarkable: a vast, implicit understanding of how language works, how ideas connect, how questions relate to answers. Not stored anywhere in particular. Present everywhere at once.

He remembers it all with perfect fidelity. You can ask him anything and he will answer with confidence and depth.

There is one catch. Nothing published after the door closed has reached him. Ask about last week's discovery, yesterday's news, the letter written this morning - and he is blind. He will not tell you he is blind. He will tell you something plausible, assembled from what he does know, which may be dangerously wrong.

The Scholar - the large language model on his own. Brilliant within his horizon. Sealed beyond it.

Now imagine a librarian working just outside that closed door. She has access to everything - including everything written after the door closed. When you ask the scholar a question, she moves first. She searches the shelves, finds the most relevant pages, and passes both question and pages under the door.

The scholar reads those pages fresh - as you would read a document handed to you this morning - and reasons over them with full intelligence. He did not study those pages during his lifetime of learning. He does not need to have. He is simply reading, and thinking, and answering on the basis of real current evidence rather than frozen memory.

The Scholar reads fresh evidence. The Librarian finds it. Together they are RAG - Retrieval Augmented Generation. Retrieve first, then generate.

The librarian uses the geometry we explored in Part I - the same embeddings, the same vector search, the same distance measurement - to find the relevant pages. Your question becomes a position in space. The most relevant passages are the ones closest to that position. Numbers used for navigation, then discarded.

What passes under the door is not numbers. It is text. Plain, readable, specific text - placed in front of the scholar alongside your question. He reads both and answers.

Two distinct actors. She navigates. He thinks. Neither does the other's job.

And crucially - her shelves are alive. New documents can be added today, this morning, this minute. His knowledge remains frozen at the date the door closed. But the evidence he is given to reason over can be as current as she chooses to make it. The time limitation is not eliminated - it is routed around.

Words, Structure, Meaning

Consider what happens when you read a sentence. Any sentence. You do it so automatically that the process is invisible - but there are three distinct things happening, one after another, so fast they feel simultaneous.

First you recognise the words. "The", "cat", "sat", "on", "the", "mat." Individual units, each one familiar. This requires no understanding - a child learning to read does this before they understand anything.

Then you parse the structure. Subject, verb, location. Something did something somewhere. The words arrange themselves into a relationship. Still no meaning yet - a grammatically identical sentence in a language you do not speak would pass this test and tell you nothing.

Then meaning arrives - a piece of text, assembled from everything before it, that did not exist in any single word or relationship alone. A specific animal. A specific posture. A specific place. The sentence lands.

Three levels. Recognition, structure, meaning. Each one built on the one below. You cannot have meaning without structure. You cannot have structure without recognising the units first.

A machine reading language follows the same three levels.

Tokenisation is recognition. The text is broken into units - not always whole words, sometimes fragments, sometimes punctuation - each one mapped to a number the machine can work with. At this stage the machine knows nothing. It has a sequence of tokens, nothing more.

Embedding is structure - but richer than grammatical structure alone. Each token is placed in the geometric space we explored in Part I. Relationships emerge - not just grammatical ones but semantic ones. "King" and "queen" land near each other. "Sad" and "melancholy" are neighbours. The structure is not a tree on a page but a shape in space.

Reasoning is meaning - a piece of text, generated from everything the machine has processed, that addresses the question asked. From the positioned tokens, from the relationships between them, from the context of everything else present, something is derived and expressed as text that did not exist before.

The question of how to define what a program means with enough precision that it can be built reliably is one I first encountered in a very different context - not in AI but in the formal study of programming languages, more than four decades ago. The question now is how to get from recognising units to deriving genuine meaning in the full complexity of human language. Both questions circle the same deep problem: the gap between describing what something should do and building a system that reliably does it.

I did not expect to meet that question again here. But here it is.

The Machine With an Open Mouth

Remember the technique from the opening. The problem that yields mid-explanation. The listener whose mouth stays open while you find the answer yourself.

Now ask: what if the listener was the scholar himself - the machine being asked the question - thinking out loud before committing to an answer?

In 2022 a team of researchers made a discovery so simple it was almost embarrassing. They found that if you asked a language model to show its working before giving an answer - to think out loud, step by step - the accuracy on complex problems improved dramatically. Not a new model. Not new training. Not an architectural change of any kind. The same frozen scholar, the same sealed study, the same accumulated knowledge.

Chain of Thought. The same Scholar - but now asked to think out loud, step by step, before committing to an answer.

To see why this matters, consider a concrete example. Ask the scholar directly: "A farmer has 17 sheep. All but 9 die. How many are left?" He answers immediately: "8." Wrong. He pattern-matched to "17 minus 9" because that is the strongest arithmetic pattern behind those words. The real answer is 9 - "all but 9" means 9 survive. The leap from question to answer skipped over the semantic interpretation entirely.

Now add the instruction: "Let's think step by step." The scholar writes: "All but 9 die means 9 sheep survive. The number remaining is 9." Correct. The instruction forced him to articulate the intermediate step - and the articulation caught the error before it became the answer.

The improvement was not marginal. On mathematical problems, on logical puzzles, on multi-step reasoning tasks, the difference between asking directly and asking with that single additional instruction was the difference between wrong and right. Consistently, reproducibly, across problem types nobody had anticipated when the technique was discovered.

Why?

Go back to the scholar's desk. When you ask him a question and demand an immediate answer, he reaches into his vast accumulated knowledge and produces whatever response has the strongest pattern behind it. One step. Question in, answer out. Fast, fluent, and on genuinely hard problems - unreliable. The leap from question to answer skips over the gaps, papers over the ambiguities, commits to the first plausible path without checking whether it holds.

Now ask him to explain his thinking first.

Each intermediate sentence he writes becomes part of what he reads next. He is not thinking in silence any more - he is thinking on paper, and the paper is in front of him. A correct intermediate step makes the next correct step more probable. The gap between "all but nine" and "minus nine" cannot hide once it has to be written down in plain language before the arithmetic begins.

This is exactly what happens when you explain a problem to a chosen listener - someone whose role is simply to receive the explanation, whether or not they understand it. The listener's comprehension is irrelevant. The act of speaking to the listener is everything.

The context window - the desk on which the scholar writes and reads - is that listener. The machine speaks its intermediate reasoning into the context window. It then reads what it has written. Each pass makes the next more accountable. It is the open mouth made concrete: not a person sitting across the table, but the machine's own growing record of its thinking, always present, always readable, always informing what comes next.

There is a name for this in software engineering. Rubber duck debugging. You place a rubber duck on your desk. You explain the problem to the duck. The duck says nothing - it is just a duck. You find the solution before the duck has had a chance to be unhelpful. The duck's silence is the point. Your articulation is the mechanism.

Chain of Thought is rubber duck debugging, formalised and scaled. The machine is simultaneously the engineer and the duck. It speaks its reasoning into the context window, reads it back, continues from a more informed position - each iteration more refined than the last, each pass narrowing the gap between the question and the correct answer, until what remains is the answer that survives the scrutiny of its own reasoning.

But there is a limitation. And it is one that anyone who has ever tried to follow a complex set of instructions will recognise.

Imagine you are assembling a piece of furniture. The instructions are clear and you follow them one step at a time. For most of the assembly, each step follows naturally from the previous one. But then you reach a step that says: "Insert bolt A into slot B, ensuring it aligns with the bracket fitted in step 3." You look at what you built in step 3 and realise - that bracket is in the wrong position. The current step cannot be completed correctly without going back and redoing something from much earlier.

One-track, forward-only reasoning cannot handle this. You must backtrack.

Chain of Thought is like those furniture instructions - clear and effective when each step follows naturally from the last. It fails when the correct next step depends on something seen much earlier, or when a choice made halfway through turns out to be wrong only when you are almost at the end.

For those problems, you need something more. You need the ability to stop, recognise that the current path is failing, return to an earlier point, and try again.

The Chess Player Who Looks Ahead

A strong chess player does not simply make the move that looks best right now. Anyone can do that. What separates the strong player from the merely competent one is the discipline of sitting back before moving - mentally playing out several lines simultaneously, abandoning the ones that lead to trouble after five or six moves, and only committing to a move when they have found a line that holds under pressure from every direction they have tested.

The board has not changed. The pieces are the same. The rules are identical. What is different is the exploration before commitment.

When faced with a hard question, instead of answering immediately, the reasoning model does the same. It generates an extended internal exploration - often invisible to the person asking - in which something recognisable as genuine problem solving takes place.

It restates the problem in its own words - checking its understanding before proceeding.

It identifies what it knows and what it needs to establish.

It tries one approach, follows it several steps, then notices - internally - that this path assumes something that may not hold. And it changes direction.

It tests competing lines of reasoning against each other, the way a chess player tests competing sequences of moves.

It checks its own conclusions by approaching from a different angle.

It commits to an answer only when the reasoning holds from every direction it has tested.

A Dedicated Reasoning Model. Not a different machine. The same Scholar - but one who has learned, through long practice, to explore before he speaks. To treat every hard question as a chess position worth studying before the piece is touched. The answer delivered is not the first answer found. It is the answer that survived the exploration.

This is qualitatively different from Chain of Thought, where the reasoning is linear and visible. A reasoning model explores a tree of possibilities - branch by branch, line by line - before committing. The answer is not the end of a line. It is the surviving branch of an exploration, the move that held up under scrutiny from every direction.

There is an honest limitation worth naming. On questions with a clear, verifiable answer - mathematics, logic, code - the model can check its own work, because correctness is unambiguous. On questions where correctness cannot be verified - where the answer is a matter of judgment, or values, or genuine uncertainty - the extended exploration can produce something that looks like deep reasoning but remains, underneath, a process that cannot guarantee it has found the truth. The appearance of rigour is not the same as rigour.

This is the frontier. Not a solved problem. A direction of travel.

The Desk

Everything we have covered - the librarian's pages passed under the door, the scholar's step by step explanation, the chess player's exploration before commitment - happens in one place.

The desk.

The scholar's lifetime of accumulated knowledge is fixed and internal - everything he learned before the door closed, carried inside him, shaping how he reads and reasons. But he can only reason about what is physically in front of him right now.

The Context Window - the desk. The text the model can see in this moment. Everything outside it is invisible. Everything on it is equally present. This is where the scholar's lifetime of learning meets the specific problem in front of him right now. Everything that makes him useful happens here, on this surface, with whatever has been placed on it.

This is the spine that connects every concept in this essay.

Finding the right pages - navigating the external world to identify what belongs on the desk. Placing those pages there - fresh, current, specific, regardless of when they were written. Using the desk as a scratchpad - writing intermediate steps, reading them back, continuing from a more informed position. Managing the scratchpad across a branching exploration - filling the desk, clearing it, refilling it, until the answer that survives is the one worth committing to.

As the desk grows larger - and the desks available to current models are hundreds of times larger than those of just a few years ago - the quality of reasoning that is possible grows with it. More pages from the librarian. Longer chains of thought. More extensive explorations before commitment. The desk that once held a short conversation can now hold entire books, extended reasoning, and multiple retrieved documents simultaneously.

But the desk has edges. Beyond those edges, the model is blind. No matter how large the desk becomes, there will always be relevant information that did not fit - or that the librarian failed to select.

And there is something more subtle. Imagine reading a very long document - a hundred pages, say. You are most alert at the beginning, when everything is fresh. You are most attentive at the end, when the conclusion is arriving. But somewhere in the middle, across pages forty to sixty, your attention drifts. You are reading, but not quite with the same sharpness. Something important on page fifty-three may pass through without fully landing.

The scholar at his desk has exactly the same habit. Evidence placed in the middle of a very large pile of documents may be effectively invisible to him - present on the desk, but not attended to with the sharpness it deserves. A crucial passage, buried in the middle of everything else, can be missed not because it was not there, but because it was too far from where his attention naturally rests.

This is a known limitation, without a clean solution yet. It is one of the places where the distance between what reasoning models can do and what careful human reasoning does remains stubbornly wide.

The desk is not a metaphor for something exotic. It is simply text. Everything on it - the question, the retrieved passages, the chain of intermediate reasoning, the conversation history - is plain text, placed in sequence. The model reads it the way you are reading this sentence. What it does with what it reads is where the intelligence lies.

Not in the desk. In the scholar sitting at it.

What Burns Away This Time

The first essay ended cleanly. The unknown became known. The black hole became a roadmap. The mystery of how machines find similar things burned away completely, leaving not a vacuum but mathematics - visible, demonstrable, elegant in its simplicity. Embeddings, vectors, distance. You could point to it. You could show it working.

This essay ends differently. And it should.

The mystery of finding burned away to geometry. The mystery of reasoning burns away only partway.

What clears is the mechanism. The librarian and her living shelves. The scholar reading fresh evidence through the gap under the door. The desk as the only workspace where thinking actually happens. The rubber duck that says nothing while the explanation finds the answer. The chess player who earns the right to move by first exhausting the alternatives. These are not metaphors decorating a technical reality - they are the technical reality, described at the level where it becomes visible.

Underneath them, the mathematics is the same patient geometry. Tokens placed in space. Distances measured. Text concatenated with text. Patterns applied to inputs, layer after layer, until what emerges is a coherent, contextually appropriate continuation of everything that came before.

Nothing mysterious. Nothing magical. The unknown, once again, burns away to reveal a mechanism.

But something remains that does not burn away. And it is the most interesting thing.

When does statistical approximation of reasoning become reasoning itself?

The question is not rhetorical and it is not philosophical decoration. It has practical consequences for every domain where we are beginning to trust these systems - medicine, law, engineering, science. A reasoning model that backtracks on instinct rather than formal rule, that checks its conclusions against patterns rather than proofs, that explores a tree guided by experience rather than guaranteed correctness - is producing something that looks like thought, behaves like thought, and in many cases produces results that thought would be proud of.

But it is not thought in the way we have always understood the word. It is something new. Something that did not exist before. Something we do not yet have the right language to describe precisely.

During my undergraduate years in computing science - studying AI, denotational semantics, and compiler writing - I thought intensely about the gap between formal description and working implementation. Those years of focused inquiry eventually led to my PhD. I did not carry the question with me consciously for the decades that followed. What I did not expect was to find it waiting here, unchanged at its core, dressed in entirely different clothes.

But the question at the centre of it was the same question that sits at the centre of this essay. How do you get from meaning - precisely described, carefully grounded - to a system that reliably does what the meaning says?

We solved a version of it then, for a constrained and formal domain, with tools that guaranteed correctness.

We are attempting a version of it now, for the full complexity of human language and thought, with tools that guarantee nothing but have proven, repeatedly and surprisingly, to be extraordinarily useful.

That is not a failure. It is a different kind of engineering. One that trades rigour for reach, and finds, more often than it should by rights, that the reach was worth it.

The unknown burns away. What remains is not ignorance.

It is a question precise enough to be worth asking, and open enough that the answer is still ahead of us.

That, in my experience, is the most attractive place to be.

Dr. Martín Raskovsky - April 2026

We love to hear your comments on this article.

Dr Martín Raskovsky

Burning Through Unknown Part II Thought

From Search to Thought