Dr Martín Raskovsky

When Pattern Matching Fails

A routine debugging session with an AI assistant revealed something unexpected: the bug we were hunting - a simple "greater than" that should have been "less than" - exposed a fundamental weakness in how AI generates code. The AI reasoned correctly in plain English but produced inverted logic in the actual program. This isn't a new problem. Forty-three years ago, my PhD thesis addressed exactly this gap between what we say we want and what code actually does. The solution I proposed then - formal transformation rules rather than verbal reasoning - turns out to be surprisingly relevant to understanding why AI coding assistants confidently produce plausible but wrong code.

A 43-Year-Old Solution to Today's AI Coding Mistakes

In this essay:

- The Bug That Wouldn't Die - Files that refused to be archived, despite everything working "correctly"
- Finding the Inversion - How a collaborative debugging session uncovered a backwards comparison
- The AI Explains Itself - What went wrong in the reasoning process
- Verbal Versus Formal - The core insight: saying it right doesn't mean coding it right
- A PhD from 1982 - How a student saw the same pattern four decades ago
- Why This Matters Now - The implications for AI-assisted development
- The Path Forward - What humans and AI can do together

The Bug That Wouldn't Die

I was looking at a list of files on my computer - log files from a financial tracking application I'd built. These files were supposed to organize themselves automatically, like a self-tidying filing cabinet. Every hour, the system would take snapshots. At the end of each day, it would combine the hourly snapshots into a single daily file. At the end of each week, the daily files would merge into a weekly file. And so on - weekly into monthly, monthly into yearly.

Think of it like those Russian nesting dolls, but in reverse: small pieces combining into progressively larger containers over time.

The problem? Some files from November were still sitting there, untouched, in mid-December. Four hourly files. Two daily files. A weekly file. All stubbornly refusing to be combined into their monthly container.

The combining program ran every day. I could see it in the logs - it was executing, processing, completing successfully. Yet these November files remained, like guests who won't leave after the party's over.

I brought this puzzle to Claude, the AI assistant I'd been working with on the project.

Finding the Inversion

What followed was a collaborative investigation. The AI examined the code, looking for the logic that decided which files to combine and when.

We found the culprit relatively quickly. Deep in the program, there was a decision point - a guard that was supposed to protect certain files from being processed at the wrong time. The intention was clear from the comments in the code: "Skip files that are from a coarser granularity."

In plain English: when you're doing the hourly combination, don't touch files that have already been combined into daily, weekly, or monthly bundles. Those are "coarser" - they represent larger time periods. Leave them alone.

Sensible. Necessary, even. Without this protection, the system might accidentally re-process files that were already properly archived.

But here's what the actual code said:

If the file's level is less than the current level, skip it.

The problem? The numbering system used "5" for yearly (the coarsest, largest container) down to "1" for hourly (the finest, smallest). So "less than" meant "finer than" - the exact opposite of the intention.

The code was protecting the wrong files. Instead of protecting the big containers from being touched when processing small ones, it was protecting the small ones from ever being combined into big ones.

Those November hourly files? Every time the monthly combination ran, it looked at them, saw they were "level 1," compared that to its own "level 4," determined that 1 is less than 4, and politely skipped them. Forever.

One character wrong. Greater-than instead of less-than. The kind of bug that can hide for months because everything appears to work.

The AI Explains Itself

After we fixed the bug - a simple change from `<` to `>` - I asked the AI something unusual:

"How do you explain that such a logic inversion could have escaped your pattern matching capabilities?"

This wasn't accusation. I was genuinely curious. The AI had written this code in an earlier session, complete with comments explaining the intent. The comments were correct. The variable names were clear. The documentation was thorough. Everything looked right.

The AI's self-analysis was illuminating:

First, there were two different numbering systems in play, with opposite meanings. One system (used for display) put hourly files first. Another system (used for processing) put yearly files first. Keeping track of "higher means finer" versus "higher means coarser" while writing code created confusion.

Second, the error was internally consistent. The code matched the comments. The comments matched the documentation. When everything agrees with everything else, there's nothing to flag as suspicious. The AI's pattern-matching actually reinforced the bug because there was no inconsistency to detect.

Third - and this is the crucial one - the AI reasoned in words, not in arithmetic. It thought: "We need to skip coarser files. Coarser files have higher granularity numbers. So skip when the number is higher." Each step sounds reasonable. But no one stopped to actually trace through: "OK, we're processing monthly, which is level 4. We encounter an hourly file, which is level 1. Is 1 less than 4? Yes. So we skip. Wait - we're skipping the hourly file? That's backwards!"

The AI summarized its own failure mode perfectly: "Coherent, well-documented, confidently wrong."

Verbal Versus Formal

I recognized this pattern immediately. Not because I'm smarter than the AI - I've made the same mistake countless times in fifty-five years of programming. I recognized it because I'd spent years studying exactly this gap.

The problem has a name: the difference between verbal specification and formal specification.

A verbal specification is what you say in plain language: "Skip coarser files when processing finer ones." It captures intent. It communicates to humans. It feels complete.

A formal specification is what the code actually does when you feed it specific values: "When processing level 4, and encountering level 1, the expression `1 < 4` evaluates to `true`, triggering the skip."

The verbal specification was correct. The formal implementation was inverted. And nothing in the AI's process required checking that they actually matched.

This is when I mentioned something that surprised even me:

"A subject that takes me back to my PhD research... back in 1982."

A PhD from 1982

In 1982, I completed a doctoral thesis with a title that sounds almost comically academic: "A Correspondence Between the Denotational Semantics of Programming Languages and the Process of Code Generation."

Strip away the jargon, and here's what it was about:

We already knew, by 1982, how to automatically generate certain parts of a computer program from formal descriptions. Give a precise specification of a language's grammar, and a tool could generate a parser - the part of a compiler that reads code and understands its structure. This worked reliably because grammar has a precise mathematical form.

But generating the part that actually does things - the code generator, the piece that turns instructions into actions - from a formal description? That was unsolved. People would write beautiful mathematical descriptions of what a programming language meant, and then manually write code that (hopefully) did what the mathematics described.

The gap between description and implementation was crossed by hand, by humans, using intuition and experience. And humans make mistakes. They write `<` when they mean `>`.

My thesis proposed a different approach: transformation rules. Patterns that could be matched in the formal specification and systematically converted into implementation code. Not verbal reasoning - mechanical transformation.

The key insight, which I'll never forget because it emerged from my own confusion as a student, was this: I was taking two courses simultaneously. In one, I wrote an interpreter for a simple programming language - actual running code. In the other, I wrote the formal mathematical semantics for the same language - abstract symbols on paper.

The two artifacts looked completely different. One was a program in a systems language called BCPL. The other was equations using something called lambda calculus - pure mathematics, no computers required.

But when I looked at them side by side, I could see they were the same thing. The structure matched. The flow matched. One was a description; the other was an implementation. They were two representations of identical logic.

Nobody had written down the rules for moving from one to the other. So that's what I did. I designed a transformation system - pattern matching with formal rigor - that could take the mathematical description and systematically produce correct implementation code.

Simple, in retrospect. But it required recognizing that verbal reasoning ("these two things are basically the same") wasn't enough. You needed explicit rules that could be mechanically verified.

Why This Matters Now

Here's the twist that struck me during our debugging session:

The bug we found in December 2025 is exactly the problem my 1982 thesis addressed.

The AI had a correct verbal understanding: "protect coarser files from finer processing."

The AI generated a concrete implementation: a comparison operator that evaluated to the wrong boolean value.

The abstract intent didn't constrain the implementation enough. Without formal verification connecting specification to code, the AI pattern-matched on surface features and got it backwards.

Forty-three years of progress in artificial intelligence, and we've recreated the exact problem that formal methods were designed to solve - but now at industrial scale, with AI systems generating millions of lines of plausible, coherent, potentially inverted code.

The AI doesn't "understand" code in any deep sense. It recognizes patterns. It produces output that matches the statistical regularities of the code it was trained on. When I write "skip coarser files," it generates code that looks like the kind of code that typically follows such comments.

But looking right isn't being right. The AI never asked: "Let me trace through this with actual values to make sure my implementation matches my intent."

A formal approach would have required exactly that: define what "skip" means as a function, specify its inputs and outputs, then verify the implementation produces correct outputs for all inputs.

This isn't a criticism unique to AI. Human programmers make the same mistake constantly. The difference is scale. When AI assistants generate code for millions of developers, and those developers trust the output because it's well-commented and internally consistent, inverted logic can propagate at unprecedented speed.

The Path Forward

I don't have a complete solution. But our debugging session suggests some directions:

Hybrid approaches: Use AI for rapid code generation, but apply formal verification to critical logic. Let the AI draft quickly; let rigorous checking catch inversions.

Concrete trace-through: Before accepting generated code, force a step through specific values. "You wrote this comparison. Walk me through what happens when the file level is 1 and the processing level is 4." The bug would have been obvious immediately.

The human role shifts: Perhaps the human's job isn't to write code anymore - it's to verify the correspondence between intent and implementation. The architect who says: "Show me this works for edge cases before you commit."

The 1982 insight remains valid: don't trust verbal reasoning alone. Verify against concrete reality.

Closing: The Unknown Burns Away

In my previous essays, I've written about "burning through the unknown" - how the most satisfying technical journeys transform intimidating mystery into elegant simplicity.

This debugging session did exactly that. The mystery of why AI assistants produce confident, coherent, wrong code isn't mysterious at all. It's the verbal-formal gap, identified forty-three years ago, now manifesting in a new technology.

The AI reasons in words. Words can be ambiguous, inverted, misaligned with implementation. This isn't a flaw unique to artificial intelligence - it's a flaw in reasoning itself, one that humans share.

The solution isn't to abandon AI assistance - it's too useful for that. The solution is to remember what we learned decades ago: formal verification exists because verbal reasoning isn't enough.

And perhaps there's something appropriate about the human role in this collaboration. Not as the one who writes code - the AI can do that faster. But as the one who insists on grounding abstract intent in concrete verification. The one who asks: "Are you sure? Show me the values."

The unknown burns away. What remains is an old truth in new clothing: saying it right isn't the same as doing it right. The gap between description and implementation requires more than pattern matching to cross.

It requires checking.

Related reading:

Thesis introduction
Burning Through Unknown
Exploring AI as a Software Development Assistant

Dr. Martín Raskovsky - December 2025

We love to hear your comments on this article.