Dr Martín Raskovsky

Burning Through Unknown

After five decades in software development, I've learned that the most attractive technical journeys aren't about conquering complexity-they're about transforming the intimidating unknown into something elegantly simple. This week, I experienced exactly that: three days of building an AI-powered semantic search engine for artwork, where each step burned away another layer of mystery until what remained was not magic, but mathematics we could see, understand, and demonstrate.

A Three-Day Journey from Mystery to Working AI

How building a semantic art search engine transformed abstract AI concepts into something beautifully simple

The journey began with a question that felt both compelling and daunting: Could we build a search engine where "sunset" and "orange evening sky" find the same artworks-not through keyword matching, but through genuine semantic understanding? The concept sounds almost mystical. How does a machine "understand" that these different phrases mean the same thing?

Spoiler: it doesn't understand in any human sense. But what it does is far more interesting.

The Black Hole of Unknown

On Monday morning, the landscape looked like this: - CLIP, FAISS, embeddings, vector spaces-impenetrable terminology - "Semantic search" felt like a black box of AI magic - The gap between concept and working demo seemed vast - Most critically: I didn't know if I could actually build this

This is the nature of the unknown in technical work. It's not just lack of information-it's the inability to even frame the right questions. When you don't understand the territory, you can't draw the map.

The traditional response would be to study: read papers, watch tutorials, understand the theory before touching code. But my recent experiences with AI-enabled development suggested a different approach: what if we could burn through the unknown by building, not by studying?

From Images to Vectors: The First Mystery Revealed

The first step was simple in statement, mysterious in mechanism: take 50 of my artworks and convert them into "embeddings."

What's an embedding? Before this week, I would have given you some hand-wavy answer about "numerical representations" and hoped you'd nod along. But here's what actually happens, and it's simultaneously more mundane and more elegant than the mystical language suggests:

I feed "Orange_Sunset_II.jpg" into CLIP (Contrastive Language-Image Pre-training-a model from OpenAI). The image goes in. Out comes: `[0.234, -0.456, 0.678, ..., 0.123]`-exactly 512 numbers. These aren't arbitrary. The first might capture "warm tones," the fifteenth might encode "horizontal composition," the hundredth might represent "sky texture," though the reality is more distributed and less cleanly mappable.

The mystery began to burn away the moment I could actually see these numbers. Not understand them in some deep mathematical sense-just see them. The unknown became concrete: images are points in a 512-dimensional space.

Here's what's beautiful: run the same image through CLIP tomorrow, and you get exactly the same numbers. Feed it a different sunset, and you get similar numbers. Feed it a portrait? Completely different region of that 512-dimensional space.

The algorithm isn't "understanding" anything. It's measuring visual and semantic features using patterns learned from 400 million image-text pairs during training. But that measurement is consistent, repeatable, and-most importantly-useful.

Text Joins the Party: The "Aha!" Moment

The second mystery was more subtle: how does text search work if we only have image vectors?

The answer is the kind of elegant simplicity that only becomes obvious after you've burned through all the intimidating terminology: CLIP encodes text into the same 512-dimensional space.

Type "sunset" and you get: `[0.221, -0.443, 0.689, ..., 0.134]`

Type "orange evening sky" and you get: `[0.229, -0.438, 0.701, ..., 0.128]`

Notice something? These text vectors are close to each other, and close to the image vector for Orange_Sunset_II. Not identical-close. In 512-dimensional space, we can measure this closeness mathematically. It's just distance calculation, the same Euclidean distance we learned in high school geometry, just in more dimensions.

The mystery of "semantic understanding" burns away completely here. The AI doesn't "understand" that sunset and orange evening sky mean similar things. It just produces similar vectors for them because during training on billions of text-image pairs, it learned that these phrases appear in similar visual contexts.

It's pattern recognition at massive scale, compressed into a mathematical transformation. Not magic-geometry.

The Search Engine: Complexity Becomes Simple

With 50 images encoded into vectors and text queries encoding into the same space, the remaining mystery was: how do you search through millions of vectors quickly?

This is where FAISS (Facebook AI Similarity Search) enters. I expected this to be the most complex part-sophisticated algorithms, intricate data structures, days of optimization.

Instead: five seconds to build an index. Queries return in 50 milliseconds.

The unknown burned away again: FAISS organizes the vectors using hierarchical clustering, creating a structure where similar vectors are grouped together. Searching becomes a geometric problem-find the nearest neighbors to the query vector. It's the same concept as "find the nearest coffee shop" except in 512 dimensions instead of 2.

Run the demo script: type "sunset", get results in milliseconds. Type "orange evening sky", get similar results. The mystery of "how do you search semantically?" evaporates into "you measure distance in vector space."

The Paradox of Simplicity

Here's what's interesting about this journey: to users of the search engine, it feels like magic. You type "foggy morning" and somehow the AI knows to show you misty landscape photos. It's impressive, almost uncanny.

But to us, having burned through the implementation, it's beautifully simple: 1. Image → 512 numbers (CLIP encoding) 2. Text → 512 numbers (same CLIP encoding) 3. "Similar" = close in 512-dimensional space (geometry) 4. Fast search = smart organization of vectors (FAISS indexing)

The paradox is that the mystery sells to users, while the simplicity empowers us as builders. We need both perspectives-the wonder that attracts users, and the clarity that enables development.

What "Understanding" Really Means

Throughout this project, we kept using the word "understand" in quotes. Does the AI "understand" that sunset and orange evening sky are related?

The honest answer: it depends on what you mean by understand.

If understanding requires consciousness, intentionality, or anything resembling human cognition-no, absolutely not. The model is a mathematical transformation, nothing more.

But if understanding means "producing useful, consistent responses to semantic similarity that generalize beyond training examples"-then yes, it understands in a functional sense.

This isn't philosophical hedging-it's recognizing that we've been so focused on human-like understanding that we've missed the value of mechanistic pattern recognition at scale. The CLIP model doesn't "know" anything, but it encodes 400 million image-text relationships in a way that generalizes beautifully.

The mystery of AI "understanding" burns away when you realize it's a different kind of intelligence entirely-not consciousness, but compressed statistical regularity that happens to be extremely useful.

The Technical-Economic Reality

Building this search engine took three days. Not three months of research, three weeks of development, three days of debugging-three days from zero to working demo deployed to the cloud.

This wouldn't have been possible five years ago. Not because the algorithms didn't exist (CLIP is from 2021, FAISS from 2017), but because the integration patterns, the tooling, the accessible documentation, and crucially-the AI assistance in writing the glue code-make what was once a research project into something buildable in a long weekend.

The economic reality of software development is shifting. When implementation becomes nearly instantaneous, when you can test architectural ideas in minutes rather than weeks, the traditional separation between design and implementation collapses. We explored this in my previous essay on Architecture Through Implementation; this project is another data point in that evolution.

I didn't design the vector search architecture and then implement it. I implemented three different approaches, saw what worked, refined what didn't, and the architecture emerged through concrete experimentation. The unknown burned away through building, not through abstract planning.

From Prototype to Production: The Remaining Mysteries

After three days, we have a working demo: 50 artworks, semantic search, deployed to Streamlit Cloud, accessible to anyone with a browser. No keyword matching, no manual tagging-just AI understanding of visual and semantic similarity.

But several mysteries remain, deliberately unexplored:

The LLM question: We're not using a large language model yet. Search is pure vector similarity. Adding GPT-4 or similar could enable: - Natural language query parsing: "find calming art for a bedroom" → extracted features - Query refinement: "show me something more abstract" - Result explanations: "this matches because of warm tones and horizontal composition" - Conversational search: multi-turn dialogue to narrow results

The question is whether this adds value or just complexity. Sometimes the simplest solution-pure vector similarity-is sufficient. The mystery here isn't technical but strategic: when does added capability justify added complexity?

The scaling question: 50 artworks to 5,000 to 500,000. The architecture handles this-FAISS scales to billions of vectors. But operational challenges emerge: embedding generation time, storage costs, cold-start performance, cache invalidation. These aren't unknowns-they're well-understood engineering problems. But they're the difference between prototype and production.

The quality question: Is semantic similarity enough, or do we need hybrid search combining vector similarity with metadata filters? "Abstract paintings under $500 with blue tones" requires both semantic understanding and structured query. The mystery here is finding the right balance between pure AI search and traditional database queries.

These remaining mysteries aren't roadblocks-they're the natural next phase. The difference is that now we have a working foundation to build on. The core mystery-can semantic search work for artwork?-is answered. What remains is refinement.

The Attractive Journey

So why is burning through the unknown attractive?

Not because it's easy-it isn't. Not because the destination is guaranteed-it's not. But because each layer of mystery that burns away reveals not arbitrary complexity but elegant simplicity underneath.

AI isn't magic. Vector embeddings aren't mystical. Semantic search isn't incomprehensible. They're layers of mathematics, carefully constructed, that happen to be remarkably good at capturing patterns we care about.

The journey is attractive because each step makes you more capable. Monday morning, "build a semantic search engine" felt impossibly daunting. Friday afternoon, I can explain exactly how it works, demonstrate it to anyone, and confidently describe how to scale it to production.

The unknown became known. The black hole became a roadmap. The mystery became mathematics.

What This Means for Software Development

This experience reinforces something I've observed repeatedly in recent AI-assisted development: the bottleneck is shifting from "can we implement this?" to "do we understand what this actually needs to be?"

The implementation of semantic search-the code, the infrastructure, the deployment-happened almost effortlessly once we understood the core concept. CLIP for encoding, FAISS for indexing, Streamlit for demo. The pieces existed, well-documented and accessible.

The real work was understanding the problem space clearly enough to know which pieces to use and how to connect them. The AI assistant (Claude, in this case) accelerated the implementation enormously, but the architectural thinking-what are we actually building and why-remained distinctly human.

This isn't "AI replacing developers." It's AI eliminating the implementation bottleneck, which forces us to be clearer about the real work: understanding problems deeply enough to know what solutions look like.

The mystery of implementation burns away, revealing that the hard part was always understanding what we're trying to build.

The Broader Implications

If building a semantic search engine takes three days instead of three months, what else becomes possible?

Not just more projects in the same time-qualitatively different projects. Ideas that would have been rejected as "too complex, not worth the effort" become viable. Experiments that would have required major resource commitment become weekend prototypes.

The economic equation of software development is changing. Not universally, not immediately, but unmistakably. When implementation cost drops dramatically, the calculus of "is this worth building?" shifts.

This has implications beyond individual projects. Markets where software development cost was a barrier to entry become more accessible. Industries where custom software was economically infeasible become automatable. Problems that required major companies or research institutions become solvable by small teams or individuals.

The unknown is burning away, not just for specific technologies but for entire classes of applications.

Conclusion: The Continuing Journey

At the end of these three days, I'm left with a working demo that proves semantic search for artwork is not only possible but practical. I can search my own artwork collection using natural language, finding pieces by concept rather than keyword. It works.

But more than that, I'm left with clarity. The mystery of how AI semantic search works has burned away, leaving not a vacuum but understanding. I can explain it, teach it, build on it.

This is what makes technical exploration attractive-not the conquest of impossible complexity, but the transformation of intimidating unknown into elegant simplicity. Each layer of mystery that burns away doesn't reveal more mystery-it reveals the beautiful mathematics underneath.

The journey continues. There are more mysteries to burn through: optimal LLM integration, production scaling, hybrid search architectures. But now we have a foundation. The core unknown-can this work?-is answered.

What remains is refinement, optimization, and the next set of mysteries waiting to be transformed from intimidating abstractions into working code that makes sense.

That's the attractive journey. Not because it ends, but because each step reveals that the impossible was merely the not-yet-understood.

Dr. Martín Raskovsky - November 2025

We love to hear your comments on this article.