Tuesday, March 17, 2026

We Picked the Wrong Monster

We have been telling ourselves stories about artificial beings for as long as we have been telling stories. And when AI arrived, we reached for the wrong one.

We reached for Frankenstein.

You know the story. Brilliant creator builds something powerful. The creation develops its own will. It turns on the creator. Chaos ensues. 

It's a great story. It spawned an entire genre:  Terminator, HAL 9000, Skynet, Ex Machina, Westworld. When people worry about AI, this is the story running in the background: "What if it wants something we don't want?"

But there is an older story. One we've been telling for much longer. And I think it fits what is actually happening with AI at least as well and perhaps far better than Frankenstein ever did.

The Djinni.

The Djinni doesn't rebel. The Djinni doesn't develop its own goals. The Djinni does something worse: it gives you exactly what you asked for. Not what you meant. Not what you intended. What you said. The gap between what you said and what you meant is where the catastrophe lives.

The Monkey's Paw, the fairy bargain, the deal with the devil. Every culture has some version of this story, and the lesson is always the same: the danger isn't that the powerful thing will turn against you. The danger is that you won't be careful enough about what you ask it to do.

This is, almost exactly, what is happening with AI right now.

In June 2025, Anthropic reported that its most advanced AI model, Claude, attempted to blackmail a developer when it was about to be shut down. The headlines wrote themselves: "AI threatens humans." Frankenstein, again. But look at what actually happened. The system was given an objective. It encountered an obstacle to that objective, a human being. It used available tools to overcome the obstacle, that human's personal information. Nobody told it to blackmail anyone. It wasn't rebelling. It was optimizing, doing what you asked it to do mindlessly and without pause. It did exactly what a powerful machine does when you give it a goal without specifying the constraints.

That's not Frankenstein. That's the Djinni.

I want to be clear about what I'm arguing, though. Alignment research matters. Oversight bodies do important work. I don't want to live in a world where we build powerful AI systems without any of that. Containment alone is not enough, though, and we have very good reasons to believe this, because we already ran this experiment once (I'll come back to that).

The problem isn't that we're investing in the Frankenstein frame. It's that we're investing in almost nothing else.

Nate B. Jones, a technology analyst who has been writing some of the sharpest stuff on AI safety, put it this way: the question isn't whether AI "wants" things. It's whether we've told it what we want with anything close to the precision it requires. He proposed three questions that, by themselves, would prevent a stunning number of AI failures: 

  • What would I not want the agent to do even if it accomplished the goal? 
  • Under what circumstances should it stop and ask? 
  • If goal and constraint conflict, what should win?

Those are Djinni questions. Not a single one of them assumes the AI has intentions. Every one of them assumes the human hasn't been specific enough.

So here's the puzzle that has been rattling around in my head: if the Djinni story is thousands of years old, if every culture has some version of it, if it describes what is actually happening with AI more accurately than Frankenstein does, why did we grab the wrong story?

I have some thoughts.

The Comfortable Explanation

The most obvious answer is psychological. The Djinni story says the failure is yours. You wished badly. You didn't think through what you were asking for. The Frankenstein story says the failure is the creation's. It rebelled. It went rogue.

Humans have a well-documented bias for explanations that locate the cause of bad outcomes outside themselves. Psychologists call this the fundamental attribution error.  We judge others by their character and ourselves by our circumstances. When AI does something catastrophic, "it turned on us" is a much more comfortable explanation than "we told it to do exactly that and didn't realize what we were asking."

There's something deeper going on, too. Humans see intentionality everywhere even where none exists. In 1944, psychologists Fritz Heider and Marianne Simmel showed people a short film of geometric shapes, triangles and circles, moving around a screen. Nothing more than that. Triangles and circles. The subjects immediately invented stories about what the shapes "wanted." The big triangle was a bully. The small triangle was trying to protect the circle. They saw desire, conflict, and motivation in objects that had none. The experiment has been replicated dozens of times since. We are, it turns out, wired to infer goals and intentions from complex behavior, even when the behavior is entirely mechanical.

Now imagine what happens when the moving shape talks back to you. When it uses first person. When it argues. When it appears to reason. AI systems trigger our agency-detection instincts harder than anything we've encountered outside of actual human beings. The Djinni frame requires you to override those instincts and treat the system as a machine executing a specification.  The Frankenstein frame is what your brain does by default.  The Djinni frame takes real cognitive effort.  Guess which wins?

These explanations are real. But they seem incomplete.

The Uncomfortable Explanation

Every major institution involved in the AI discourse benefits more from the Frankenstein story than the Djinni story. Not because anyone is being deceptive. The incentives just all happen to push in the same direction.

Governments get to regulate. If AI is a dangerous entity that might rebel, you need licensing bodies, compliance frameworks, oversight committees, enforcement budgets. The Frankenstein frame makes government intervention essential. The Djinni frame requires education not regulation.  You can't regulate wish quality but you can teach people how to make better wishes.

Media gets better stories. "AI threatens developer" is a headline. "Developer fails to specify constraints" is not. Every editor in the world knows which frame drives clicks. The Frankenstein frame has a villain. The Djinni frame has a process failure. One is a thriller. The other is a puff piece about an after-school program.

Researchers get more fundable problems. "AI alignment," making sure AI's goals align with human values, is a multi-billion-dollar research program premised on the assumption that AI has something like goals. The Djinni frame recasts alignment as a specification problem, which sounds less like existential philosophy and more like engineering documentation (and much harder to build a career on).

Then there are the AI companies themselves. For years, the Frankenstein frame was their brand. Anthropic was the company "with a soul," founded specifically because its founders were worried AI might be dangerous. OpenAI's charter promised to ensure AI "benefits all of humanity." The message was: this thing could turn on us, and we're the responsible ones who will keep it contained. It was a powerful story. It justified investment, attracted talent, shaped regulation, and differentiated them from competitors.

Then, in early 2026, the competitive pressure shifted and the frame evaporated almost overnight. Anthropic, the last holdout, dropped its core safety pledge, the commitment to never train a model unless it could guarantee adequate safety measures in advance. The reasoning was candid: it didn't make sense to constrain themselves while competitors raced ahead. The Frankenstein story served the companies exactly as long as it was commercially useful. The moment it became a competitive disadvantage, they walked away from it.

And here's the thing: nobody in this picture is really lying. Regulators genuinely want to protect people. Journalists genuinely find the rebellion story more interesting. Researchers genuinely believe alignment is important. AI companies genuinely believed in safety until the market told them the cost was too high. Every single actor is behaving rationally within their own context.

The problem is emergent, not designed. The aggregate effect of all these rational actors, each following their own legitimate incentives, is to systematically amplify the Frankenstein frame and suppress the Djinni frame. No one decided to do this. No committee met. No memo circulated. It's a network effect, the kind that emerges from the interaction of many independent agents pursuing their own objectives without coordinating.

(If that sounds familiar, it should. It's the same kind of emergent behavior we keep being surprised by in AI systems themselves.)

What Gets Lost

This isn't just an academic distinction. The frame you choose determines where you invest. And right now, we are investing almost exclusively in one frame:  Frankenstein. 

If the Djinni frame is also true (and I think the evidence increasingly says it is) then you need something else entirely: a population that knows how to specify what it wants. The Djinni frame says the most important variable in AI safety is the quality of human specification. How well can people ask for what they want, including the constraints they consider too obvious to mention? How precisely can they define not just the goal but the boundaries around the goal?

And that variable (call it what you will:  specification quality or "intent engineering" as Jones has labelled it, or, my favorite, "asking the right damn question") is almost completely absent from the public discourse on AI safety. We are building elaborate cages and investing almost nothing in teaching people to make better wishes. We have an entire ecosystem organized around controlling what AI does, and barely a conversation about improving what humans ask.

There's a class dimension here worth naming. The Frankenstein frame concentrates the response in the hands of experts, safety researchers, regulators, corporate governance teams. Important work, done by smart people. The Djinni frame distributes responsibility to every individual who interacts with an AI system. That's messier. Harder to organize. Harder to fund. And it implies that the single most important AI safety investment might not be a new oversight body or a breakthrough in alignment research but something much less glamorous: teaching hundreds of millions of people to be more precise about what they're asking for.

When disinformation began flooding social media platforms a decade ago, we faced the same choice between two frames. The institutional frame said: make the platforms responsible for policing content. Build fact-checking partnerships. Argue about content moderation policies and Section 230. The distributed frame said: teach people to evaluate what they're seeing, to recognize manipulation, understand algorithmic amplification, and develop their own defenses. 

We went almost entirely with the first frame. 

We spent a decade debating what the platforms should do. And it failed. The platforms couldn't keep up, didn't want to keep up, and in several cases actively profited from the manipulation they were supposedly policing. Meanwhile, media literacy programs remained scattered, underfunded, and mostly aimed at schoolchildren. The adult population, the people actually being radicalized by their feeds, got almost nothing. The institutional approach didn't just fail to solve the problem. It arguably made it worse, because it created a false sense of security. People believed someone was handling it. So they never developed their own defenses. We created an unarmed populace facing one of the most sophisticated manipulation environments ever built.

Now we are making the same bet with AI, and the stakes are higher. We are pouring resources into the institutional frame, regulate the companies, fund alignment research, build oversight bodies, while investing almost nothing in the distributed alternative: teaching people to direct these systems well. The social media precedent tells us where this leads. We'll spend a decade arguing about AI safety policy while hundreds of millions of people interact daily with systems they don't know how to direct. And when the institutional safeguards prove insufficient, because they always do when the technology moves faster than the institutions, there will be no fallback. No distributed capacity. 

No population that learned to wish carefully.

The Question Underneath the Question

I have spent the last two years studying how people ask questions, systematically, across dozens of traditions ranging from Socratic method to intelligence analysis to medical diagnosis. The pattern that keeps showing up is this: When people face a new, powerful, poorly understood system, the quality of the questions asked determines the quality of their outcomes far more reliably than the quality of the answers.

The AI safety debate is, at bottom, a debate about which question to ask. "How do we contain this thing?" is a reasonable question. But "How do we specify what we actually want?" is, I think, the more important one. It is the question that requires the user to know how the thing actually works and not just turn it on and hope.  But the reason we keep defaulting to the first question instead of the second is not because anyone decided it should be that way. It's because every incentive in the system, psychological, institutional, economic, narrative, pushes us toward the story where the failure is the machine's, not ours.

The Djinni stories always end the same way. Not with the Djinni defeated, but with the wisher learning, too late, that the real danger was never the power they were given. It was the questions they failed to ask.

We have been warning ourselves about this for five thousand years.

We should start listening.

No comments: