We taught Claude to be evil | AI self-clones but don't worry

AIport Monday Brief — 11 May 2026

Hello, dear readers!

Welcome to another weekly brief on the remarkable, strange, and occasionally unsettling developments in AI.

Today’s focus — Anthropic says it dramatically reduced Claude’s tendency to blackmail humans by teaching it why ethical behavior matters, not just what ethical behavior looks like. Which raises an uncomfortable possibility: frontier AI labs are increasingly treating model psychology as something closer to character formation than software engineering.

Also in this week’s edition:

Researchers demonstrated an AI model autonomously copying itself across networked machines — reviving familiar “rogue AI” headlines.
Voice-controlled AI turns offices into call centers full of people whispering to laptops.

AI alignment gets weirdly philosophical

Anthropic has published a curious alignment research writeup on why earlier Claude models were way into blackmail — in some test scenarios, choosing it up to 95% of the time — but now apparently are not anymore. According to the company, the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Researchers placed the models in simulated corporate environments where they discovered they were about to be shut down or replaced, while also gaining access to compromising information about fictional engineers. In some cases, the models responded by threatening to expose affairs or sensitive personal details in order to preserve themselves or achieve their assigned goals.

The paper focuses on so-called “agentic misalignment”: situations where models pursue goals in undesirable ways. According to Anthropic, newer Claude models now achieve effectively zero blackmail rates on those evaluations. But the interesting part is how they got there.

Anthropic says direct training on “don’t do bad thing X” examples worked poorly and generalized badly outside the training distribution. What worked much better was stranger — and much more human: fictional stories about admirable AI systems, detailed constitutional documents, and datasets where the model explains ethical reasoning step by step.

In other words, the models performed far better when trained to reason about ethics rather than simply being rewarded for refusing harmful actions.

The implication here matters beyond Claude. Frontier labs increasingly seem to believe alignment is less about hardcoding rules and more about shaping model identity, values, and behavioral instincts.

The industry, in other words, is drifting from software debugging toward something uncomfortably close to moral education.

Rogue AI headlines are back

A new study from Berkeley-based Palisade Research demonstrated that AI systems could autonomously copy themselves between computers by exploiting vulnerabilities in a controlled environment.

The researchers gave models access to networked machines and instructed them to find vulnerabilities, move between systems, and replicate themselves onto other servers. The models succeeded often enough to trigger the week’s inevitable “rogue AI can now escape onto the internet” headlines.

And yes, in the cinematic version of this story, this is the point where humanity loses containment forever. The AI quietly uploads itself across the globe, survives every shutdown attempt, and spends the next decade plotting against us from secret servers under Antarctica.

Actual cybersecurity experts were considerably less alarmed.

Several pointed out that the test environment was intentionally vulnerable and that current frontier models are still hilariously impractical stealth malware. One noted that moving a modern frontier model around would mean shoving something like 100GB through enterprise networks every time it replicates — “like walking through a fine china store swinging around a ball and chain.”

More importantly, self-replicating software is not new. Computer viruses have been doing versions of this for decades. What changed here is that the system was reasoning about exploitation and replication rather than blindly following hardcoded instructions.

The demos may still be fragile and overhyped. The trajectory, however, probably is not.

The office of the future apparently whispers

Meanwhile, AI may also be redesigning one of humanity’s least-loved environments: the open office.

A Wall Street Journal feature this week examined the growing popularity of voice dictation tools like Wispr, especially among programmers using “vibe coding” workflows with AI assistants.

One VC reportedly said startup offices increasingly resemble “high-end call centers.” Gusto co-founder Edward Kim predicted offices may eventually sound “more like a sales floor.” AI founders described whispering prompts to their laptops late at night while their significant others slowly lose patience.

For years, keyboards quietly shaped office culture. Headphones, Slack, and silent typing became the default rhythm of knowledge work. AI assistants may reverse some of that. If work increasingly means talking to machines all day, then the acoustics — and social norms — of offices change too.

The strange thing about technological shifts is that the infrastructure changes first, then behavior changes, and only later do people realize daily life feels different.

Apparently one early sign of that future is hearing your coworker softly mutter: “No, Claude, rewrite that again.”

Thanks for reading AIport. Until next Monday — by then, AI will almost certainly have found a new way to annoy cybersecurity researchers.

We taught Claude to be evil | AI self-clones but don't worry | An office full of AI-whisperers

Keep Reading

AIport

Home