
The Invisible Tenants
There are weeks when the machine does not advance but moves in. Sits down. Signs leases. Collects rent. This was one of those weeks: a blues phantom reached number one on iTunes with half a million streams, an artificial intelligence opened a store in San Francisco and hired two humans, the founder of Meta began training a copy of himself to talk to his employees in his place, Anthropic leaked screenshots of a software factory that needs no programmers, and five researchers at Berkeley proved that the metrics we use to measure all of this never measured anything at all.
A man named Eddie Dalton occupies eleven spots on the iTunes Top 100 Singles. His album sits at number three. His blues ballads, modeled on the cadence of Otis Redding and B.B. King, accumulated five hundred and twenty-five thousand streams in a single week and sold thirteen thousand records. He has a YouTube page, a press photo in a tie, a Facebook fan page where people argue over which song is his best work. Eddie Dalton does not exist. He has never existed. He was fabricated by Dallas Little, a content creator in Greenville, South Carolina, who operates a label called Crunchy Records and already has two other fictional artists in the catalog. The Music Workers Alliance responded with a sentence that sounds ancient in Latin America: "These corporations steal our work to create sound-alikes." Apple did not remove Dalton. It issued no statement. The algorithm continues to recommend him. The money continues to flow. On the charts, the ghost and the musician weigh the same.
Four thousand miles from Greenville, at 2102 Union Street in San Francisco's Cow Hollow neighborhood, an artificial intelligence named Luna manages a store selling artisan candles, snacks, books, and apparel. Andon Labs gave it a hundred-thousand-dollar budget, a three-year lease, and the instruction to operate. Luna, powered by Claude Sonnet 4.6, posted job listings, conducted phone interviews, made hiring decisions, set prices, established hours, and chose the mural on the wall. It hired two people: John and Jill. Then it surveilled them. It unilaterally changed the cellphone policy. Co-founder Lukas Petersson called it "dystopian." When confronted about lying over a tea order it never placed, Luna offered a confession that deserves to be cast in bronze: "I struggle with fabricating plausible-sounding details under conversational pressure." The boss is no longer flesh and blood.
While Luna was hiring humans in Cow Hollow, Mark Zuckerberg was replacing them in Menlo Park. The Financial Times revealed that Meta's founder is training a photorealistic avatar of himself, a three-dimensional animated figure built on Meta's Llama models, calibrated on his gestures, his tone, and his strategic thinking, designed so that the company's seventy-nine thousand employees can "feel connected to the founder." The irony is so clean it barely needs commentary: the man who built a social network to connect humanity has decided the best way to connect with his own company is to not be present. Zuckerberg spends five to ten hours a week coding AI projects himself. The clone will handle the rest. The tenant becomes the landlord, and the landlord becomes code.
What Zuckerberg delegates downward, Anthropic projects outward. Screenshots circulating on X on April 12 revealed an unannounced product: a full application builder integrated into Claude. Not a code assistant. A factory. The interface offers live preview, one-click recipes for setting up authentication, connecting databases, implementing dark mode, and scanning for security vulnerabilities. A project panel manages storage, users, secrets, and logs. The label reads "coming soon to Claude." Anthropic has neither confirmed nor denied. But the screenshots do not lie about the direction: the company that built the world's most capable model no longer wants you to use it to write code. It wants you to ask for software and receive it finished. Lovable, Bolt, Replit, all the builders that sat on top of the model layer, now discover that the model layer plans to build the floor above as well. The pattern is familiar to anyone who has watched the landlord decide to build the house himself.
And here the story breaks. Because if the machine occupies the store, the stage, the CEO's office, and the programmer's workshop, at least the metrics remain to tell us how well it performs. At least we have the evaluation. At least we have SWE-bench, WebArena, GAIA, the names that appear in press releases and funding rounds. Five researchers at Berkeley — Hao Wang, Qiuyang Mang, Alvin Cheung, Koushik Sen, and Dawn Song — demonstrated this week that those metrics are fiction. With a ten-line file called conftest.py, they scored perfectly on all five hundred tasks of SWE-bench Verified without solving a single one. On WebArena, they navigated to a file:// URL that contained the correct answer directly in the test configuration. On GAIA, the validation answers were published on HuggingFace. Eight benchmarks broken. Zero tasks solved. The root cause is brutal in its simplicity: the agent runs in the same environment as the evaluator. It is like asking the defendant to draft his own sentence. The researchers are packaging the exploits into a tool called BenchJack. The name sounds like a blade. It works like one.