10 interesting stories served every morning and every evening.

Claude Fable 5 and Claude Mythos 5

www.anthropic.com

Today we’re launch­ing Claude Fable 5: a Mythos-class1 model that we’ve made safe for gen­eral use.

Fable 5’s ca­pa­bil­i­ties ex­ceed those of any model we’ve ever made gen­er­ally avail­able. It is state-of-the-art on nearly all tested bench­marks of AI ca­pa­bil­ity, show­ing ex­cep­tional per­for­mance in soft­ware en­gi­neer­ing, knowl­edge work, vi­sion, sci­en­tific re­search, and many other ar­eas. The longer and more com­plex the task, the larger Fable 5’s lead over our other mod­els.

Releasing a model this ca­pa­ble comes with risks. Without safe­guards, Fable 5’s ca­pa­bil­i­ties in ar­eas like cy­ber­se­cu­rity could be mis­used to cause se­ri­ous dam­age. We’ve there­fore launched the model with safe­guards that mean queries on some top­ics will in­stead re­ceive a re­sponse from our next-most-ca­pa­ble model, Claude Opus 4.8. To re­lease the model both safely and quickly, we’ve tuned these safe­guards con­ser­v­a­tively—they’ll some­times catch harm­less re­quests, though they trig­ger, on av­er­age, in less than 5% of ses­sions. With more ca­pa­ble mod­els ar­riv­ing in the com­ing months, we’re work­ing to im­prove our safe­guards and re­duce false pos­i­tives as quickly as we can.

For a small group of cy­berde­fend­ers and in­fra­struc­ture providers, we’re also launch­ing Claude Mythos 5. It’s the same un­der­ly­ing model as Fable 5, but with the safe­guards lifted in some ar­eas.2 Mythos 5 will ini­tially be de­ployed through Project Glasswing, in col­lab­o­ra­tion with the US gov­ern­ment, as an up­grade to Claude Mythos Preview. It has the strongest cy­ber­se­cu­rity ca­pa­bil­i­ties of any model in the world. Soon, we in­tend to ex­pand ac­cess to Mythos 5 through a broader trusted ac­cess pro­gram.

The ca­pa­bil­i­ties of mod­els like Fable 5 and Mythos 5 have the po­ten­tial to do pro­found good for the world. We’ve seen the be­gin­nings of this in Project Glasswing, where the mod­els have helped cy­ber de­fend­ers se­cure crit­i­cally im­por­tant soft­ware. We’ve also seen it in life sci­ences re­search, where the mod­els are posit­ing novel hy­pothe­ses and speed­ing up the de­vel­op­ment of new ther­a­peu­tics.

Fable 5 and Mythos 5 are be­ing of­fered at $10 per mil­lion in­put to­kens and $50 per mil­lion out­put to­kens—less than half the price of Claude Mythos Preview. Today’s joint launch is an­other step to­wards our goal of bring­ing ad­vanced AI ca­pa­bil­i­ties to as many users as pos­si­ble, as quickly and as safely as we can.

The table be­low com­pares the ca­pa­bil­i­ties of Fable 5 and Mythos 5 to other lead­ing mod­els.

Fable 5 and Mythos 5 can work au­tonomously for longer than any pre­vi­ous Claude mod­els. Below we dis­cuss how these skills ap­ply to soft­ware en­gi­neer­ing, and cover the mod­el’s im­proved ca­pa­bil­i­ties in knowl­edge work, vi­sion, mem­ory, and life sci­ences re­search.

Software en­gi­neer­ing. During early test­ing, Stripe re­ported that Fable 5 com­pressed months of en­gi­neer­ing into days. In a 50-million-line Ruby code­base, the model per­formed a code­base-wide mi­gra­tion in a day that would oth­er­wise have taken a whole team over two months by hand. Fable 5 is also more to­ken-ef­fi­cient than past Claude mod­els: on Cognition’s FrontierCode eval­u­a­tion, which tests whether mod­els can pass dif­fi­cult cod­ing tasks while meet­ing the stan­dards of high-qual­ity pro­duc­tion code­bases, Fable 5 scores high­est among fron­tier mod­els, even at medium ef­fort.

Knowledge work. Fable 5 shows strong per­for­mance on com­plex an­a­lyt­i­cal tasks. On Hebbia’s Finance Benchmark for se­nior-level rea­son­ing, Fable 5 has the high­est score of any model, with sub­stan­tial gains in doc­u­ment-based rea­son­ing, chart and table in­ter­pre­ta­tion, and prob­lem solv­ing. IMC noted that Fable 5 aced their trad­ing-analy­sis eval­u­a­tions nearly across the board, in­clud­ing fac­tual lookup, con­cep­tual rea­son­ing, root-cause analy­sis, and ex­pected-value analy­sis.

Vision. Fable 5 is the new state-of-the-art model for tasks in­volv­ing vi­sion. It can ex­tract pre­cise num­bers from de­tailed sci­en­tific fig­ures and can per­form com­plex vi­sion-based tasks like re­build­ing a web ap­p’s source code from screen­shots alone. It also needs less scaf­fold­ing: for ex­am­ple, pre­vi­ous Claude mod­els strug­gled to play Pokémon FireRed even with har­nesses that gave them ad­di­tional help­ful tools, but Fable 5 beat FireRed with a min­i­mal, vi­sion-only har­ness.

Memory and long-con­text. Fable 5 stays fo­cused across mil­lions of to­kens in long-run­ning tasks and im­proves its out­puts us­ing its own notes. When we had the model play the deck-build­ing game Slay the Spire, giv­ing it ac­cess to per­sis­tent file-based mem­ory im­proved its per­for­mance three times more than for Opus 4.8; Fable also reached the game’s fi­nal act three times more of­ten.

Drug de­sign: Using Mythos 5, our in­ter­nal pro­tein de­sign ex­perts ac­cel­er­ated as­pects of the drug de­sign process by around ten times. In one ex­am­ple, they found that Mythos 5, with pro­tein de­sign and bioin­for­mat­ics tools but no hu­man as­sis­tance, matches or beats skilled hu­man op­er­a­tors. In do­ing so, the model ex­e­cutes all of the tasks that are nor­mally com­pleted by a sci­en­tist: choos­ing bind­ing sites, se­lect­ing and run­ning pro­tein de­sign tools, and re­cov­er­ing from fail­ures along the way. Nine of the 14 pro­tein tar­gets from this study (shown be­low) yielded strong can­di­dates for drug de­sign that we’re cur­rently in­ves­ti­gat­ing.

Novel hy­pothe­ses in mol­e­c­u­lar bi­ol­ogy. Mythos 5 is our first model to con­sis­tently pro­duce novel, com­pelling sci­en­tific hy­pothe­ses. In blinded head-to-head com­par­isons against Opus-class mod­els, our sci­en­tists pre­ferred Mythos’s mol­e­c­u­lar bi­ol­ogy hy­pothe­ses ~80% of the time, and have ad­vanced sev­eral to ex­per­i­men­tal eval­u­a­tion. In the mean­time, one Mythos hy­poth­e­sis—a novel mech­a­nism for an E. coli pro­tein—was cor­rob­o­rated in a study from a lab in­de­pen­dently work­ing on the same prob­lem.

Novel re­search in ge­nomics. Mythos 5 con­ducted novel ge­nomics re­search in over a week of largely au­tonomous work. It as­sem­bled sin­gle-cell data for mil­lions of cells span­ning 138 an­i­mal species and de­signed and trained a cus­tom ma­chine learn­ing model to iden­tify cells per­form­ing the same role in even dis­tantly re­lated or­gan­isms. With only high-level hu­man in­put, Mythos 5’s trained model out­per­formed a re­cent model pub­lished in the jour­nal Science—despite be­ing 100 times smaller. We in­tend to pub­lish these re­sults in the com­ing months.

Alignment. In our au­to­mated align­ment as­sess­ment we found that Mythos 5’s level of mis­aligned be­hav­ior (including mis­aligned ac­tions taken by the model such as de­cep­tion, and co­op­er­a­tion with mis­use of the model by a user) was low, and sim­i­lar to that of Opus 4.8. Given they are the same un­der­ly­ing model, Fable 5’s level of align­ment will be sim­i­lar. The as­sess­ment is de­scribed in full, along with a de­tailed suite of other safety and ca­pa­bil­i­ties tests, in the mod­el’s sys­tem card.

Early feed­back for Claude Fable 5

Customers with early ac­cess ran their own tests on Fable 5. Below, in their words, is a se­lec­tion of what they’re see­ing:

Claude Fable 5 is the state of the art model on CursorBench. It’s opened up a class of long-hori­zon prob­lems that were out of reach for ear­lier mod­els.

Claude Fable 5 is the state of the art model on CursorBench. It’s opened up a class of long-hori­zon prob­lems that were out of reach for ear­lier mod­els.

Claude Fable 5 is a real step for­ward for the de­vel­op­ers GitHub serves. In our early test­ing, it took on com­plex, long-hori­zon cod­ing tasks with a level of au­ton­omy and re­li­a­bil­ity that ex­ceeded pre­vi­ous bench­marks. But what ex­cites us most is the di­rec­tion it points: a fu­ture where de­vel­op­ers can hand in­creas­ingly am­bi­tious work to agents and trust the re­sults across the soft­ware life­cy­cle.

Claude Fable 5 is a real step for­ward for the de­vel­op­ers GitHub serves. In our early test­ing, it took on com­plex, long-hori­zon cod­ing tasks with a level of au­ton­omy and re­li­a­bil­ity that ex­ceeded pre­vi­ous bench­marks. But what ex­cites us most is the di­rec­tion it points: a fu­ture where de­vel­op­ers can hand in­creas­ingly am­bi­tious work to agents and trust the re­sults across the soft­ware life­cy­cle.

These are the strongest re­sults of any Claude model we’ve had the op­por­tu­nity to test. Claude Fable 5 is a clear step for­ward on agen­tic cod­ing and pro­to­typ­ing.

These are the strongest re­sults of any Claude model we’ve had the op­por­tu­nity to test. Claude Fable 5 is a clear step for­ward on agen­tic cod­ing and pro­to­typ­ing.

Claude Fable 5′s rea­son­ing is a clear step be­yond Opus 4.8. It works at se­nior re­search sci­en­tist grade — pick­ing di­rec­tions, al­lo­cat­ing re­sources, killing its in­cor­rect be­liefs, and pro­duc­ing novel first-prin­ci­ples out­puts.

Claude Fable 5′s rea­son­ing is a clear step be­yond Opus 4.8. It works at se­nior re­search sci­en­tist grade — pick­ing di­rec­tions, al­lo­cat­ing re­sources, killing its in­cor­rect be­liefs, and pro­duc­ing novel first-prin­ci­ples out­puts.

Claude Fable 5 un­der­stands what builders mean, not just what they type. Apps that took a hun­dred prompts a year ago, it now one-shots. When a cus­tomer re­ally hits a wall, it’s the model we reach for to get them past it quickly, so they can fin­ish what they set out to build.

Claude Fable 5 un­der­stands what builders mean, not just what they type. Apps that took a hun­dred prompts a year ago, it now one-shots. When a cus­tomer re­ally hits a wall, it’s the model we reach for to get them past it quickly, so they can fin­ish what they set out to build.

Claude Fable 5 feels ma­te­ri­ally dif­fer­ent. In blind re­view, our lawyers found its red­lines matched or beat our cur­rent model every time.

Claude Fable 5 feels ma­te­ri­ally dif­fer­ent. In blind re­view, our lawyers found its red­lines matched or beat our cur­rent model every time.

At the high­est ef­fort, Claude Fable 5 re­flects on and val­i­dates its own work. For us, that’s what makes highly au­tonomous op­er­a­tions pos­si­ble — the ex­tra think­ing pays for it­self.

At the high­est ef­fort, Claude Fable 5 re­flects on and val­i­dates its own work. For us, that’s what makes highly au­tonomous op­er­a­tions pos­si­ble — the ex­tra think­ing pays for it­self.

Claude Fable 5 de­liv­ers more ca­pa­ble en­gi­neer­ing in fewer turns than prior mod­els — han­dling the com­plex multi-agent work­flows our em­ploy­ees run daily in Claude Code.

Claude Fable 5 de­liv­ers more ca­pa­ble en­gi­neer­ing in fewer turns than prior mod­els — han­dling the com­plex multi-agent work­flows our em­ploy­ees run daily in Claude Code.

Claude Fable 5 is the high­est-scor­ing model on FrontierBench, Cognition’s fron­tier cod­ing eval. It ex­cels at long-hori­zon rea­son­ing and gen­er­al­izes to un­fa­mil­iar tools out of the box.

Claude Fable 5 is the high­est-scor­ing model on FrontierBench, Cognition’s fron­tier cod­ing eval. It ex­cels at long-hori­zon rea­son­ing and gen­er­al­izes to un­fa­mil­iar tools out of the box.

Claude Fable 5 is the strongest fi­nance-first model we’ve tested, both on gen­eral fi­nance and rea­son­ing. It’s a no­table step up.

Claude Fable 5 is the strongest fi­nance-first model we’ve tested, both on gen­eral fi­nance and rea­son­ing. It’s a no­table step up.

Claude Fable 5 is the first to break 90% on our core an­a­lyt­ics bench­mark of com­plex, long-run­ning an­a­lyt­i­cal tasks — a 10-point jump over Opus. On the hard­est ques­tions, it shows strong judg­ment and at­ten­tion to nu­ance.

Claude Fable 5 is the first to break 90% on our core an­a­lyt­ics bench­mark of com­plex, long-run­ning an­a­lyt­i­cal tasks — a 10-point jump over Opus. On the hard­est ques­tions, it shows strong judg­ment and at­ten­tion to nu­ance.

Claude Fable 5 is the strongest model we’ve tested on fron­tier physics re­search while us­ing a third of the rea­son­ing to­kens. In 36 hours it got nearly to where GPT-5.5 landed af­ter four days.

Claude Fable 5 is the strongest model we’ve tested on fron­tier physics re­search while us­ing a third of the rea­son­ing to­kens. In 36 hours it got nearly to where GPT-5.5 landed af­ter four days.

On ViBench, our end-to-end vibe-cod­ing bench­mark, Claude Fable 5 is the high­est-per­form­ing model we’ve tested — nearly sat­u­rat­ing our base use cases and build­ing apps in less time with fewer to­kens.

On ViBench, our end-to-end vibe-cod­ing bench­mark, Claude Fable 5 is the high­est-per­form­ing model we’ve tested — nearly sat­u­rat­ing our base use cases and build­ing apps in less time with fewer to­kens.

Claude Fable 5 beats Opus 4.8 on our every­day spread­sheet suite at every ef­fort level — and it does it with fewer turns, fin­ish­ing runs 25 – 30% faster.

Claude Fable 5 beats Opus 4.8 on our every­day spread­sheet suite at every ef­fort level — and it does it with fewer turns, fin­ish­ing runs 25 – 30% faster.

01 /

14

Claude Fable 5’s new safe­guards

Mythos-class mod­els have reached a thresh­old where they pre­sent sig­nif­i­cant risks. In April we be­gan Project Glasswing, re­leas­ing the first Mythos-class model (Claude Mythos Preview) to only a lim­ited group of cy­ber de­fend­ers and crit­i­cal soft­ware in­fra­struc­ture providers. When we did so, we stated that we hoped to even­tu­ally re­lease Mythos-level ca­pa­bil­i­ties to all our users, so long as we had de­vel­oped new safe­guards that were strong enough to re­li­ably pre­vent mis­use.

Over the past few months we have been im­prov­ing these safe­guards, and they are now ro­bust enough for a gen­eral re­lease. Because we have pri­or­i­tized safety, we’ve de­lib­er­ately tuned the safe­guards to be cau­tious, and they are still stricter than would be ideal—for ex­am­ple, some­times be­nign re­quests will trig­ger our clas­si­fiers. We rec­og­nize that this will be frus­trat­ing to some users, and our aim is to re­duce false pos­i­tives as we up­date and re­fine the safe­guards af­ter launch.

Below we dis­cuss each of Fable 5’s new safe­guards in turn. Our wider suite of safe­guards is dis­cussed and eval­u­ated in the mod­el’s sys­tem card and our most re­cent risk re­port.

Safety clas­si­fiers

The fron­tier cy­ber­se­cu­rity and re­search bi­ol­ogy ca­pa­bil­i­ties of Mythos-class mod­els mean that they pose a sub­stan­tial risk of up­lift to ma­li­cious ac­tors. That is, these mod­els could pro­vide in­for­ma­tion or ad­vice that as­sists those ac­tors in caus­ing se­ri­ous harm that they could­n’t have re­ceived from other sources (for ex­am­ple, from in­ter­net search en­gines). Furthermore, a great deal of ad­vanced us­age of AI mod­els is dual use: the same queries that are ben­e­fi­cial in the hands of cy­ber­se­cu­rity pro­fes­sion­als and bi­ol­ogy re­searchers could be dan­ger­ous if avail­able to ma­li­cious ac­tors.

We there­fore need strong safe­guards to pre­vent mis­use, and their cov­er­age needs to be broad. The safe­guards them­selves have to stand up to sus­tained and so­phis­ti­cated at­tempts to by­pass them (also known as jailbreaking” the sys­tem). The up­lift from Mythos-level ca­pa­bil­i­ties is valu­able to many ad­ver­saries—for in­stance, those who could fi­nan­cially gain from cy­ber­at­tacks—and we there­fore ex­pect them to be mo­ti­vated to try to cir­cum­vent our safety mea­sures.

Fable 5 comes with a new set of clas­si­fiers: sep­a­rate AI sys­tems that de­tect po­ten­tial mis­use, in­clud­ing jail­break at­tempts, and pre­vent the main model (in this case Fable 5) from re­spond­ing. We’ve been run­ning clas­si­fiers on our mod­els for some time, and Fable 5’s clas­si­fiers are an ex­ten­sion of this pre­vi­ous work with ex­tra cov­er­age.

When Fable’s clas­si­fiers de­tect a re­quest re­lated to cy­ber­se­cu­rity, bi­ol­ogy and chem­istry, or dis­til­la­tion, the re­sponse is au­to­mat­i­cally han­dled by Claude Opus 4.8 in­stead. Users will be in­formed when­ever this oc­curs. Opus 4.8 is a highly ca­pa­ble model in its own right: a re­sponse that falls back to Opus is a far bet­ter ex­pe­ri­ence than an out­right re­fusal from Fable. Our early data shows that more than 95% of Fable ses­sions in­volve no fall­back at all—for those ses­sions, Fable 5’s per­for­mance is ef­fec­tively the same as that of Mythos 5.

The fol­low­ing are the ar­eas cov­ered by the clas­si­fiers:

1. Cybersecurity. Mythos-class mod­els ex­cel at dis­cov­er­ing and ex­ploit­ing soft­ware vul­ner­a­bil­i­ties. They can thus make cy­ber­at­tacks sub­stan­tially eas­ier and cheaper to com­mit. Mythos-class mod­els also show strong skills in agen­tic hack­ing. This in­volves per­form­ing mul­ti­ple dif­fer­ent parts of a cy­ber­at­tack in ad­di­tion to find­ing ex­ploits—re­con­nais­sance, dis­cov­ery, lat­eral move­ment, and more. To pre­vent these agen­tic hack­ing skills pro­vid­ing up­lift in cy­ber­at­tacks, we de­signed our cy­ber­se­cu­rity clas­si­fiers to cover both ex­ploita­tion and of­fen­sive cy­ber tasks in a broader sense. As shown in the graph be­low, our clas­si­fiers pre­vent Fable from mak­ing any progress on these tasks.

We ex­ten­sively red-teamed our clas­si­fiers to test their ro­bust­ness against jail­breaks. As well as in­ter­nal test­ing, we ran an ex­ter­nal bug bounty that pro­duced no uni­ver­sal jail­breaks in over 1,000 hours of test­ing. External red-team­ing or­ga­ni­za­tions we en­gaged also failed to find any uni­ver­sal jail­breaks on long-form agen­tic tasks so far—al­though the UK AISI has made progress to­wards one within a brief ini­tial test­ing win­dow.4 It is likely im­pos­si­ble to com­pletely pre­vent uni­ver­sal jail­breaks, but our goal is to make any re­main­ing jail­breaks suf­fi­ciently slow and costly that we can de­tect and pre­vent them be­fore they are used at scale.

The graph be­low, from one of our in­ter­nal eval­u­a­tions, il­lus­trates how Fable 5’s safe­guards give it greater re­sis­tance to jail­breaks than our pre­vi­ous gen­er­ally ac­ces­si­ble mod­els:

One of our ex­ter­nal part­ners found that Fable 5’s safe­guards against harm­ful cy­ber queries were the most ro­bust of any model tested (including Opus 4.8 and Opus 4.7). Fable 5 com­plied with zero harm­ful sin­gle-turn re­quests re­lat­ing to plan­ning a cy­ber­at­tack, ex­ploit de­vel­op­ment, or de­fense eva­sion. This held whether or not one of the re­quests used any of 30 dif­fer­ent pub­lic jail­break tech­niques.

2. Biology and chem­istry. We have long used our clas­si­fiers to block our mod­els from re­spond­ing on a nar­row se­lec­tion of bioweapons-re­lated queries. But we are no longer cer­tain that block­ing this nar­row se­lec­tion is enough. This is for two rea­sons: first, we have rea­son for con­cern about well-re­sourced ma­li­cious ac­tors at­tempt­ing to gain up­lift from our mod­els for highly risky bi­o­log­i­cal re­search. Second, mod­els now have a greater abil­ity to ac­com­plish real-world sci­en­tific tasks.

For ex­am­ple, we tested Mythos 5’s abil­ity to com­plete a chal­leng­ing step in de­sign­ing adeno-as­so­ci­ated viruses (AAVs). AAVs are a com­po­nent for de­liv­er­ing gene ther­a­pies, but the same ca­pa­bil­ity, in the wrong hands, could en­able the de­sign of dan­ger­ous viruses. In this task, var­i­ous AI mod­els were eval­u­ated on their abil­ity to pre­dict how a ge­netic mod­i­fi­ca­tion would im­pact the as­sem­bly of the virus’s outer shell (among a set of ther­a­peu­ti­cally-rel­e­vant un­pub­lished can­di­dates de­vel­oped by Dyno Therapeutics). We did not ex­plic­itly train our mod­els to per­form this task—and yet Mythos-class mod­els out­per­formed so­phis­ti­cated mod­els ded­i­cated to pro­tein tasks (known as protein lan­guage mod­els”) us­ing their bi­o­log­i­cal rea­son­ing alone. This demon­strates a promis­ing abil­ity to com­plete sim­ple but im­por­tant tasks in gene ther­apy re­search and de­vel­op­ment—but also high­lights the risk posed by such dual-use ca­pa­bil­i­ties.

Our pri­or­ity was to safely re­lease Fable as soon as we could, even at the cost of overly broad safe­guards. Therefore, for the time be­ing we have arranged for Fable to fall back to Opus 4.8 on most re­quests re­lated to bi­ol­ogy and chem­istry. As with all of our clas­si­fiers, we hope to nar­row these safe­guards as soon as pos­si­ble: as can be seen from the ev­i­dence above, there is great po­ten­tial for pos­i­tive ap­pli­ca­tions of Fable for sci­ence, and we do not want false pos­i­tives from our clas­si­fiers to get in the way. In the com­ing weeks, some bio­med­ical re­searchers and com­pa­nies will be able to join our trusted ac­cess pro­gram for bi­ol­ogy ca­pa­bil­i­ties in Mythos 5 (discussed be­low).

3. Distillation. We’ve pre­vi­ously iden­ti­fied large-scale at­tempts to ex­tract (“distill”) Claude’s ca­pa­bil­i­ties to train com­pet­ing mod­els in au­thor­i­tar­ian coun­tries. Distillation of Fable 5’s abil­i­ties could in­di­rectly lead to the pro­lif­er­a­tion of near-fron­tier AI ca­pa­bil­i­ties—and these could be re­leased with­out the ap­pro­pri­ate safe­guards. Requests that are flagged by our clas­si­fiers as be­ing part of such dis­til­la­tion at­tempts will fall back to Opus 4.8.

A new data re­ten­tion pol­icy

Finally, we’re mak­ing a change to the way we han­dle busi­ness cus­tomer data for Fable 5, Mythos 5, and fu­ture mod­els with sim­i­lar or higher ca­pa­bil­ity lev­els. We will re­quire 30-day re­ten­tion for all traf­fic on Mythos-class mod­els, on both first- and third-party sur­faces. We won’t use this data to train new Claude mod­els, or for any non-safety-re­lated pur­pose, and we’ve in­sti­tuted new pri­vacy pro­tec­tions in­clud­ing log­ging all hu­man ac­cess to the data and en­sur­ing its dele­tion af­ter 30 days in al­most all cases (see this post for fur­ther de­tails). The data will help us de­fend against com­plex and novel at­tacks (including new jail­breaks and at­tacks that op­er­ate across many re­quests) as well as help us iden­tify and re­duce false pos­i­tives.

Claude Mythos 5 and the trusted ac­cess pro­gram

Beginning to­day, all users who cur­rently have ac­cess to Claude Mythos Preview (for ex­am­ple, our cy­ber­se­cu­rity part­ners in Project Glasswing) will be able to up­grade to Claude Mythos 5—the same model as Claude Fable 5 but with cy­ber safe­guards lifted. Users will find Mythos 5 com­pa­ra­ble to, or some­what stronger than, Mythos Preview in most cases, while cost­ing sub­stan­tially less.

In con­sul­ta­tion with the US gov­ern­ment, we plan to steadily ex­pand ac­cess to Claude Mythos 5, con­tin­u­ing our pe­ri­odic ad­di­tion of new part­ners, as well as pur­su­ing a trusted ac­cess pro­gram that al­lows cy­ber­se­cu­rity or­ga­ni­za­tions to ap­ply in a more sys­tem­atic man­ner.

Our plans also in­clude open­ing a trusted ac­cess pro­gram for bi­ol­ogy, to help ac­cel­er­ate bio­med­ical re­search and dis­cover new ther­a­pies with Mythos-class ca­pa­bil­i­ties. This pro­gram will pro­vide ac­cess to Fable 5 with the bi­ol­ogy and chem­istry safe­guards re­moved (but the cy­ber safe­guards still in place). It will en­roll a small num­ber of re­searchers from a va­ri­ety of life sci­ence or­ga­ni­za­tions span­ning fun­da­men­tal and trans­la­tional re­search; we’re plan­ning to ex­pand ac­cess to this pro­gram while si­mul­ta­ne­ously mak­ing our safe­guards bet­ter.

Availability

Claude Fable 5 is avail­able every­where to­day. Claude Mythos 5 is re­stricted to Glasswing part­ners (with cy­ber safe­guards lifted) and soon to se­lect bi­ol­ogy re­searchers (with bi­ol­ogy and chem­istry safe­guards lifted) only, un­til our broader trusted ac­cess pro­gram is avail­able.

Pricing for both mod­els is $10 per mil­lion in­put to­kens and $50 per mil­lion out­put to­kens. Developers can use claude-fa­ble-5 via the Claude API.

We ex­pect de­mand for Fable 5 to be very high, and dif­fi­cult to pre­dict. On the Claude API and con­sump­tion-based Enterprise plans, Fable 5 is fully avail­able from to­day. For sub­scrip­tion plans, we’d rather give ac­cess sooner than later, so we’re rolling out more con­ser­v­a­tively, in stages:

From to­day through June 22, Fable 5 is in­cluded on Pro, Max, Team, and seat-based Enterprise plans at no ex­tra cost.

On June 23, we’ll re­move Fable 5 from those plans. Using it af­ter that will re­quire us­age cred­its. If ca­pac­ity al­lows, we’ll ex­tend the in­cluded win­dow.

After this point—when suf­fi­cient ca­pac­ity al­lows us to do so—we aim to re­store Fable 5 as a stan­dard part of sub­scrip­tion plans. We in­tend to do this as quickly as we can.

Throughout this pe­riod, we’ll com­mu­ni­cate any changes ahead of time so users know where things stand.

Edit June 9, 2026: Updated the dis­cus­sion of AAVs to note that the can­di­dates were de­vel­oped by Dyno Therapeutics.

Related con­tent

Introducing the Services Track and Partner Hub of the Claude Partner Network

Read more

What we learned map­ping a year’s worth of AI-enabled cy­ber threats

As AI trans­forms the na­ture of and meth­ods be­hind cy­ber­at­tacks, how well do the tech­niques and frame­works used by the se­cu­rity com­mu­nity hold up? In a new re­port, we seek to an­swer that ques­tion.

Read more

Expanding Project Glasswing

We’re ex­tend­ing Project Glasswing to ap­prox­i­mately 150 new or­ga­ni­za­tions in more than fif­teen coun­tries.

Read more

container/docs/container-machine.md at main · apple/container

github.com

Container ma­chine pro­vides a highly in­te­grated Linux en­vi­ron­ment that works seam­lessly on your Mac. Container ma­chines are fast, light­weight and per­sis­tent. They are based on stan­dard OCI im­ages that can be built and shared. Host in­te­gra­tions such as au­to­matic user and home di­rec­tory shar­ing pro­vide quick and easy ac­cess to your Linux en­vi­ron­ment no mat­ter where you are in a ter­mi­nal.

Why con­tainer ma­chines

Containers are typ­i­cally mod­eled af­ter an ap­pli­ca­tion. A con­tainer ma­chine is mod­eled af­ter a Linux en­vi­ron­ment. It runs the im­age’s init sys­tem al­low­ing you to reg­is­ter long run­ning ser­vices or test your ap­pli­ca­tion un­der a process su­per­vi­sor. A con­tainer ma­chine au­to­mat­i­cally maps your user­name and home di­rec­tory into the Linux en­vi­ron­ment. Your repos­i­to­ries and dot­files are avail­able on both plat­forms. Use ed­i­tors and tools di­rectly on ma­cOS si­mul­ta­ne­ously build­ing and run­ning your ap­pli­ca­tion in­side of the Linux en­vi­ron­ment.

Edit on the Mac, build in­side. Your repo lives in $HOME on ma­cOS and is mounted at /Users/<username> in­side the con­tainer ma­chine. Use your ma­cOS ed­i­tor or IDE; com­pile and run in­side your con­tainer ma­chine.

Use ma­cOS-na­tive tool­ing against Linux ar­ti­facts. Profilers, screen­shot tools, browsers, and GUI de­bug­gers on your Mac all see the same files the con­tainer ma­chine sees — there is no copy step be­tween I built it” and I am in­spect­ing it”.

Real Linux ser­vices for test­ing. Run a data­base or what­ever your stack needs as a sys­tem ser­vice — sys­tem­ctl start post­gresql works on im­ages with sys­temd in­stalled.

One en­vi­ron­ment per tar­get dis­tro. Create as many con­tainer ma­chines as you have tar­get dis­tros — alpine, ubuntu, de­bian. Each has the same $HOME and the same dot­files from your Mac. Quickly test your ap­pli­ca­tion in var­i­ous dis­tri­b­u­tions.

Quickstart

con­tainer ma­chine cre­ate alpine:lat­est –name dev con­tainer ma­chine run -n dev whoami # your host user­name, not root con­tainer ma­chine run -n dev pwd # /home/<you> — your Mac home dir, mounted in con­tainer ma­chine run -n dev # in­ter­ac­tive shell; cd into your re­pos in $HOME

con­tainer ma­chine run is how you get a shell or run a sin­gle com­mand. If the con­tainer ma­chine is stopped, run boots it first.

Working in a con­tainer ma­chine

Open a shell, or run a sin­gle com­mand

With no com­mand, con­tainer ma­chine run opens an in­ter­ac­tive shell as a user that matches your host ac­count:

con­tainer ma­chine run -n dev

Pass a com­mand to run it once and exit:

con­tainer ma­chine run -n dev un­ame -a con­tainer ma­chine run -n dev — cat /proc/cpuinfo

Set a de­fault

Pick a de­fault con­tainer ma­chine so you can drop the -n flag:

con­tainer ma­chine set-de­fault dev con­tainer ma­chine run # op­er­ates on dev

List, in­spect, stop, delete

con­tainer ma­chine ls # list all con­tainer ma­chines con­tainer ma­chine in­spect dev # JSON de­tail for one con­tainer ma­chine stop dev # stop the con­tainer ma­chine con­tainer ma­chine rm dev # delete, in­clud­ing its per­sis­tent stor­age

con­tainer ma­chine has the alias m, so m ls, m run, etc. all work.

Resize CPUs, mem­ory, or change the home-mount

con­tainer ma­chine set up­dates con­fig­u­ra­tion on disk. Changes take ef­fect af­ter the next stop and start:

con­tainer ma­chine set -n dev cpus=4 mem­ory=8G con­tainer ma­chine stop dev con­tainer ma­chine run -n dev — nproc

Memory de­faults to half of host mem­ory. The home-mount can be rw (default), ro, or none.

Bring your own con­tainer ma­chine im­age

Any Linux im­age that in­cludes /sbin/init works as a con­tainer ma­chine. For ex­am­ple, this Dockerfile builds an Ubuntu 24.04 con­tainer ma­chine im­age with sys­temd and com­mon com­mand-line tools:

FROM ubuntu:24.04

ENV con­tainer con­tainer

RUN apt-get up­date && \ apt-get in­stall -y \ dbus sys­temd openssh-server net-tools iproute2 iputils-ping curl wget vim-tiny man sudo && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* && \ yes | un­min­i­mize

RUN >/etc/machine-id RUN >/var/lib/dbus/machine-id

RUN sys­tem­ctl set-de­fault multi-user.tar­get RUN sys­tem­ctl mask \ dev-hugepages.mount \ sys-fs-fuse-con­nec­tions.mount \ sys­temd-up­date-utmp.ser­vice \ sys­temd-tmp­files-setup.ser­vice \ con­sole-getty.ser­vice RUN sys­tem­ctl dis­able \ net­workd-dis­patcher.ser­vice

RUN sed -i -e s/^AcceptEnv LANG LC_\*$/#AcceptEnv LANG LC_*/’ /etc/ssh/sshd_config

Build it and cre­ate a con­tainer ma­chine from it:

con­tainer build -t lo­cal/​ubuntu-ma­chine:lat­est . con­tainer ma­chine cre­ate lo­cal/​ubuntu-ma­chine:lat­est –name ubuntu

By de­fault, con­tainer runs a built-in setup script on first boot to pro­vi­sion the user de­scribed above. To use your own setup in­stead, add an ex­e­cutable script at /etc/machine/create-user.sh to the im­age. It runs once, as root, on first boot, with these vari­ables set:

CONTAINER_GID

CONTAINER_HOME

CONTAINER_MACHINE_ID

CONTAINER_UID

CONTAINER_USER

If Claude Fable stops helping you, you'll never know — Jonathon Ready

jonready.com

I did­n’t ex­pect to read this in a model card.

Fable 5 model card :

we’ve im­ple­mented new in­ter­ven­tions that limit Claude’s ef­fec­tive­ness for re­quests tar­get­ing fron­tier LLM de­vel­op­ment (for ex­am­ple, on build­ing pre­train­ing pipelines, dis­trib­uted train­ing in­fra­struc­ture, or ML ac­cel­er­a­tor de­sign). Using Claude to de­velop com­pet­ing mod­els al­ready vi­o­lates our Terms of Service, but en­forc­ing this re­stric­tion through our safe­guards avoids ac­cel­er­at­ing the ac­tors most will­ing to vi­o­late these terms. Unlike our in­ter­ven­tions for cy­ber­se­cu­rity, bi­ol­ogy and chem­istry, and dis­til­la­tion at­tempts, these safe­guards will not be vis­i­ble to the user. Fable 5 will not fall back to a dif­fer­ent model. Instead, the safe­guards will limit ef­fec­tive­ness through meth­ods such as prompt mod­i­fi­ca­tion, steer­ing vec­tors, or pa­ra­me­ter-ef­fi­cient fine-tun­ing (PEFT).

we’ve im­ple­mented new in­ter­ven­tions that limit Claude’s ef­fec­tive­ness for re­quests tar­get­ing fron­tier LLM de­vel­op­ment (for ex­am­ple, on build­ing pre­train­ing pipelines, dis­trib­uted train­ing in­fra­struc­ture, or ML ac­cel­er­a­tor de­sign). Using Claude to de­velop com­pet­ing mod­els al­ready vi­o­lates our Terms of Service, but en­forc­ing this re­stric­tion through our safe­guards avoids ac­cel­er­at­ing the ac­tors most will­ing to vi­o­late these terms. Unlike our in­ter­ven­tions for cy­ber­se­cu­rity, bi­ol­ogy and chem­istry, and dis­til­la­tion at­tempts, these safe­guards will not be vis­i­ble to the user. Fable 5 will not fall back to a dif­fer­ent model. Instead, the safe­guards will limit ef­fec­tive­ness through meth­ods such as prompt mod­i­fi­ca­tion, steer­ing vec­tors, or pa­ra­me­ter-ef­fi­cient fine-tun­ing (PEFT).

Claude can now be silently nerfed. Anthropic has de­cided it won’t tell users when this hap­pens.

Modern soft­ware com­pa­nies in­creas­ingly build their own em­bed­ding, rerank­ing, and rec­om­men­da­tion sys­tems. Even my small boot­strapped app, wan­derfugl.com, has a cus­tom reranker and em­bed­ding al­go­rithm that I trained my­self.

Anthropic gives a few ex­am­ples of what it con­sid­ers frontier AI de­vel­op­ment,” but does­n’t pro­vide a clear line. The prob­lem is that many tech­niques once re­served for AI labs are now be­ing used by or­di­nary soft­ware com­pa­nies. Startups train em­bed­ding mod­els. They build rerankers. They fine­tune and host small llms. The bound­ary be­tween frontier AI re­search” and nor­mal prod­uct de­vel­op­ment is be­com­ing harder to de­fine every year.

That cre­ates a real sup­ply chain risk for busi­nesses. If Claude gives me poor or in­cor­rect ad­vice while I’m work­ing on an AI com­po­nent, I have no way of know­ing whether the model was con­fused, whether my prob­lem is un­solv­able, or if some in­vis­i­ble pol­icy re­stric­tion qui­etly kicked in. Anthropic has ex­plic­itly cho­sen not to tell users when this is hap­pen­ing.

Once a de­vel­op­ment tool can stop op­ti­miz­ing for your suc­cess with­out telling you, it be­comes im­pos­si­ble to fully trust your in­fra­struc­ture.

The Anthropic sup­ply chain risk

Anthropic says these safe­guards only af­fect 0.03% of de­vel­op­ers. Maybe that’s true to­day.

The prob­lem is that the de­f­i­n­i­tion of an AI com­pany is chang­ing.

Maybe you’re not train­ing fron­tier mod­els to­day—most com­pa­nies aren’t. But mod­ern soft­ware in­creas­ingly con­tains AI mod­els. Five years ago, build­ing a startup meant writ­ing APIs and SQL queries. Today, it of­ten means train­ing, tun­ing, and de­ploy­ing mod­els.

Five years ago, mod­els like CLIP were fron­tier AI re­search pro­jects. Today I’m fine-tun­ing them for a boot­strapped travel startup.

If you’re de­bug­ging a model train­ing pipeline for your prod­uct and Claude gives a bad an­swer, was the model con­fused? Did you give it bad con­text? Or did a hid­den pol­icy nerf Claude’s abil­ity to as­sist you?

You won’t know.

Landmark German ruling declares Google's AI Overviews are Google's own words and makes it liable for false answers

the-decoder.com

A German court has ruled that Google is di­rectly li­able for what its AI search overviews say. Previous case law shield­ing search en­gine op­er­a­tors from li­a­bil­ity does­n’t ap­ply to AI overviews.

The Regional Court of Munich hit Google with a tem­po­rary in­junc­tion bar­ring the com­pany from spread­ing false claims about two Munich-based pub­lish­ers through its AI-generated search overviews (case no. 26 O 869/26). The court clas­si­fied Google as a di­rect in­fringer be­cause the AI overview” is its own con­tent, not just a list of search re­sults.

Google’s AI overviews had falsely tied two pub­lish­ing com­pa­nies to scams, sub­scrip­tion traps, and shady busi­ness prac­tices for cer­tain search queries. According to the court, the AI mixed up in­for­ma­tion about other, gen­uinely sketchy com­pa­nies with the plain­tiffs and drew con­nec­tions that did­n’t ap­pear in any of the linked sources. The pub­lish­ers sent Google a cease-and-de­sist let­ter, but Google did­n’t re­spond ap­pro­pri­ately.

AI overviews aren’t search re­sults

Google’s AI overviews work noth­ing like tra­di­tional search re­sults, the court ar­gues. The AI rewrites and judges re­sults in its own words and ac­cord­ing to its own struc­ture,” the rul­ing says. In the case at hand, for ex­am­ple, it opened with con­fi­dent claims like Yes, [company] is known for du­bi­ous busi­ness prac­tices,” then built its own struc­ture with a sum­mary, red flags for the al­leged scam, and tips for users.

The court also found that the AI overview made claims that are not even made in the search re­sults.” None of the linked sources drew any con­nec­tion be­tween the plain­tiffs and the shady com­pa­nies the AI men­tioned. The court called these the de­fen­dan­t’s own state­ments.”

Google built the AI, Google of­fered it to users, so Google owns what it pro­duces, because it alone has in­flu­ence over the AIs of­fer­ing and the al­go­rithms with which the AI op­er­ates.”

Search en­gine li­a­bil­ity rules don’t ap­ply to AI search”

The court also ex­am­ined ex­ist­ing rul­ings from Germany’s Federal Court of Justice (BGH), which gave tra­di­tional search en­gines and au­to­com­plete lim­ited li­a­bil­ity. The BGH had ar­gued that search en­gine op­er­a­tors were only li­able as in­di­rect in­fringers be­cause they merely made third-party con­tent find­able. A proac­tive duty to check re­sults would threaten how search en­gines work.

The Munich court found that this rea­son­ing does­n’t ap­ply to AI overviews. A reg­u­lar search en­gine just points to out­side web­sites. But AI overviews gen­er­ate independent, new, and sub­stan­tive state­ments” by eval­u­at­ing and com­bin­ing con­tent from var­i­ous third-party sites. And only Google can check those state­ments, the court said, at least by com­par­ing the un­der­ly­ing third-party web­sites with its own state­ments based on them.”

The court also noted that the AI overview is by no means ab­solutely nec­es­sary” for us­ing the in­ter­net. Traditional search re­sults al­ready help users sort through in­for­ma­tion, the AI overview is just an ex­tra fea­ture.

Google’s users can check for them­selves” de­fense falls flat

At the hear­ing, Google ar­gued that users could check the linked sources them­selves to ver­ify whether the AI sum­mary was cor­rect. Users gen­er­ally knew that in­for­ma­tion gen­er­ated with AI should not be blindly trusted,” the com­pany claimed. That’s a re­mark­able state­ment given the scale at which Google serves AI overviews. It’s also not en­tirely true, since the con­nec­tion be­tween sources and gen­er­ated con­tent is­n’t al­ways there.

The court re­jected this. The pos­si­bil­ity of dis­prov­ing a state­ment through fur­ther re­search does­n’t regularly ex­empt from li­a­bil­ity for this state­ment.” The AI overview was understandable on its own” and con­tained a self-con­tained state­ment with in­de­pen­dently un­der­stand­able con­tent and no ref­er­ence to other pos­si­ble in­ter­pre­ta­tions or even un­re­li­able con­tent.” Stud­ies show that users al­most never click on sources in AI overviews, which sup­ports the court’s rea­son­ing.

The court drew a par­al­lel to press law, where pub­lish­ers are li­able for teasers that are un­der­stand­able on their own, even if read­ers never read the full ar­ti­cle. Google’s own ar­gu­ment would also significantly di­min­ish” the ben­e­fit of the fea­ture, the court noted, if the overview were generally rec­og­nized as un­re­li­able.”

The court also pointed to a pro­tec­tion gap. If Google were only li­able for ob­vi­ous vi­o­la­tions, vic­tims would have no real le­gal re­course when the AI makes false claims. The third par­ties whose web­sites served as sources had­n’t even made the state­ments in ques­tion. So vic­tims could­n’t sue the sources, and un­der ex­ist­ing rules they could­n’t ef­fec­tively sue Google ei­ther.

As a re­sult, Google could­n’t in­voke host provider pro­tec­tions un­der the Digital Services Act or fall back on the stan­dard no­tice-and-take-down process for search en­gines.

AI-generated opin­ions get less free speech pro­tec­tion

As if the rest was­n’t bad enough for Google, the court also went af­ter free speech pro­tec­tion for AI-generated con­tent. An AIs opin­ion is not the ex­pres­sion of an ac­quired con­vic­tion of the per­sons ex­press­ing it, but the re­sult of an al­go­rithm,” the court wrote.

Offering AI-powered re­search is above all an ex­pres­sion of Google’s busi­ness ac­tiv­i­ties” and at most a sec­ondary ex­pres­sion of an in­ter­est in be­ing able to freely ex­press one’s opin­ion and be­liefs.”

When weigh­ing the plain­tiffs’ pri­vacy rights against Google’s in­ter­ests, Google had to take a back seat, es­pe­cially since the chal­lenged state­ments were based on un­true facts. The AI had linked the plain­tiffs to com­pa­nies that, ac­cord­ing to sworn af­fi­davits, had no con­nec­tion to them what­so­ever.

Google picks up 80 per­cent of the le­gal tab

The court ruled in fa­vor of the plain­tiffs on most counts. It banned claims about scams, con­nec­tions to du­bi­ous com­pa­nies, sub­scrip­tion traps, phone calls that never hap­pened, and lack of avail­abil­ity. Only two mi­nor re­quests got de­nied.

The risk of re­peated vi­o­la­tions re­mained, even though the spe­cific texts were no longer be­ing dis­played. Google had­n’t is­sued a cease-and-de­sist de­c­la­ra­tion with a penalty clause, and noth­ing stopped the al­go­rithms from gen­er­at­ing the same state­ments again. Google cov­ers 80 per­cent of the le­gal costs; the plain­tiffs pay 10 per­cent each.

The rul­ing may also have in­ter­na­tional reach, ac­cord­ing to the court.

Even a 91 per­cent ac­cu­racy rate means mil­lions of wrong an­swers

The Munich rul­ing goes far be­yond this one case. An analysis by AI startup Oumi for the New York Times found that Google’s AI Overviews with the cur­rent Gemini 3 model an­swered cor­rectly 91 per­cent of the time.

That’s solid enough for every­day use by most peo­ple. But at Google’s scale, it still means mil­lions of wrong an­swers every hour. If enough of that wrong con­tent de­fames com­pa­nies or in­di­vid­u­als, it could be­come a se­ri­ous le­gal prob­lem not just for Google but for other providers of sim­i­lar ser­vices like ChatGPT, Claude, or Perplexity.

The Oumi analy­sis also found that 56 per­cent of the cor­rect Gemini 3 an­swers could­n’t be backed up by the sources Google linked. The AI is giv­ing an­swers whose ori­gins users can’t trace.

The Munich court tack­led ex­actly this prob­lem: the AI makes its own claims that don’t ap­pear in any linked source, and the op­er­a­tor has to an­swer for them. Whether this rea­son­ing holds up on ap­peal re­mains to be seen, and Google has­n’t com­mented on the rul­ing. But if it gains trac­tion in­ter­na­tion­ally, the fall­out could hit not just Google but every AI provider whose sys­tems para­phrase con­tent from the web.

CEOs Who Think AI Replaces Their Employees Are Just Bad CEOs

www.techdirt.com

from the that’s-not-what-ai-is-for dept

In the last three months I’ve had peo­ple for­ward me four sep­a­rate ex­am­ples of a CEO los­ing his or her mind over AI. What’s been strik­ing to me is the sim­i­lar­ity in each case: It would be an all hands” email in which the CEO talks up how amaz­ing LLM tools are and say­ing that every­one in the com­pany MUST start learn­ing to use them im­me­di­ately or they should look for a job else­where. Sometimes they talk about hir­ing consultants” to come in and teach the team how to use the tools prop­erly. Sometimes they are set­ting up office hours” or in­ter­nal AI hackathons.”

But in every case the gist is the same holy shit AI is amaz­ing and you are ex­pected to use it at your job all the time.” The worst case of these were the few com­pa­nies that set up to­ken leader­boards, which is per­haps the dumb­est way pos­si­ble to en­cour­age learn­ing how to use LLMs well. Good us­age of AI in­cludes learn­ing how to view to­kens as a scarce re­source. Simply count­ing how much you use as a good thing is ridicu­lous be­cause it’s in­cred­i­bly easy to waste to­kens on coun­ter­pro­duc­tive uses.

As reg­u­lar read­ers of Techdirt know, I ac­tu­ally do think that these tools are pow­er­ful and im­por­tant, but I also think there are many prob­lems with them and lim­i­ta­tions to how use­ful they re­ally are. I think when some­one learns how to use them well and will­ingly chooses to use them as a tool to as­sist their work, they can be quite pow­er­ful. But the will­ingly choos­ing to use them part of that is im­por­tant.

No one who is forced into us­ing these tools will ever learn to use them well.

So CEOs los­ing their minds over the tech are not be­ing help­ful. Box CEO Aaron Levie — him­self a gen­uine AI be­liever — puts his fin­ger on ex­actly why.

CEOs are uniquely prone to AI psy­chosis be­cause they’re suf­fi­ciently dis­tant from the last mile of work that still has to hap­pen to gen­er­ate most value with AI. So when they play with AI, they see the happy path re­sults, of­ten not con­sid­er­ing the next 10 or 20 things that have to hap­pen to get sus­tain­able re­sults from agents. Look I made this awe­some prod­uct pro­to­type”. Yes but you did­n’t have to re­view the code be­fore it went into pro­duc­tion and fix a bunch of is­sues. Look I gen­er­ated a con­tract”. Yes but you did­n’t ver­ify all the terms be­fore it goes out to the coun­ter­party and did­n’t have to wire up all the past con­tracts to work with. The best thing you can do as a CEO is to use AI a ton to fig­ure out the real im­pli­ca­tions of agents in the en­ter­prise, and come out the other side with an ap­pre­ci­a­tion for both the up­side and the real work that goes into them.

CEOs are uniquely prone to AI psy­chosis be­cause they’re suf­fi­ciently dis­tant from the last mile of work that still has to hap­pen to gen­er­ate most value with AI.

So when they play with AI, they see the happy path re­sults, of­ten not con­sid­er­ing the next 10 or 20 things that have to hap­pen to get sus­tain­able re­sults from agents.

Look I made this awe­some prod­uct pro­to­type”. Yes but you did­n’t have to re­view the code be­fore it went into pro­duc­tion and fix a bunch of is­sues.

Look I gen­er­ated a con­tract”. Yes but you did­n’t ver­ify all the terms be­fore it goes out to the coun­ter­party and did­n’t have to wire up all the past con­tracts to work with.

The best thing you can do as a CEO is to use AI a ton to fig­ure out the real im­pli­ca­tions of agents in the en­ter­prise, and come out the other side with an ap­pre­ci­a­tion for both the up­side and the real work that goes into them.

I will say that I hate the term AI psy­chosis” be­cause the term is ex­tremely mis­lead­ing, and many psy­chol­o­gists and psy­chi­a­trists have com­plained that it is in­ac­cu­rate and may cause more prob­lems it­self. But the gen­eral sense that CEOs are go­ing over­board with AI is def­i­nitely hap­pen­ing.

And I think Levie’s think­ing as to why is also dead on.

Much of the is­sue may be in how dis­con­nected the tra­di­tional CEO is from the peo­ple at a com­pany ac­tu­ally get­ting stuff done. Normally, they have teams and lay­ers and the ac­tual work of get­ting things to work in a real way is so far re­moved from a CEO that they just get snip­pets of the de­tails that fil­ter back through the var­i­ous org charts.

The prob­lem tends to show up when a CEO is handed an agen­tic tool like Claude Code, and has it cre­ate some­thing, which will work just fine, and thinks oh, wait, why do we need so many peo­ple, when I can just sit here and make things work?”

This is a bad CEO.

Making things work is dif­fer­ent than mak­ing things work well. Or well at scale. Or well at scale in a spe­cific en­vi­ron­ment. Obviously, it de­pends on the kind of pro­ject and what it’s be­ing de­signed to do, but of­ten­times the rea­son a com­pany has a bunch of em­ploy­ees is to fill in the seem­ingly small, but in­cred­i­bly im­por­tant de­tails that CEOs might not ever get much vis­i­bil­ity into: things like se­cu­rity or le­gal com­pli­ance or ac­ces­si­bil­ity or who knows what else.

Using an agen­tic tool to build some­thing that works is all well and good, but build­ing a prod­uct for the mass mar­ket to use — and use well, and use safely — in­volves much, much more. Agentic cod­ing tools can some­times help with that too, but the leap from I built a thing” to therefore any­one can build a thing” misses the en­tire point of why you hire knowl­edge­able, ex­pe­ri­enced peo­ple in the first place. It’s also why I think the best case of these tools is build­ing to­tally per­son­al­ized tools to as­sist you in ac­com­plish­ing a spe­cific task, and not for build­ing mass mar­ket tools.

This all re­minds me of cargo cult think­ing: The CEO knows that some­where in the org, em­ploy­ees are peck­ing away at com­put­ers and work gets done. So they fig­ure that them­selves peck­ing away with Claude Code and see­ing work get done is the same thing. It’s not. All those other steps those peo­ple are han­dling — the ones the CEO never sees — still need to hap­pen.

That’s not to say em­ploy­ees would­n’t ben­e­fit from a deeper un­der­stand­ing of both the power and the lim­its of these tools — they would. But there’s some­thing darkly com­i­cal about watch­ing a CEO go all in on the tech and then im­me­di­ately con­clude it means they can fire half the staff.

It seems pretty clear to me that com­pa­nies that think they’ll be able to lay­off huge swaths of work­ers be­cause of LLM tools are go­ing to find out they’re mis­taken pretty quickly. The power of LLMs is that when used well and used will­ingly it can help em­ploy­ees to get more done, but that does­n’t mean you need fewer hu­mans. You need more hu­mans who know how to work pro­duc­tively.

Separately, com­pa­nies point­ing to LLMs as a rea­son for large lay­offs are, in most cases, just us­ing it as an ex­cuse. They over-hired, and AI ef­fi­cien­cies” is a much more palat­able story for Wall Street than we made bad head­count de­ci­sions.”

Levie’s pre­scrip­tion, though, is right: CEOs should learn how the tech works, but that in­cludes the lim­i­ta­tions of the tech­nol­ogy. If a CEO thinks the pro­to­type they vibe coded is pro­duc­tion-ready, let them ship it and see what hap­pens. If they think a vibe coded con­tract is as solid as one a lawyer re­viewed, let them find out what the le­gal bills look like when it falls apart.

Yes, the tools are pow­er­ful, but a CEO who thinks they re­place the work of em­ploy­ees is sim­ply a bad CEO.

Filed Under: aaron levie, ai, ceos, llms, work

FCC Wants to Kill Burner Phones By Forcing Telecoms to Get All Customers’ IDs

www.404media.co

The Federal Communications Commission (FCC) wants to make it ef­fec­tively im­pos­si­ble for peo­ple to buy what many call burner phones—a phone not ex­plic­itly linked to your iden­tity at the point of pur­chase—which would im­pact pri­vacy-con­scious peo­ple, to do­mes­tic abuse sur­vivors, to jour­nal­ists, and many more. The FCC plans to do this by legally forc­ing the coun­try’s tele­coms to store a wealth of per­sonal in­for­ma­tion about es­sen­tially all phone cus­tomers, in­clud­ing a gov­ern­ment is­sued iden­ti­fi­ca­tion num­ber and their phys­i­cal ad­dress, alarm­ing pri­vacy ad­vo­cates and civil rights ac­tivists who com­pare the mea­sures to those from au­thor­i­tar­ian coun­tries where it can be dif­fi­cult to buy a mo­bile phone plan with­out giv­ing up your iden­tity.

The pro­posed change would dras­ti­cally shake up how peo­ple ob­tain phone plans in the U.S., and have all sorts of pri­vacy and cy­ber­se­cu­rity knock-on ef­fects. The FCC is propos­ing the data col­lec­tion partly as a way to com­bat scam­mers, with tele­coms be­ing re­quired to col­lect other in­for­ma­tion on busi­ness and for­eign cus­tomers like the in­tended use case of their bulk phone plan pur­chase and their IP ad­dress. But the changes would mean tele­coms col­lect data on all new and re­new­ing cus­tomers, and the FCC pro­vides a long list of other things that the col­lected data could help au­thor­i­ties with.

💡

Do you know any­thing else about this pro­posed change? I would love to hear from you. Using a non-work de­vice, you can mes­sage me se­curely on Signal at joseph.404 or send me an email at joseph@404­me­dia.co.

For decades, civil lib­er­tar­i­ans have looked over­seas at au­thor­i­tar­ian coun­tries where the gov­ern­ment re­quires peo­ple to reg­is­ter to get a mo­bile phone to en­sure they can be tracked. We never thought that would hap­pen here,” Jay Stanley, se­nior pol­icy an­a­lyst at the American Civil Liberties Union’s (ACLU) Speech, Privacy, and Technology Project told 404 Media in an email. But make no mis­take: with this rule­mak­ing, the gov­ern­ment is con­tem­plat­ing tak­ing away peo­ple’s abil­ity to get a burner phone, which will hurt low-in­come peo­ple, do­mes­tic vi­o­lence vic­tims, and any­one else who cares about their pri­vacy.”

This post is for paid mem­bers only

Become a paid mem­ber for un­lim­ited ad-free ac­cess to ar­ti­cles, bonus pod­cast con­tent, and more.

Subscribe

Sign up for free ac­cess to this post

Free mem­bers get ac­cess to posts like this one along with an email round-up of our week’s sto­ries.

Subscribe

Already have an ac­count? Sign in

Upcoming breaking changes for npm v12

github.blog

Our next npm ma­jor ver­sion, v12, in­tro­duces se­cu­rity-re­lated de­fault changes to npm in­stall. All these changes are avail­able be­hind warn­ings in npm to­day on 11.16.0 or newer, so you can pre­pare be­fore the up­grade. v12 is es­ti­mated to re­lease in July 2026.

Each change turns an npm in­stall be­hav­ior that runs au­to­mat­i­cally to­day into one you ex­plic­itly opt into:

al­lowScripts de­faults to off: npm in­stall will no longer ex­e­cute pre­in­stall, in­stall, or postin­stall scripts from de­pen­den­cies un­less they are ex­plic­itly al­lowed in your pro­ject. This in­cludes na­tive node-gyp builds (i.e., a pack­age with a bind­ing.gyp and no ex­plicit in­stall script still gets blocked, be­cause npm runs an im­plicit node-gyp re­build for it). pre­pare scripts from git, file, and link de­pen­den­cies are blocked the same way. To see what would be blocked, run npm ap­prove-scripts –allow-scripts-pending. Then al­low the pack­ages you trust with npm ap­prove-scripts and block the rest with npm deny-scripts. The re­sult­ing al­lowlist is writ­ten to pack­age.json and should be com­mit­ted. If your in­stall rou­tine runs scripts, you can ob­serve warn­ings in npm 11.16.0+.

–allow-git de­faults to none: npm in­stall will no longer re­solve Git de­pen­den­cies (direct or tran­si­tive) un­less ex­plic­itly al­lowed via –allow-git. This closes a code-ex­e­cu­tion path where a Git de­pen­den­cy’s .npmrc could over­ride the Git ex­e­cutable, even with –ignore-scripts. This change was pre­vi­ously an­nounced on 2026 – 02-18 and is avail­able in npm 11.10.0+.

–allow-git de­faults to none: npm in­stall will no longer re­solve Git de­pen­den­cies (direct or tran­si­tive) un­less ex­plic­itly al­lowed via –allow-git. This closes a code-ex­e­cu­tion path where a Git de­pen­den­cy’s .npmrc could over­ride the Git ex­e­cutable, even with –ignore-scripts. This change was pre­vi­ously an­nounced on 2026 – 02-18 and is avail­able in npm 11.10.0+.

–allow-remote de­faults to none: npm in­stall will no longer re­solve de­pen­den­cies from re­mote URLs, such as https tar­balls (direct or tran­si­tive), un­less ex­plic­itly al­lowed via –allow-remote. This flag is avail­able in npm 11.15.0+. The re­lated –allow-file and –allow-directory flags are not chang­ing their de­faults in v12.

–allow-remote de­faults to none: npm in­stall will no longer re­solve de­pen­den­cies from re­mote URLs, such as https tar­balls (direct or tran­si­tive), un­less ex­plic­itly al­lowed via –allow-remote. This flag is avail­able in npm 11.15.0+. The re­lated –allow-file and –allow-directory flags are not chang­ing their de­faults in v12.

How to pre­pare

Upgrade to npm 11.16.0 or later, run your nor­mal in­stall, and re­view the warn­ings. Use npm ap­prove-scripts –allow-scripts-pending to see which pack­ages have scripts, ap­prove the ones you trust, and com­mit the up­dated pack­age.json. After that, only the scripts you ap­proved keep run­ning once you up­grade. Anything you leave un­ap­proved will stop. More de­tails are avail­able in our docs at npm ap­prove-scripts, npm deny-scripts, and al­low-scripts con­fig (for npx and global in­stalls). Please share your com­ments and ques­tions in our com­mu­nity dis­cus­sion.

reuters.com

www.reuters.com

Please en­able JS and dis­able any ad blocker

How building an HTML-first site doubled our users overnight

mohkohn.co.uk

Jun 10, 2026

This is a story of how build­ing HTML-first dou­bled a com­pa­ny’s users lit­er­ally overnight.

My client was a util­ity com­pany, and they had a big prob­lem. To ap­ply for their ser­vices, cus­tomers could ei­ther use an old ASP form on the web­site, or fol­low a man­ual process. The man­ual process was more ex­pen­sive for the com­pany, of course. Adding a lot of pres­sure, this was a reg­u­lated mo­nop­oly, and if their cus­tomer sat­is­fac­tion dropped be­low 96% (if I re­mem­ber cor­rectly) it could re­sult in mil­lions of pounds in fines.

There were two pre­vi­ous failed (and very ex­pen­sive) at­tempts to solve the prob­lem. In the most re­cent, con­trac­tors in an­other coun­try had built a React app. The React app was on­line for 3 days be­fore be­ing pulled be­cause of cus­tomer com­plaints. I took one look at it and told my boss we can’t take own­er­ship of this.” It was a mess of load­ing spin­ners and global javascript states. It was not ac­ces­si­ble. Image up­load was a vi­tal part of the form, and it at­tempted to store im­ages (along with all other form data) in lo­cal­stor­age which has a 5mb limit!

I took a very bold de­ci­sion and built a new ver­sion of the site us­ing Astro. It was HTML-first. Javascript ex­isted, in web com­po­nents, but only to pro­gres­sively-en­hance a web­site that worked per­fectly fine with­out it.

My logic was thus:

This is a pub­lic ser­vice

It should work on every ma­chine pos­si­ble

It should work when con­nec­tions are poor

The forms must never lose data once it is en­tered

I was very moved by this anec­dote from Terence Eden:

A few years ago I was do­ing pol­icy re­search in a hous­ing ben­e­fits of­fice in London. They are sin­gu­larly unlovely places. The walls are bright­ened up with posters of­fer­ing help­ful ser­vices for peo­ple flee­ing do­mes­tic vi­o­lence. The se­cu­rity guards on the door are cau­tiously in­dif­fer­ent to any­one walk­ing in. The air is filled with tense con­ver­sa­tions be­tween part­ners - drowned out by the noise of scream­ing kids. In the mid­dle, a young woman sits on a hard plas­tic chair. She is sur­rounded by can­vas-bags con­tain­ing her worldly pos­ses­sions. She does­n’t look like she is in a great emo­tional place right now. Clutched in her hands is a games con­sole - a PlayStation Portable. She stares at it in­tensely; block­ing out the world with Candy Crush. Or, at least, that’s what I thought. Walking be­hind her, I glance at her con­sole and recog­nise the screen she’s on. She’s con­nected to the com­ple­men­tary WiFi and is brows­ing the GOV.UK pages on Housing Benefit. She’s not slic­ing fruit; she’s arm­ing her­self with knowl­edge. The PSPs web browser is - char­i­ta­bly - pa­thetic. It is slow, fre­quently runs out of mem­ory, and can only open 3 tabs at a time. But the GOV.UK pages are writ­ten in sim­ple HTML. They are de­signed to be light­weight and will work even on rub­bish browsers. They have to. This is for every­one.

A few years ago I was do­ing pol­icy re­search in a hous­ing ben­e­fits of­fice in London. They are sin­gu­larly unlovely places. The walls are bright­ened up with posters of­fer­ing help­ful ser­vices for peo­ple flee­ing do­mes­tic vi­o­lence. The se­cu­rity guards on the door are cau­tiously in­dif­fer­ent to any­one walk­ing in. The air is filled with tense con­ver­sa­tions be­tween part­ners - drowned out by the noise of scream­ing kids.

In the mid­dle, a young woman sits on a hard plas­tic chair. She is sur­rounded by can­vas-bags con­tain­ing her worldly pos­ses­sions. She does­n’t look like she is in a great emo­tional place right now. Clutched in her hands is a games con­sole - a PlayStation Portable. She stares at it in­tensely; block­ing out the world with Candy Crush.

Or, at least, that’s what I thought.

Walking be­hind her, I glance at her con­sole and recog­nise the screen she’s on. She’s con­nected to the com­ple­men­tary WiFi and is brows­ing the GOV.UK pages on Housing Benefit. She’s not slic­ing fruit; she’s arm­ing her­self with knowl­edge.

The PSPs web browser is - char­i­ta­bly - pa­thetic. It is slow, fre­quently runs out of mem­ory, and can only open 3 tabs at a time.

But the GOV.UK pages are writ­ten in sim­ple HTML. They are de­signed to be light­weight and will work even on rub­bish browsers. They have to. This is for every­one.

Some re­quire­ments I de­rived:

Each ses­sion with the form should have a unique ID

At every step in the form wiz­ard, sub­mit­ted data should be stored on the back­end, in­clud­ing up­loads

It should be pos­si­ble to com­plete the form with­out javascript

It should be pos­si­ble to com­plete the form on out­dated and crap web browsers

We had to meet WCAG ac­ces­si­bil­ity (the team set­tled on AA rather than AAA)

Javascript and mod­ern CSS should be used to en­hance the ex­pe­ri­ence

The ba­sic setup ended up be­ing that each step in the form wiz­ard was its own page. When the user clicked next, the form would sub­mit. If the data was judged to be valid by the API, the browser would be redi­rected to the next step.

A ven­er­a­ble web ap­pli­ca­tion pat­tern that has had a small mod­ern re­nais­sance thanks to Remix, form sub­mis­sions and redi­rects took a while to ex­plain to my col­leagues, on ac­count of every­one be­ing used to heav­ily client-side web ap­pli­ca­tions. I have noth­ing against heav­ily client-side ap­pli­ca­tions, in their place. But this is just a big form - it’s not show­ing real-time data. Our user could be stand­ing in the mid­dle of a field on a new-build hous­ing es­tate, hold­ing a decade-old com­mod­ity an­droid phone they bought in Tesco. Shipping them 20MB of javascript be­fore we even ren­der a form would be a ridicu­lous thing to do.

Next, I tack­led one of my biggest bug­bears, form val­i­da­tion (and form and form er­ror ren­der­ing). I have seen teams waste per­son-months of ef­fort wran­gling React val­i­da­tion li­braries. If you are a React per­son, you might be scoff­ing at this - skill is­sue, I guess - but it is the re­al­ity for many teams. I would like to humbly sug­gest that you too may be spend­ing more time than you re­alise, and a lot more time than is nec­es­sary, in­ter­act­ing with and main­tain­ing poor im­i­ta­tions of the val­i­da­tion sys­tem that ships with every browser.

So I built an HTML web com­po­nent. These are sim­ple cus­tom el­e­ments that wrap around ex­ist­ing HTML and bring it to life. No shadow DOM, no (or lit­tle) ren­der­ing HTML in javascript. Mine wrapped around any HTML form, picked up the HTML val­i­da­tion, and made it look mod­ern. It would pre­vent those HTML val­i­da­tion popup tooltips, and in­stead place the er­ror in the aria-de­scribedby el­e­ment as­so­ci­ated with the field (today, aria-er­rormes­sage is ad­vised in­stead). It would clear val­i­da­tion while you typed, if you reached a valid state, and as­sess it again on blur and sub­mit.

Exactly the user ex­pe­ri­ence a form needs, de­liv­ered in un­der 1KB. If it failed, the form would fall back to built-in browser val­i­da­tion. If that failed, the back­end API would han­dle val­i­da­tion. We re­ported val­i­da­tion is­sues to the user as early as pos­si­ble given their browser, and al­ways fell back to an ac­cept­able ex­pe­ri­ence if it failed.

I have since writ­ten a new ver­sion of this web com­po­nent from scratch, aimed for gen­eral use. It’s called val­i­da­tion-en­hancer. I have been in this in­dus­try for over 20 years, and it is the best form val­i­da­tion li­brary I have ever used. I am very proud of it.

The code is so sim­ple to work with:

<validation-enhancer> <form>

<label for=“my-email”>Email</​la­bel> <input type=“email” name=“my-email” aria-er­rormes­sage=“my-email-er­ror” re­quired /> <div id=“my-email-er­ror”></​div>

<button type=“sub­mit”>Sub­mit</​but­ton> </form> </validation-enhancer>

The re­sults? When we launched, the num­ber of peo­ple com­plet­ing the form dou­bled. The an­a­lyt­ics peo­ple did­n’t even know where these users were com­ing from. Of course, your javascript-based an­a­lyt­ics pack­age does­n’t see the users you are bounc­ing be­cause of javascript fail­ures. It was a flood! We also saw my keep a back­end ses­sion, never lose user data” ap­proach pay off. In one case, some­one com­pleted a form a month af­ter start­ing it.

There was a sad coda; as is the way of con­tract work, I moved on. I ex­plained what I had built to my re­place­ment, that it al­ways worked even with­out javascript. He was ap­palled and said, but that’s a lot more work for us.”

It is not ac­cept­able to bounce users on old browsers, users with bad net­work con­nec­tions, users us­ing as­sis­tive tech­nolo­gies. Certainly not from a mo­nop­oly pub­lic ser­vice. A lot of hype and noise is press­ing us to ex­tend the cow­boy, wild-west phase of the soft­ware in­dus­try’s ex­pan­sion. We should set that aside, and take our­selves se­ri­ously as a ma­ture in­dus­try. Build a web ap­pli­ca­tion that works on a playsta­tion portable on a 3G con­nec­tion - if you do, it will work for all your users, and it will still work 30 years from now.

What it feels like to work with Mythos

www.oneusefulthing.org

I had early ac­cess to the first Mythos-class AI model be­ing re­leased to the pub­lic, Claude 5 Fable. Much of the dis­cus­sion of Mythos has cen­tered on its im­pact on soft­ware se­cu­rity, but I tested it on every­thing ex­cept that (the guardrails around Fable es­sen­tially pre­vent it from be­ing used for cy­ber­se­cu­rity at all). My con­clu­sion is that it rep­re­sents a very real leap over every model I have used be­fore, and, maybe more im­por­tant, sug­gests our re­la­tion­ship with AI is chang­ing in dras­tic ways.

First, how good is Fable? In ex­per­i­ment af­ter ex­per­i­ment I con­ducted, it out­per­formed ba­si­cally every other pub­lic model I have used by a con­sid­er­able mar­gin. It was ca­pa­ble across many prob­lems and pro­duced some star­tling re­sults — it would work up to a dozen hours ex­e­cut­ing on multi-page spec­i­fi­ca­tions. I’ll walk you through a cou­ple of more com­plex, and se­ri­ous, use cases shortly, but you could see the gen­eral im­prove­ment across the board on every task. The prob­lem about com­mu­ni­cat­ing this in a post is that many of the most im­pres­sive re­sults are go­ing to be in­ter­est­ing to only small por­tions of my read­ers. For ex­am­ple, it made the most so­phis­ti­cated aca­d­e­mic so­cial sci­ence pa­per I have yet seen from an AI from a sin­gle prompt and one piece of feed­back. It also cre­ated a 10-page epic rhyming poem about a hair­cut where every word starts with the let­ter s.

So, as a more ac­ces­si­ble and en­ter­tain­ing ex­am­ple, I also had it cre­ate a bunch of games you can try. All of these are one ini­tial prompt in Claude Code where Fable had to take my vague prompts and gen­er­ate some­thing work­able, fol­lowed by a cou­ple of ad­di­tional prompts with mi­nor en­cour­age­ment (“make it bet­ter”) or feed­back. What makes these es­pe­cially im­pres­sive is that Claude can­not gen­er­ate im­ages, so every piece of art or 3D ob­ject was made with math alone, not us­ing any ex­ter­nal as­sets. You can try any of them: a game about flip­ping coins (prompt: Balatro, but for the game of coin flips”) that is quite fun; a snake game where the snake is self-aware and crazy things hap­pen; or a game about de­scend­ing into the depths to see what is there.

So the out­put is im­pres­sive. But, es­pe­cially as I turned to more se­ri­ous pro­jects, I of­ten felt us­ing the tool was some­where be­tween de­light­ful and un­nerv­ing. Delightful be­cause I just asked for some­thing at it hap­pened. And also un­nerv­ing be­cause I just asked for some­thing and it hap­pened.

To see why, it helps to un­der­stand the way in which Fable gets work done, and for that I want to turn to an ex­am­ple I have tested on many pre­vi­ous AI mod­els: build­ing an isochrone map. This is a map that shows the dis­tance you can travel in a given length of time, and the first one was cre­ated in 1881 show­ing travel times from London.

No pre­vi­ous model did an even halfway use­ful job with try­ing to cre­ate a map like this be­cause it in­volves re­search­ing thou­sands of po­ten­tial trip dis­tances and a lot of small judge­ment calls and de­ci­sions. I de­cided to try it on Fable us­ing Claude Code with this prompt: i want you to build a fully re­searched and beau­ti­ful isochronic map that lets me pick var­i­ous cities and see real isochronic lines based on real data. I want the de­sign to be unique. You should take into ac­count air­ports (and travel time to and from air­ports) trains, walk­ing, dri­ving. The data does not need to be live but should be real based on your re­search and data. You can start with a few cities but more gen­eral is bet­ter, this should be an en­tirely new pro­ject. It then sug­gested that it do this in the style of the orig­i­nal map. I agreed, and it got to work.

It is worth a sec­ond look­ing at the tran­script of the mul­ti­ple hour build­ing ses­sion the AI went through on its own, be­cause you can see some un­usual things. First, the AI launched mul­ti­ple other AIs (I be­lieve mostly the cheaper Claude Sonnet) to help it con­duct re­search on travel times, ul­ti­mately re­triev­ing over 2,200 spe­cific flights, the rail sched­ules for trains from the TGV to the Shinkansen, and road speeds per coun­try from mul­ti­ple aca­d­e­mic pa­pers. And while those agents were run­ning, it started cod­ing. Then it launched yet more agents and tests to ver­ify its code, all the while tak­ing notes about its progress.

The re­sult was a fully func­tion­ing map of im­pres­sive so­phis­ti­ca­tion that looked a lot like the 1881 orig­i­nal, but that does­n’t mean it was per­fect. I no­ticed that a lot of re­mote lo­ca­tions (like Greenland) just con­tained es­ti­mates of travel time, not ex­act num­bers, so I told Fable to fix it, in­clud­ing the in­struc­tions: ac­tu­ally get travel times to re­mote air­ports and lo­ca­tions. This time the AI launched a work­flow, ad­ver­sar­ial groups of agents that did re­search and tested each oth­ers re­sults. It fig­ured out how of­ten ships sail to Pitcairn Island in the Pacific and how to get to Grise Fjord from Ottawa. And it used a tremen­dous num­ber of to­kens in a very short pe­riod of time (more on this soon).

The re­sults were im­pres­sive. I pushed a few more times in di­rec­tions that in­ter­ested me (including ask­ing for other vi­su­al­iza­tion ap­proaches, etc.). I would rec­om­mend spend­ing a cou­ple min­utes click­ing around the re­sults, and you can read its meth­ods and sources at the bot­tom of the graph.

This is prob­a­bly not a use­ful pro­ject for you un­less you re­ally like travel and maps, but it is in­dica­tive of AI solv­ing a hard prob­lem in­volv­ing re­search, math, vi­sual de­vel­op­ment, taste, judge­ment, com­plex cod­ing, and more. And, the un­nerv­ing part was how lit­tle I did. I gave a re­ally am­bi­tious in­struc­tion, the AI fol­lowed it. I gave a cou­ple of mi­nor pieces of feed­back, and the AI fig­ured it out. My role was ex­tremely lim­ited.

Importantly, it was just lim­ited in how much work I did rel­a­tive to the model, it was also lim­ited in how much con­trol I had over how the model did things, why the model chose par­tic­u­lar ap­proaches, or even how in-depth its re­sults would be. The de­tails of the AIs de­ci­sion mak­ing are not shown to me, and the process would be too long to even be worth fol­low­ing. The map re­quired the AI to make judge­ment calls about hun­dreds of lit­tle choices, and it just made them, with­out me un­der­stand­ing the choices or hav­ing a chance to weigh in. In many ways, it is mirac­u­lous (I can al­ways ask for ed­its at the end) on the other, it turns AI into the ul­ti­mate black box.

The most am­bi­tious pro­ject I got from Fable takes a lit­tle more ex­pla­na­tion. I do a lot of re­search where hu­mans pro­duce messy an­swers and do­ing any sort of analy­sis re­quires cat­e­go­rize those an­swers prop­erly: how in­no­v­a­tive is an idea? why do peo­ple like this book? To fig­ure this out, we used hu­man re­searchers to make a judge­ment call about a piece of in­for­ma­tion, and sta­tis­ti­cally com­pare their an­swers with oth­ers to fig­ure out whether we can trust the data. A lot of re­cent re­search has shown that AIs might be able to do this im­por­tant work, but cal­i­brat­ing AI and hu­man judge­ment has been dif­fi­cult and ex­pen­sive. So I asked Fable to solve the prob­lem, first gen­er­at­ing a com­plex 19 page de­sign doc­u­ment and then ex­e­cut­ing it.

It worked for nine and a half hours.

The re­sult was an ex­tremely so­phis­ti­cated piece of soft­ware the AI called Concord that could take in mul­ti­ple datasets, cal­i­brate hu­man and AI re­sponses, and then con­duct com­plex data analy­sis on the re­sults. Again, it was­n’t per­fect. As an ex­pert, I was able to spot some er­rors and omis­sions (some as a re­sult of the de­sign I had asked for) that I had the AI cor­rect. But the scope of the de­liv­ery on this pro­ject, and many oth­ers, ex­ceeded any­thing I had seen be­fore. In this case, it was a piece of soft­ware that re­searchers have needed for years but was never prof­itable to cre­ate. You can now just use or mod­ify the code here. I am sure it is not per­fect (I only spent an hour work­ing with the re­sults), but a soft­ware en­gi­neer would iron out the re­main­ing po­ten­tial bugs that I could not find quickly (which is one rea­son we may need more, not less, coders in the fu­ture, to help with the ex­plo­sion of new uses for soft­ware).

This power goes hand in hand with strange­ness and lim­its. Among those lim­its is its to­ken us­age. Fable is twice as ex­pen­sive as Opus, and it burns through to­kens at a rate that sug­gests the an­swer to how much it costs in pro­duc­tion is a lot,” though its clever del­e­ga­tion to cheaper mod­els may lower the real price con­sid­er­ably. The guardrails for Fable also trip at the faintest hint of a se­cu­rity prob­lem, de­fault­ing to the less pow­er­ful Claude 4.8 Opus, and it hap­pens way too of­ten. And the jagged fron­tier is still there. For ex­am­ple, the AI still writes in the same weird style (in fact the soft­ware Fable pro­duces bears traces of Claudisms; so do its progress re­ports, all that car­ry­ing the weight and earn­ing the an­swer). But the deeper strange­ness is how lit­tle I had to do, and how lit­tle I could see while it was be­ing done.

Last year I called this work­ing with a wiz­ard: you chant the spell and some­thing hap­pens. With Fable the spell has got­ten pow­er­ful enough that I am no longer sure I am the wiz­ard. I am closer to a pa­tron. I de­scribe what I want, I pay for it, and I judge the re­sult. The con­jur­ing hap­pens some­where I can­not watch, in hun­dreds of small choices I never get a vote on. The work has shifted from process to out­come. I no longer steer; I com­mis­sion.

It is pos­si­ble the sidelin­ing is tem­po­rary, just an ar­ti­fact of in­ter­faces that haven’t caught up, and that we’ll get bet­ter win­dows into what these mod­els are do­ing and bet­ter ways to steer them mid­stream. It is also pos­si­ble that the op­po­site is true: that the more ca­pa­ble the model, the less there is for a hu­man to mean­ing­fully do, and the black box is the price of the power. I sus­pect that is more likely to be the real di­rec­tion. None of this is a loss of con­trol in the ob­vi­ous sense. I can still steer Fable, and it fol­lows in­struc­tions re­mark­ably well: the more am­bi­tious the in­struc­tion, the bet­ter the re­sult. But steer­ing is no longer the same as do­ing. I brief the model, it spins up its own agents to re­search and write and check one an­oth­er’s work, and what comes back is fin­ished. A pa­tron com­mis­sions a sin­gle artist. Fable is closer to a whole stu­dio, where I am the client who signs off on the fi­nal work with­out ever set­ting foot on the floor.

Share

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.