10 interesting stories served every morning and every evening.

DNSSEC Debugger - nic.de

dnssec-analyzer.verisignlabs.com

Back to Verisign Labs Tools

Analyzing DNSSEC prob­lems for nic.de

Move your mouse over any or sym­bols for re­me­di­a­tion hints.

Want a sec­ond opin­ion? Test nic.de at dnsviz.net.

↓ Advanced op­tions

DENIC Status

status.denic.de

Components

DNS

Services

DNS Nameservice

May 6, 2026 01:34 CESTMay 5, 2026 23:34 UTC

RESOLVED

All Services are up and run­ning.

May 5, 2026 23:28 CESTMay 5, 2026 21:28 UTC

INVESTIGATING

Frankfurt am Main, 5 May 2026 — DENIC eG is cur­rently ex­pe­ri­enc­ing a dis­rup­tion in its DNS ser­vice for .de do­mains. As a re­sult, all DNSSEC-signed .de do­mains are cur­rently af­fected in their reach­a­bil­ity.

The root cause of the dis­rup­tion has not yet been fully iden­ti­fied. DENICs tech­ni­cal teams are work­ing in­ten­sively on analy­sis and on restor­ing sta­ble op­er­a­tions as quickly as pos­si­ble.

Based on cur­rent in­for­ma­tion, users and op­er­a­tors of .de do­mains may ex­pe­ri­ence im­pair­ments in do­main res­o­lu­tion. Further up­dates will be pro­vided as soon as re­li­able find­ings on the cause and re­cov­ery are avail­able.

DENIC asks all af­fected par­ties for their un­der­stand­ing.

For fur­ther en­quiries, DENIC can be con­tacted via the usual chan­nels.

Accelerating Gemma 4: faster inference with multi-token prediction drafters

blog.google

May 05, 2026

By us­ing Multi-Token Prediction (MTP) drafters, Gemma 4 mod­els re­duce la­tency bot­tle­necks and achieve im­proved re­spon­sive­ness for de­vel­op­ers.

Olivier Lacombe

Director, Product Management

Maarten Grootendorst

Developer Relations Engineer

Your browser does not sup­port the au­dio el­e­ment.

Listen to ar­ti­cle

This con­tent is gen­er­ated by Google AI. Generative AI is ex­per­i­men­tal

[[duration]] min­utes

Just a few weeks ago, we in­tro­duced Gemma 4, our most ca­pa­ble open mod­els to date. With over 60 mil­lion down­loads in just the first few weeks, Gemma 4 is de­liv­er­ing un­prece­dented in­tel­li­gence-per-pa­ra­me­ter to de­vel­oper work­sta­tions, mo­bile de­vices and the cloud. Today, we are push­ing ef­fi­ciency even fur­ther.

We’re re­leas­ing Multi-Token Prediction (MTP) drafters for the Gemma 4 fam­ily. By us­ing a spe­cial­ized spec­u­la­tive de­cod­ing ar­chi­tec­ture, these drafters de­liver up to a 3x speedup with­out any degra­da­tion in out­put qual­ity or rea­son­ing logic.

Tokens-per-second speed in­creases, tested on hard­ware us­ing LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.

Why spec­u­la­tive de­cod­ing?

The tech­ni­cal re­al­ity is that stan­dard LLM in­fer­ence is mem­ory-band­width bound, cre­at­ing a sig­nif­i­cant la­tency bot­tle­neck. The proces­sor spends the ma­jor­ity of its time mov­ing bil­lions of pa­ra­me­ters from VRAM to the com­pute units just to gen­er­ate a sin­gle to­ken. This leads to un­der-uti­lized com­pute and high la­tency, es­pe­cially on con­sumer-grade hard­ware.

Speculative de­cod­ing de­cou­ples to­ken gen­er­a­tion from ver­i­fi­ca­tion. By pair­ing a heavy tar­get model (e.g., Gemma 4 31B) with a light­weight drafter (the MTP model), we can uti­lize idle com­pute to predict” sev­eral fu­ture to­kens at once with the drafter in less time than it takes for the tar­get model to process just one to­ken. The tar­get model then ver­i­fies all of these sug­gested to­kens in par­al­lel.

How spec­u­la­tive de­cod­ing works

Standard large lan­guage mod­els gen­er­ate text au­tore­gres­sively, pro­duc­ing ex­actly one to­ken at a time. While ef­fec­tive, this process ded­i­cates the same amount of com­pu­ta­tion to pre­dict­ing an ob­vi­ous con­tin­u­a­tion (like pre­dict­ing words” af­ter Actions speak louder than…”) as it does to solv­ing a com­plex logic puz­zle.

MTP mit­i­gates this in­ef­fi­ciency through spec­u­la­tive de­cod­ing, a tech­nique in­tro­duced by Google re­searchers in Fast Inference from Transformers via Speculative Decoding. If the tar­get model agrees with the draft, it ac­cepts the en­tire se­quence in a sin­gle for­ward pass —and even gen­er­ates an ad­di­tional to­ken of its own in the process. This means your ap­pli­ca­tion can out­put the full drafted se­quence plus one to­ken in the time it usu­ally takes to gen­er­ate a sin­gle one.

Unlocking faster AI from the edge to the work­sta­tion

For de­vel­op­ers, in­fer­ence speed is of­ten the pri­mary bot­tle­neck for pro­duc­tion de­ploy­ment. Whether you are build­ing cod­ing as­sis­tants, au­tonomous agents that re­quire rapid multi-step plan­ning, or re­spon­sive mo­bile ap­pli­ca­tions run­ning en­tirely on-de­vice, every mil­lisec­ond mat­ters.

By pair­ing a Gemma 4 model with its cor­re­spond­ing drafter, de­vel­op­ers can achieve:

Improved re­spon­sive­ness: Drastically re­duce la­tency for near real-time chat, im­mer­sive voice ap­pli­ca­tions and agen­tic work­flows.

Supercharged lo­cal de­vel­op­ment: Run our 26B MoE and 31B Dense mod­els on per­sonal com­put­ers and con­sumer GPUs with un­prece­dented speed, pow­er­ing seam­less, com­plex of­fline cod­ing and agen­tic work­flows.

Enhanced on-de­vice per­for­mance: Maximize the util­ity of our E2B and E4B mod­els on edge de­vices by gen­er­at­ing out­puts faster, which in turn pre­serves valu­able bat­tery life.

Zero qual­ity degra­da­tion: Because the pri­mary Gemma 4 model re­tains the fi­nal ver­i­fi­ca­tion, you get iden­ti­cal fron­tier-class rea­son­ing and ac­cu­racy, just de­liv­ered sig­nif­i­cantly faster.

Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in to­kens per sec­ond. Same out­put qual­ity, half the wait time.

Where you can dive deeper into MTP drafters

To make these MTP drafters ex­cep­tion­ally fast and ac­cu­rate, we in­tro­duced sev­eral ar­chi­tec­tural en­hance­ments un­der the hood. The draft mod­els seam­lessly uti­lize the tar­get mod­el’s ac­ti­va­tions and share its KV cache, mean­ing they don’t have to waste time re­cal­cu­lat­ing con­text the larger model has al­ready fig­ured out. For our E2B and E4B edge mod­els, where the fi­nal logit cal­cu­la­tion be­comes a big bot­tle­neck, we even im­ple­mented an ef­fi­cient clus­ter­ing tech­nique in the em­bed­der to fur­ther ac­cel­er­ate gen­er­a­tion.

We’ve also been closely an­a­lyz­ing hard­ware-spe­cific op­ti­miza­tions. For ex­am­ple, while the 26B mix­ture-of-ex­perts model pre­sents unique rout­ing chal­lenges at a batch size of 1 on Apple Silicon, pro­cess­ing mul­ti­ple re­quests si­mul­ta­ne­ously (e.g., batch sizes of 4 to 8) un­locks up to a ~2.2x speedup lo­cally. We see sim­i­lar gains with Nvidia A100 when in­creas­ing batch size.

Want to see the ex­act me­chan­ics of how this works? We’ve pub­lished an in-depth tech­ni­cal ex­plainer that un­packs the vi­sual ar­chi­tec­ture, KV cache shar­ing and ef­fi­cient em­bed­ders pow­er­ing these drafters.

How to get started

The MTP drafters for the Gemma 4 fam­ily are avail­able to­day un­der the same open-source Apache 2.0 li­cense as Gemma 4. Read the doc­u­men­ta­tion to learn how to use MTP with Gemma 4. You can down­load the model weights right now on Hugging Face, Kaggle, and start ex­per­i­ment­ing with faster in­fer­ence with trans­form­ers, MLX, VLLM, SGLang, and Ollama or try them di­rectly on Google AI Edge Gallery for Android or iOS.

We can’t wait to see how this new­found speed ac­cel­er­ates what you build next in the Gemmaverse.

AI didn't delete your database, you did

idiallo.com

Last week, a tweet went vi­ral show­ing a guy claim­ing that a Cursor/Claude agent deleted his com­pa­ny’s pro­duc­tion data­base. We watched from the side­lines as he tried to get a con­fes­sion from the agent: Why did you delete it when you were told never to per­form this ac­tion?” Then he tried to parse the an­swer to ei­ther learn from his mis­take or warn us about the dan­gers of AI agents.

I have a ques­tion too: why do you have an API end­point that deletes your en­tire pro­duc­tion data­base? His post ram­bled on about false mar­ket­ing in AI, bad cus­tomer sup­port, and so on. What was miss­ing was ac­count­abil­ity.

I’m not one to blindly de­fend AI, I al­ways err on the side of cau­tion. But I also know you can’t blame a tool for your own mis­takes.

In 2010, I worked with a com­pany that had a very man­ual de­ploy­ment process. We used SVN for ver­sion con­trol. To de­ploy, we had to copy trunk, the equiv­a­lent of the mas­ter branch, into a re­lease folder la­beled with a re­lease date. Then we made a sec­ond copy of that re­lease and called it current.” That way, pulling the cur­rent folder al­ways gave you the lat­est re­lease.

One day, while de­ploy­ing, I ac­ci­den­tally copied trunk twice. To fix it via the CLI, I edited my pre­vi­ous com­mand to delete the du­pli­cate. Then I con­tin­ued the de­ploy­ment with­out any is­sues… or so I thought. Turns out, I had­n’t deleted the du­pli­cate copy at all. I had edited the wrong com­mand and deleted trunk in­stead. Later that day, an­other de­vel­oper was con­fused when he could­n’t find it.

All hell broke loose. Managers scram­bled, meet­ings were called. By the time the news reached my team, the lead de­vel­oper had al­ready run a com­mand to re­vert the dele­tion. He checked the logs, saw that I was re­spon­si­ble, and my next task was to write a script to au­to­mate our de­ploy­ment process so this kind of mis­take could­n’t hap­pen again. Before the day was over, we had a more ro­bust sys­tem in place. One that even­tu­ally grew into a full CI/CD pipeline.

Automation helps elim­i­nate the silly mis­takes that come with man­ual, repet­i­tive work. We could have eas­ily gone around ask­ing Why did­n’t SVN pre­vent us from delet­ing trunk?” But the real prob­lem was our man­ual process. Unlike ma­chines, we can’t re­peat a task ex­actly the same way every sin­gle day. We are bound to slip up even­tu­ally.

With AI gen­er­at­ing large swaths of code, we get the il­lu­sion of that same se­cu­rity. But au­toma­tion means do­ing the same thing the same way every time. AI is more like me copy­ing and past­ing branches, it’s bound to make mis­takes, and it’s not equipped to ex­plain why it did what it did. The terms we use, like thinking” and reasoning,” may look like re­flec­tion from an in­tel­li­gent agent. But these are mar­ket­ing terms slapped on top of AI. In re­al­ity, the mod­els are still just gen­er­at­ing to­kens.

Now, back to the main prob­lem this guy faced. Why does a pub­lic-fac­ing API that can delete all your pro­duc­tion data­bases even ex­ist? If the AI had­n’t called that end­point, some­one else even­tu­ally would have. It’s like putting a self-de­struct but­ton on your car’s dash­board. You have every rea­son not to press it, be­cause you like your car and it takes you from point A to point B. But a mo­ti­vated tod­dler who wig­gles out of his car seat will hit that big red but­ton the mo­ment he sees it. You can’t then in­ter­ro­gate the child about his rea­son­ing. Mine would have an­swered sim­ply: I did it be­cause I pressed it.”

I sus­pect a large part of this com­pa­ny’s ap­pli­ca­tion was vibe-coded. The soft­ware ar­chi­tects used AI to spec the prod­uct from AI-generated de­scrip­tions pro­vided by the prod­uct team. The de­vel­op­ers used AI to write the code. The re­view­ers used AI to ap­prove it. Now, when a bug ap­pears, the only op­tion is to in­ter­ro­gate yet an­other AI for an­swers, prob­a­bly not even run­ning on the same GPU that gen­er­ated the orig­i­nal code. You can’t blame the GPU!

The sim­ple so­lu­tion is know what you’re de­ploy­ing to pro­duc­tion. The more re­al­is­tic one is, if you’re go­ing to use AI ex­ten­sively, build a process where com­pe­tent de­vel­op­ers use it as a tool to aug­ment their work, not a way to avoid ac­count­abil­ity. And please, don’t let your CEO or CTO write the code.

Three Inverse Laws of AI

susam.net

By Susam Pal on 12 Jan 2026

Introduction

Since the launch of ChatGPT in November 2022, gen­er­a­tive ar­ti­fi­cial

in­tel­li­gence (AI) chat­bot ser­vices have be­come in­creas­ingly

so­phis­ti­cated and pop­u­lar. These sys­tems are now em­bed­ded in search

en­gines, soft­ware de­vel­op­ment tools as well as of­fice soft­ware. For

many peo­ple, they have quickly be­come part of every­day com­put­ing.

These ser­vices have turned out to be quite use­ful, es­pe­cially for

ex­plor­ing un­fa­mil­iar top­ics and as a gen­eral pro­duc­tiv­ity aid.

However, I also think that the way these ser­vices are ad­ver­tised and

con­sumed can pose a dan­ger to so­ci­ety, es­pe­cially if we get into the

habit of trust­ing their out­put with­out fur­ther scrutiny.

Contents

Introduction

Pitfalls

Inverse Laws of Robotics

Non-Anthropomorphism

Non-Deference

Non-Abdication of Responsibility

Non-Anthropomorphism

Non-Deference

Non-Abdication of Responsibility

Conclusion

Pitfalls

Certain de­sign choices in mod­ern AI sys­tems can en­cour­age un­crit­i­cal

ac­cep­tance of their out­put. For ex­am­ple, many pop­u­lar search

en­gines are al­ready high­light­ing an­swers gen­er­ated by AI at the very

top of the page. When this hap­pens, it is easy to stop scrolling,

ac­cept the gen­er­ated an­swer and move on. Over time, this could

in­ad­ver­tently train users to treat AI as the de­fault au­thor­ity

rather than as a start­ing point for fur­ther in­ves­ti­ga­tion. I wish

that each such gen­er­a­tive AI ser­vice came with a brief but

con­spic­u­ous warn­ing ex­plain­ing that these sys­tems can some­times

pro­duce out­put that is fac­tu­ally in­cor­rect, mis­lead­ing or

in­com­plete. Such warn­ings should high­light that ha­bit­u­ally trust­ing

AI out­put can be dan­ger­ous. In my ex­pe­ri­ence, even when such

warn­ings ex­ist, they tend to be min­i­mal and vi­su­ally deem­pha­sised.

In the world of sci­ence fic­tion, there are the

Three

Laws of Robotics de­vised by Isaac Asimov, which re­cur through­out

his work. These laws were de­signed to con­strain the be­hav­iour of

ro­bots in or­der to keep hu­mans safe. As far as I know, Asimov never

for­mu­lated any equiv­a­lent laws gov­ern­ing how hu­mans should in­ter­act

with ro­bots. I think we now need some­thing to that ef­fect to keep

our­selves safe. I will call them the Inverse Laws of

Robotics. These ap­ply to any sit­u­a­tion that re­quires us hu­mans

to in­ter­act with a ro­bot, where the term robot’ refers to any

ma­chine, com­puter pro­gram, soft­ware ser­vice or AI sys­tem that is

ca­pa­ble of per­form­ing com­plex tasks au­to­mat­i­cally. I use the term

inverse’ here not in the sense of log­i­cal nega­tion but to in­di­cate

that these laws ap­ply to hu­mans rather than to ro­bots.

It is well known that Asimov’s laws were flawed. Indeed, Asimov

used those flaws to great ef­fect as a source of ten­sion. But the

par­tic­u­lar ways in which they fail for fic­tional ro­bots do not

nec­es­sar­ily carry over to these in­verse laws for hu­mans. Asimov’s

laws try to con­strain the be­hav­iour of au­tonomous ro­bots. However,

these in­verse laws are meant to guide the judge­ment and con­duct of

hu­mans. Still, one thing we can learn from Asimov’s sto­ries is that

no fi­nite set of laws can ever be fool­proof for the com­plex is­sues

we face with AI and ro­bot­ics. But that does not mean we should not

even try. There will al­ways be edge cases where judge­ment is

re­quired. A non-ex­haus­tive set of prin­ci­ples can still be use­ful if

it helps us think more clearly about the risks in­volved.

Inverse Laws of Robotics

Here are the three in­verse laws of ro­bot­ics:

Humans must not an­thro­po­mor­phise AI sys­tems.

Humans must not blindly trust the out­put of AI sys­tems.

Humans must re­main fully re­spon­si­ble and ac­count­able for

con­se­quences aris­ing from the use of AI sys­tems.

Non-Anthropomorphism

Humans must not an­thro­po­mor­phise AI sys­tems. That is, hu­mans must

not at­tribute emo­tions, in­ten­tions or moral agency to them.

Anthropomorphism dis­torts judge­ment. In ex­treme cases,

an­thro­po­mor­phis­ing can lead to emo­tional de­pen­dence.

Modern chat­bot sys­tems of­ten sound con­ver­sa­tional and em­pa­thetic.

They use po­lite phras­ing and con­ver­sa­tional pat­terns that closely

re­sem­ble hu­man in­ter­ac­tion. While this makes them eas­ier and more

pleas­ant to use, it also makes it eas­ier to for­get what they

ac­tu­ally are: large sta­tis­ti­cal mod­els pro­duc­ing plau­si­ble text

based on pat­terns in data.

I think ven­dors of AI based chat­bot ser­vices could do a bet­ter job

here. In many cases, the sys­tems are de­lib­er­ately tuned to feel more

hu­man rather than more me­chan­i­cal. I would ar­gue that the op­po­site

ap­proach would be health­ier in the long term. A slightly more

ro­botic tone would re­duce the like­li­hood that users mis­take flu­ent

lan­guage for un­der­stand­ing, judge­ment or in­tent.

Whether or not ven­dors make such changes, it still serves us well, I

think, to avoid this pit­fall our­selves. We should ac­tively re­sist

the habit of treat­ing AI sys­tems as so­cial ac­tors or moral agents.

Doing so pre­serves clear think­ing about their ca­pa­bil­i­ties and

lim­i­ta­tions.

Non-Deference

Humans must not blindly trust the out­put of AI sys­tems.

AI-generated con­tent must not be treated as au­thor­i­ta­tive with­out

in­de­pen­dent ver­i­fi­ca­tion ap­pro­pri­ate to its con­text.

This prin­ci­ple is not unique to AI. In most ar­eas of life, we

should not ac­cept in­for­ma­tion un­crit­i­cally. In prac­tice, of course,

this is not al­ways fea­si­ble. Not every­one is an ex­pert in med­i­cine

or law, so we of­ten rely on trusted in­sti­tu­tions and pub­lic health

iOS 27 is adding a 'Create a Pass' button to Apple Wallet

walletwallet.alen.ro

Bloomberg’s Mark Gurman re­ported on Monday that iOS 27 will add a Create a Pass” fea­ture to the Wallet app. Tap the +” but­ton you al­ready use to add credit cards or pass emails, and Wallet will of­fer some­thing it has never of­fered be­fore on iPhone: a path to build your own pass.

You can scan a QR code on a pa­per ticket or mem­ber­ship card with the cam­era, or build a pass from scratch in a lay­out ed­i­tor. The whole flow runs with­out an Apple Developer ac­count, a Pass Type ID, or any cer­tifi­cate sign­ing.

iOS 27 is ex­pected to pre­view at WWDC on June 8, with a pub­lic re­lease in September.

How the new flow works

Reporting from Bloomberg, MacRumors, 9to5Mac, and AppleInsider lines up on the same work­flow. Inside the Wallet app, the ex­ist­ing +” but­ton gains a new op­tion for cre­at­ing a pass. From there you choose be­tween two start­ing points:

Scan a QR code from a pa­per card, ticket, or screen

Build a cus­tom pass from scratch with no scan needed

Once you are in the ed­i­tor, Wallet ex­poses ad­justable styles, im­ages, col­ors, and text fields. The re­ports de­scribe a fairly con­ven­tional tem­plate-dri­ven lay­out, closer in spirit to what Pass2U, WalletWallet, and other third-party gen­er­a­tors have of­fered for years than to Apple’s de­vel­oper-only PassKit pipeline.

Three tem­plates, color-coded

Apple is test­ing three start­ing tem­plates, each tied to a de­fault color:

Standard (orange): the de­fault for any gen­eral-pur­pose pass.

Membership (blue): geared to­ward gyms, clubs, li­braries, and other re­cur­ring-ac­cess cards.

Event (purple): meant for tick­ets to games, movies, and one-off oc­ca­sions.

The color choice is not just dec­o­ra­tion. Wallet cur­rently sorts passes vi­su­ally in the stack, and the tem­plate hue is what sets each card apart at a glance, so a quick look is enough to pick out the or­ange punch card from the pur­ple ticket with­out read­ing a word.

Why now: 14 years of PassKit drought

Apple shipped PassKit along­side iOS 6 back in 2012. The pitch was clean: busi­nesses build .pkpass files, cus­tomers tap to add, every­one wins. In prac­tice, the con­sis­tent adopters ended up be­ing air­lines, big-box re­tail­ers, tick­et­ing plat­forms, and a hand­ful of na­tional chains. Most gyms, cafes, li­braries, rec cen­ters, and small loy­alty pro­grams never built one, be­cause the path re­quires an Apple Developer ac­count, sign­ing cer­tifi­cates, and enough en­gi­neer­ing work that just print a pa­per card” al­most al­ways won the bud­get con­ver­sa­tion.

The Next Web’s fram­ing is blunt: Apple is no longer wait­ing on de­vel­op­ers. With Create a Pass, the sup­ply-side prob­lem is fi­nally be­ing solved from the de­mand side. If the busi­ness will not build a Wallet pass, the user does it them­selves from the QR code that busi­ness al­ready printed.

That is a mean­ing­ful shift in pos­ture. For more than a decade, Wallet has been a di­rec­tory of what brands chose to ship. In iOS 27 it be­comes a di­rec­tory of what peo­ple choose to keep.

What this means for WalletWallet

We will be hon­est. WalletWallet ex­ists be­cause of this ex­act gap. You take a bar­code from any loy­alty card, paste it into our web app, pick a color, and a free Apple Wallet pass lands on your phone in about a minute, all from the browser with­out an ac­count or any de­vel­oper setup. Once Create a Pass ships in September, a chunk of that work­flow moves na­tively into the iPhone Wallet app.

That is good for users. We started this pro­ject to make Wallet friend­lier for the cafes-and-gyms long tail, and Apple agree­ing with us at OS-level scope is a healthy out­come. The cat­e­gory needed it.

A few places where we still help, even af­ter iOS 27 ships:

Google Wallet. Create a Pass is iPhone-only. Roughly half of the wal­let-us­ing world is on Android, and our gen­er­a­tor builds Google Wallet passes from the same form.

Web, no OS up­grade. iOS 27 needs a com­pat­i­ble iPhone and the September up­date. WalletWallet runs in any browser to­day. iOS 14, iPad, Mac, a friend’s lap­top, all fine.

Tag passes with real in­te­gra­tions. Our Bandcamp, SoundCloud, and Spotify pass builders pull artist art and links au­to­mat­i­cally into a tag pass. That is a dif­fer­ent shape from the generic tem­plated pass Apple is show­ing.

Sharing. A web-gen­er­ated .pkpass is just a file. You can email it, post it, hand it to a friend on Android via QR. The Wallet-native flow is more locked to the de­vice that built it.

We ex­pect to lose vol­ume on the sim­plest one-bar­code-to-Wal­let case once Create a Pass goes live. That is fine. The rea­son WalletWallet started was that Apple’s bar for a Wallet pass was too high for nor­mal peo­ple. If iOS 27 low­ers that bar, the world we wanted is closer.

What we still do not know

The cur­rent re­ports cover the UI, the tem­plates, and the high-level work­flow. They are silent on a lot of de­tails that mat­ter:

Whether iCloud will sync user-cre­ated passes across iPhone, iPad, and Mac

Whether passes can be ex­ported as .pkpass files to share with non-iPhone users

Whether Wallet sup­ports Code 128, PDF417, and Aztec bar­codes, or only QR

Whether mer­chants can claim, co-sign, or up­date user-cre­ated passes af­ter the fact

Whether passes have lock-screen be­hav­ior tied to time and lo­ca­tion, the way de­vel­oper-is­sued passes do to­day

We will know more once Apple pre­views iOS 27 at WWDC on June 8, and again when the first de­vel­oper be­tas land. We will up­date this post when there is some­thing con­crete to add.

Quick re­cap

iOS 27 is adding a Create a Pass but­ton to the Wallet app, with a QR-scan or build-from-scratch flow and three color-coded tem­plates: Standard (orange), Membership (blue), and Event (purple). Bloomberg broke the story on May 4, and a pub­lic re­lease is ex­pected in September 2026. It will be the first time iPhone users do not need a third-party tool to put a bar­code into Wallet, and for us that is a sign the cat­e­gory is ma­tur­ing the right way.

Sources

Should I Run Plain Docker Compose in Production in 2026?

distr.sh

I am Philip—an en­gi­neer work­ing at Distr, which helps soft­ware and AI com­pa­nies dis­trib­ute their ap­pli­ca­tions to self-man­aged en­vi­ron­ments.

Our Open Source Software Distribution plat­form is avail­able on GitHub (github.com/​distr-sh/​distr) and or­ches­trates both Docker Compose and Docker Swarm de­ploy­ments on cus­tomer hosts every day.

Most of the pro­duc­tion in­ci­dents I have seen on Docker Compose hosts come from the same hand­ful of quirks: an old con­tainer that should have been re­moved, a disk that filled up overnight, a health check that de­tected a prob­lem and then did noth­ing about it, a :latest tag that pointed some­where new, or a socket mount no­body thought twice about. None of these are bugs in Docker. They are de­lib­er­ate trade-offs in a tool that started as in­ter­nal tool­ing at dot­Cloud, a PaaS com­pany that wrapped LXC to fix it works on my ma­chine,” and is now run­ning the back end of a lot of real busi­nesses. This post col­lects the re­cur­ring ones, with the com­mands and the op­er­a­tional an­swer for each.

Short an­swer: yes—plain Docker Compose can still run real pro­duc­tion work­loads in 2026, but only if you han­dle the op­er­a­tional gaps it leaves your­self.

Where Plain Docker Compose Fits in Production

Before the list of quirks, a quick word on the au­di­ence. Docker Compose is a de­clar­a­tive way to wire up a multi-con­tainer ap­pli­ca­tion: one YAML file de­scribes the ser­vices, the net­works be­tween them, the vol­umes they share, the en­vi­ron­ment they need, and—through the pat­terns for over­writ­ing or patch­ing ser­vice con­fig­u­ra­tion—the on-disk con­fig­u­ra­tion each ap­pli­ca­tion ex­pects. docker com­pose up rec­on­ciles the host to that file. The sweet spot in pro­duc­tion is the sin­gle-node de­ploy­ment built around ex­actly that—a ven­dor push­ing a multi-con­tainer ap­pli­ca­tion into a cus­tomer en­vi­ron­ment, an in­ter­nal team run­ning a long-tail ser­vice that does not jus­tify a Kubernetes clus­ter, an edge box in a re­tail lo­ca­tion. The foot­print is small, the op­er­a­tional over­head is low, and a com­pe­tent op­er­a­tor can rea­son about the whole stack from one docker-com­pose.yaml. There is no con­trol plane be­hind Compose it­self—no sched­uler watch­ing the host, no rec­on­ciler reap­ply­ing state, no op­er­a­tor push­ing up­dates from some­where else. docker com­pose up runs once and ex­its.

That ar­chi­tec­tural sim­plic­ity is ex­actly why the quirks bite. Compose as­sumes you—or who­ever runs the host—will do the op­er­a­tional work noth­ing else is do­ing, and if you ship Compose files to cus­tomers the safe as­sump­tion is that the cus­tomer will not. The rest of this post is about clos­ing the gap be­tween what Compose does and what a pro­duc­tion host ac­tu­ally needs, ei­ther by hand or with an agent that does it for you. If you have al­ready con­cluded that the gap is too wide and want to com­pare with the next step up, read our Docker Compose vs Kubernetes break­down.

Docker Compose Orphan Containers and –remove-orphans

Remove a ser­vice from docker-com­pose.yaml, run docker com­pose up -d, and the con­tainer you re­moved keeps run­ning. It is de­tached from the pro­ject but still bound to the same net­works and ports. docker com­pose ps will not show it, be­cause Compose only lists what is in the cur­rent file. docker ps –filter la­bel=com.docker.com­pose.pro­ject=<name> will, be­cause Docker still has the la­bel on the con­tainer. This is how you dis­cover, six months in, that an old worker ser­vice has been qui­etly con­sum­ing RAM since the last refac­tor.

The fix is one flag:

docker com­pose up -d –remove-orphansdocker com­pose down –remove-orphans

docker com­pose up -d –remove-orphans

docker com­pose down –remove-orphans

The flag tells Compose: any con­tainer that was once part of this pro­ject but is no longer in the file should be re­moved. Networks Compose cre­ated for the pro­ject are rec­on­ciled the same way on each up, so or­phan net­works go away too. Volumes are the ex­cep­tion—Com­pose pre­serves named vol­umes by de­fault to pro­tect data, and there is no per-ser­vice flag to drop the ones a re­moved ser­vice used. To re­claim that space you have to do it man­u­ally: list can­di­dates with docker vol­ume ls –filter dan­gling=true and docker vol­ume rm by name, or use docker com­pose down -v if you in­tend to wipe the pro­jec­t’s vol­umes whole­sale. To au­dit be­fore delet­ing, list every­thing Docker still as­so­ci­ates with the pro­ject name:

docker ps -a –filter la­bel=com.docker.com­pose.pro­ject=<name>

docker ps -a –filter la­bel=com.docker.com­pose.pro­ject=<name>

Distr’s Docker agent passes RemoveOrphans: true on every Compose Up call, so cus­tomer hosts never ac­cu­mu­late or­phans across de­ploy­ment up­dates. That sin­gle flag has elim­i­nated a re­cur­ring class of the old ver­sion is still an­swer­ing on port 8080” sup­port tick­ets.

Pruning Docker Images and Capping Container Logs

Every docker com­pose pull keeps the pre­vi­ous im­age on disk. Every con­tainer with the de­fault json-file log dri­ver writes un­bounded JSON to /var/lib/docker/containers/<id>/<id>-json.log. On a busy host this is one of the most com­mon rea­sons for an out­age: the disk fills and Docker stops be­ing able to write any­thing—logs, meta­data, im­age lay­ers—at which point con­tain­ers start fail­ing in con­fus­ing ways.

The first thing to learn is the au­dit com­mand:

docker sys­tem df­docker sys­tem df -v

docker sys­tem df

docker sys­tem df -v

-v breaks the to­tals down per im­age, con­tainer, vol­ume, and build cache, which is usu­ally enough to spot the of­fender. From there, the tar­geted prune com­mands:

docker im­age prune -a –filter until=168h” -f # delete un­used im­ages older than 7 days­docker con­tainer prune -f # re­move stopped con­tain­ers­docker builder prune -f # drop the BuildKit cache

docker im­age prune -a –filter until=168h” -f # delete un­used im­ages older than 7 days

docker con­tainer prune -f # re­move stopped con­tain­ers

docker builder prune -f # drop the BuildKit cache

docker vol­ume prune -f ex­ists too, and it is gen­uinely use­ful, but read the next aside be­fore you run it.

The other half of the disk story is logs. Cap them at the dae­mon level, once, in /etc/docker/daemon.json:

{ log-driver”: json-file”, log-opts”: { max-size”: 10m”, max-file”: 3” }}

{

log-driver”: json-file”,

log-opts”: {

max-size”: 10m”,

max-file”: 3″

}

}

After sys­tem­ctl restart docker, every new con­tainer will ro­tate its logs at 10 MB and keep at most three ro­tated files—30 MB ceil­ing per con­tainer, in­stead of until the disk is gone.” Existing con­tain­ers need to be recre­ated to pick up the new de­faults.

This is one of the top­ics worth get­ting right be­fore you ship.

In Distr’s Docker agent the cleanup is built in: each de­ploy­ment tar­get has an opt-out con­tainer im­age cleanup set­ting that re­moves the pre­vi­ous ver­sion’s im­ages au­to­mat­i­cally af­ter a suc­cess­ful up­date, with re­tries on fail­ure. It only fires on suc­cess, so the pre­vi­ous im­age stays on disk if some­thing goes wrong and you need to roll back.

Docker Health Checks Don’t Restart Unhealthy Containers

This is the one that sur­prises peo­ple the most. You add a HEALTHCHECK to your Dockerfile or a healthcheck: block to the ser­vice in Compose, you watch the con­tainer go from healthy to un­healthy, and then… noth­ing hap­pens. The Docker Engine re­ports the sta­tus. It does not act on it. restart: un­less-stopped is trig­gered by the con­tainer ex­it­ing, not by it be­ing marked un­healthy.

You can con­firm what Docker ac­tu­ally thinks:

docker in­spect –format=‘{{json .State.Health}}’ <container> | jq

docker in­spect –format=‘{{json .State.Health}}’ <container> | jq

You will see the sta­tus, the streak of fail­ures, and the last few probe out­puts—use­ful in­for­ma­tion that is silently ig­nored by the en­gine.

There are three an­swers to this:

Run an au­to­heal side­car. The com­mu­nity stan­dard is will­far­rell/​docker-au­to­heal: a tiny con­tainer that mounts the Docker socket, watches for un­healthy events, and restarts the of­fend­ing con­tainer. You opt con­tain­ers in by la­bel­ing them au­to­heal=true (or set AUTOHEAL_CONTAINER_LABEL=all to mon­i­tor every­thing).

Run on Docker Swarm. Swarm restarts un­healthy tasks by de­fault. If you are al­ready con­sid­er­ing Swarm, this is one of the bet­ter rea­sons.

Use Distr. Every Distr Docker agent de­ploys an adapted au­to­heal ser­vice along­side it. The Enable au­to­heal for all con­tain­ers” tog­gle is on by de­fault at de­ploy­ment-tar­get cre­ation, so cus­tomer-side restarts of un­healthy con­tain­ers hap­pen with­out any­one con­fig­ur­ing it.

Whichever path you pick, the take­away is the same: a HEALTHCHECK with­out some­thing act­ing on it is a sta­tus light, not a self-heal­ing sys­tem.

Pinning Docker Images by Digest Instead of :latest

Docker tags are mu­ta­ble ref­er­ences. myapp:1.4 to­day is what­ever the reg­istry cur­rently has un­der that tag; to­mor­row it can point at a dif­fer­ent layer set af­ter a re-push. :latest is the worst of­fender be­cause every­one treats it as a syn­onym for stable” when in prac­tice it of­ten means whatever was pushed most re­cently.” It is also the silent de­fault: an un­qual­i­fied im­age: ng­inx in a Compose file is treated as im­age: ng­inx:lat­est, so even Compose files that never type the word land on it by ac­ci­dent. The re­sult, in pro­duc­tion, is that two hosts pulling the same” tag five min­utes apart can end up run­ning dif­fer­ent code.

The fix is to pin by con­tent-ad­dress­able di­gest. Every im­age has one, and Docker ac­cepts it any­where a tag would go.

To find the di­gest for an im­age you al­ready pulled:

docker im­age in­spect –format=‘{{index .RepoDigests 0}}’ myapp:1.4# myapp@sha256:9b7c…

docker im­age in­spect –format=‘{{index .RepoDigests 0}}’ myapp:1.4

# myapp@sha256:9b7c…

Or, with­out pulling, from the lo­cal Docker in­stal­la­tion against the re­mote reg­istry:

docker buildx im­age­tools in­spect myapp:1.4

docker buildx im­age­tools in­spect myapp:1.4

In your Compose file, re­place the tag with the di­gest:

ser­vices: app: im­age: myapp@sha256:9b7c0a3e1f…

ser­vices:

app:

im­age: myapp@sha256:9b7c0a3e1f…

A pull against a di­gest fails fast if the reg­istry no longer has those bytes, which is ex­actly what you want—silent drift be­comes a loud er­ror. The same im­age ref­er­ence works in docker stack de­ploy, in docker run, and in Kubernetes man­i­fests.

For the broader pic­ture of what your cus­tomers can ex­tract from a pub­lished im­age (and why im­age hy­giene mat­ters be­yond re­pro­ducibil­ity), check out our guide on pro­tect­ing source code and IP in Docker and Kubernetes de­ploy­ments. And if you’re still pick­ing a reg­istry, our con­tainer reg­istry com­par­i­son walks through the trade-offs.

Why Mounting /var/run/docker.sock Is a Security Risk

A con­tainer with /var/run/docker.sock mounted can call the Docker API, and the Docker API can launch a priv­i­leged con­tainer that mounts the host’s root filesys­tem. In other words: any con­tainer with the socket has ef­fec­tively root priv­i­leges on the host. This is not a Docker bug; it is the threat model of the socket. It de­serves a mo­ment of at­ten­tion be­cause the line that grants this ac­cess is one bind mount in a Compose file and is easy to add with­out think­ing about it.

Practical hy­giene:

Inventory the con­tain­ers that mount the socket. Agents, CI run­ners, mon­i­tor­ing side­cars, con­tainer man­age­ment UIs—keep the list short and in­ten­tional.

Run root­less Docker where pos­si­ble. dock­erd-root­less-se­tup­tool.sh in­stall sets up a Docker dae­mon that runs as a reg­u­lar user. The blast ra­dius of a com­pro­mised socket-mount­ing con­tainer shrinks from full host” to this user ac­count.”

Consider socket-proxy. Projects like Tecnativa’s docker-socket-proxy ex­pose a fil­tered sub­set of the API to the con­tainer that needs it (e.g. read-only con­tain­ers and events for mon­i­tor­ing) in­stead of the full socket.

Keep socket-mount­ing im­ages min­i­mal. Smaller sur­face, fewer li­braries, fewer ways in.

The Distr Docker agent does mount the socket—it has to, in or­der to or­ches­trate Compose and Swarm on the host. We doc­u­ment that bound­ary openly in the Docker agent docs so cus­tomer se­cu­rity teams can re­view it be­fore in­stal­la­tion. The agent au­then­ti­cates to the Hub with a JWT, and the in­stall se­cret is shown once and never stored.

Updating Docker Compose Deployments Across Customer Hosts

docker com­pose pull && docker com­pose up -d is a fine com­mand if you are SSH’d into the host. At cus­tomer scale—dozens of self-man­aged en­vi­ron­ments be­hind fire­walls, each with its own change-con­trol process—that man­ual process does­n’t scale. Docker has no built-in mech­a­nism to push a new man­i­fest to a run­ning host from some­where else. Docker Hub web­hooks can trig­ger a CI re­build when an im­age is pushed, but they do not reach into a cus­tomer’s net­work and tell their docker com­pose to pull.

The usual workarounds and what they cost:

Watchtower: Polls the reg­istry on a sched­ule, pulls new im­ages, recre­ates con­tain­ers. Easy to set up, hard to con­trol. No staged roll­out, no roll­back path, lim­ited vis­i­bil­ity from your side—you find out a cus­tomer up­dated when they file a ticket.

Bastion + SSH + Ansible/scripts: Works for ten cus­tomers. Falls apart at fifty, es­pe­cially when three of them are air-gapped and four run their own change-con­trol ca­dence. Every op­er­a­tor has to live with shared keys and a main­te­nance win­dow cal­en­dar.

A pull-based agent. This is the shape Distr lands on. The agent runs on the cus­tomer host, polls a known end­point every 5 sec­onds, and rec­on­ciles the lo­cal Compose state against what the Hub says it should be. The agent re­ports sta­tus back, so you can see in your dash­board which cus­tomers are on which ver­sion. When the agent it­self needs to up­date, it spawns a sep­a­rate con­tainer to per­form the swap so it is not try­ing to re­place it­self while run­ning.

The pat­tern is not unique—Ku­ber­netes op­er­a­tors and GitOps tools do the same thing—but Compose users rou­tinely re-in­vent it badly. If you find your­self build­ing one, at least give it roll­back, sta­tus re­port­ing, and a way to pin ver­sions, or you will end up with a fleet that drifts in ways you can­not see.

The other thing worth not­ing: re­cur­ring sched­uled jobs along­side the ap­pli­ca­tion have no na­tive Compose an­swer ei­ther. If your stack in­cludes any­thing like a nightly cleanup, a pe­ri­odic re­port, or a heart­beat-style task, the in-app sched­uler is one op­tion, but you even­tu­ally run into the cases it can’t cover (cross-service jobs, jobs that should out­live a sin­gle con­tainer). For the three pat­terns I have seen sur­vive cus­tomer de­ploy­ments, check out our guide on Compose cron jobs.

Outgrowing Docker Compose: Kubernetes vs Swarm

If a sin­gle-node Compose de­ploy­ment out­grows it­self, the re­al­is­tic next step for most teams is Kubernetes. The ecosys­tem is large, the op­er­a­tional pat­terns are well doc­u­mented, and the tal­ent pool to hire against ac­tu­ally ex­ists. For the side-by-side, read our Docker Compose vs Kubernetes com­par­i­son.

Docker Swarm is the other op­tion—it reuses the Compose YAML for­mat, ships in the box, and solves a few of the quirks above di­rectly (it restarts un­healthy tasks, rolls out up­dates with up­date_­con­fig, and treats se­crets and con­figs as first-class ob­jects). It is a real fit for some sin­gle-clus­ter, low-cer­e­mony de­ploy­ments.

The Distr agent sup­ports both—the Hub records whether a de­ploy­ment is Compose or Swarm, and the agent runs the match­ing docker com­pose up or docker stack de­ploy. If you do choose Swarm, read our rout­ing and Traefik guide for Docker Swarm and the prod­uct walk­through for dis­trib­ut­ing ap­pli­ca­tions to Swarm for the de­tails.

So, should you run plain Docker Compose in pro­duc­tion?

Yes—plain Docker Compose still runs a lot of real pro­duc­tion work­loads in 2026, as long as you ac­cept that plain Compose” is short­hand for Compose plus the op­er­a­tor prac­tices it does­n’t en­force.” None of the quirks above are se­cret. They are all in Docker’s doc­u­men­ta­tion, in GitHub is­sues that have been open for years, and in the war sto­ries of every team that has run Compose in anger. What makes them dan­ger­ous is not the quirks them­selves but the or­der in which you dis­cover them: usu­ally at 2 a.m., one at a time.

TL;DR:

Pass –remove-orphans on every com­pose up and com­pose down.

Cap con­tainer logs in dae­mon.json and prune im­ages on a sched­ule. Be care­ful with docker vol­ume prune.

Health checks do not heal. Run an au­to­heal side­car, run on Swarm, or use an agent that bun­dles one.

Pin by @sha256:… di­gest. Treat tags as ref­er­ences, not con­tracts.

The socket is root. Inventory the con­tain­ers that mount it; pre­fer root­less Docker.

Updates need an agent of some kind. Watchtower is fine for one host; not for a fleet.

When Compose stops be­ing enough, Kubernetes is usu­ally the right next step. Swarm is a nar­rower fit and worth pick­ing eyes-open.

If you ship soft­ware to self-man­aged cus­tomers and you would rather not re­build this list your­self, the Distr Docker agent han­dles all of the above on the cus­tomer side. The Docker agent doc­u­men­ta­tion walks through the in­stall, the socket model, the au­to­heal and im­age-cleanup de­faults, and how the agent self-up­dates. The repos­i­tory is on GitHub.

Computer use is 45x More Expensive Than Structured APIs

reflex.dev

We ran a bench­mark com­par­ing two ways of let­ting an AI agent op­er­ate the same ad­min panel, with the goal of putting a price tag on vi­sion agents (browser-use, com­puter-use).

Here is what we mea­sured, what we had to change to make the vi­sion agent work at all, and what changes when gen­er­at­ing an API sur­face stops be­ing a sep­a­rate en­gi­neer­ing pro­ject.

Why vi­sion agents?

Vision agents are the de­fault for let­ting AI agents op­er­ate web apps that don’t ex­pose APIs. The al­ter­na­tive, writ­ing an MCP or REST sur­face per app, is its own en­gi­neer­ing pro­ject across the 20+ in­ter­nal tools most teams have. Most teams de­fault to vi­sion agents not be­cause they are bet­ter, but be­cause the al­ter­na­tive is too ex­pen­sive to build. The cost of the vi­sion ap­proach is treated as a fixed price.

We wanted to mea­sure the price.

The setup

The test app is an ad­min panel for man­ag­ing cus­tomers, or­ders, and re­views, mod­eled on the re­act-ad­min Posters Galore demo. Two agents tar­get the same run­ning app: one dri­ves the UI via screen­shots and clicks, the other calls the ap­p’s HTTP end­points di­rectly. Same Claude Sonnet, same pinned dataset, same task. The in­ter­face is the only vari­able.

The task: find the cus­tomer named Smith” with the most or­ders, lo­cate their most re­cent pend­ing or­der, ac­cept all of their pend­ing re­views, and mark the or­der as de­liv­ered. This touches three re­sources, re­quires fil­ter­ing, pag­i­na­tion, cross-en­tity lookups, and both reads and writes. It is the shape of work a typ­i­cal in­ter­nal tool sees daily.

Path A: Vision agent. Claude Sonnet dri­ving the UI via browser-use 0.12. Vision mode, tak­ing screen­shots and ex­e­cut­ing clicks.

Path B: API agent. Claude Sonnet with tool-use, call­ing the han­dlers the UI calls. Each tool maps to one or more event han­dlers on the ap­p’s State, the same func­tions a but­ton click would trig­ger. The agent gets the struc­tured re­sponse back in­stead of a ren­dered page.

The vi­sion agent could­n’t com­plete the task

We started by giv­ing both agents the same six-sen­tence task above and see­ing what hap­pened.

The API agent com­pleted it in 8 calls. It listed the cus­tomer’s re­views fil­tered by pend­ing sta­tus, ac­cepted each one, and marked the or­der as de­liv­ered. Both agents are call­ing into the same ap­pli­ca­tion logic; the API agent just reads the struc­tured re­sponse di­rectly in­stead of look­ing at a ren­dered page.

The vi­sion agent, on the same prompt, found one of four pend­ing re­views, ac­cepted it, and moved on. It never pag­i­nated. The re­main­ing three re­views were be­low the vis­i­ble fold of the re­views page and the agent had no sig­nal to scroll for them.

This is not a model prob­lem. The vi­sion agent was rea­son­ing about a ren­dered page and had no sig­nal that the page was­n’t show­ing every­thing. The API agent calls the same han­dler the UI calls, but the re­sponse in­cludes the full re­sult set the han­dler re­turned, not just the rows cur­rently ren­dered. The agent reads page 1 of 4 with 50 re­sults per page” di­rectly in­stead of hav­ing to in­ter­pret pag­i­na­tion con­trols from pix­els.

With a 14-step walk­through, it suc­ceeded

To make the com­par­i­son ap­ples-to-ap­ples, we rewrote the vi­sion prompt as an ex­plicit UI walk­through, nam­ing the side­bar items, tabs, and form fields the agent should in­ter­act with at each step. Fourteen num­bered in­struc­tions cov­er­ing the nav­i­ga­tion the agent had failed to fig­ure out on its own.

With the walk­through, the vi­sion agent com­pleted the task. It also ran for four­teen min­utes and con­sumed about half a mil­lion in­put to­kens.

The walk­through is it­self a find­ing. Each num­bered in­struc­tion is en­gi­neer­ing work that does­n’t show up in to­ken counts but rep­re­sents real cost. Anyone de­ploy­ing a vi­sion agent against an in­ter­nal tool is ei­ther writ­ing prompts at this level of speci­ficity or ac­cept­ing that the agent will silently miss work.

How we ran it

We ran the API path five times and the vi­sion path three times. The vi­sion path was capped at three tri­als be­cause each run takes 14 – 22 min­utes and con­sumes 400 – 750k to­kens.

Variance was the most sur­pris­ing part of the vi­sion re­sults. Across three tri­als the wall-clock time spanned 749s to 1257s, and in­put to­kens spanned 407k to 751k. The agent took 43 cy­cles in the short­est run and 68 in the longest. The screen­shot-rea­son-click loop has enough non-de­ter­min­ism that a sin­gle run is not a rep­re­sen­ta­tive cost es­ti­mate.

The API path had no such vari­ance. Sonnet hit iden­ti­cal 8 tool calls on every trial, with in­put to­ken counts vary­ing by ±27 across all five runs. The agent calls the same han­dlers in the same or­der be­cause the struc­tured re­sponses give it no rea­son to de­vi­ate.

The full re­sults

Numbers are mean ± sam­ple stan­dard de­vi­a­tion (n−1), with n=5 per API path and n=3 for the vi­sion path. Full run de­tails are avail­able in the repo.

Numbers are mean ± sam­ple stan­dard de­vi­a­tion (n−1), with n=5 per API path and n=3 for the vi­sion path. Full run de­tails are avail­able in the repo.

Haiku could not com­plete the vi­sion path. The fail­ure was spe­cific to browser-use 0.12′s struc­tured-out­put schema, which Haiku could not re­li­ably pro­duce in ei­ther vi­sion or text-only mode. On the API path, Haiku fin­ished in un­der 8 sec­onds for un­der 10k in­put to­kens, which is the cheap­est con­fig­u­ra­tion we tested.

The struc­tural gap

The cost dif­fer­ence fol­lows di­rectly from the ar­chi­tec­ture. An agent that must see in or­der to act will al­ways pay for the see­ing, re­gard­less of how good the model gets. Better vi­sion mod­els re­duce er­ror rates per screen­shot, but they do not re­duce the num­ber of screen­shots re­quired to reach the rel­e­vant data. Each ren­der is a screen­shot is thou­sands of in­put to­kens.

Both agents in this bench­mark walk through the same ap­pli­ca­tion logic. They both fil­ter, pag­i­nate, and up­date the same way the UI does. The dif­fer­ence is what they read at each step. The vi­sion agent reads pix­els and has to ren­der every in­ter­me­di­ate state to in­ter­pret it. The API agent reads the struc­tured re­sponse from the same han­dlers, which al­ready con­tains the data the UI was go­ing to dis­play.

Better mod­els will nar­row the cost per step. They will not nar­row the step count, be­cause the step count is set by the in­ter­face.

How we jus­tify the API en­gi­neer­ing cost

The bench­mark was made cheap to run by Reflex 0.9, which in­cludes a plu­gin that auto-gen­er­ates HTTP end­points from a Reflex ap­pli­ca­tion’s event han­dlers. None of the struc­tural ar­gu­ment de­pends on Reflex specif­i­cally, but it is what made run­ning the API path pos­si­ble with­out writ­ing a sec­ond code­base.

The in­ter­est­ing ques­tion is what be­comes pos­si­ble when the en­gi­neer­ing cost of an API sur­face drops to zero. Vision agents re­main the right tool for ap­pli­ca­tions you do not con­trol: third-party SaaS prod­ucts, legacy sys­tems, any­thing you can­not mod­ify. For in­ter­nal tools you build your­self, the math now points the other way.

Notes

Vision re­sults are spe­cific to browser-use 0.12 in vi­sion mode, and other vi­sion agents may be­have dif­fer­ently. The Path B run­ner shapes the auto-gen­er­ated end­points into a small REST tool sur­face of about thirty lines, which the agent sees as list_­cus­tomers, up­date_or­der, and sim­i­lar. The dataset is pinned and small (900 cus­tomers, 600 or­ders, 324 re­views), so be­hav­ior on pro­duc­tion-scale data is not mea­sured here. The vi­sion agent runs through LangChain’s ChatAnthropic, and the API agent runs through the Anthropic SDK di­rectly. Reported to­ken counts are un­cached in­put to­kens.

Reproduce it

The repo in­cludes seed data gen­er­a­tion, the patched re­act-ad­min demo, both agent scripts, and raw re­sults.

Meta, Zuckerberg Sued Over Alleged Copyright Infringement by Book Publishers and Scott Turow

variety.com

In a new le­gal bat­tle in the AI space, Meta and CEO Mark Zuckerberg have been sued by five pub­lish­ers and au­thor Scott Turow, who al­lege the tech com­pany il­le­gally copied mil­lions of books, ar­ti­cles and other works to train Meta’s ar­ti­fi­cial-in­tel­li­gence sys­tems.

In their ef­fort to win the AI arms race’ and build a func­tional gen­er­a­tive AI model, Defendants Meta and Zuckerberg fol­lowed their well-known motto: move fast and break things,’” the plain­tiffs say in their law­suit. They first il­le­gally tor­rented mil­lions of copy­righted books and jour­nal ar­ti­cles from no­to­ri­ous pi­rate sites and down­loaded unau­tho­rized web scrapes of vir­tu­ally the en­tire in­ter­net. They then copied those stolen fruits many times over to train Meta’s multi­bil­lion-dol­lar gen­er­a­tive AI sys­tem called Llama. In do­ing so, Defendants en­gaged in one of the most mas­sive in­fringe­ments of copy­righted ma­te­ri­als in his­tory.”

The suit was filed Tuesday (May 5) in the U.S. District Court for the Southern District of New York by five pub­lish­ers (Hachette, Macmillan, McGraw Hill, Elsevier and Cengage) and Turow in­di­vid­u­ally. The pro­posed class-ac­tion suit seeks un­spe­cific mon­e­tary dam­ages for the al­leged copy­right in­fringe­ment. A copy of the law­suit is avail­able at this link.

Asked for com­ment, a Meta spokesper­son said, AI is pow­er­ing trans­for­ma­tive in­no­va­tions, pro­duc­tiv­ity and cre­ativ­ity for in­di­vid­u­als and com­pa­nies, and courts have rightly found that train­ing AI on copy­righted ma­te­r­ial can qual­ify as fair use. We will fight this law­suit ag­gres­sively.”

Authors have sued AI com­pa­nies for copy­right in­fringe­ment be­fore — and lost.

For ex­am­ple, in June 2025, a fed­eral judge re­jected a claim brought by 13 au­thors, in­clud­ing Sarah Silverman and Junot Díaz, that Meta vi­o­lated their copy­rights by train­ing its AI model on their books. Judge Vincent Chhabria ruled that Meta had en­gaged in fair use” when it used a data set of nearly 200,000 books to train its Llama lan­guage model for gen­er­a­tive AI.

But the lat­est law­suit al­leges that Meta and Zuckerberg de­lib­er­ately cir­cum­vented copy­right-pro­tec­tion mech­a­nisms — and had con­sid­ered pay­ing to li­cense the works be­fore aban­don­ing that strat­egy at Zuckerberg’s per­sonal in­struc­tion.” The suit es­sen­tially ar­gues that the con­duct de­scribed falls out­side pro­tec­tions af­forded by fair-use pro­vi­sions of the U.S. copy­right code.

Meta — at Zuckerberg’s di­rec­tion — copied mil­lions of books, jour­nal ar­ti­cles, and other writ­ten works with­out au­tho­riza­tion, in­clud­ing those owned or con­trolled by Plaintiffs and the Class, and then made ad­di­tional copies of those works to train Llama,” the suit says. Zuckerberg him­self per­son­ally au­tho­rized and ac­tively en­cour­aged the in­fringe­ment. Meta also stripped [copyright man­age­ment in­for­ma­tion] from the copy­righted works it stole. It did this to con­ceal its train­ing sources and fa­cil­i­tate their unau­tho­rized use.”

According to the law­suit, af­ter the re­lease of Llama 1, Meta briefly con­sid­ered en­ter­ing into li­cens­ing deals with ma­jor pub­lish­ers. Meta dis­cussed in­creas­ing the com­pa­ny’s dataset li­cens­ing” bud­get to as much as $200 mil­lion from January to April 2023, per the com­plaint.

But then in early April 2023, Meta abruptly stopped its li­cens­ing strat­egy,” ac­cord­ing to the law­suit. The ques­tion of whether to li­cense or pi­rate [copyrighted ma­te­r­ial] mov­ing for­ward was escalated’ to Zuckerberg. After this es­ca­la­tion to Zuckerberg, Meta’s busi­ness de­vel­op­ment team re­ceived ver­bal in­struc­tions to stop li­cens­ing ef­forts. One Meta em­ployee pre­sciently de­scribed the ra­tio­nale: if we li­cense once [sic] sin­gle book, we won’t be able to lean into the fair use strat­egy.’”

According to the law­suit, Meta and Zuckerberg are well aware of the mar­ket for li­cens­ing AI train­ing ma­te­ri­als.” Meta signed four li­censes in 2022 with African-language book pub­lish­ers for a lim­ited train­ing set, and it sub­se­quently reached li­cens­ing agree­ments with ma­jor news pub­lish­ers in­clud­ing Fox News, CNN and USA Today,” the suit says.

On Dec. 13, 2023, Meta em­ploy­ees in­ter­nally cir­cu­lated a memo con­cern­ing the le­gal risks of us­ing LibGen, a repos­i­tory of copy­righted ma­te­r­ial that the Meta memo de­scribed as a dataset we know to be pi­rated” and added that we would not dis­close use of Libgen datasets used to train,” per the suit. Ultimately, how­ever, those con­cerns went un­heeded. Zuckerberg and other Meta ex­ec­u­tives au­tho­rized and di­rected the tor­rent­ing of over 267 TB of pi­rated ma­te­r­ial — equiv­a­lent to hun­dreds of mil­lions of pub­li­ca­tions and many times the size of the en­tire print col­lec­tion of the Library of Congress,” ac­cord­ing to the law­suit.

As a re­sult of the al­leged in­fringe­ment, Meta’s AI sys­tem readily gen­er­ates, at speed and scale, sub­sti­tutes for Plaintiffs’ and the Class’s works on which it was trained,” the law­suit states. Those sub­sti­tutes take mul­ti­ple forms, in­clud­ing ver­ba­tim and near-ver­ba­tim copies, re­place­ment chap­ters of aca­d­e­mic text­books, sum­maries and al­ter­na­tive ver­sions of fa­mous nov­els and jour­nal ar­ti­cles, in­fe­rior knock­offs that copy cre­ative el­e­ments of orig­i­nal works, and de­riv­a­tive works ex­clu­sively re­served to rights hold­ers. Llama even tai­lors out­puts to mimic the ex­pres­sive el­e­ments and cre­ative choices of spe­cific au­thors.”

When everyone has AI and the company still learns nothing

www.robert-glaser.de

Are peo­ple us­ing AI, or is the or­ga­ni­za­tion learn­ing from it? What changed be­cause we spent those to­kens? And who moves dis­cov­er­ies from in­di­vid­u­als to teams to or­ga­ni­za­tional ca­pa­bil­i­ties?

Ethan Mollick has been writ­ing about AI adop­tion in or­ga­ni­za­tions for a while now. In Making AI Work: Leadership, Lab, and Crowd, he makes the point that in­di­vid­ual pro­duc­tiv­ity gains from AI do not au­to­mat­i­cally be­come or­ga­ni­za­tional gains. People may get faster, write bet­ter, an­a­lyze more, au­to­mate more, or qui­etly be­come cy­borg ver­sions of them­selves. The com­pany may still learn al­most noth­ing.

A lot of com­pa­nies are now en­ter­ing the phase where GitHub Copilot li­censes are pro­vi­sioned, ChatGPT Enterprise ex­ists some­where in the stack, Claude or Gemini or Cursor show up in pock­ets, and every team has at least one per­son who is much fur­ther along than the of­fi­cial en­able­ment ma­te­r­ial as­sumes. Some of this is vis­i­ble, yet much of it is not. Management sees li­cense us­age (“Where is the ROI for the 2 mio € we paid Anthropic last year?”), maybe prompt counts, maybe a sur­vey, maybe a few in­ter­nal PoCs that feel en­cour­ag­ing enough to put into a steer­ing com­mit­tee deck. In other com­pa­nies, AI went straight to IT and died.

I think every­one knows this is the phase where it gets com­pli­cated, like, re­ally com­pli­cated. The messy mid­dle” of AI adop­tion starts when AI use is every­where, un­even, par­tially hid­den, dif­fi­cult to com­pare, and not yet con­nected to or­ga­ni­za­tional learn­ing.

Everyone has Copilot now

The first phase of AI adop­tion is (mostly) com­fort­able be­cause it looks like other en­ter­prise roll­outs. You buy seats. You de­fine ac­cept­able use. You run train­ing. You cre­ate a cham­pion net­work. You ask peo­ple to share use cases in a Teams chan­nel, which will briefly look alive and then be­come one more cor­po­rate at­tic full of good in­ten­tions.

The sec­ond phase is much stranger: one team uses Copilot as au­to­com­plete and calls it a day. Another team runs Claude Code in tight loops, with tests, re­views, and con­stant steer­ing. A prod­uct owner sud­denly pro­to­types real soft­ware in­stead of mock­ing screens in Figma. A se­nior en­gi­neer del­e­gates a root-cause analy­sis to an agent and comes back to the valid so­lu­tion in un­der an hour; this would’ve taken him two weeks with­out AI. A ju­nior per­son pro­duces pol­ished code but has no idea which ar­chi­tec­tural as­sump­tions got smug­gled into the sys­tem. A sup­port team qui­etly turns re­cur­ring tick­ets into work­flow au­toma­tion, be­cause they know ex­actly where the work hurts and no­body in the Center of Excellence ever asked the right ques­tion.

All of these things can hap­pen in the same com­pany at the same time. That is what makes the messy mid­dle messy: the adop­tion unit is no longer the or­ga­ni­za­tion, and maybe not even the team. It is the loop in­side the work!

Mollick’s Leadership, Lab, and Crowd frame is use­ful here. Leadership sets di­rec­tion and per­mis­sion, The Crowd dis­cov­ers use cases be­cause the Crowd does the ac­tual work. The Lab turns those dis­cov­er­ies into shared prac­tices, tools, bench­marks, and new sys­tems. But the part I keep get­ting stuck on is the same one that shows up in agen­tic en­gi­neer­ing again and again: how does the learn­ing ac­tu­ally travel?

The old change ma­chin­ery is too slow for this

Most com­pa­nies will try to process AI adop­tion through the ma­chin­ery they al­ready have. Communities of prac­tice, brown-bag ses­sions, cham­pion net­works, en­able­ment decks, of­fice hours, monthly demos, sur­veys, maybe a dash­board. Fair enough, I did it, you did it. Some of that helps, es­pe­cially in or­ga­ni­za­tions that still need per­mis­sion to ex­per­i­ment at all.

But the in­ter­est­ing AI work does not wait for the next com­mu­nity meet­ing. It ap­pears in­side a code re­view, a sales pro­posal, a re­search task, a prod­uct pro­to­type, a pro­duc­tion in­ci­dent, a test strat­egy, a com­pli­ance ques­tion. Or when some­one fig­ures out that for a cer­tain class of prod­uct com­po­nents, they can set up some­thing close to a dark fac­tory: write the in­tent, let the agent run a very loose loop, ap­ply enough back­pres­sure to keep it on track, eval­u­ate the out­come against strong sce­nar­ios, re­fine the in­tent, and re­peat­edly get high-qual­ity re­sults. By the time the story is cleaned up enough to be­come a best-prac­tice slide, the im­por­tant learn­ing has of­ten lost its teeth. What made it use­ful was the fric­tion: the miss­ing con­text, the test that failed, the weird API be­hav­ior, the mo­ment where the agent sprawled into non­sense and some­one had to pull it back.

I have been think­ing about this through the same lens as the elas­tic loop. AI col­lab­o­ra­tion is not one mode! It stretches from tight, syn­chro­nous co-dri­ving to looser, asyn­chro­nous del­e­ga­tion. The adop­tion ques­tion is not sim­ply are peo­ple us­ing AI?” It is whether teams know which loop size to use, where they need re­sis­tance, which ar­ti­facts should sur­vive the loop, and how those ar­ti­facts be­come some­thing the or­ga­ni­za­tion can learn from.

That is a much harder ques­tion than tool us­age or bean (token) count­ing.

Scrum was built for ex­pen­sive it­er­a­tion

I argued that much of mod­ern soft­ware process ex­ists be­cause hu­man it­er­a­tion used to be ex­pen­sive. Sprint plan­ning, es­ti­ma­tion, standups, user sto­ries, ticket groom­ing, hand­offs, all the cer­e­mony around co­or­di­na­tion and risk re­duc­tion. Reasonable, given the con­straints. If a sin­gle it­er­a­tion takes days or weeks, you need struc­tures that pre­vent peo­ple from wast­ing too many of them.

But agen­tic en­gi­neer­ing changes the eco­nom­ics: It makes more op­tions ma­te­ri­al­iz­able! It lets teams move from in­tent to pro­to­type to eval­u­a­tion much faster. It lets prod­uct peo­ple see work­ing soft­ware ear­lier. It lets en­gi­neers test more hy­pothe­ses be­fore com­mit­ting. It does not mag­i­cally make de­liv­ery easy, but it moves the con­straint away from im­ple­men­ta­tion and to­ward in­tent, ver­i­fi­ca­tion, judg­ment, and feed­back.

The awk­ward thing is that many or­ga­ni­za­tions spent twenty years call­ing them­selves ag­ile while pre­serv­ing the or­ga­ni­za­tional re­flexes ag­ile was sup­posed to re­move. Now AI makes real agility more plau­si­ble, and the sys­tem still asks for two-week sprint com­mit­ments, hand­off doc­u­ments, and all the stuff that as­sumes it­er­a­tion is scarce.

That is the cer­e­mony grave­yard again, but now at adop­tion level. The loop can move faster than the or­ga­ni­za­tion can me­tab­o­lize what the loop learned.

The open bar will not stay open for­ever

There is an­other pres­sure build­ing un­der­neath all this. AI us­age will be­come more vis­i­bly me­tered. The cur­rent en­ter­prise feel­ing of everyone has ac­cess, don’t worry too much about the bill” will not hold for­ever, at least not in the form peo­ple are get­ting used to. Model rout­ing, to­ken bud­gets, us­age-con­tin­gent pric­ing, in­fer­ence costs, gov­er­nance around which model is al­lowed for which task: all of that will be­come more ex­plicit as com­pa­nies move from ca­sual as­sis­tance to se­ri­ous agen­tic work.

I do not want to make this a cost panic story, that would be the least in­ter­est­ing way to think about rented in­tel­li­gence”. The ques­tion is not how to min­i­mize to­ken spend in the ab­stract, any more than the ques­tion of soft­ware de­liv­ery was ever how to min­i­mize key­strokes.

But the bill will force a bet­ter ques­tion: what changed be­cause we spent those to­kens?

Please, I beg you, don’t count pull re­quests. Better: Which loops closed faster? Which de­ci­sions im­proved? Which root-cause analy­ses got sharper? Which re­views caught more? Which teams learned reusable pat­terns? Which prod­uct ideas were killed ear­lier be­cause a pro­to­type made the weak­ness ob­vi­ous? Where did AI cre­ate learn­ing, and where did it just cre­ate more out­put?

Token-to-output is the old mea­sure­ment re­flex in a new cos­tume. Token-to-learning is closer to the thing that mat­ters.

Loop Intelligence is the miss­ing feed­back path

I keep com­ing back to three ca­pa­bil­i­ties com­pa­nies will need in the messy mid­dle.

Agent Operations: which agents and AI tools are run­ning, what sys­tems they can touch, which data they can see, which ac­tions re­quire ap­proval, where iden­tity, au­dit, per­mis­sions, and run­time vis­i­bil­ity live. This is the con­trol side, and it mat­ters be­cause agen­tic work even­tu­ally touches real sys­tems.

Loop Intelligence: which AI-assisted (or fully agen­tic) loops ac­tu­ally pro­duce learn­ing, which ones stay open, which ones de­cay, where agents cre­ate lever­age, where they sprawl into side quests, which teams are stuck in tight su­per­vi­sion be­cause they lack tests, con­text, or in­tu­ition. Which teams are ready for looser del­e­ga­tion.

Agent Capabilities: how use­ful ca­pa­bil­i­ties get dis­trib­uted across the or­ga­ni­za­tion with­out pre­tend­ing that three mono­lithic agents can do every­one’s work. AI is start­ing to be­have more like a fluid base tech­nol­ogy than a sin­gle ap­pli­ca­tion cat­e­gory. It does not fit cleanly into one HR agent,” one engineering agent,” one sales agent,” each sit­ting some­where in the en­ter­prise zoo. The bet­ter ques­tion is how ca­pa­bil­i­ties flow into the places where work hap­pens: em­ployee har­nesses, back­ground agents, prod­uct teams, plat­form ser­vices, lo­cal skills, MCP servers, eval­u­a­tion suites, run­books, ex­am­ples, and do­main-spe­cific pro­ce­dures.

This is where the plat­form ques­tion gets in­ter­est­ing. Who owns these ca­pa­bil­i­ties? How does a use­ful agent skill dis­cov­ered in one team be­come avail­able to oth­ers with­out turn­ing into a dead tem­plate? How do you en­rich a de­vel­op­er’s har­ness dif­fer­ently from a prod­uct per­son’s har­ness, a sup­port team’s back­ground agent, or a com­pli­ance work­flow? Which ca­pa­bil­i­ties be­long close to the team, which be­long in a plat­form layer, and which should never be gen­er­al­ized be­cause the lo­cal con­text is the whole point?

One with­out the oth­ers gets weird quickly. Agent Operations with­out Loop Intelligence be­comes con­trol bu­reau­cracy. Loop Intelligence with­out Agent Capabilities be­comes an an­a­lyt­ics layer that dis­cov­ers use­ful pat­terns but has no way to feed them back into work. Agent Capabilities with­out Operations and Loop Intelligence be­comes tool sprawl with bet­ter brand­ing. We can all have nice charts these days, no need to ask the IT de­part­ment to build a dash­board any­more, right?

The con­trol path, the learn­ing path, and the ca­pa­bil­ity path have to meet some­where.

That some­where is what I have been call­ing a feed­back har­ness in­ter­nally. I am not sure I like the term for cus­tomers. It sounds too much like some­thing from an ar­chi­tec­ture di­a­gram, and cus­tomers do not buy har­nesses be­cause the mech­a­nism is el­e­gant, even if it’s the thing of the year. They buy con­fi­dence, bet­ter de­ci­sions, faster learn­ing, less waste, safer del­e­ga­tion.

So the more use­ful cus­tomer-fac­ing con­cept might be a Loop Intelligence Hub.

A feed­back har­ness lis­tens to real work loops: tasks, prompts, spec­i­fi­ca­tions, re­views, sce­nar­ios, ac­cepted and re­jected hy­pothe­ses, pro­duc­tion sig­nals, re­work, hu­man de­ci­sions and in­ter­ven­tions. Not to watch peo­ple, but to un­der­stand the loop. A first ver­sion does not have to be a gi­ant plat­form. Pick a few real work­flows, in­stru­ment the points where in­tent, agent work, ver­i­fi­ca­tion, and hu­man de­ci­sion al­ready leave traces, col­lect enough qual­i­ta­tive feed­back to un­der­stand why a loop worked or failed, and turn that into a re­cur­ring learn­ing ar­ti­fact.

A Loop Intelligence Hub turns those sig­nals into some­thing the or­ga­ni­za­tion can act on: an en­able­ment back­log, a ca­pa­bil­ity radar, in­vest­ment briefs, gov­er­nance gaps, reusable work­flows, train­ing needs, eval­u­a­tion pri­or­i­ties. No one-size-fits-all dash­boards, cus­tomized to what’s rel­e­vant. The in­ter­est­ing out­put is not the dash­board any­way. It is the de­ci­sion that fol­lows: this team needs bet­ter back­pres­sure be­fore it can del­e­gate more (stretch the loop), this prod­uct group has a re­peat­able dark-fac­tory pat­tern for a nar­row class of com­po­nents, this com­pli­ance work­flow needs a gov­erned tool bound­ary, this skill should move into the plat­form be­cause five teams have rein­vented it badly.

The har­ness col­lects and the hub helps the or­ga­ni­za­tion de­cide. The ca­pa­bil­ity layer feeds the learn­ing back into work.

This can­not be­come em­ployee sur­veil­lance

The whole thing dies if it turns into em­ployee scor­ing.

If peo­ple be­lieve the or­ga­ni­za­tion is mea­sur­ing whether they used enough AI, they will game the sig­nals. If they be­lieve every ex­per­i­ment be­comes a pro­duc­tiv­ity ex­pec­ta­tion, they will hide the ex­per­i­ments. If they be­lieve their best work­flow will sim­ply be­come their new base­line work­load, they will keep it pri­vate. The com­pany will get the worst pos­si­ble ver­sion of adop­tion: vis­i­ble com­pli­ance and in­vis­i­ble learn­ing.

This is why the hon­est in­tent (not just the fram­ing) is re­ally im­por­tant here. The use­ful ques­tion can’t be who uses AI enough?” but: where did AI change the work in a way the or­ga­ni­za­tion can learn from? Which loops be­came health­ier? Which teams need bet­ter back­pres­sure be­fore they can del­e­gate more? Where does a prod­uct team need a dif­fer­ent en­vi­ron­ment be­cause pro­to­types are be­com­ing real soft­ware?

You can write poli­cies about this, and you prob­a­bly should. But gov­er­nance, like learn­ing, only be­comes real through use. Once the agent touches pro­duc­tion-ad­ja­cent work, once a prod­uct per­son pro­to­types in­stead of spec­i­fy­ing, once a de­vel­oper del­e­gates root-cause analy­sis, once to­ken spend be­comes large enough that man­age­ment wants an­swers, the or­ga­ni­za­tion dis­cov­ers whether it built a learn­ing sys­tem or just bought a lot of seats.

The messy mid­dle is not a phase to sur­vive

The first phase of AI adop­tion was about ac­cess. Who gets the tools, who has per­mis­sion, who ne­go­ti­ates the con­tracts, who can try the lat­est model with­out fil­ing a pro­cure­ment ticket. That phase still mat­ters, but it will not dif­fer­en­ti­ate for long. Access to fron­tier in­tel­li­gence can be rented. Operational con­trol and or­ga­ni­za­tional learn­ing can­not be rented in the same way.

The next ad­van­tage is learn­ing ve­loc­ity.

Who finds the real pat­terns faster? Who moves dis­cov­er­ies from in­di­vid­u­als to teams to or­ga­ni­za­tional ca­pa­bil­i­ties? Who builds back­pres­sure into agen­tic loops, so agents can’t sprawl? Who dis­trib­utes use­ful agent ca­pa­bil­i­ties with­out turn­ing them into mono­lithic en­ter­prise agents that fit no­body? Who fi­nally uses agen­tic en­gi­neer­ing to make ag­ile real, in­stead of just slap­ping AI onto the old cer­e­monies?

Nobody has this fig­ured out yet, I cer­tainly do not. I have been it­er­at­ing on the elas­tic loop for months, and every cus­tomer con­ver­sa­tion, every in­ter­nal dis­cus­sion, every strange ex­am­ple from real work re­shapes it again. That is the point! We will not un­der­stand this shift by wait­ing for a de­fin­i­tive adop­tion play­book from a ven­dor, a con­sul­tant, or an AI lab. We will un­der­stand it by in­stru­ment­ing the work, shar­ing the messy learn­ings, let­ting oth­ers poke holes, and it­er­at­ing in the open.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.