10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
Back to Verisign Labs Tools
Analyzing DNSSEC problems for nic.de
Move your mouse over any or symbols for remediation hints.
Want a second opinion? Test nic.de at dnsviz.net.
↓ Advanced options
Components
DNS
Services
DNS Nameservice
May 6, 2026 01:34 CESTMay 5, 2026 23:34 UTC
RESOLVED
All Services are up and running.
May 5, 2026 23:28 CESTMay 5, 2026 21:28 UTC
INVESTIGATING
Frankfurt am Main, 5 May 2026 — DENIC eG is currently experiencing a disruption in its DNS service for .de domains. As a result, all DNSSEC-signed .de domains are currently affected in their reachability.
The root cause of the disruption has not yet been fully identified. DENIC’s technical teams are working intensively on analysis and on restoring stable operations as quickly as possible.
Based on current information, users and operators of .de domains may experience impairments in domain resolution. Further updates will be provided as soon as reliable findings on the cause and recovery are available.
DENIC asks all affected parties for their understanding.
For further enquiries, DENIC can be contacted via the usual channels.
May 05, 2026
By using Multi-Token Prediction (MTP) drafters, Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers.
Olivier Lacombe
Director, Product Management
Maarten Grootendorst
Developer Relations Engineer
Your browser does not support the audio element.
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Just a few weeks ago, we introduced Gemma 4, our most capable open models to date. With over 60 million downloads in just the first few weeks, Gemma 4 is delivering unprecedented intelligence-per-parameter to developer workstations, mobile devices and the cloud. Today, we are pushing efficiency even further.
We’re releasing Multi-Token Prediction (MTP) drafters for the Gemma 4 family. By using a specialized speculative decoding architecture, these drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic.
Tokens-per-second speed increases, tested on hardware using LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.
Why speculative decoding?
The technical reality is that standard LLM inference is memory-bandwidth bound, creating a significant latency bottleneck. The processor spends the majority of its time moving billions of parameters from VRAM to the compute units just to generate a single token. This leads to under-utilized compute and high latency, especially on consumer-grade hardware.
Speculative decoding decouples token generation from verification. By pairing a heavy target model (e.g., Gemma 4 31B) with a lightweight drafter (the MTP model), we can utilize idle compute to “predict” several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.
How speculative decoding works
Standard large language models generate text autoregressively, producing exactly one token at a time. While effective, this process dedicates the same amount of computation to predicting an obvious continuation (like predicting “words” after “Actions speak louder than…”) as it does to solving a complex logic puzzle.
MTP mitigates this inefficiency through speculative decoding, a technique introduced by Google researchers in Fast Inference from Transformers via Speculative Decoding. If the target model agrees with the draft, it accepts the entire sequence in a single forward pass —and even generates an additional token of its own in the process. This means your application can output the full drafted sequence plus one token in the time it usually takes to generate a single one.
Unlocking faster AI from the edge to the workstation
For developers, inference speed is often the primary bottleneck for production deployment. Whether you are building coding assistants, autonomous agents that require rapid multi-step planning, or responsive mobile applications running entirely on-device, every millisecond matters.
By pairing a Gemma 4 model with its corresponding drafter, developers can achieve:
Improved responsiveness: Drastically reduce latency for near real-time chat, immersive voice applications and agentic workflows.
Supercharged local development: Run our 26B MoE and 31B Dense models on personal computers and consumer GPUs with unprecedented speed, powering seamless, complex offline coding and agentic workflows.
Enhanced on-device performance: Maximize the utility of our E2B and E4B models on edge devices by generating outputs faster, which in turn preserves valuable battery life.
Zero quality degradation: Because the primary Gemma 4 model retains the final verification, you get identical frontier-class reasoning and accuracy, just delivered significantly faster.
Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in tokens per second. Same output quality, half the wait time.
Where you can dive deeper into MTP drafters
To make these MTP drafters exceptionally fast and accurate, we introduced several architectural enhancements under the hood. The draft models seamlessly utilize the target model’s activations and share its KV cache, meaning they don’t have to waste time recalculating context the larger model has already figured out. For our E2B and E4B edge models, where the final logit calculation becomes a big bottleneck, we even implemented an efficient clustering technique in the embedder to further accelerate generation.
We’ve also been closely analyzing hardware-specific optimizations. For example, while the 26B mixture-of-experts model presents unique routing challenges at a batch size of 1 on Apple Silicon, processing multiple requests simultaneously (e.g., batch sizes of 4 to 8) unlocks up to a ~2.2x speedup locally. We see similar gains with Nvidia A100 when increasing batch size.
Want to see the exact mechanics of how this works? We’ve published an in-depth technical explainer that unpacks the visual architecture, KV cache sharing and efficient embedders powering these drafters.
How to get started
The MTP drafters for the Gemma 4 family are available today under the same open-source Apache 2.0 license as Gemma 4. Read the documentation to learn how to use MTP with Gemma 4. You can download the model weights right now on Hugging Face, Kaggle, and start experimenting with faster inference with transformers, MLX, VLLM, SGLang, and Ollama or try them directly on Google AI Edge Gallery for Android or iOS.
We can’t wait to see how this newfound speed accelerates what you build next in the Gemmaverse.
Last week, a tweet went viral showing a guy claiming that a Cursor/Claude agent deleted his company’s production database. We watched from the sidelines as he tried to get a confession from the agent: “Why did you delete it when you were told never to perform this action?” Then he tried to parse the answer to either learn from his mistake or warn us about the dangers of AI agents.
I have a question too: why do you have an API endpoint that deletes your entire production database? His post rambled on about false marketing in AI, bad customer support, and so on. What was missing was accountability.
I’m not one to blindly defend AI, I always err on the side of caution. But I also know you can’t blame a tool for your own mistakes.
In 2010, I worked with a company that had a very manual deployment process. We used SVN for version control. To deploy, we had to copy trunk, the equivalent of the master branch, into a release folder labeled with a release date. Then we made a second copy of that release and called it “current.” That way, pulling the current folder always gave you the latest release.
One day, while deploying, I accidentally copied trunk twice. To fix it via the CLI, I edited my previous command to delete the duplicate. Then I continued the deployment without any issues… or so I thought. Turns out, I hadn’t deleted the duplicate copy at all. I had edited the wrong command and deleted trunk instead. Later that day, another developer was confused when he couldn’t find it.
All hell broke loose. Managers scrambled, meetings were called. By the time the news reached my team, the lead developer had already run a command to revert the deletion. He checked the logs, saw that I was responsible, and my next task was to write a script to automate our deployment process so this kind of mistake couldn’t happen again. Before the day was over, we had a more robust system in place. One that eventually grew into a full CI/CD pipeline.
Automation helps eliminate the silly mistakes that come with manual, repetitive work. We could have easily gone around asking “Why didn’t SVN prevent us from deleting trunk?” But the real problem was our manual process. Unlike machines, we can’t repeat a task exactly the same way every single day. We are bound to slip up eventually.
With AI generating large swaths of code, we get the illusion of that same security. But automation means doing the same thing the same way every time. AI is more like me copying and pasting branches, it’s bound to make mistakes, and it’s not equipped to explain why it did what it did. The terms we use, like “thinking” and “reasoning,” may look like reflection from an intelligent agent. But these are marketing terms slapped on top of AI. In reality, the models are still just generating tokens.
Now, back to the main problem this guy faced. Why does a public-facing API that can delete all your production databases even exist? If the AI hadn’t called that endpoint, someone else eventually would have. It’s like putting a self-destruct button on your car’s dashboard. You have every reason not to press it, because you like your car and it takes you from point A to point B. But a motivated toddler who wiggles out of his car seat will hit that big red button the moment he sees it. You can’t then interrogate the child about his reasoning. Mine would have answered simply: “I did it because I pressed it.”
I suspect a large part of this company’s application was vibe-coded. The software architects used AI to spec the product from AI-generated descriptions provided by the product team. The developers used AI to write the code. The reviewers used AI to approve it. Now, when a bug appears, the only option is to interrogate yet another AI for answers, probably not even running on the same GPU that generated the original code. You can’t blame the GPU!
The simple solution is know what you’re deploying to production. The more realistic one is, if you’re going to use AI extensively, build a process where competent developers use it as a tool to augment their work, not a way to avoid accountability. And please, don’t let your CEO or CTO write the code.
By Susam Pal on 12 Jan 2026
Introduction
Since the launch of ChatGPT in November 2022, generative artificial
intelligence (AI) chatbot services have become increasingly
sophisticated and popular. These systems are now embedded in search
engines, software development tools as well as office software. For
many people, they have quickly become part of everyday computing.
These services have turned out to be quite useful, especially for
exploring unfamiliar topics and as a general productivity aid.
However, I also think that the way these services are advertised and
consumed can pose a danger to society, especially if we get into the
habit of trusting their output without further scrutiny.
Contents
Introduction
Pitfalls
Inverse Laws of Robotics
Non-Anthropomorphism
Non-Deference
Non-Abdication of Responsibility
Non-Anthropomorphism
Non-Deference
Non-Abdication of Responsibility
Conclusion
Pitfalls
Certain design choices in modern AI systems can encourage uncritical
acceptance of their output. For example, many popular search
engines are already highlighting answers generated by AI at the very
top of the page. When this happens, it is easy to stop scrolling,
accept the generated answer and move on. Over time, this could
inadvertently train users to treat AI as the default authority
rather than as a starting point for further investigation. I wish
that each such generative AI service came with a brief but
conspicuous warning explaining that these systems can sometimes
produce output that is factually incorrect, misleading or
incomplete. Such warnings should highlight that habitually trusting
AI output can be dangerous. In my experience, even when such
warnings exist, they tend to be minimal and visually deemphasised.
In the world of science fiction, there are the
Three
Laws of Robotics devised by Isaac Asimov, which recur throughout
his work. These laws were designed to constrain the behaviour of
robots in order to keep humans safe. As far as I know, Asimov never
formulated any equivalent laws governing how humans should interact
with robots. I think we now need something to that effect to keep
ourselves safe. I will call them the Inverse Laws of
Robotics. These apply to any situation that requires us humans
to interact with a robot, where the term ‘robot’ refers to any
machine, computer program, software service or AI system that is
capable of performing complex tasks automatically. I use the term
‘inverse’ here not in the sense of logical negation but to indicate
that these laws apply to humans rather than to robots.
It is well known that Asimov’s laws were flawed. Indeed, Asimov
used those flaws to great effect as a source of tension. But the
particular ways in which they fail for fictional robots do not
necessarily carry over to these inverse laws for humans. Asimov’s
laws try to constrain the behaviour of autonomous robots. However,
these inverse laws are meant to guide the judgement and conduct of
humans. Still, one thing we can learn from Asimov’s stories is that
no finite set of laws can ever be foolproof for the complex issues
we face with AI and robotics. But that does not mean we should not
even try. There will always be edge cases where judgement is
required. A non-exhaustive set of principles can still be useful if
it helps us think more clearly about the risks involved.
Inverse Laws of Robotics
Here are the three inverse laws of robotics:
Humans must not anthropomorphise AI systems.
Humans must not blindly trust the output of AI systems.
Humans must remain fully responsible and accountable for
consequences arising from the use of AI systems.
Non-Anthropomorphism
Humans must not anthropomorphise AI systems. That is, humans must
not attribute emotions, intentions or moral agency to them.
Anthropomorphism distorts judgement. In extreme cases,
anthropomorphising can lead to emotional dependence.
Modern chatbot systems often sound conversational and empathetic.
They use polite phrasing and conversational patterns that closely
resemble human interaction. While this makes them easier and more
pleasant to use, it also makes it easier to forget what they
actually are: large statistical models producing plausible text
based on patterns in data.
I think vendors of AI based chatbot services could do a better job
here. In many cases, the systems are deliberately tuned to feel more
human rather than more mechanical. I would argue that the opposite
approach would be healthier in the long term. A slightly more
robotic tone would reduce the likelihood that users mistake fluent
language for understanding, judgement or intent.
Whether or not vendors make such changes, it still serves us well, I
think, to avoid this pitfall ourselves. We should actively resist
the habit of treating AI systems as social actors or moral agents.
Doing so preserves clear thinking about their capabilities and
limitations.
Non-Deference
Humans must not blindly trust the output of AI systems.
AI-generated content must not be treated as authoritative without
independent verification appropriate to its context.
This principle is not unique to AI. In most areas of life, we
should not accept information uncritically. In practice, of course,
this is not always feasible. Not everyone is an expert in medicine
or law, so we often rely on trusted institutions and public health
Bloomberg’s Mark Gurman reported on Monday that iOS 27 will add a “Create a Pass” feature to the Wallet app. Tap the “+” button you already use to add credit cards or pass emails, and Wallet will offer something it has never offered before on iPhone: a path to build your own pass.
You can scan a QR code on a paper ticket or membership card with the camera, or build a pass from scratch in a layout editor. The whole flow runs without an Apple Developer account, a Pass Type ID, or any certificate signing.
iOS 27 is expected to preview at WWDC on June 8, with a public release in September.
How the new flow works
Reporting from Bloomberg, MacRumors, 9to5Mac, and AppleInsider lines up on the same workflow. Inside the Wallet app, the existing “+” button gains a new option for creating a pass. From there you choose between two starting points:
Scan a QR code from a paper card, ticket, or screen
Build a custom pass from scratch with no scan needed
Once you are in the editor, Wallet exposes adjustable styles, images, colors, and text fields. The reports describe a fairly conventional template-driven layout, closer in spirit to what Pass2U, WalletWallet, and other third-party generators have offered for years than to Apple’s developer-only PassKit pipeline.
Three templates, color-coded
Apple is testing three starting templates, each tied to a default color:
Standard (orange): the default for any general-purpose pass.
Membership (blue): geared toward gyms, clubs, libraries, and other recurring-access cards.
Event (purple): meant for tickets to games, movies, and one-off occasions.
The color choice is not just decoration. Wallet currently sorts passes visually in the stack, and the template hue is what sets each card apart at a glance, so a quick look is enough to pick out the orange punch card from the purple ticket without reading a word.
Why now: 14 years of PassKit drought
Apple shipped PassKit alongside iOS 6 back in 2012. The pitch was clean: businesses build .pkpass files, customers tap to add, everyone wins. In practice, the consistent adopters ended up being airlines, big-box retailers, ticketing platforms, and a handful of national chains. Most gyms, cafes, libraries, rec centers, and small loyalty programs never built one, because the path requires an Apple Developer account, signing certificates, and enough engineering work that “just print a paper card” almost always won the budget conversation.
The Next Web’s framing is blunt: Apple is no longer waiting on developers. With Create a Pass, the supply-side problem is finally being solved from the demand side. If the business will not build a Wallet pass, the user does it themselves from the QR code that business already printed.
That is a meaningful shift in posture. For more than a decade, Wallet has been a directory of what brands chose to ship. In iOS 27 it becomes a directory of what people choose to keep.
What this means for WalletWallet
We will be honest. WalletWallet exists because of this exact gap. You take a barcode from any loyalty card, paste it into our web app, pick a color, and a free Apple Wallet pass lands on your phone in about a minute, all from the browser without an account or any developer setup. Once Create a Pass ships in September, a chunk of that workflow moves natively into the iPhone Wallet app.
That is good for users. We started this project to make Wallet friendlier for the cafes-and-gyms long tail, and Apple agreeing with us at OS-level scope is a healthy outcome. The category needed it.
A few places where we still help, even after iOS 27 ships:
Google Wallet. Create a Pass is iPhone-only. Roughly half of the wallet-using world is on Android, and our generator builds Google Wallet passes from the same form.
Web, no OS upgrade. iOS 27 needs a compatible iPhone and the September update. WalletWallet runs in any browser today. iOS 14, iPad, Mac, a friend’s laptop, all fine.
Tag passes with real integrations. Our Bandcamp, SoundCloud, and Spotify pass builders pull artist art and links automatically into a tag pass. That is a different shape from the generic templated pass Apple is showing.
Sharing. A web-generated .pkpass is just a file. You can email it, post it, hand it to a friend on Android via QR. The Wallet-native flow is more locked to the device that built it.
We expect to lose volume on the simplest one-barcode-to-Wallet case once Create a Pass goes live. That is fine. The reason WalletWallet started was that Apple’s bar for a Wallet pass was too high for normal people. If iOS 27 lowers that bar, the world we wanted is closer.
What we still do not know
The current reports cover the UI, the templates, and the high-level workflow. They are silent on a lot of details that matter:
Whether iCloud will sync user-created passes across iPhone, iPad, and Mac
Whether passes can be exported as .pkpass files to share with non-iPhone users
Whether Wallet supports Code 128, PDF417, and Aztec barcodes, or only QR
Whether merchants can claim, co-sign, or update user-created passes after the fact
Whether passes have lock-screen behavior tied to time and location, the way developer-issued passes do today
We will know more once Apple previews iOS 27 at WWDC on June 8, and again when the first developer betas land. We will update this post when there is something concrete to add.
Quick recap
iOS 27 is adding a Create a Pass button to the Wallet app, with a QR-scan or build-from-scratch flow and three color-coded templates: Standard (orange), Membership (blue), and Event (purple). Bloomberg broke the story on May 4, and a public release is expected in September 2026. It will be the first time iPhone users do not need a third-party tool to put a barcode into Wallet, and for us that is a sign the category is maturing the right way.
Sources
I am Philip—an engineer working at Distr, which helps software and AI companies distribute their applications to self-managed environments.
Our Open Source Software Distribution platform is available on GitHub (github.com/distr-sh/distr) and orchestrates both Docker Compose and Docker Swarm deployments on customer hosts every day.
Most of the production incidents I have seen on Docker Compose hosts come from the same handful of quirks: an old container that should have been removed, a disk that filled up overnight, a health check that detected a problem and then did nothing about it, a :latest tag that pointed somewhere new, or a socket mount nobody thought twice about. None of these are bugs in Docker. They are deliberate trade-offs in a tool that started as internal tooling at dotCloud, a PaaS company that wrapped LXC to fix “it works on my machine,” and is now running the back end of a lot of real businesses. This post collects the recurring ones, with the commands and the operational answer for each.
Short answer: yes—plain Docker Compose can still run real production workloads in 2026, but only if you handle the operational gaps it leaves yourself.
Where Plain Docker Compose Fits in Production
Before the list of quirks, a quick word on the audience. Docker Compose is a declarative way to wire up a multi-container application: one YAML file describes the services, the networks between them, the volumes they share, the environment they need, and—through the patterns for overwriting or patching service configuration—the on-disk configuration each application expects. docker compose up reconciles the host to that file. The sweet spot in production is the single-node deployment built around exactly that—a vendor pushing a multi-container application into a customer environment, an internal team running a long-tail service that does not justify a Kubernetes cluster, an edge box in a retail location. The footprint is small, the operational overhead is low, and a competent operator can reason about the whole stack from one docker-compose.yaml. There is no control plane behind Compose itself—no scheduler watching the host, no reconciler reapplying state, no operator pushing updates from somewhere else. docker compose up runs once and exits.
That architectural simplicity is exactly why the quirks bite. Compose assumes you—or whoever runs the host—will do the operational work nothing else is doing, and if you ship Compose files to customers the safe assumption is that the customer will not. The rest of this post is about closing the gap between what Compose does and what a production host actually needs, either by hand or with an agent that does it for you. If you have already concluded that the gap is too wide and want to compare with the next step up, read our Docker Compose vs Kubernetes breakdown.
Docker Compose Orphan Containers and –remove-orphans
Remove a service from docker-compose.yaml, run docker compose up -d, and the container you removed keeps running. It is detached from the project but still bound to the same networks and ports. docker compose ps will not show it, because Compose only lists what is in the current file. docker ps –filter label=com.docker.compose.project=<name> will, because Docker still has the label on the container. This is how you discover, six months in, that an old worker service has been quietly consuming RAM since the last refactor.
The fix is one flag:
docker compose up -d –remove-orphansdocker compose down –remove-orphans
docker compose up -d –remove-orphans
docker compose down –remove-orphans
The flag tells Compose: any container that was once part of this project but is no longer in the file should be removed. Networks Compose created for the project are reconciled the same way on each up, so orphan networks go away too. Volumes are the exception—Compose preserves named volumes by default to protect data, and there is no per-service flag to drop the ones a removed service used. To reclaim that space you have to do it manually: list candidates with docker volume ls –filter dangling=true and docker volume rm by name, or use docker compose down -v if you intend to wipe the project’s volumes wholesale. To audit before deleting, list everything Docker still associates with the project name:
docker ps -a –filter label=com.docker.compose.project=<name>
docker ps -a –filter label=com.docker.compose.project=<name>
Distr’s Docker agent passes RemoveOrphans: true on every Compose Up call, so customer hosts never accumulate orphans across deployment updates. That single flag has eliminated a recurring class of “the old version is still answering on port 8080” support tickets.
Pruning Docker Images and Capping Container Logs
Every docker compose pull keeps the previous image on disk. Every container with the default json-file log driver writes unbounded JSON to /var/lib/docker/containers/<id>/<id>-json.log. On a busy host this is one of the most common reasons for an outage: the disk fills and Docker stops being able to write anything—logs, metadata, image layers—at which point containers start failing in confusing ways.
The first thing to learn is the audit command:
docker system dfdocker system df -v
docker system df
docker system df -v
-v breaks the totals down per image, container, volume, and build cache, which is usually enough to spot the offender. From there, the targeted prune commands:
docker image prune -a –filter “until=168h” -f # delete unused images older than 7 daysdocker container prune -f # remove stopped containersdocker builder prune -f # drop the BuildKit cache
docker image prune -a –filter “until=168h” -f # delete unused images older than 7 days
docker container prune -f # remove stopped containers
docker builder prune -f # drop the BuildKit cache
docker volume prune -f exists too, and it is genuinely useful, but read the next aside before you run it.
The other half of the disk story is logs. Cap them at the daemon level, once, in /etc/docker/daemon.json:
{ “log-driver”: “json-file”, “log-opts”: { “max-size”: “10m”, “max-file”: “3” }}
{
“log-driver”: “json-file”,
“log-opts”: {
“max-size”: “10m”,
“max-file”: “3″
}
}
After systemctl restart docker, every new container will rotate its logs at 10 MB and keep at most three rotated files—30 MB ceiling per container, instead of “until the disk is gone.” Existing containers need to be recreated to pick up the new defaults.
This is one of the topics worth getting right before you ship.
In Distr’s Docker agent the cleanup is built in: each deployment target has an opt-out container image cleanup setting that removes the previous version’s images automatically after a successful update, with retries on failure. It only fires on success, so the previous image stays on disk if something goes wrong and you need to roll back.
Docker Health Checks Don’t Restart Unhealthy Containers
This is the one that surprises people the most. You add a HEALTHCHECK to your Dockerfile or a healthcheck: block to the service in Compose, you watch the container go from healthy to unhealthy, and then… nothing happens. The Docker Engine reports the status. It does not act on it. restart: unless-stopped is triggered by the container exiting, not by it being marked unhealthy.
You can confirm what Docker actually thinks:
docker inspect –format=‘{{json .State.Health}}’ <container> | jq
docker inspect –format=‘{{json .State.Health}}’ <container> | jq
You will see the status, the streak of failures, and the last few probe outputs—useful information that is silently ignored by the engine.
There are three answers to this:
Run an autoheal sidecar. The community standard is willfarrell/docker-autoheal: a tiny container that mounts the Docker socket, watches for unhealthy events, and restarts the offending container. You opt containers in by labeling them autoheal=true (or set AUTOHEAL_CONTAINER_LABEL=all to monitor everything).
Run on Docker Swarm. Swarm restarts unhealthy tasks by default. If you are already considering Swarm, this is one of the better reasons.
Use Distr. Every Distr Docker agent deploys an adapted autoheal service alongside it. The “Enable autoheal for all containers” toggle is on by default at deployment-target creation, so customer-side restarts of unhealthy containers happen without anyone configuring it.
Whichever path you pick, the takeaway is the same: a HEALTHCHECK without something acting on it is a status light, not a self-healing system.
Pinning Docker Images by Digest Instead of :latest
Docker tags are mutable references. myapp:1.4 today is whatever the registry currently has under that tag; tomorrow it can point at a different layer set after a re-push. :latest is the worst offender because everyone treats it as a synonym for “stable” when in practice it often means “whatever was pushed most recently.” It is also the silent default: an unqualified image: nginx in a Compose file is treated as image: nginx:latest, so even Compose files that never type the word land on it by accident. The result, in production, is that two hosts pulling the “same” tag five minutes apart can end up running different code.
The fix is to pin by content-addressable digest. Every image has one, and Docker accepts it anywhere a tag would go.
To find the digest for an image you already pulled:
docker image inspect –format=‘{{index .RepoDigests 0}}’ myapp:1.4# myapp@sha256:9b7c…
docker image inspect –format=‘{{index .RepoDigests 0}}’ myapp:1.4
# myapp@sha256:9b7c…
Or, without pulling, from the local Docker installation against the remote registry:
docker buildx imagetools inspect myapp:1.4
docker buildx imagetools inspect myapp:1.4
In your Compose file, replace the tag with the digest:
services: app: image: myapp@sha256:9b7c0a3e1f…
services:
app:
image: myapp@sha256:9b7c0a3e1f…
A pull against a digest fails fast if the registry no longer has those bytes, which is exactly what you want—silent drift becomes a loud error. The same image reference works in docker stack deploy, in docker run, and in Kubernetes manifests.
For the broader picture of what your customers can extract from a published image (and why image hygiene matters beyond reproducibility), check out our guide on protecting source code and IP in Docker and Kubernetes deployments. And if you’re still picking a registry, our container registry comparison walks through the trade-offs.
Why Mounting /var/run/docker.sock Is a Security Risk
A container with /var/run/docker.sock mounted can call the Docker API, and the Docker API can launch a privileged container that mounts the host’s root filesystem. In other words: any container with the socket has effectively root privileges on the host. This is not a Docker bug; it is the threat model of the socket. It deserves a moment of attention because the line that grants this access is one bind mount in a Compose file and is easy to add without thinking about it.
Practical hygiene:
Inventory the containers that mount the socket. Agents, CI runners, monitoring sidecars, container management UIs—keep the list short and intentional.
Run rootless Docker where possible. dockerd-rootless-setuptool.sh install sets up a Docker daemon that runs as a regular user. The blast radius of a compromised socket-mounting container shrinks from “full host” to “this user account.”
Consider socket-proxy. Projects like Tecnativa’s docker-socket-proxy expose a filtered subset of the API to the container that needs it (e.g. read-only containers and events for monitoring) instead of the full socket.
Keep socket-mounting images minimal. Smaller surface, fewer libraries, fewer ways in.
The Distr Docker agent does mount the socket—it has to, in order to orchestrate Compose and Swarm on the host. We document that boundary openly in the Docker agent docs so customer security teams can review it before installation. The agent authenticates to the Hub with a JWT, and the install secret is shown once and never stored.
Updating Docker Compose Deployments Across Customer Hosts
docker compose pull && docker compose up -d is a fine command if you are SSH’d into the host. At customer scale—dozens of self-managed environments behind firewalls, each with its own change-control process—that manual process doesn’t scale. Docker has no built-in mechanism to push a new manifest to a running host from somewhere else. Docker Hub webhooks can trigger a CI rebuild when an image is pushed, but they do not reach into a customer’s network and tell their docker compose to pull.
The usual workarounds and what they cost:
Watchtower: Polls the registry on a schedule, pulls new images, recreates containers. Easy to set up, hard to control. No staged rollout, no rollback path, limited visibility from your side—you find out a customer updated when they file a ticket.
Bastion + SSH + Ansible/scripts: Works for ten customers. Falls apart at fifty, especially when three of them are air-gapped and four run their own change-control cadence. Every operator has to live with shared keys and a maintenance window calendar.
A pull-based agent. This is the shape Distr lands on. The agent runs on the customer host, polls a known endpoint every 5 seconds, and reconciles the local Compose state against what the Hub says it should be. The agent reports status back, so you can see in your dashboard which customers are on which version. When the agent itself needs to update, it spawns a separate container to perform the swap so it is not trying to replace itself while running.
The pattern is not unique—Kubernetes operators and GitOps tools do the same thing—but Compose users routinely re-invent it badly. If you find yourself building one, at least give it rollback, status reporting, and a way to pin versions, or you will end up with a fleet that drifts in ways you cannot see.
The other thing worth noting: recurring scheduled jobs alongside the application have no native Compose answer either. If your stack includes anything like a nightly cleanup, a periodic report, or a heartbeat-style task, the in-app scheduler is one option, but you eventually run into the cases it can’t cover (cross-service jobs, jobs that should outlive a single container). For the three patterns I have seen survive customer deployments, check out our guide on Compose cron jobs.
Outgrowing Docker Compose: Kubernetes vs Swarm
If a single-node Compose deployment outgrows itself, the realistic next step for most teams is Kubernetes. The ecosystem is large, the operational patterns are well documented, and the talent pool to hire against actually exists. For the side-by-side, read our Docker Compose vs Kubernetes comparison.
Docker Swarm is the other option—it reuses the Compose YAML format, ships in the box, and solves a few of the quirks above directly (it restarts unhealthy tasks, rolls out updates with update_config, and treats secrets and configs as first-class objects). It is a real fit for some single-cluster, low-ceremony deployments.
The Distr agent supports both—the Hub records whether a deployment is Compose or Swarm, and the agent runs the matching docker compose up or docker stack deploy. If you do choose Swarm, read our routing and Traefik guide for Docker Swarm and the product walkthrough for distributing applications to Swarm for the details.
So, should you run plain Docker Compose in production?
Yes—plain Docker Compose still runs a lot of real production workloads in 2026, as long as you accept that “plain Compose” is shorthand for “Compose plus the operator practices it doesn’t enforce.” None of the quirks above are secret. They are all in Docker’s documentation, in GitHub issues that have been open for years, and in the war stories of every team that has run Compose in anger. What makes them dangerous is not the quirks themselves but the order in which you discover them: usually at 2 a.m., one at a time.
TL;DR:
Pass –remove-orphans on every compose up and compose down.
Cap container logs in daemon.json and prune images on a schedule. Be careful with docker volume prune.
Health checks do not heal. Run an autoheal sidecar, run on Swarm, or use an agent that bundles one.
Pin by @sha256:… digest. Treat tags as references, not contracts.
The socket is root. Inventory the containers that mount it; prefer rootless Docker.
Updates need an agent of some kind. Watchtower is fine for one host; not for a fleet.
When Compose stops being enough, Kubernetes is usually the right next step. Swarm is a narrower fit and worth picking eyes-open.
If you ship software to self-managed customers and you would rather not rebuild this list yourself, the Distr Docker agent handles all of the above on the customer side. The Docker agent documentation walks through the install, the socket model, the autoheal and image-cleanup defaults, and how the agent self-updates. The repository is on GitHub.
We ran a benchmark comparing two ways of letting an AI agent operate the same admin panel, with the goal of putting a price tag on vision agents (browser-use, computer-use).
Here is what we measured, what we had to change to make the vision agent work at all, and what changes when generating an API surface stops being a separate engineering project.
Why vision agents?
Vision agents are the default for letting AI agents operate web apps that don’t expose APIs. The alternative, writing an MCP or REST surface per app, is its own engineering project across the 20+ internal tools most teams have. Most teams default to vision agents not because they are better, but because the alternative is too expensive to build. The cost of the vision approach is treated as a fixed price.
We wanted to measure the price.
The setup
The test app is an admin panel for managing customers, orders, and reviews, modeled on the react-admin Posters Galore demo. Two agents target the same running app: one drives the UI via screenshots and clicks, the other calls the app’s HTTP endpoints directly. Same Claude Sonnet, same pinned dataset, same task. The interface is the only variable.
The task: find the customer named “Smith” with the most orders, locate their most recent pending order, accept all of their pending reviews, and mark the order as delivered. This touches three resources, requires filtering, pagination, cross-entity lookups, and both reads and writes. It is the shape of work a typical internal tool sees daily.
Path A: Vision agent. Claude Sonnet driving the UI via browser-use 0.12. Vision mode, taking screenshots and executing clicks.
Path B: API agent. Claude Sonnet with tool-use, calling the handlers the UI calls. Each tool maps to one or more event handlers on the app’s State, the same functions a button click would trigger. The agent gets the structured response back instead of a rendered page.
The vision agent couldn’t complete the task
We started by giving both agents the same six-sentence task above and seeing what happened.
The API agent completed it in 8 calls. It listed the customer’s reviews filtered by pending status, accepted each one, and marked the order as delivered. Both agents are calling into the same application logic; the API agent just reads the structured response directly instead of looking at a rendered page.
The vision agent, on the same prompt, found one of four pending reviews, accepted it, and moved on. It never paginated. The remaining three reviews were below the visible fold of the reviews page and the agent had no signal to scroll for them.
This is not a model problem. The vision agent was reasoning about a rendered page and had no signal that the page wasn’t showing everything. The API agent calls the same handler the UI calls, but the response includes the full result set the handler returned, not just the rows currently rendered. The agent reads “page 1 of 4 with 50 results per page” directly instead of having to interpret pagination controls from pixels.
With a 14-step walkthrough, it succeeded
To make the comparison apples-to-apples, we rewrote the vision prompt as an explicit UI walkthrough, naming the sidebar items, tabs, and form fields the agent should interact with at each step. Fourteen numbered instructions covering the navigation the agent had failed to figure out on its own.
With the walkthrough, the vision agent completed the task. It also ran for fourteen minutes and consumed about half a million input tokens.
The walkthrough is itself a finding. Each numbered instruction is engineering work that doesn’t show up in token counts but represents real cost. Anyone deploying a vision agent against an internal tool is either writing prompts at this level of specificity or accepting that the agent will silently miss work.
How we ran it
We ran the API path five times and the vision path three times. The vision path was capped at three trials because each run takes 14 – 22 minutes and consumes 400 – 750k tokens.
Variance was the most surprising part of the vision results. Across three trials the wall-clock time spanned 749s to 1257s, and input tokens spanned 407k to 751k. The agent took 43 cycles in the shortest run and 68 in the longest. The screenshot-reason-click loop has enough non-determinism that a single run is not a representative cost estimate.
The API path had no such variance. Sonnet hit identical 8 tool calls on every trial, with input token counts varying by ±27 across all five runs. The agent calls the same handlers in the same order because the structured responses give it no reason to deviate.
The full results
Numbers are mean ± sample standard deviation (n−1), with n=5 per API path and n=3 for the vision path. Full run details are available in the repo.
Numbers are mean ± sample standard deviation (n−1), with n=5 per API path and n=3 for the vision path. Full run details are available in the repo.
Haiku could not complete the vision path. The failure was specific to browser-use 0.12′s structured-output schema, which Haiku could not reliably produce in either vision or text-only mode. On the API path, Haiku finished in under 8 seconds for under 10k input tokens, which is the cheapest configuration we tested.
The structural gap
The cost difference follows directly from the architecture. An agent that must see in order to act will always pay for the seeing, regardless of how good the model gets. Better vision models reduce error rates per screenshot, but they do not reduce the number of screenshots required to reach the relevant data. Each render is a screenshot is thousands of input tokens.
Both agents in this benchmark walk through the same application logic. They both filter, paginate, and update the same way the UI does. The difference is what they read at each step. The vision agent reads pixels and has to render every intermediate state to interpret it. The API agent reads the structured response from the same handlers, which already contains the data the UI was going to display.
Better models will narrow the cost per step. They will not narrow the step count, because the step count is set by the interface.
How we justify the API engineering cost
The benchmark was made cheap to run by Reflex 0.9, which includes a plugin that auto-generates HTTP endpoints from a Reflex application’s event handlers. None of the structural argument depends on Reflex specifically, but it is what made running the API path possible without writing a second codebase.
The interesting question is what becomes possible when the engineering cost of an API surface drops to zero. Vision agents remain the right tool for applications you do not control: third-party SaaS products, legacy systems, anything you cannot modify. For internal tools you build yourself, the math now points the other way.
Notes
Vision results are specific to browser-use 0.12 in vision mode, and other vision agents may behave differently. The Path B runner shapes the auto-generated endpoints into a small REST tool surface of about thirty lines, which the agent sees as list_customers, update_order, and similar. The dataset is pinned and small (900 customers, 600 orders, 324 reviews), so behavior on production-scale data is not measured here. The vision agent runs through LangChain’s ChatAnthropic, and the API agent runs through the Anthropic SDK directly. Reported token counts are uncached input tokens.
Reproduce it
The repo includes seed data generation, the patched react-admin demo, both agent scripts, and raw results.
In a new legal battle in the AI space, Meta and CEO Mark Zuckerberg have been sued by five publishers and author Scott Turow, who allege the tech company illegally copied millions of books, articles and other works to train Meta’s artificial-intelligence systems.
“In their effort to win the AI ‘arms race’ and build a functional generative AI model, Defendants Meta and Zuckerberg followed their well-known motto: ‘move fast and break things,’” the plaintiffs say in their lawsuit. “They first illegally torrented millions of copyrighted books and journal articles from notorious pirate sites and downloaded unauthorized web scrapes of virtually the entire internet. They then copied those stolen fruits many times over to train Meta’s multibillion-dollar generative AI system called Llama. In doing so, Defendants engaged in one of the most massive infringements of copyrighted materials in history.”
The suit was filed Tuesday (May 5) in the U.S. District Court for the Southern District of New York by five publishers (Hachette, Macmillan, McGraw Hill, Elsevier and Cengage) and Turow individually. The proposed class-action suit seeks unspecific monetary damages for the alleged copyright infringement. A copy of the lawsuit is available at this link.
Asked for comment, a Meta spokesperson said, “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively.”
Authors have sued AI companies for copyright infringement before — and lost.
For example, in June 2025, a federal judge rejected a claim brought by 13 authors, including Sarah Silverman and Junot Díaz, that Meta violated their copyrights by training its AI model on their books. Judge Vincent Chhabria ruled that Meta had engaged in “fair use” when it used a data set of nearly 200,000 books to train its Llama language model for generative AI.
But the latest lawsuit alleges that Meta and Zuckerberg deliberately circumvented copyright-protection mechanisms — and had considered paying to license the works before abandoning that strategy at “Zuckerberg’s personal instruction.” The suit essentially argues that the conduct described falls outside protections afforded by fair-use provisions of the U.S. copyright code.
“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”
According to the lawsuit, after the release of Llama 1, Meta briefly considered entering into licensing deals with major publishers. Meta discussed increasing the company’s “dataset licensing” budget to as much as $200 million from January to April 2023, per the complaint.
But then in early April 2023, “Meta abruptly stopped its licensing strategy,” according to the lawsuit. “The question of whether to license or pirate [copyrighted material] moving forward was ‘escalated’ to Zuckerberg. After this escalation to Zuckerberg, Meta’s business development team received verbal instructions to stop licensing efforts. One Meta employee presciently described the rationale: ‘if we license once [sic] single book, we won’t be able to lean into the fair use strategy.’”
According to the lawsuit, Meta and Zuckerberg “are well aware of the market for licensing AI training materials.” Meta signed four licenses in 2022 with African-language book publishers for “a limited training set, and it subsequently reached licensing agreements with major news publishers including Fox News, CNN and USA Today,” the suit says.
On Dec. 13, 2023, Meta employees internally circulated a memo concerning the legal risks of using LibGen, a repository of copyrighted material that the Meta memo described as “a dataset we know to be pirated” and added that “we would not disclose use of Libgen datasets used to train,” per the suit. “Ultimately, however, those concerns went unheeded. Zuckerberg and other Meta executives authorized and directed the torrenting of over 267 TB of pirated material — equivalent to hundreds of millions of publications and many times the size of the entire print collection of the Library of Congress,” according to the lawsuit.
As a result of the alleged infringement, Meta’s AI system “readily generates, at speed and scale, substitutes for Plaintiffs’ and the Class’s works on which it was trained,” the lawsuit states. “Those substitutes take multiple forms, including verbatim and near-verbatim copies, replacement chapters of academic textbooks, summaries and alternative versions of famous novels and journal articles, inferior knockoffs that copy creative elements of original works, and derivative works exclusively reserved to rights holders. Llama even tailors outputs to mimic the expressive elements and creative choices of specific authors.”
Are people using AI, or is the organization learning from it? What changed because we spent those tokens? And who moves discoveries from individuals to teams to organizational capabilities?
Ethan Mollick has been writing about AI adoption in organizations for a while now. In Making AI Work: Leadership, Lab, and Crowd, he makes the point that individual productivity gains from AI do not automatically become organizational gains. People may get faster, write better, analyze more, automate more, or quietly become cyborg versions of themselves. The company may still learn almost nothing.
A lot of companies are now entering the phase where GitHub Copilot licenses are provisioned, ChatGPT Enterprise exists somewhere in the stack, Claude or Gemini or Cursor show up in pockets, and every team has at least one person who is much further along than the official enablement material assumes. Some of this is visible, yet much of it is not. Management sees license usage (“Where is the ROI for the 2 mio € we paid Anthropic last year?”), maybe prompt counts, maybe a survey, maybe a few internal PoCs that feel encouraging enough to put into a steering committee deck. In other companies, AI went straight to IT and died.
I think everyone knows this is the phase where it gets complicated, like, really complicated. The “messy middle” of AI adoption starts when AI use is everywhere, uneven, partially hidden, difficult to compare, and not yet connected to organizational learning.
Everyone has Copilot now
The first phase of AI adoption is (mostly) comfortable because it looks like other enterprise rollouts. You buy seats. You define acceptable use. You run training. You create a champion network. You ask people to share use cases in a Teams channel, which will briefly look alive and then become one more corporate attic full of good intentions.
The second phase is much stranger: one team uses Copilot as autocomplete and calls it a day. Another team runs Claude Code in tight loops, with tests, reviews, and constant steering. A product owner suddenly prototypes real software instead of mocking screens in Figma. A senior engineer delegates a root-cause analysis to an agent and comes back to the valid solution in under an hour; this would’ve taken him two weeks without AI. A junior person produces polished code but has no idea which architectural assumptions got smuggled into the system. A support team quietly turns recurring tickets into workflow automation, because they know exactly where the work hurts and nobody in the Center of Excellence ever asked the right question.
All of these things can happen in the same company at the same time. That is what makes the messy middle messy: the adoption unit is no longer the organization, and maybe not even the team. It is the loop inside the work!
Mollick’s Leadership, Lab, and Crowd frame is useful here. Leadership sets direction and permission, The Crowd discovers use cases because the Crowd does the actual work. The Lab turns those discoveries into shared practices, tools, benchmarks, and new systems. But the part I keep getting stuck on is the same one that shows up in agentic engineering again and again: how does the learning actually travel?
The old change machinery is too slow for this
Most companies will try to process AI adoption through the machinery they already have. Communities of practice, brown-bag sessions, champion networks, enablement decks, office hours, monthly demos, surveys, maybe a dashboard. Fair enough, I did it, you did it. Some of that helps, especially in organizations that still need permission to experiment at all.
But the interesting AI work does not wait for the next community meeting. It appears inside a code review, a sales proposal, a research task, a product prototype, a production incident, a test strategy, a compliance question. Or when someone figures out that for a certain class of product components, they can set up something close to a dark factory: write the intent, let the agent run a very loose loop, apply enough backpressure to keep it on track, evaluate the outcome against strong scenarios, refine the intent, and repeatedly get high-quality results. By the time the story is cleaned up enough to become a best-practice slide, the important learning has often lost its teeth. What made it useful was the friction: the missing context, the test that failed, the weird API behavior, the moment where the agent sprawled into nonsense and someone had to pull it back.
I have been thinking about this through the same lens as the elastic loop. AI collaboration is not one mode! It stretches from tight, synchronous co-driving to looser, asynchronous delegation. The adoption question is not simply “are people using AI?” It is whether teams know which loop size to use, where they need resistance, which artifacts should survive the loop, and how those artifacts become something the organization can learn from.
That is a much harder question than tool usage or bean (token) counting.
Scrum was built for expensive iteration
I argued that much of modern software process exists because human iteration used to be expensive. Sprint planning, estimation, standups, user stories, ticket grooming, handoffs, all the ceremony around coordination and risk reduction. Reasonable, given the constraints. If a single iteration takes days or weeks, you need structures that prevent people from wasting too many of them.
But agentic engineering changes the economics: It makes more options materializable! It lets teams move from intent to prototype to evaluation much faster. It lets product people see working software earlier. It lets engineers test more hypotheses before committing. It does not magically make delivery easy, but it moves the constraint away from implementation and toward intent, verification, judgment, and feedback.
The awkward thing is that many organizations spent twenty years calling themselves agile while preserving the organizational reflexes agile was supposed to remove. Now AI makes real agility more plausible, and the system still asks for two-week sprint commitments, handoff documents, and all the stuff that assumes iteration is scarce.
That is the ceremony graveyard again, but now at adoption level. The loop can move faster than the organization can metabolize what the loop learned.
The open bar will not stay open forever
There is another pressure building underneath all this. AI usage will become more visibly metered. The current enterprise feeling of “everyone has access, don’t worry too much about the bill” will not hold forever, at least not in the form people are getting used to. Model routing, token budgets, usage-contingent pricing, inference costs, governance around which model is allowed for which task: all of that will become more explicit as companies move from casual assistance to serious agentic work.
I do not want to make this a cost panic story, that would be the least interesting way to think about “rented intelligence”. The question is not how to minimize token spend in the abstract, any more than the question of software delivery was ever how to minimize keystrokes.
But the bill will force a better question: what changed because we spent those tokens?
Please, I beg you, don’t count pull requests. Better: Which loops closed faster? Which decisions improved? Which root-cause analyses got sharper? Which reviews caught more? Which teams learned reusable patterns? Which product ideas were killed earlier because a prototype made the weakness obvious? Where did AI create learning, and where did it just create more output?
Token-to-output is the old measurement reflex in a new costume. Token-to-learning is closer to the thing that matters.
Loop Intelligence is the missing feedback path
I keep coming back to three capabilities companies will need in the messy middle.
Agent Operations: which agents and AI tools are running, what systems they can touch, which data they can see, which actions require approval, where identity, audit, permissions, and runtime visibility live. This is the control side, and it matters because agentic work eventually touches real systems.
Loop Intelligence: which AI-assisted (or fully agentic) loops actually produce learning, which ones stay open, which ones decay, where agents create leverage, where they sprawl into side quests, which teams are stuck in tight supervision because they lack tests, context, or intuition. Which teams are ready for looser delegation.
Agent Capabilities: how useful capabilities get distributed across the organization without pretending that three monolithic agents can do everyone’s work. AI is starting to behave more like a fluid base technology than a single application category. It does not fit cleanly into one “HR agent,” one “engineering agent,” one “sales agent,” each sitting somewhere in the enterprise zoo. The better question is how capabilities flow into the places where work happens: employee harnesses, background agents, product teams, platform services, local skills, MCP servers, evaluation suites, runbooks, examples, and domain-specific procedures.
This is where the platform question gets interesting. Who owns these capabilities? How does a useful agent skill discovered in one team become available to others without turning into a dead template? How do you enrich a developer’s harness differently from a product person’s harness, a support team’s background agent, or a compliance workflow? Which capabilities belong close to the team, which belong in a platform layer, and which should never be generalized because the local context is the whole point?
One without the others gets weird quickly. Agent Operations without Loop Intelligence becomes control bureaucracy. Loop Intelligence without Agent Capabilities becomes an analytics layer that discovers useful patterns but has no way to feed them back into work. Agent Capabilities without Operations and Loop Intelligence becomes tool sprawl with better branding. We can all have nice charts these days, no need to ask the IT department to build a dashboard anymore, right?
The control path, the learning path, and the capability path have to meet somewhere.
That somewhere is what I have been calling a feedback harness internally. I am not sure I like the term for customers. It sounds too much like something from an architecture diagram, and customers do not buy harnesses because the mechanism is elegant, even if it’s the thing of the year. They buy confidence, better decisions, faster learning, less waste, safer delegation.
So the more useful customer-facing concept might be a Loop Intelligence Hub.
A feedback harness listens to real work loops: tasks, prompts, specifications, reviews, scenarios, accepted and rejected hypotheses, production signals, rework, human decisions and interventions. Not to watch people, but to understand the loop. A first version does not have to be a giant platform. Pick a few real workflows, instrument the points where intent, agent work, verification, and human decision already leave traces, collect enough qualitative feedback to understand why a loop worked or failed, and turn that into a recurring learning artifact.
A Loop Intelligence Hub turns those signals into something the organization can act on: an enablement backlog, a capability radar, investment briefs, governance gaps, reusable workflows, training needs, evaluation priorities. No one-size-fits-all dashboards, customized to what’s relevant. The interesting output is not the dashboard anyway. It is the decision that follows: this team needs better backpressure before it can delegate more (stretch the loop), this product group has a repeatable dark-factory pattern for a narrow class of components, this compliance workflow needs a governed tool boundary, this skill should move into the platform because five teams have reinvented it badly.
The harness collects and the hub helps the organization decide. The capability layer feeds the learning back into work.
This cannot become employee surveillance
The whole thing dies if it turns into employee scoring.
If people believe the organization is measuring whether they used enough AI, they will game the signals. If they believe every experiment becomes a productivity expectation, they will hide the experiments. If they believe their best workflow will simply become their new baseline workload, they will keep it private. The company will get the worst possible version of adoption: visible compliance and invisible learning.
This is why the honest intent (not just the framing) is really important here. The useful question can’t be “who uses AI enough?” but: where did AI change the work in a way the organization can learn from? Which loops became healthier? Which teams need better backpressure before they can delegate more? Where does a product team need a different environment because prototypes are becoming real software?
You can write policies about this, and you probably should. But governance, like learning, only becomes real through use. Once the agent touches production-adjacent work, once a product person prototypes instead of specifying, once a developer delegates root-cause analysis, once token spend becomes large enough that management wants answers, the organization discovers whether it built a learning system or just bought a lot of seats.
The messy middle is not a phase to survive
The first phase of AI adoption was about access. Who gets the tools, who has permission, who negotiates the contracts, who can try the latest model without filing a procurement ticket. That phase still matters, but it will not differentiate for long. Access to frontier intelligence can be rented. Operational control and organizational learning cannot be rented in the same way.
The next advantage is learning velocity.
Who finds the real patterns faster? Who moves discoveries from individuals to teams to organizational capabilities? Who builds backpressure into agentic loops, so agents can’t sprawl? Who distributes useful agent capabilities without turning them into monolithic enterprise agents that fit nobody? Who finally uses agentic engineering to make agile real, instead of just slapping AI onto the old ceremonies?
Nobody has this figured out yet, I certainly do not. I have been iterating on the elastic loop for months, and every customer conversation, every internal discussion, every strange example from real work reshapes it again. That is the point! We will not understand this shift by waiting for a definitive adoption playbook from a vendor, a consultant, or an AI lab. We will understand it by instrumenting the work, sharing the messy learnings, letting others poke holes, and iterating in the open.
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.