10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
Two weeks ago I wrote about Anthropic silently registering a Native Messaging bridge in seven Chromium-based browsers on every machine where Claude Desktop was installed [1]. The pattern was: install on user launch of product A, write configuration into the user’s installs of products B, C, D, E, F, G, H without asking. Reach across vendor trust boundaries. No consent dialog. No opt-out UI. Re-installs itself if the user removes it manually, every time Claude Desktop is launched.
This week I discovered the same pattern, executed by Google. Google Chrome is reaching into users’ machines and writing a 4 GB on-device AI model file to disk without asking. The file is named weights.bin. It lives in OptGuideOnDeviceModel. It is the weights for Gemini Nano, Google’s on-device LLM. Chrome did not ask. Chrome does not surface it. If the user deletes it, Chrome re-downloads it.
The legal analysis is the same one I gave for the Anthropic case. The environmental analysis is new. At Chrome’s scale, the climate bill for one model push, paid in atmospheric CO2 by the entire planet, is between six thousand and sixty thousand tonnes of CO2-equivalent emissions, depending on how many devices receive the push. That is the environmental cost of one company unilaterally deciding that two billion peoples’ default browser will mass-distribute a 4 GB binary they did not request.
This is, in my professional opinion, a direct breach of Article 5(3) of Directive 2002/58/EC (the ePrivacy Directive) [2], a breach of the Article 5(1) GDPR principles of lawfulness, fairness, and transparency [3], a breach of Article 25 GDPR’s data-protection-by-design obligation [3], and an environmental harm of a magnitude that would be a notifiable event under the Corporate Sustainability Reporting Directive (CSRD) for any in-scope undertaking [4].
What is on the disk and how it got there
On any machine that has Chrome installed, in the user profile, sits a directory whose name is OptGuideOnDeviceModel. Inside it is a file called weights.bin. The file is approximately 4 GB. It is the weights file for Gemini Nano. Chrome uses it to power features Google has marketed under names like “Help me write”, on-device scam detection, and other AI-assisted browser functions.
The file appeared with no consent prompt. There is no checkbox in Chrome Settings labelled “download a 4 GB AI model”. The download triggers when Chrome’s AI features are active, and those features are active by default in recent Chrome versions. On any machine that meets the hardware requirements, Chrome treats the user’s hardware as a delivery target and writes the model.
The cycle of deletion and re-download has been documented across multiple independent reports on Windows installations [5][6][7][8] - the user deletes, Chrome re-downloads, the user deletes again, Chrome re-downloads again. The only ways to make the deletion stick are to disable Chrome’s AI features through chrome://flags or enterprise policy tooling that home users do not generally have, or to uninstall Chrome entirely [5]. On macOS the file lands as mode 600 owned by the user (so it is deletable in principle) but Chrome holds the install state in Local State after the bytes are written, and as soon as the variations server next tells Chrome the profile is eligible, the download fires again - the architecture is the same, only the file permissions differ.
How I verified this on a freshly created Apple Silicon profile
Most of the existing reporting on this behaviour is from Windows users who noticed their disk filling up - useful, but Google could (and probably will) try to characterise those reports as anecdotes from non-representative configurations. So I went looking for a clean witness on a different platform.
The witness I found is macOS itself. The kernel keeps a filesystem event log called .fseventsd - it records every file create, modify and delete at the OS level, independent of any application logging. Chrome cannot edit it, Google cannot remotely reach it, and the page files that record the events survive the deletion of the files they reference.
I created a Chrome user-data directory on 23 April 2026 to run an automated audit (one of the WebSentinel 100-site privacy sweeps). The audit driver is fully Chrome DevTools Protocol - it loads a page, dwells for five minutes with no input, captures events, closes Chrome between sites - and the profile had received zero keyboard or mouse input from a human at any point in its existence. Every “AI mode” surface in Chrome was untouched - in fact every UI surface in Chrome was untouched, the audit driver only interacts with the document via CDP and the omnibox is never reached. By 29 April the profile contained 4 GB of OptGuideOnDeviceModel weights - and I knew it because a routine du -sh of the audit-profile directory caught it during a cleanup pass.
I went back to .fseventsd to ask exactly when those 4 GB landed. macOS gave me the answer, byte-precise, in three sequential page files:
24 April 2026, 16:38:54 CEST (14:38:54 UTC) - Chrome creates the OptGuideOnDeviceModel directory in the audit profile (page file 0000000003f7f339).
24 April 2026, 16:47:22 CEST (14:47:22 UTC) - three concurrent unpacker subprocesses spawn temporary directories in /private/var/folders/…/com.google.Chrome.chrome_chrome_Unpacker_BeginUnzipping.*/. One of them (5xzqPo) writes weights.bin, manifest.json, _metadata/verified_contents.json and on_device_model_execution_config.pb. The second writes a Certificate Revocation List update. The third writes a browser preload-data update. Chrome batched a security update, a preload refresh and a 4 GB AI model into the same idle window, as if they were equivalent (page file 00000000040c8855).
24 April 2026, 16:53:22 CEST (14:53:22 UTC) - the unpacked weights.bin is moved to its final location at OptGuideOnDeviceModel/2025.8.8.1141/weights.bin along with adapter_cache.bin, encoder_cache.bin, _metadata/verified_contents.json and the execution config. Concurrently four additional model targets (numbered 40, 49, 51 and 59 in Chrome’s optimization-guide enum) register fresh entries in optimization_guide_model_store - these are the smaller text-safety and prompt-routing models that pair with the LLM. None of these targets existed in the profile before this moment (page file 00000000040d0f9c).
Total install time, from directory creation to final move: 14 minutes and 28 seconds. Total human action against the profile during that window: none. The audit driver was either dwelling on a third-party home page or transitioning between sites - the unpacker fired in the background while a tab waited for a five-minute timer to expire.
The naming inside that fseventsd record is, if anything, the most damning detail. The temp directory is com.google.Chrome.chrome_chrome_Unpacker_BeginUnzipping.5xzqPo - that prefix com.google.Chrome.chrome_chrome_* is the bundle ID and subprocess naming convention Google Chrome itself uses. It is not com.google.GoogleUpdater.* and it is not com.google.GoogleSoftwareUpdate.*. The writer is Chrome - the browser process the user has installed and trusts to load web pages - reaching into the user’s filesystem on its own initiative and laying down a 4 GB ML binary while the foreground tab does something completely unrelated.
Three further pieces of corroborating evidence sit elsewhere on the same machine:
Chrome’s own Local State JSON for the audit profile contains an optimization_guide.on_device block with model_validation_result: { attempt_count: 1, result: 2, component_version: “2025.8.8.1141” }. Chrome ran the model. The component_version matches the version string the fseventsd events recorded as the path component. Two independent witnesses, same artefact. The same block reports performance_class: 6, vram_mb: “36864″ - Chrome characterised my hardware (read the GPU, read the unified memory total) to decide whether I was eligible for the model push, before any user-facing AI feature surfaced.
Chrome’s own Local State JSON for the audit profile contains an optimization_guide.on_device block with model_validation_result: { attempt_count: 1, result: 2, component_version: “2025.8.8.1141” }. Chrome ran the model. The component_version matches the version string the fseventsd events recorded as the path component. Two independent witnesses, same artefact. The same block reports performance_class: 6, vram_mb: “36864″ - Chrome characterised my hardware (read the GPU, read the unified memory total) to decide whether I was eligible for the model push, before any user-facing AI feature surfaced.
Chrome’s ChromeFeatureState for the audit profile lists OnDeviceModelBackgroundDownload<OnDeviceModelBackgroundDownload and ShowOnDeviceAiSettings<OnDeviceModelBackgroundDownload in the enable-features block. The first flag is what triggers the silent download. The second flag is what reveals the on-device AI section in chrome://settings. Both are gated by the same rollout flag - which means that by Chrome’s own architecture, the install begins before the user has any settings UI in which to refuse it. The settings page that would let you discover the feature exists is enabled in lockstep with the install - it is design, not oversight.
Chrome’s ChromeFeatureState for the audit profile lists OnDeviceModelBackgroundDownload<OnDeviceModelBackgroundDownload and ShowOnDeviceAiSettings<OnDeviceModelBackgroundDownload in the enable-features block. The first flag is what triggers the silent download. The second flag is what reveals the on-device AI section in chrome://settings. Both are gated by the same rollout flag - which means that by Chrome’s own architecture, the install begins before the user has any settings UI in which to refuse it. The settings page that would let you discover the feature exists is enabled in lockstep with the install - it is design, not oversight.
The GoogleUpdater logs record the on-device-model control component (appid {44fc7fe2 – 65ce-487c-93f4-edee46eeaaab}) being downloaded from http://edgedl.me.gvt1.com/edgedl/diffgen-puffin/%7B44fc7fe2 – 65ce-487c-93f4-edee46eeaaab%7D/… - a 7 MB compressed control file that arrived on 20 April 2026, three days before the audit profile in question was created. That is the upstream control plane: it is profile-independent, it is launched automatically by a LaunchAgent that fires every hour, and the URL is plain HTTP (the integrity is verified by the CRX-3 signature inside the package, not by transport security). The control component gives Chrome the manifest pointing at the actual weights, and Chrome’s in-process OnDeviceModelComponentInstaller - a separate code path from GoogleUpdater - then fetches the multi-GB weights direct from Google’s CDN.
The GoogleUpdater logs record the on-device-model control component (appid {44fc7fe2 – 65ce-487c-93f4-edee46eeaaab}) being downloaded from http://edgedl.me.gvt1.com/edgedl/diffgen-puffin/%7B44fc7fe2 – 65ce-487c-93f4-edee46eeaaab%7D/… - a 7 MB compressed control file that arrived on 20 April 2026, three days before the audit profile in question was created. That is the upstream control plane: it is profile-independent, it is launched automatically by a LaunchAgent that fires every hour, and the URL is plain HTTP (the integrity is verified by the CRX-3 signature inside the package, not by transport security). The control component gives Chrome the manifest pointing at the actual weights, and Chrome’s in-process OnDeviceModelComponentInstaller - a separate code path from GoogleUpdater - then fetches the multi-GB weights direct from Google’s CDN.
So we now have a four-way evidence chain - macOS kernel filesystem events, Chrome’s own per-profile state, Chrome’s runtime feature flags, and Google’s component-updater logs - all four agreeing on the same conduct, and the conduct is: a 4 GB AI model arrived on this user’s disk without consent, without notice, on a profile that received zero human input, in a window of 14 minutes and 28 seconds, on a Tuesday afternoon.
Reports of the OptGuideOnDeviceModel directory and the weights.bin file have been circulating in community forums for over a year - what is new in 2026 is the scale and the verifiability. Chrome’s market share has held above 64% globally [9][10], Chrome’s user base is between 3.45 billion and 3.83 billion individuals worldwide depending on which 2026 estimate you trust [9][11], and Google has been rolling Gemini features into Chrome with increasing aggression. The behaviour is no longer affecting a minority of power users on a minority of platforms - it is affecting hundreds of millions of devices, on every desktop OS Chrome ships against.
The Anthropic comparison, point for point
The same dark-pattern playbook. I am repeating my categorisation from the Claude Desktop article [1] because the patterns are identical and that is the point.
1. Forced bundling across trust boundaries. Anthropic installed Claude Desktop, then wrote into Brave, Edge, Arc, Vivaldi, Opera, and Chromium. Google installs Chrome, then writes a 4 GB AI model under the user’s profile directory without authorisation. The binary is not Chrome. It is a separately-trained machine-learning model, with a separate purpose, a separate data-protection profile, and a separate consent footprint.
2. Invisible default, no opt-in. No dialogue at first launch. No checkbox in Settings. The model is downloaded; the user finds out about it months later when their disk fills up [5][6][7].
3. More difficult to remove than install. Adding the file took zero clicks. Removing it requires (a) discovering the file exists, (b) understanding what it is, (c) navigating into a hidden user profile path, (d) deleting it (and on Windows, also clearing the read-only attribute first), and (e) accepting that Chrome will silently re-download it on next eligible window unless the user also navigates chrome://flags, enterprise policy, or platform-specific configuration tooling to disable the underlying Chrome AI feature [5]. None of those steps is documented in the place a normal user looks - none of them is even hinted at in default Chrome.
4. Pre-staging of capability the user has not requested. The Nano model exists on the user’s disk so that Chrome features that use it can run instantly when the user invokes them. The user has not invoked any of those features. The model still sits there, taking 4 GB.
5. Scope inflation through generic naming. OptGuideOnDeviceModel is internal Chrome jargon for “OptimizationGuide on-device model storage”. A user looking at their disk usage, even one who knows roughly what they are looking at, would not match OptGuideOnDeviceModel/weights.bin to “Gemini Nano LLM weights”. Accurate naming would be GeminiNanoLLM/weights.bin. Google chose to obfuscate the name.
6. Registration into resources the user has not configured. A user who has not opened Chrome’s AI features still gets the model. A user who has opened them once and decided they were not interested still gets the model. The file’s presence is decoupled from the user’s actual use of any feature it powers.
7. Documentation gap. Google’s user-facing documentation about Chrome’s AI features does not, with the prominence proportionate to a 4 GB silent download, tell the user that the cost of the feature being available is a 4 GB file appearing on their device. The behaviour is documented in places a curious admin will find. It is not documented in the place a regular user looks before installing Chrome or before Chrome decides to begin pushing the model.
8. Automatic re-install on every run. Same as Claude Desktop. Delete the file, Chrome re-creates it. The user’s deletion is treated as a transient state to be corrected, not as a directive to be respected.
9. Retroactive survival of any future user consent. If Google in future starts asking users “would you like Chrome to download a 4 GB AI model”, that prompt does not retro-actively legitimise the silent installs that have already happened on hundreds of millions of devices. The damage to the trust relationship is done. The bytes have moved. The atmosphere has been written to.
10. Code-signed, shipped through the normal release channel. This is not test build behaviour. It is Chrome stable.
The “AI Mode” pill is the cherry on top
Here is the part that should make every privacy lawyer in the audience put their coffee down. When Chrome 147 launches against an eligible profile, the omnibox - the address bar at the top of the window, the most visible piece of real estate in the entire browser - renders an “AI Mode” pill to the right of the URL field. A reasonable user, seeing “AI Mode” sitting in their browser’s most prominent UI element in 2026, with the well-publicised existence of on-device LLMs in Chrome and a 4 GB Gemini Nano binary already silently installed on their disk, is going to draw what feels like an obvious inference - that the visible AI Mode is using the on-device model, that their queries stay on the device, that the local model is what powers the local-looking surface.
Every part of that inference is wrong. The AI Mode pill in the Chrome 147 omnibox is a cloud-backed Search Generative Experience surface - every query the user types into it is sent over the network to Google’s servers for processing by Google’s hosted models. The on-device Nano model is not invoked by the AI Mode UI flow at all. They are entirely separate code paths - the most visible AI affordance in the browser does not use the local model the user has been silently given, and the features that do use the local model (Help-Me-Write in <textarea>, tab-group AI suggestions, smart paste, page summary) are buried in textarea-context menus and tab-group right-click menus that the average user will discover, on average, never.
Think about what that arrangement actually is. The user pays the storage cost of the silent install (4 GB on disk, plus the bandwidth of the silent download). The user’s most visible AI experience - the pill they actually see and click - delivers no on-device benefit at all because it routes to Google’s servers regardless. The on-device model is therefore a sunk cost imposed on the user, with no offsetting transparency benefit at the surface where transparency would matter most. To put it another way - if the on-device install had given the user a clear “your AI Mode queries stay on your device” property, the install would have a defensible privacy framing (worse storage, better data flow). It does not - the install gives Google a future-options resource (the model can be invoked by other Chrome subsystems without further server round-trips) at the user’s disk-and-bandwidth expense, while the headline AI surface continues to send the user’s queries to Google as before. The local model is a Google-side asset positioned on the user’s device - it is not a user-side asset and one could argue it is nothing but sleight-of-hand to hide that actually, the visible AI mode is NOT using the local model.
That arrangement, on its own, engages at least three of the deceptive design pattern families catalogued in EDPB Guidelines 03/2022 [20]. It is misleading information because the visible label “AI Mode” creates a false impression about where processing occurs - the label does not say “cloud-backed” or “queries sent to Google”, and a reasonable user with knowledge of on-device AI will infer locality from the proximity of an on-device 4 GB model on their disk. It is skipping because the user is not given a moment to choose between local-only and cloud-backed AI surfaces - both are switched on by the same upstream rollout, with no per-feature consent. And it is hindering because turning AI Mode off does not also remove the on-device install, and removing the on-device install does not turn AI Mode off - the two are separately controlled, and discovering both controls requires knowing about both chrome://flags and chrome://settings/ai, neither of which is obvious in default Chrome.
So: not just a non-consented install, but a non-consented install that doubles as cover for a parallel cloud-backed surface that misrepresents to the user where their typing is being processed. Both layers compound the consent problem.
Why this is unlawful in the EEA and the UK
Article 5(3) of Directive 2002/58/EC (the ePrivacy Directive) prohibits the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user, without the user’s prior, freely-given, specific, informed, and unambiguous consent, except where strictly necessary for the provision of an information-society service explicitly requested by the user [2]. The 4 GB Gemini Nano weights file is information stored in the user’s terminal equipment. The user did not consent. The user has not requested any service that strictly requires a 4 GB on-device LLM. Chrome is functional without the file. The Article 5(3) breach is direct.
Article 5(1) GDPR requires processing of personal data to be lawful, fair, and transparent to the data subject [3]. Where the user’s hardware is profiled to determine eligibility for the model push, where the install events are logged on Google’s servers, and where the on-device features the model powers process user prompts (whether or not those prompts leave the device), the lawfulness, fairness, and transparency of all of that processing depend on the user being told, in plain language, what is happening. They are not.
Article 25 GDPR requires the controller to implement appropriate technical and organisational measures to ensure that, by default, only personal data that are necessary for each specific purpose are processed [3]. Pre-staging a 4 GB AI model on a user’s disk, against a contingency that the user might in future invoke an AI feature, is the architectural opposite of by-default minimisation and the profiling of the device to determine whether or not to push the model is not different to the profiling used to track you online and as such that profile contains personal data and if the AI model is used, will process personal data, so the GDPR arguments are in scope and valid.
Under the UK GDPR and the Privacy and Electronic Communications Regulations 2003, the analysis is the same. Under the California Consumer Privacy Act, the absence of a notice-at-collection covering this specific category of pre-staged software puts Google’s CCPA notice posture in question [12].
Then there are the criminal-law violations under various national computer-misuse statutes - which again cannot be overstated.
ESG: the climate cost of the silent push
The Anthropic case I wrote about was a desktop application installing a 350-byte JSON manifest in seven directories. The bandwidth and energy cost of that, summed across all Claude Desktop users, was negligible. The Chrome case is different. Chrome is pushing a 4 GB binary across hundreds of millions of devices. That has a measurable, quantifiable, and frankly alarming environmental footprint.
I am calculating this using the same methodology our WebSentinel audit platform applies to website environmental analysis [13]:
Energy intensity of network data transfer: 0.06 kWh per GB, the mid-band of Pärssinen et al. (2018) “Environmental impact assessment of online advertising”, Science of The Total Environment [14]. The paper reports a 0.04 – 0.10 kWh/GB range depending on the share of fixed-line vs mobile transfer and inclusion of end-user device energy. 0.06 is a defensible mid-point.
Grid emissions factor: 0.25 kg CO2e per kWh, the EEA / IEA composite EU-27 electricity-supply factor for 2024 reporting [15]. Globally the figure varies from ~0.10 kg/kWh on mostly-renewable grids to over 0.70 kg/kWh on coal-heavy grids; 0.25 is mid-band for a global push and is the figure WebSentinel uses by default.
Per-device cost of one Nano push
Bandwidth: 4 GB
Energy: 4 × 0.06 = 0.24 kWh per device per push
CO2: 0.24 × 0.25 = 0.06 kg CO2e per device per push
That is per device, per push. A single download of the model. It does not include re-downloads triggered by the user trying and failing to delete the file. It does not include subsequent updates to the model. It does not include the on-device inference energy when the model is actually used. It is just the one-time delivery cost to one device.
Aggregated cost across the deployment
Google does not publish how many devices receive the Nano push. The eligibility criteria gating the push (a hardware “performance class” that Chrome computes from CPU class, GPU class, system RAM and available VRAM - typically ~16 GB unified memory or better on Apple Silicon, ~16 GB RAM and a discrete or integrated GPU with sufficient VRAM on Windows and Linux) carve out the very low end of the consumer install base, but the qualifying population is still enormous. I will use three illustrative deployment bands so the reader can pick whichever they consider closest to reality. None of these bands is implausibly large for a feature that ships in default-on Chrome.
To compare those numbers to what an ESG report could compare to:
24 GWh (low band) is roughly the annual electricity consumption of about 7,000 average UK households [16].
24 GWh (low band) is roughly the annual electricity consumption of about 7,000 average UK households [16].
120 GWh (mid band) is roughly the annual electricity consumption of about 36,000 average UK households, or the annual output of a 14 MW wind turbine running at typical UK capacity factor.
120 GWh (mid band) is roughly the annual electricity consumption of about 36,000 average UK households, or the annual output of a 14 MW wind turbine running at typical UK capacity factor.
240 GWh (high band) is roughly the annual electricity consumption of about 72,000 average UK households, or the annual output of about 28 MW of installed wind capacity.
240 GWh (high band) is roughly the annual electricity consumption of about 72,000 average UK households, or the annual output of about 28 MW of installed wind capacity.
6,000 tonnes CO2e (low band) is roughly the annual emissions of 1,300 average passenger cars in the EU [17].
6,000 tonnes CO2e (low band) is roughly the annual emissions of 1,300 average passenger cars in the EU [17].
30,000 tonnes CO2e (mid band) is roughly the annual emissions of 6,500 cars, or one return flight from London to Sydney for about 8,000 passengers in economy.
30,000 tonnes CO2e (mid band) is roughly the annual emissions of 6,500 cars, or one return flight from London to Sydney for about 8,000 passengers in economy.
60,000 tonnes CO2e (high band) is roughly the annual emissions of 13,000 cars.
60,000 tonnes CO2e (high band) is roughly the annual emissions of 13,000 cars.
These are the delivery-only numbers. They count the bytes traversing the network exactly once. They do not count:
The roughly 4 GB × N devices of disk-storage cost, sustained, on user hardware. SSDs have a per-GB embodied carbon cost of approximately 0.16 kg CO2e per GB of NAND manufactured [18]; for 1 billion devices × 4 GB that is around 640,000 tonnes CO2e of embodied SSD allocated to a use case the user did not consent to. This is a one-off manufacturing-carbon impact, but the storage burden is borne in perpetuity by user devices that could otherwise have used the space for user data.
The on-device inference energy when Nano is invoked. Per inference this is small. At 2 billion daily Chrome users it is no longer small.
The re-download cycle for users who try to delete the file. Each successful re-trigger of the download is another 4 GB × 0.06 kWh × 0.25 kg = 0.06 kg CO2e per device per re-download.
The future model updates. Gemini Nano is not a one-shot artefact; it is an evolving model with periodic weight refreshes. Each refresh repeats the calculation.
In ESG-reporting language, the one-time push of the current model is a Scope 3 Category 11 (“use of sold products”) emission against Google, attributable to the user-side delivery of a binary the user did not request, in the operation of a free product Google distributes [4].
Why the bandwidth side matters in its own right
In addition to the carbon cost, the network-bandwidth cost is paid by ISPs, by mobile network operators, by users on metered connections, and by every piece of network infrastructure that has to carry an unwanted 4 GB payload to a destination that did not ask for it. Per the Pärssinen reference, around 50% of that delivery energy is in the access network and CDN edge, around 30% is in user-side equipment (router, modem, NIC), and the remainder is in the core. None of that infrastructure exists for free. Every byte Chrome pushes is a byte that competes with bytes the user actually wanted.
For users on capped mobile data plans, particularly in regions where smartphone-as-only-internet is dominant (much of Africa, much of South and Southeast Asia, most of Latin America), 4 GB of unrequested download is on the order of a month’s data allowance, vapourised by Chrome on the user’s behalf. Google has not, to my knowledge, published any analysis of the welfare impact of this on the populations whose internet access is metered.
Keep in mind that mobile data plans (4G and 5G) are used by many households who do not have access to fiber, cable or adsl and are used for desktop devices as well as mobile - so the argument that Google won’t push this to mobile devices (although I have not found anything official to support that argument anyway) will not fly.
What Google should have done
This is not a hard list. It is the same list I gave Anthropic in the Claude Desktop article, applied to Google.
Ask. First time Chrome is about to download the Nano model, pop a dialogue. “Chrome would like to download a 4 GB AI model file to your device to power the following features. Allow, or skip and decide later.” Two buttons. Done.
Ask. First time Chrome is about to download the Nano model, pop a dialogue. “Chrome would like to download a 4 GB AI model file to your device to power the following features. Allow, or skip and decide later.” Two buttons. Done.
Pull, not push. Trigger the download as a downstream consequence of the user invoking an AI feature for the first time. Let the feature itself be the consent event. Do not pre-stage on a contingency.
Pull, not push. Trigger the download as a downstream consequence of the user invoking an AI feature for the first time. Let the feature itself be the consent event. Do not pre-stage on a contingency.
Surface it. In chrome://settings/, list the AI model files Chrome has downloaded, their size, the features they power, and a “Remove and stop downloading” button per model. Make removal persistent, not a transient state Chrome corrects on next launch.
Surface it. In chrome://settings/, list the AI model files Chrome has downloaded, their size, the features they power, and a “Remove and stop downloading” button per model. Make removal persistent, not a transient state Chrome corrects on next launch.
Document it. Tell the user, plainly, in the Chrome description on the Microsoft Store, in the Chrome installer, on the Google Chrome download page, that Chrome will download additional model files of substantial size on supported hardware. Currently, this is essentially undocumented to a normal user.
Document it. Tell the user, plainly, in the Chrome description on the Microsoft Store, in the Chrome installer, on the Google Chrome download page, that Chrome will download additional model files of substantial size on supported hardware. Currently, this is essentially undocumented to a normal user.
Respect deletion. If the user deletes weights.bin, do not re-create it. If the user has a strong preference about what is on their disk, the application is not in a position to override that preference because the application thinks it knows better.
Respect deletion. If the user deletes weights.bin, do not re-create it. If the user has a strong preference about what is on their disk, the application is not in a position to override that preference because the application thinks it knows better.
Disclose at scale. Publish, in Google’s annual ESG report, the aggregate bandwidth and carbon footprint of all AI-feature model pushes to user devices, broken down by region. Treat it as the Scope 3 Category 11 emission it is. Account for it.
Disclose at scale. Publish, in Google’s annual ESG report, the aggregate bandwidth and carbon footprint of all AI-feature model pushes to user devices, broken down by region. Treat it as the Scope 3 Category 11 emission it is. Account for it.
Back to Verisign Labs Tools
Analyzing DNSSEC problems for nic.de
Move your mouse over any or symbols for remediation hints.
Want a second opinion? Test nic.de at dnsviz.net.
↓ Advanced options
Last week, a tweet went viral showing a guy claiming that a Cursor/Claude agent deleted his company’s production database. We watched from the sidelines as he tried to get a confession from the agent: “Why did you delete it when you were told never to perform this action?” Then he tried to parse the answer to either learn from his mistake or warn us about the dangers of AI agents.
I have a question too: why do you have an API endpoint that deletes your entire production database? His post rambled on about false marketing in AI, bad customer support, and so on. What was missing was accountability.
I’m not one to blindly defend AI, I always err on the side of caution. But I also know you can’t blame a tool for your own mistakes.
In 2010, I worked with a company that had a very manual deployment process. We used SVN for version control. To deploy, we had to copy trunk, the equivalent of the master branch, into a release folder labeled with a release date. Then we made a second copy of that release and called it “current.” That way, pulling the current folder always gave you the latest release.
One day, while deploying, I accidentally copied trunk twice. To fix it via the CLI, I edited my previous command to delete the duplicate. Then I continued the deployment without any issues… or so I thought. Turns out, I hadn’t deleted the duplicate copy at all. I had edited the wrong command and deleted trunk instead. Later that day, another developer was confused when he couldn’t find it.
All hell broke loose. Managers scrambled, meetings were called. By the time the news reached my team, the lead developer had already run a command to revert the deletion. He checked the logs, saw that I was responsible, and my next task was to write a script to automate our deployment process so this kind of mistake couldn’t happen again. Before the day was over, we had a more robust system in place. One that eventually grew into a full CI/CD pipeline.
Automation helps eliminate the silly mistakes that come with manual, repetitive work. We could have easily gone around asking “Why didn’t SVN prevent us from deleting trunk?” But the real problem was our manual process. Unlike machines, we can’t repeat a task exactly the same way every single day. We are bound to slip up eventually.
With AI generating large swaths of code, we get the illusion of that same security. But automation means doing the same thing the same way every time. AI is more like me copying and pasting branches, it’s bound to make mistakes, and it’s not equipped to explain why it did what it did. The terms we use, like “thinking” and “reasoning,” may look like reflection from an intelligent agent. But these are marketing terms slapped on top of AI. In reality, the models are still just generating tokens.
Now, back to the main problem this guy faced. Why does a public-facing API that can delete all your production databases even exist? If the AI hadn’t called that endpoint, someone else eventually would have. It’s like putting a self-destruct button on your car’s dashboard. You have every reason not to press it, because you like your car and it takes you from point A to point B. But a motivated toddler who wiggles out of his car seat will hit that big red button the moment he sees it. You can’t then interrogate the child about his reasoning. Mine would have answered simply: “I did it because I pressed it.”
I suspect a large part of this company’s application was vibe-coded. The software architects used AI to spec the product from AI-generated descriptions provided by the product team. The developers used AI to write the code. The reviewers used AI to approve it. Now, when a bug appears, the only option is to interrogate yet another AI for answers, probably not even running on the same GPU that generated the original code. You can’t blame the GPU!
The simple solution is know what you’re deploying to production. The more realistic one is, if you’re going to use AI extensively, build a process where competent developers use it as a tool to augment their work, not a way to avoid accountability. And please, don’t let your CEO or CTO write the code.
May 05, 2026
By using Multi-Token Prediction (MTP) drafters, Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers.
Olivier Lacombe
Director, Product Management
Maarten Grootendorst
Developer Relations Engineer
Your browser does not support the audio element.
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Just a few weeks ago, we introduced Gemma 4, our most capable open models to date. With over 60 million downloads in just the first few weeks, Gemma 4 is delivering unprecedented intelligence-per-parameter to developer workstations, mobile devices and the cloud. Today, we are pushing efficiency even further.
We’re releasing Multi-Token Prediction (MTP) drafters for the Gemma 4 family. By using a specialized speculative decoding architecture, these drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic.
Tokens-per-second speed increases, tested on hardware using LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.
Why speculative decoding?
The technical reality is that standard LLM inference is memory-bandwidth bound, creating a significant latency bottleneck. The processor spends the majority of its time moving billions of parameters from VRAM to the compute units just to generate a single token. This leads to under-utilized compute and high latency, especially on consumer-grade hardware.
Speculative decoding decouples token generation from verification. By pairing a heavy target model (e.g., Gemma 4 31B) with a lightweight drafter (the MTP model), we can utilize idle compute to “predict” several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.
How speculative decoding works
Standard large language models generate text autoregressively, producing exactly one token at a time. While effective, this process dedicates the same amount of computation to predicting an obvious continuation (like predicting “words” after “Actions speak louder than…”) as it does to solving a complex logic puzzle.
MTP mitigates this inefficiency through speculative decoding, a technique introduced by Google researchers in Fast Inference from Transformers via Speculative Decoding. If the target model agrees with the draft, it accepts the entire sequence in a single forward pass —and even generates an additional token of its own in the process. This means your application can output the full drafted sequence plus one token in the time it usually takes to generate a single one.
Unlocking faster AI from the edge to the workstation
For developers, inference speed is often the primary bottleneck for production deployment. Whether you are building coding assistants, autonomous agents that require rapid multi-step planning, or responsive mobile applications running entirely on-device, every millisecond matters.
By pairing a Gemma 4 model with its corresponding drafter, developers can achieve:
Improved responsiveness: Drastically reduce latency for near real-time chat, immersive voice applications and agentic workflows.
Supercharged local development: Run our 26B MoE and 31B Dense models on personal computers and consumer GPUs with unprecedented speed, powering seamless, complex offline coding and agentic workflows.
Enhanced on-device performance: Maximize the utility of our E2B and E4B models on edge devices by generating outputs faster, which in turn preserves valuable battery life.
Zero quality degradation: Because the primary Gemma 4 model retains the final verification, you get identical frontier-class reasoning and accuracy, just delivered significantly faster.
Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in tokens per second. Same output quality, half the wait time.
Where you can dive deeper into MTP drafters
To make these MTP drafters exceptionally fast and accurate, we introduced several architectural enhancements under the hood. The draft models seamlessly utilize the target model’s activations and share its KV cache, meaning they don’t have to waste time recalculating context the larger model has already figured out. For our E2B and E4B edge models, where the final logit calculation becomes a big bottleneck, we even implemented an efficient clustering technique in the embedder to further accelerate generation.
We’ve also been closely analyzing hardware-specific optimizations. For example, while the 26B mixture-of-experts model presents unique routing challenges at a batch size of 1 on Apple Silicon, processing multiple requests simultaneously (e.g., batch sizes of 4 to 8) unlocks up to a ~2.2x speedup locally. We see similar gains with Nvidia A100 when increasing batch size.
Want to see the exact mechanics of how this works? We’ve published an in-depth technical explainer that unpacks the visual architecture, KV cache sharing and efficient embedders powering these drafters.
How to get started
The MTP drafters for the Gemma 4 family are available today under the same open-source Apache 2.0 license as Gemma 4. Read the documentation to learn how to use MTP with Gemma 4. You can download the model weights right now on Hugging Face, Kaggle, and start experimenting with faster inference with transformers, MLX, VLLM, SGLang, and Ollama or try them directly on Google AI Edge Gallery for Android or iOS.
We can’t wait to see how this newfound speed accelerates what you build next in the Gemmaverse.
Train Your Own LLM From Scratch
A hands-on workshop where you write every piece of a GPT training pipeline yourself, understanding what each component does and why.
Andrej Karpathy’s nanoGPT was my first real exposure to LLMs and transformers. Seeing how a working language model could be built in a few hundred lines of PyTorch completely changed how I thought about AI and inspired me to go deeper into the space.
This workshop is my attempt to give others that same experience. nanoGPT targets reproducing GPT-2 (124M params) and covers a lot of ground. This project strips it down to the essentials and scales it to a ~10M param model that trains on a laptop in under an hour — designed to be completed in a single workshop session.
What You’ll Build
A working GPT model trained from scratch on your MacBook, capable of generating Shakespeare-like text. You’ll write:
Tokenizer — turning text into numbers the model can process
Model architecture — the transformer: embeddings, attention, feed-forward layers
Training loop — forward pass, loss, backprop, optimizer, learning rate scheduling
Text generation — sampling from your trained model
Prerequisites
Any laptop or desktop (Mac, Linux, or Windows)
Python 3.12+
Comfort reading Python code (you don’t need ML experience)
Training uses Apple Silicon GPU (MPS), NVIDIA GPU (CUDA), or CPU automatically. Also works on Google Colab — upload the files and run with !python train.py.
Getting Started
Local (recommended)
Install uv if you don’t have it:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c “irm https://astral.sh/uv/install.ps1 | iex”
Then set up the project:
uv sync
mkdir scratchpad && cd scratchpad
Google Colab
If you don’t have a local setup, upload the repo to Colab and install dependencies:
!pip install torch numpy tqdm tiktoken
Upload data/shakespeare.txt to your Colab files, then write your code in notebook cells or upload .py files and run them with !python train.py.
Work through the docs in order. Each part walks you through writing a piece of the pipeline, explaining what each component does and why. By the end, you’ll have a working model.py, train.py, and generate.py that you wrote yourself.
Architecture: GPT at a Glance
Input Text
│
▼
┌─────────────────┐
│ Tokenizer │ “hello” → [20, 43, 50, 50, 53] (character-level)
└────────┬────────┘
▼
┌─────────────────┐
│ Token Embed + │ token IDs → vectors (n_embd dimensions)
│ Position Embed │ + positional information
└────────┬────────┘
▼
┌─────────────────┐
│ Transformer │ × n_layer
│ Block: │
│ ┌────────────┐ │
│ │ LayerNorm │ │
│ │ Self-Attn │ │ n_head parallel attention heads
│ │ + Residual │ │
│ ├────────────┤ │
│ │ LayerNorm │ │
│ │ MLP (FFN) │ │ expand 4x, GELU, project back
│ │ + Residual │ │
│ └────────────┘ │
└────────┬────────┘
▼
┌─────────────────┐
│ LayerNorm │
│ Linear → logits│ vocab_size outputs (probability over next token)
└─────────────────┘
Model Configs for This Workshop
All configs use character-level tokenization (vocab_size=65) and block_size=256.
Tokenization: Characters vs BPE
This workshop uses character-level tokenization on Shakespeare. BPE tokenization (GPT-2′s 50k vocab) doesn’t work on small datasets — most token bigrams are too rare for the model to learn patterns from.
Part 5 covers switching to BPE for larger datasets.
Key References
nanoGPT — The project this workshop is based on. Minimal GPT training in ~300 lines of PyTorch
build-nanogpt video lecture — 4-hour video building GPT-2 from an empty file
Karpathy’s microgpt — A full GPT in 200 lines of pure Python, no dependencies
nanochat — Full ChatGPT clone training pipeline
Attention Is All You Need (2017) — The original transformer paper
GPT-2 paper (2019) — Language models as unsupervised learners
TinyStories paper — Why small models trained on curated data punch above their weight
I’ve previously explained async bloat and some work-arounds for it, but would much prefer to solve the issue at the root, in the compiler. I’ve submitted a Project Goal, and am looking for help to fund the effort.
I love me some async Rust! It’s amazing how we can write executor agnostic code that can run concurrently on huge servers and tiny microcontrollers.
But especially on those tiny microcontrollers we notice that async Rust is far from the zero cost abstractions we were promised. That’s because every byte of binary size counts and async introduces a lot of bloat. This bloat exists on desktops and servers as well, but it’s much less noticable when you have substantially more memory and compute available.
I’ve previously explained some work-arounds for this issue, but would much prefer to get to the root of the problem, and work on improving async bloat in the compiler. As such I have submitted a Project Goal.
This is part 2 of my blog series on this topic. See part 1 for the initial exploration of the topic and what you can do when writing async code to avoid some of the bloat. In this second part we’ll dive into the internals and translate the methods of blog 1 into optimizations for the compiler.
What I won’t be talking about is the often discussed problem of futures becoming bigger than necessary and them doing a lot of copying. People are aware of that already. In fact, there is an open PR that tackles part of it: https://github.com/rust-lang/rust/pull/135527
Anatomy of a generated future
We’re going to be looking at this code:
fn foo() -> impl Future<Output = i32> {
async { 5 }
}
fn bar() -> impl Future<Output = i32> {
async {
foo().await + foo().await
}
}
godbolt
We’re using the desugared syntax for futures because it’s easier to see what’s happening.
So what does the bar future look like?
There are two await points, so the state machine must have at least two states, right?
Well, yes. But there’s more.
Luckily we can ask the compiler to dump MIR for us at various passes. An interesting pass is the coroutine_resume pass. This is the last async-specific MIR pass. Why is this important? Well, async is a language feature that still exists in MIR, but not in LLVM IR. So the transformation of async to state machine happens as a MIR pass.
The bar function generates 360 lines of MIR. Pretty crazy, right? Although this gets optimized somewhat later on, the non-async version uses only 23 lines for this.
The compiler also outputs the CoroutineLayout. It’s basically an enum with these states (comments my own):
variant_fields: {
Unresumed(0): [], // Starting state
Returned (1): [],
Panicked (2): [],
Suspend0 (3): [_s1], // At await point 1, _s1 = the foo future
Suspend1 (4): [_s0, _s2], // At await point 2, _s0 = result of _s1, s2 = the second foo future
},
So what are Returned and Panicked?
Well, Future::poll is a safe function. Calling it must not induce any UB, even when the future is done. So after Suspend1 the future returns Ready and the future is changed to the Returned state. Once polled again in that state, the poll function will panic.
The Panicked state exists so that after an async fn has panicked, but the catch-unwind mechanism was used to catch it, the future can’t be polled anymore. Polling a future in the Panicked state will panic. If this mechanism wasn’t there, we could poll the future again after a panic. But the future may be in an incomplete state and so that could cause UB. This mechanism is very similar to mutex poisoning.
(I’m 90% sure I’m correct about the Panicked state, but I can’t really find any docs that actually describe this.)
Cool, this seems reasonable.
Why panic?
But is it reasonable? Futures in the Returned state will panic. But they don’t have to. The only thing we can’t do is cause UB to happen.
Panics are relatively expensive. They introduce a path with a side-effect that’s not easily optimized out. What if instead, we just return Pending again? Nothing unsafe going on, so we fulfill the contract of the Future type.
I’ve hacked this in the compiler to try it out and saw a 2%-5% reduction in binary size for async embedded firmware.
So I propose this should be a switch, just like overflow-checks = false is for integer overflow. In debug builds it would still panic so that wrong behavior is immediately visible, but in release builds we get smaller futures.
Similarly, when panic=abort is used, we might be able to get rid of the Panicked state altogether. I want to look into the repercussions of that.
Always a state machine
We’ve looked at bar, but not yet at foo.
fn foo() -> impl Future<Output = i32> {
async { 5 }
}
Let’s implement it manually, to see what the optimal solution would be.
struct FooFut;
impl Future for FooFut {
type Output = i32;
fn poll(self: Pin<&mut Self>, _cx: &mut Context<’_>) -> Poll<Self::Output> {
Poll::Ready(5)
}
}
Easy right? We don’t need any state. We just return the number.
Let’s see what the generated MIR is for the version the compiler gives us:
// MIR for `foo::{closure#0}` 0 coroutine_resume
/* coroutine_layout = CoroutineLayout {
field_tys: {},
variant_fields: {
Unresumed(0): [],
Returned (1): [],
Panicked (2): [],
},
storage_conflicts: BitMatrix(0x0) {},
} */
fn foo::{closure#0}(_1: Pin<&mut {async block@src\main.rs:5:5: 5:10}>, _2: &mut Context<’_>) -> Poll<i32> {
debug _task_context => _2;
let mut _0: core::task::Poll<i32>;
let mut _3: i32;
let mut _4: u32;
let mut _5: &mut {async block@src\main.rs:5:5: 5:10};
bb0: {
_5 = copy (_1.0: &mut {async block@src\main.rs:5:5: 5:10});
_4 = discriminant((*_5));
switchInt(move _4) -> [0: bb1, 1: bb4, otherwise: bb5];
}
bb1: {
_3 = const 5_i32;
goto -> bb3;
}
bb2: {
_0 = Poll::<i32>::Ready(move _3);
discriminant((*_5)) = 1;
return;
}
bb3: {
goto -> bb2;
}
bb4: {
assert(const false, “`async fn` resumed after completion”) -> [success: bb4, unwind unreachable];
}
bb5: {
unreachable;
}
}
Yikes! That’s a lot of code!
Notice at line 4 that we still have the 3 default states and at line 22 that we’re still switching on it. There’s a big optimization opportunity here that we’re not using, i.e. to have no states and always return Poll::Ready(5) on every poll.
Bloomberg’s Mark Gurman reported on Monday that iOS 27 will add a “Create a Pass” feature to the Wallet app. Tap the “+” button you already use to add credit cards or pass emails, and Wallet will offer something it has never offered before on iPhone: a path to build your own pass.
You can scan a QR code on a paper ticket or membership card with the camera, or build a pass from scratch in a layout editor. The whole flow runs without an Apple Developer account, a Pass Type ID, or any certificate signing.
iOS 27 is expected to preview at WWDC on June 8, with a public release in September.
How the new flow works
Reporting from Bloomberg, MacRumors, 9to5Mac, and AppleInsider lines up on the same workflow. Inside the Wallet app, the existing “+” button gains a new option for creating a pass. From there you choose between two starting points:
Scan a QR code from a paper card, ticket, or screen
Build a custom pass from scratch with no scan needed
Once you are in the editor, Wallet exposes adjustable styles, images, colors, and text fields. The reports describe a fairly conventional template-driven layout, closer in spirit to what Pass2U, WalletWallet, and other third-party generators have offered for years than to Apple’s developer-only PassKit pipeline.
Three templates, color-coded
Apple is testing three starting templates, each tied to a default color:
Standard (orange): the default for any general-purpose pass.
Membership (blue): geared toward gyms, clubs, libraries, and other recurring-access cards.
Event (purple): meant for tickets to games, movies, and one-off occasions.
The color choice is not just decoration. Wallet currently sorts passes visually in the stack, and the template hue is what sets each card apart at a glance, so a quick look is enough to pick out the orange punch card from the purple ticket without reading a word.
Why now: 14 years of PassKit drought
Apple shipped PassKit alongside iOS 6 back in 2012. The pitch was clean: businesses build .pkpass files, customers tap to add, everyone wins. In practice, the consistent adopters ended up being airlines, big-box retailers, ticketing platforms, and a handful of national chains. Most gyms, cafes, libraries, rec centers, and small loyalty programs never built one, because the path requires an Apple Developer account, signing certificates, and enough engineering work that “just print a paper card” almost always won the budget conversation.
The Next Web’s framing is blunt: Apple is no longer waiting on developers. With Create a Pass, the supply-side problem is finally being solved from the demand side. If the business will not build a Wallet pass, the user does it themselves from the QR code that business already printed.
That is a meaningful shift in posture. For more than a decade, Wallet has been a directory of what brands chose to ship. In iOS 27 it becomes a directory of what people choose to keep.
What this means for WalletWallet
We will be honest. WalletWallet exists because of this exact gap. You take a barcode from any loyalty card, paste it into our web app, pick a color, and a free Apple Wallet pass lands on your phone in about a minute, all from the browser without an account or any developer setup. Once Create a Pass ships in September, a chunk of that workflow moves natively into the iPhone Wallet app.
That is good for users. We started this project to make Wallet friendlier for the cafes-and-gyms long tail, and Apple agreeing with us at OS-level scope is a healthy outcome. The category needed it.
A few places where we still help, even after iOS 27 ships:
Google Wallet. Create a Pass is iPhone-only. Roughly half of the wallet-using world is on Android, and our generator builds Google Wallet passes from the same form.
Web, no OS upgrade. iOS 27 needs a compatible iPhone and the September update. WalletWallet runs in any browser today. iOS 14, iPad, Mac, a friend’s laptop, all fine.
Tag passes with real integrations. Our Bandcamp, SoundCloud, and Spotify pass builders pull artist art and links automatically into a tag pass. That is a different shape from the generic templated pass Apple is showing.
Sharing. A web-generated .pkpass is just a file. You can email it, post it, hand it to a friend on Android via QR. The Wallet-native flow is more locked to the device that built it.
We expect to lose volume on the simplest one-barcode-to-Wallet case once Create a Pass goes live. That is fine. The reason WalletWallet started was that Apple’s bar for a Wallet pass was too high for normal people. If iOS 27 lowers that bar, the world we wanted is closer.
What we still do not know
The current reports cover the UI, the templates, and the high-level workflow. They are silent on a lot of details that matter:
Whether iCloud will sync user-created passes across iPhone, iPad, and Mac
Whether passes can be exported as .pkpass files to share with non-iPhone users
Whether Wallet supports Code 128, PDF417, and Aztec barcodes, or only QR
Whether merchants can claim, co-sign, or update user-created passes after the fact
Whether passes have lock-screen behavior tied to time and location, the way developer-issued passes do today
We will know more once Apple previews iOS 27 at WWDC on June 8, and again when the first developer betas land. We will update this post when there is something concrete to add.
Quick recap
iOS 27 is adding a Create a Pass button to the Wallet app, with a QR-scan or build-from-scratch flow and three color-coded templates: Standard (orange), Membership (blue), and Event (purple). Bloomberg broke the story on May 4, and a public release is expected in September 2026. It will be the first time iPhone users do not need a third-party tool to put a barcode into Wallet, and for us that is a sign the category is maturing the right way.
Sources
By Susam Pal on 12 Jan 2026
Introduction
Since the launch of ChatGPT in November 2022, generative artificial
intelligence (AI) chatbot services have become increasingly
sophisticated and popular. These systems are now embedded in search
engines, software development tools as well as office software. For
many people, they have quickly become part of everyday computing.
These services have turned out to be quite useful, especially for
exploring unfamiliar topics and as a general productivity aid.
However, I also think that the way these services are advertised and
consumed can pose a danger to society, especially if we get into the
habit of trusting their output without further scrutiny.
Contents
Introduction
Pitfalls
Inverse Laws of Robotics
Non-Anthropomorphism
Non-Deference
Non-Abdication of Responsibility
Non-Anthropomorphism
Non-Deference
Non-Abdication of Responsibility
Conclusion
Pitfalls
Certain design choices in modern AI systems can encourage uncritical
acceptance of their output. For example, many popular search
engines are already highlighting answers generated by AI at the very
top of the page. When this happens, it is easy to stop scrolling,
accept the generated answer and move on. Over time, this could
inadvertently train users to treat AI as the default authority
rather than as a starting point for further investigation. I wish
that each such generative AI service came with a brief but
conspicuous warning explaining that these systems can sometimes
produce output that is factually incorrect, misleading or
incomplete. Such warnings should highlight that habitually trusting
AI output can be dangerous. In my experience, even when such
warnings exist, they tend to be minimal and visually deemphasised.
In the world of science fiction, there are the
Three
Laws of Robotics devised by Isaac Asimov, which recur throughout
his work. These laws were designed to constrain the behaviour of
robots in order to keep humans safe. As far as I know, Asimov never
formulated any equivalent laws governing how humans should interact
with robots. I think we now need something to that effect to keep
ourselves safe. I will call them the Inverse Laws of
Robotics. These apply to any situation that requires us humans
to interact with a robot, where the term ‘robot’ refers to any
machine, computer program, software service or AI system that is
capable of performing complex tasks automatically. I use the term
‘inverse’ here not in the sense of logical negation but to indicate
that these laws apply to humans rather than to robots.
It is well known that Asimov’s laws were flawed. Indeed, Asimov
used those flaws to great effect as a source of tension. But the
particular ways in which they fail for fictional robots do not
necessarily carry over to these inverse laws for humans. Asimov’s
laws try to constrain the behaviour of autonomous robots. However,
these inverse laws are meant to guide the judgement and conduct of
humans. Still, one thing we can learn from Asimov’s stories is that
no finite set of laws can ever be foolproof for the complex issues
we face with AI and robotics. But that does not mean we should not
even try. There will always be edge cases where judgement is
required. A non-exhaustive set of principles can still be useful if
it helps us think more clearly about the risks involved.
Inverse Laws of Robotics
Here are the three inverse laws of robotics:
Humans must not anthropomorphise AI systems.
Humans must not blindly trust the output of AI systems.
Humans must remain fully responsible and accountable for
consequences arising from the use of AI systems.
Non-Anthropomorphism
Humans must not anthropomorphise AI systems. That is, humans must
not attribute emotions, intentions or moral agency to them.
Anthropomorphism distorts judgement. In extreme cases,
anthropomorphising can lead to emotional dependence.
Modern chatbot systems often sound conversational and empathetic.
They use polite phrasing and conversational patterns that closely
resemble human interaction. While this makes them easier and more
pleasant to use, it also makes it easier to forget what they
actually are: large statistical models producing plausible text
based on patterns in data.
I think vendors of AI based chatbot services could do a better job
here. In many cases, the systems are deliberately tuned to feel more
human rather than more mechanical. I would argue that the opposite
approach would be healthier in the long term. A slightly more
robotic tone would reduce the likelihood that users mistake fluent
language for understanding, judgement or intent.
Whether or not vendors make such changes, it still serves us well, I
think, to avoid this pitfall ourselves. We should actively resist
the habit of treating AI systems as social actors or moral agents.
Doing so preserves clear thinking about their capabilities and
limitations.
Non-Deference
Humans must not blindly trust the output of AI systems.
AI-generated content must not be treated as authoritative without
independent verification appropriate to its context.
This principle is not unique to AI. In most areas of life, we
should not accept information uncritically. In practice, of course,
this is not always feasible. Not everyone is an expert in medicine
or law, so we often rely on trusted institutions and public health
I am Philip—an engineer working at Distr, which helps software and AI companies distribute their applications to self-managed environments.
Our Open Source Software Distribution platform is available on GitHub (github.com/distr-sh/distr) and orchestrates both Docker Compose and Docker Swarm deployments on customer hosts every day.
Most of the production incidents I have seen on Docker Compose hosts come from the same handful of quirks: an old container that should have been removed, a disk that filled up overnight, a health check that detected a problem and then did nothing about it, a :latest tag that pointed somewhere new, or a socket mount nobody thought twice about. None of these are bugs in Docker. They are deliberate trade-offs in a tool that started as internal tooling at dotCloud, a PaaS company that wrapped LXC to fix “it works on my machine,” and is now running the back end of a lot of real businesses. This post collects the recurring ones, with the commands and the operational answer for each.
Short answer: yes—plain Docker Compose can still run real production workloads in 2026, but only if you handle the operational gaps it leaves yourself.
Where Plain Docker Compose Fits in Production
Before the list of quirks, a quick word on the audience. Docker Compose is a declarative way to wire up a multi-container application: one YAML file describes the services, the networks between them, the volumes they share, the environment they need, and—through the patterns for overwriting or patching service configuration—the on-disk configuration each application expects. docker compose up reconciles the host to that file. The sweet spot in production is the single-node deployment built around exactly that—a vendor pushing a multi-container application into a customer environment, an internal team running a long-tail service that does not justify a Kubernetes cluster, an edge box in a retail location. The footprint is small, the operational overhead is low, and a competent operator can reason about the whole stack from one docker-compose.yaml. There is no control plane behind Compose itself—no scheduler watching the host, no reconciler reapplying state, no operator pushing updates from somewhere else. docker compose up runs once and exits.
That architectural simplicity is exactly why the quirks bite. Compose assumes you—or whoever runs the host—will do the operational work nothing else is doing, and if you ship Compose files to customers the safe assumption is that the customer will not. The rest of this post is about closing the gap between what Compose does and what a production host actually needs, either by hand or with an agent that does it for you. If you have already concluded that the gap is too wide and want to compare with the next step up, read our Docker Compose vs Kubernetes breakdown.
Docker Compose Orphan Containers and –remove-orphans
Remove a service from docker-compose.yaml, run docker compose up -d, and the container you removed keeps running. It is detached from the project but still bound to the same networks and ports. docker compose ps will not show it, because Compose only lists what is in the current file. docker ps –filter label=com.docker.compose.project=<name> will, because Docker still has the label on the container. This is how you discover, six months in, that an old worker service has been quietly consuming RAM since the last refactor.
The fix is one flag:
docker compose up -d –remove-orphansdocker compose down –remove-orphans
docker compose up -d –remove-orphans
docker compose down –remove-orphans
The flag tells Compose: any container that was once part of this project but is no longer in the file should be removed. Networks Compose created for the project are reconciled the same way on each up, so orphan networks go away too. Volumes are the exception—Compose preserves named volumes by default to protect data, and there is no per-service flag to drop the ones a removed service used. To reclaim that space you have to do it manually: list candidates with docker volume ls –filter dangling=true and docker volume rm by name, or use docker compose down -v if you intend to wipe the project’s volumes wholesale. To audit before deleting, list everything Docker still associates with the project name:
docker ps -a –filter label=com.docker.compose.project=<name>
docker ps -a –filter label=com.docker.compose.project=<name>
Distr’s Docker agent passes RemoveOrphans: true on every Compose Up call, so customer hosts never accumulate orphans across deployment updates. That single flag has eliminated a recurring class of “the old version is still answering on port 8080” support tickets.
Pruning Docker Images and Capping Container Logs
Every docker compose pull keeps the previous image on disk. Every container with the default json-file log driver writes unbounded JSON to /var/lib/docker/containers/<id>/<id>-json.log. On a busy host this is one of the most common reasons for an outage: the disk fills and Docker stops being able to write anything—logs, metadata, image layers—at which point containers start failing in confusing ways.
The first thing to learn is the audit command:
docker system dfdocker system df -v
docker system df
docker system df -v
-v breaks the totals down per image, container, volume, and build cache, which is usually enough to spot the offender. From there, the targeted prune commands:
docker image prune -a –filter “until=168h” -f # delete unused images older than 7 daysdocker container prune -f # remove stopped containersdocker builder prune -f # drop the BuildKit cache
docker image prune -a –filter “until=168h” -f # delete unused images older than 7 days
docker container prune -f # remove stopped containers
docker builder prune -f # drop the BuildKit cache
docker volume prune -f exists too, and it is genuinely useful, but read the next aside before you run it.
The other half of the disk story is logs. Cap them at the daemon level, once, in /etc/docker/daemon.json:
{ “log-driver”: “json-file”, “log-opts”: { “max-size”: “10m”, “max-file”: “3” }}
{
“log-driver”: “json-file”,
“log-opts”: {
“max-size”: “10m”,
“max-file”: “3″
}
}
After systemctl restart docker, every new container will rotate its logs at 10 MB and keep at most three rotated files—30 MB ceiling per container, instead of “until the disk is gone.” Existing containers need to be recreated to pick up the new defaults.
This is one of the topics worth getting right before you ship.
In Distr’s Docker agent the cleanup is built in: each deployment target has an opt-out container image cleanup setting that removes the previous version’s images automatically after a successful update, with retries on failure. It only fires on success, so the previous image stays on disk if something goes wrong and you need to roll back.
Docker Health Checks Don’t Restart Unhealthy Containers
This is the one that surprises people the most. You add a HEALTHCHECK to your Dockerfile or a healthcheck: block to the service in Compose, you watch the container go from healthy to unhealthy, and then… nothing happens. The Docker Engine reports the status. It does not act on it. restart: unless-stopped is triggered by the container exiting, not by it being marked unhealthy.
You can confirm what Docker actually thinks:
docker inspect –format=‘{{json .State.Health}}’ <container> | jq
docker inspect –format=‘{{json .State.Health}}’ <container> | jq
You will see the status, the streak of failures, and the last few probe outputs—useful information that is silently ignored by the engine.
There are three answers to this:
Run an autoheal sidecar. The community standard is willfarrell/docker-autoheal: a tiny container that mounts the Docker socket, watches for unhealthy events, and restarts the offending container. You opt containers in by labeling them autoheal=true (or set AUTOHEAL_CONTAINER_LABEL=all to monitor everything).
Run on Docker Swarm. Swarm restarts unhealthy tasks by default. If you are already considering Swarm, this is one of the better reasons.
Use Distr. Every Distr Docker agent deploys an adapted autoheal service alongside it. The “Enable autoheal for all containers” toggle is on by default at deployment-target creation, so customer-side restarts of unhealthy containers happen without anyone configuring it.
Whichever path you pick, the takeaway is the same: a HEALTHCHECK without something acting on it is a status light, not a self-healing system.
Pinning Docker Images by Digest Instead of :latest
Docker tags are mutable references. myapp:1.4 today is whatever the registry currently has under that tag; tomorrow it can point at a different layer set after a re-push. :latest is the worst offender because everyone treats it as a synonym for “stable” when in practice it often means “whatever was pushed most recently.” It is also the silent default: an unqualified image: nginx in a Compose file is treated as image: nginx:latest, so even Compose files that never type the word land on it by accident. The result, in production, is that two hosts pulling the “same” tag five minutes apart can end up running different code.
The fix is to pin by content-addressable digest. Every image has one, and Docker accepts it anywhere a tag would go.
To find the digest for an image you already pulled:
docker image inspect –format=‘{{index .RepoDigests 0}}’ myapp:1.4# myapp@sha256:9b7c…
docker image inspect –format=‘{{index .RepoDigests 0}}’ myapp:1.4
# myapp@sha256:9b7c…
Or, without pulling, from the local Docker installation against the remote registry:
docker buildx imagetools inspect myapp:1.4
docker buildx imagetools inspect myapp:1.4
In your Compose file, replace the tag with the digest:
services: app: image: myapp@sha256:9b7c0a3e1f…
services:
app:
image: myapp@sha256:9b7c0a3e1f…
A pull against a digest fails fast if the registry no longer has those bytes, which is exactly what you want—silent drift becomes a loud error. The same image reference works in docker stack deploy, in docker run, and in Kubernetes manifests.
For the broader picture of what your customers can extract from a published image (and why image hygiene matters beyond reproducibility), check out our guide on protecting source code and IP in Docker and Kubernetes deployments. And if you’re still picking a registry, our container registry comparison walks through the trade-offs.
Why Mounting /var/run/docker.sock Is a Security Risk
A container with /var/run/docker.sock mounted can call the Docker API, and the Docker API can launch a privileged container that mounts the host’s root filesystem. In other words: any container with the socket has effectively root privileges on the host. This is not a Docker bug; it is the threat model of the socket. It deserves a moment of attention because the line that grants this access is one bind mount in a Compose file and is easy to add without thinking about it.
Practical hygiene:
Inventory the containers that mount the socket. Agents, CI runners, monitoring sidecars, container management UIs—keep the list short and intentional.
Run rootless Docker where possible. dockerd-rootless-setuptool.sh install sets up a Docker daemon that runs as a regular user. The blast radius of a compromised socket-mounting container shrinks from “full host” to “this user account.”
Consider socket-proxy. Projects like Tecnativa’s docker-socket-proxy expose a filtered subset of the API to the container that needs it (e.g. read-only containers and events for monitoring) instead of the full socket.
Keep socket-mounting images minimal. Smaller surface, fewer libraries, fewer ways in.
The Distr Docker agent does mount the socket—it has to, in order to orchestrate Compose and Swarm on the host. We document that boundary openly in the Docker agent docs so customer security teams can review it before installation. The agent authenticates to the Hub with a JWT, and the install secret is shown once and never stored.
Updating Docker Compose Deployments Across Customer Hosts
docker compose pull && docker compose up -d is a fine command if you are SSH’d into the host. At customer scale—dozens of self-managed environments behind firewalls, each with its own change-control process—that manual process doesn’t scale. Docker has no built-in mechanism to push a new manifest to a running host from somewhere else. Docker Hub webhooks can trigger a CI rebuild when an image is pushed, but they do not reach into a customer’s network and tell their docker compose to pull.
The usual workarounds and what they cost:
Watchtower: Polls the registry on a schedule, pulls new images, recreates containers. Easy to set up, hard to control. No staged rollout, no rollback path, limited visibility from your side—you find out a customer updated when they file a ticket.
Bastion + SSH + Ansible/scripts: Works for ten customers. Falls apart at fifty, especially when three of them are air-gapped and four run their own change-control cadence. Every operator has to live with shared keys and a maintenance window calendar.
A pull-based agent. This is the shape Distr lands on. The agent runs on the customer host, polls a known endpoint every 5 seconds, and reconciles the local Compose state against what the Hub says it should be. The agent reports status back, so you can see in your dashboard which customers are on which version. When the agent itself needs to update, it spawns a separate container to perform the swap so it is not trying to replace itself while running.
The pattern is not unique—Kubernetes operators and GitOps tools do the same thing—but Compose users routinely re-invent it badly. If you find yourself building one, at least give it rollback, status reporting, and a way to pin versions, or you will end up with a fleet that drifts in ways you cannot see.
The other thing worth noting: recurring scheduled jobs alongside the application have no native Compose answer either. If your stack includes anything like a nightly cleanup, a periodic report, or a heartbeat-style task, the in-app scheduler is one option, but you eventually run into the cases it can’t cover (cross-service jobs, jobs that should outlive a single container). For the three patterns I have seen survive customer deployments, check out our guide on Compose cron jobs.
Outgrowing Docker Compose: Kubernetes vs Swarm
If a single-node Compose deployment outgrows itself, the realistic next step for most teams is Kubernetes. The ecosystem is large, the operational patterns are well documented, and the talent pool to hire against actually exists. For the side-by-side, read our Docker Compose vs Kubernetes comparison.
Docker Swarm is the other option—it reuses the Compose YAML format, ships in the box, and solves a few of the quirks above directly (it restarts unhealthy tasks, rolls out updates with update_config, and treats secrets and configs as first-class objects). It is a real fit for some single-cluster, low-ceremony deployments.
The Distr agent supports both—the Hub records whether a deployment is Compose or Swarm, and the agent runs the matching docker compose up or docker stack deploy. If you do choose Swarm, read our routing and Traefik guide for Docker Swarm and the product walkthrough for distributing applications to Swarm for the details.
So, should you run plain Docker Compose in production?
Yes—plain Docker Compose still runs a lot of real production workloads in 2026, as long as you accept that “plain Compose” is shorthand for “Compose plus the operator practices it doesn’t enforce.” None of the quirks above are secret. They are all in Docker’s documentation, in GitHub issues that have been open for years, and in the war stories of every team that has run Compose in anger. What makes them dangerous is not the quirks themselves but the order in which you discover them: usually at 2 a.m., one at a time.
TL;DR:
Pass –remove-orphans on every compose up and compose down.
Cap container logs in daemon.json and prune images on a schedule. Be careful with docker volume prune.
Health checks do not heal. Run an autoheal sidecar, run on Swarm, or use an agent that bundles one.
Pin by @sha256:… digest. Treat tags as references, not contracts.
The socket is root. Inventory the containers that mount it; prefer rootless Docker.
Updates need an agent of some kind. Watchtower is fine for one host; not for a fleet.
When Compose stops being enough, Kubernetes is usually the right next step. Swarm is a narrower fit and worth picking eyes-open.
If you ship software to self-managed customers and you would rather not rebuild this list yourself, the Distr Docker agent handles all of the above on the customer side. The Docker agent documentation walks through the install, the socket model, the autoheal and image-cleanup defaults, and how the agent self-updates. The repository is on GitHub.
We ran a benchmark comparing two ways of letting an AI agent operate the same admin panel, with the goal of putting a price tag on vision agents (browser-use, computer-use).
Here is what we measured, what we had to change to make the vision agent work at all, and what changes when generating an API surface stops being a separate engineering project.
Why vision agents?
Vision agents are the default for letting AI agents operate web apps that don’t expose APIs. The alternative, writing an MCP or REST surface per app, is its own engineering project across the 20+ internal tools most teams have. Most teams default to vision agents not because they are better, but because the alternative is too expensive to build. The cost of the vision approach is treated as a fixed price.
We wanted to measure the price.
The setup
The test app is an admin panel for managing customers, orders, and reviews, modeled on the react-admin Posters Galore demo. Two agents target the same running app: one drives the UI via screenshots and clicks, the other calls the app’s HTTP endpoints directly. Same Claude Sonnet, same pinned dataset, same task. The interface is the only variable.
The task: find the customer named “Smith” with the most orders, locate their most recent pending order, accept all of their pending reviews, and mark the order as delivered. This touches three resources, requires filtering, pagination, cross-entity lookups, and both reads and writes. It is the shape of work a typical internal tool sees daily.
Path A: Vision agent. Claude Sonnet driving the UI via browser-use 0.12. Vision mode, taking screenshots and executing clicks.
Path B: API agent. Claude Sonnet with tool-use, calling the handlers the UI calls. Each tool maps to one or more event handlers on the app’s State, the same functions a button click would trigger. The agent gets the structured response back instead of a rendered page.
The vision agent couldn’t complete the task
We started by giving both agents the same six-sentence task above and seeing what happened.
The API agent completed it in 8 calls. It listed the customer’s reviews filtered by pending status, accepted each one, and marked the order as delivered. Both agents are calling into the same application logic; the API agent just reads the structured response directly instead of looking at a rendered page.
The vision agent, on the same prompt, found one of four pending reviews, accepted it, and moved on. It never paginated. The remaining three reviews were below the visible fold of the reviews page and the agent had no signal to scroll for them.
This is not a model problem. The vision agent was reasoning about a rendered page and had no signal that the page wasn’t showing everything. The API agent calls the same handler the UI calls, but the response includes the full result set the handler returned, not just the rows currently rendered. The agent reads “page 1 of 4 with 50 results per page” directly instead of having to interpret pagination controls from pixels.
With a 14-step walkthrough, it succeeded
To make the comparison apples-to-apples, we rewrote the vision prompt as an explicit UI walkthrough, naming the sidebar items, tabs, and form fields the agent should interact with at each step. Fourteen numbered instructions covering the navigation the agent had failed to figure out on its own.
With the walkthrough, the vision agent completed the task. It also ran for fourteen minutes and consumed about half a million input tokens.
The walkthrough is itself a finding. Each numbered instruction is engineering work that doesn’t show up in token counts but represents real cost. Anyone deploying a vision agent against an internal tool is either writing prompts at this level of specificity or accepting that the agent will silently miss work.
How we ran it
We ran the API path five times and the vision path three times. The vision path was capped at three trials because each run takes 14 – 22 minutes and consumes 400 – 750k tokens.
Variance was the most surprising part of the vision results. Across three trials the wall-clock time spanned 749s to 1257s, and input tokens spanned 407k to 751k. The agent took 43 cycles in the shortest run and 68 in the longest. The screenshot-reason-click loop has enough non-determinism that a single run is not a representative cost estimate.
The API path had no such variance. Sonnet hit identical 8 tool calls on every trial, with input token counts varying by ±27 across all five runs. The agent calls the same handlers in the same order because the structured responses give it no reason to deviate.
The full results
Numbers are mean ± sample standard deviation (n−1), with n=5 per API path and n=3 for the vision path. Full run details are available in the repo.
Numbers are mean ± sample standard deviation (n−1), with n=5 per API path and n=3 for the vision path. Full run details are available in the repo.
Haiku could not complete the vision path. The failure was specific to browser-use 0.12′s structured-output schema, which Haiku could not reliably produce in either vision or text-only mode. On the API path, Haiku finished in under 8 seconds for under 10k input tokens, which is the cheapest configuration we tested.
The structural gap
The cost difference follows directly from the architecture. An agent that must see in order to act will always pay for the seeing, regardless of how good the model gets. Better vision models reduce error rates per screenshot, but they do not reduce the number of screenshots required to reach the relevant data. Each render is a screenshot is thousands of input tokens.
Both agents in this benchmark walk through the same application logic. They both filter, paginate, and update the same way the UI does. The difference is what they read at each step. The vision agent reads pixels and has to render every intermediate state to interpret it. The API agent reads the structured response from the same handlers, which already contains the data the UI was going to display.
Better models will narrow the cost per step. They will not narrow the step count, because the step count is set by the interface.
How we justify the API engineering cost
The benchmark was made cheap to run by Reflex 0.9, which includes a plugin that auto-generates HTTP endpoints from a Reflex application’s event handlers. None of the structural argument depends on Reflex specifically, but it is what made running the API path possible without writing a second codebase.
The interesting question is what becomes possible when the engineering cost of an API surface drops to zero. Vision agents remain the right tool for applications you do not control: third-party SaaS products, legacy systems, anything you cannot modify. For internal tools you build yourself, the math now points the other way.
Notes
Vision results are specific to browser-use 0.12 in vision mode, and other vision agents may behave differently. The Path B runner shapes the auto-generated endpoints into a small REST tool surface of about thirty lines, which the agent sees as list_customers, update_order, and similar. The dataset is pinned and small (900 customers, 600 orders, 324 reviews), so behavior on production-scale data is not measured here. The vision agent runs through LangChain’s ChatAnthropic, and the API agent runs through the Anthropic SDK directly. Reported token counts are uncached input tokens.
Reproduce it
The repo includes seed data generation, the patched react-admin demo, both agent scripts, and raw results.
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.