10 interesting stories served every morning and every evening.

Google Chrome silently installs a 4 GB AI model on your device without consent. At a billion-device scale the climate costs are insane.

www.thatprivacyguy.com

Two weeks ago I wrote about Anthropic silently reg­is­ter­ing a Native Messaging bridge in seven Chromium-based browsers on every ma­chine where Claude Desktop was in­stalled [1]. The pat­tern was: in­stall on user launch of prod­uct A, write con­fig­u­ra­tion into the user’s in­stalls of prod­ucts B, C, D, E, F, G, H with­out ask­ing. Reach across ven­dor trust bound­aries. No con­sent di­a­log. No opt-out UI. Re-installs it­self if the user re­moves it man­u­ally, every time Claude Desktop is launched.

This week I dis­cov­ered the same pat­tern, ex­e­cuted by Google. Google Chrome is reach­ing into users’ ma­chines and writ­ing a 4 GB on-de­vice AI model file to disk with­out ask­ing. The file is named weights.bin. It lives in OptGuideOnDeviceModel. It is the weights for Gemini Nano, Google’s on-de­vice LLM. Chrome did not ask. Chrome does not sur­face it. If the user deletes it, Chrome re-down­loads it.

The le­gal analy­sis is the same one I gave for the Anthropic case. The en­vi­ron­men­tal analy­sis is new. At Chrome’s scale, the cli­mate bill for one model push, paid in at­mos­pheric CO2 by the en­tire planet, is be­tween six thou­sand and sixty thou­sand tonnes of CO2-equivalent emis­sions, de­pend­ing on how many de­vices re­ceive the push. That is the en­vi­ron­men­tal cost of one com­pany uni­lat­er­ally de­cid­ing that two bil­lion peo­ples’ de­fault browser will mass-dis­trib­ute a 4 GB bi­nary they did not re­quest.

This is, in my pro­fes­sional opin­ion, a di­rect breach of Article 5(3) of Directive 2002/58/EC (the ePri­vacy Directive) [2], a breach of the Article 5(1) GDPR prin­ci­ples of law­ful­ness, fair­ness, and trans­parency [3], a breach of Article 25 GDPRs data-pro­tec­tion-by-de­sign oblig­a­tion [3], and an en­vi­ron­men­tal harm of a mag­ni­tude that would be a no­ti­fi­able event un­der the Corporate Sustainability Reporting Directive (CSRD) for any in-scope un­der­tak­ing [4].

What is on the disk and how it got there

On any ma­chine that has Chrome in­stalled, in the user pro­file, sits a di­rec­tory whose name is OptGuideOnDeviceModel. Inside it is a file called weights.bin. The file is ap­prox­i­mately 4 GB. It is the weights file for Gemini Nano. Chrome uses it to power fea­tures Google has mar­keted un­der names like Help me write”, on-de­vice scam de­tec­tion, and other AI-assisted browser func­tions.

The file ap­peared with no con­sent prompt. There is no check­box in Chrome Settings la­belled download a 4 GB AI model”. The down­load trig­gers when Chrome’s AI fea­tures are ac­tive, and those fea­tures are ac­tive by de­fault in re­cent Chrome ver­sions. On any ma­chine that meets the hard­ware re­quire­ments, Chrome treats the user’s hard­ware as a de­liv­ery tar­get and writes the model.

The cy­cle of dele­tion and re-down­load has been doc­u­mented across mul­ti­ple in­de­pen­dent re­ports on Windows in­stal­la­tions [5][6][7][8] - the user deletes, Chrome re-down­loads, the user deletes again, Chrome re-down­loads again. The only ways to make the dele­tion stick are to dis­able Chrome’s AI fea­tures through chrome://​flags or en­ter­prise pol­icy tool­ing that home users do not gen­er­ally have, or to unin­stall Chrome en­tirely [5]. On ma­cOS the file lands as mode 600 owned by the user (so it is deletable in prin­ci­ple) but Chrome holds the in­stall state in Local State af­ter the bytes are writ­ten, and as soon as the vari­a­tions server next tells Chrome the pro­file is el­i­gi­ble, the down­load fires again - the ar­chi­tec­ture is the same, only the file per­mis­sions dif­fer.

How I ver­i­fied this on a freshly cre­ated Apple Silicon pro­file

Most of the ex­ist­ing re­port­ing on this be­hav­iour is from Windows users who no­ticed their disk fill­ing up - use­ful, but Google could (and prob­a­bly will) try to char­ac­terise those re­ports as anec­dotes from non-rep­re­sen­ta­tive con­fig­u­ra­tions. So I went look­ing for a clean wit­ness on a dif­fer­ent plat­form.

The wit­ness I found is ma­cOS it­self. The ker­nel keeps a filesys­tem event log called .fseventsd - it records every file cre­ate, mod­ify and delete at the OS level, in­de­pen­dent of any ap­pli­ca­tion log­ging. Chrome can­not edit it, Google can­not re­motely reach it, and the page files that record the events sur­vive the dele­tion of the files they ref­er­ence.

I cre­ated a Chrome user-data di­rec­tory on 23 April 2026 to run an au­to­mated au­dit (one of the WebSentinel 100-site pri­vacy sweeps). The au­dit dri­ver is fully Chrome DevTools Protocol - it loads a page, dwells for five min­utes with no in­put, cap­tures events, closes Chrome be­tween sites - and the pro­file had re­ceived zero key­board or mouse in­put from a hu­man at any point in its ex­is­tence. Every AI mode” sur­face in Chrome was un­touched - in fact every UI sur­face in Chrome was un­touched, the au­dit dri­ver only in­ter­acts with the doc­u­ment via CDP and the om­ni­box is never reached. By 29 April the pro­file con­tained 4 GB of OptGuideOnDeviceModel weights - and I knew it be­cause a rou­tine du -sh of the au­dit-pro­file di­rec­tory caught it dur­ing a cleanup pass.

I went back to .fseventsd to ask ex­actly when those 4 GB landed. ma­cOS gave me the an­swer, byte-pre­cise, in three se­quen­tial page files:

24 April 2026, 16:38:54 CEST (14:38:54 UTC) - Chrome cre­ates the OptGuideOnDeviceModel di­rec­tory in the au­dit pro­file (page file 0000000003f7f339).

24 April 2026, 16:47:22 CEST (14:47:22 UTC) - three con­cur­rent un­packer sub­processes spawn tem­po­rary di­rec­to­ries in /private/var/folders/…/com.google.Chrome.chrome_chrome_Unpacker_BeginUnzipping.*/. One of them (5xzqPo) writes weights.bin, man­i­fest.json, _metadata/verified_contents.json and on_de­vice_­mod­el_ex­e­cu­tion_­con­fig.pb. The sec­ond writes a Certificate Revocation List up­date. The third writes a browser pre­load-data up­date. Chrome batched a se­cu­rity up­date, a pre­load re­fresh and a 4 GB AI model into the same idle win­dow, as if they were equiv­a­lent (page file 00000000040c8855).

24 April 2026, 16:53:22 CEST (14:53:22 UTC) - the un­packed weights.bin is moved to its fi­nal lo­ca­tion at OptGuideOnDeviceModel/2025.8.8.1141/weights.bin along with adapter_­cache.bin, en­coder_­cache.bin, _metadata/verified_contents.json and the ex­e­cu­tion con­fig. Concurrently four ad­di­tional model tar­gets (numbered 40, 49, 51 and 59 in Chrome’s op­ti­miza­tion-guide enum) reg­is­ter fresh en­tries in op­ti­miza­tion_guide_­mod­el_­s­tore - these are the smaller text-safety and prompt-rout­ing mod­els that pair with the LLM. None of these tar­gets ex­isted in the pro­file be­fore this mo­ment (page file 00000000040d0f9c).

Total in­stall time, from di­rec­tory cre­ation to fi­nal move: 14 min­utes and 28 sec­onds. Total hu­man ac­tion against the pro­file dur­ing that win­dow: none. The au­dit dri­ver was ei­ther dwelling on a third-party home page or tran­si­tion­ing be­tween sites - the un­packer fired in the back­ground while a tab waited for a five-minute timer to ex­pire.

The nam­ing in­side that fsev­entsd record is, if any­thing, the most damn­ing de­tail. The temp di­rec­tory is com.google.Chrome.chrome_chrome_Un­pack­er_Be­gi­n­Un­zip­ping.5xzqPo - that pre­fix com.google.Chrome.chrome_chrome_* is the bun­dle ID and sub­process nam­ing con­ven­tion Google Chrome it­self uses. It is not com.google.Google­Up­dater.* and it is not com.google.Google­Soft­ware­Up­date.*. The writer is Chrome - the browser process the user has in­stalled and trusts to load web pages - reach­ing into the user’s filesys­tem on its own ini­tia­tive and lay­ing down a 4 GB ML bi­nary while the fore­ground tab does some­thing com­pletely un­re­lated.

Three fur­ther pieces of cor­rob­o­rat­ing ev­i­dence sit else­where on the same ma­chine:

Chrome’s own Local State JSON for the au­dit pro­file con­tains an op­ti­miza­tion_guide.on_de­vice block with mod­el_­val­i­da­tion_re­sult: { at­temp­t_­count: 1, re­sult: 2, com­po­nen­t_ver­sion: 2025.8.8.1141” }. Chrome ran the model. The com­po­nen­t_ver­sion matches the ver­sion string the fsev­entsd events recorded as the path com­po­nent. Two in­de­pen­dent wit­nesses, same arte­fact. The same block re­ports per­for­mance_­class: 6, vram_mb: 36864″ - Chrome char­ac­terised my hard­ware (read the GPU, read the uni­fied mem­ory to­tal) to de­cide whether I was el­i­gi­ble for the model push, be­fore any user-fac­ing AI fea­ture sur­faced.

Chrome’s own Local State JSON for the au­dit pro­file con­tains an op­ti­miza­tion_guide.on_de­vice block with mod­el_­val­i­da­tion_re­sult: { at­temp­t_­count: 1, re­sult: 2, com­po­nen­t_ver­sion: 2025.8.8.1141” }. Chrome ran the model. The com­po­nen­t_ver­sion matches the ver­sion string the fsev­entsd events recorded as the path com­po­nent. Two in­de­pen­dent wit­nesses, same arte­fact. The same block re­ports per­for­mance_­class: 6, vram_mb: 36864″ - Chrome char­ac­terised my hard­ware (read the GPU, read the uni­fied mem­ory to­tal) to de­cide whether I was el­i­gi­ble for the model push, be­fore any user-fac­ing AI fea­ture sur­faced.

Chrome’s ChromeFeatureState for the au­dit pro­file lists OnDeviceModelBackgroundDownload<OnDeviceModelBackgroundDownload and ShowOnDeviceAiSettings<OnDeviceModelBackgroundDownload in the en­able-fea­tures block. The first flag is what trig­gers the silent down­load. The sec­ond flag is what re­veals the on-de­vice AI sec­tion in chrome://​set­tings. Both are gated by the same roll­out flag - which means that by Chrome’s own ar­chi­tec­ture, the in­stall be­gins be­fore the user has any set­tings UI in which to refuse it. The set­tings page that would let you dis­cover the fea­ture ex­ists is en­abled in lock­step with the in­stall - it is de­sign, not over­sight.

Chrome’s ChromeFeatureState for the au­dit pro­file lists OnDeviceModelBackgroundDownload<OnDeviceModelBackgroundDownload and ShowOnDeviceAiSettings<OnDeviceModelBackgroundDownload in the en­able-fea­tures block. The first flag is what trig­gers the silent down­load. The sec­ond flag is what re­veals the on-de­vice AI sec­tion in chrome://​set­tings. Both are gated by the same roll­out flag - which means that by Chrome’s own ar­chi­tec­ture, the in­stall be­gins be­fore the user has any set­tings UI in which to refuse it. The set­tings page that would let you dis­cover the fea­ture ex­ists is en­abled in lock­step with the in­stall - it is de­sign, not over­sight.

The GoogleUpdater logs record the on-de­vice-model con­trol com­po­nent (appid {44fc7fe2 – 65ce-487c-93f4-edee46eeaaab}) be­ing down­loaded from http://​edgedl.me.gvt1.com/​edgedl/​dif­f­gen-puf­fin/%​7B44fc7fe2 – 65ce-487c-93f4-edee46eeaaab%7D/… - a 7 MB com­pressed con­trol file that ar­rived on 20 April 2026, three days be­fore the au­dit pro­file in ques­tion was cre­ated. That is the up­stream con­trol plane: it is pro­file-in­de­pen­dent, it is launched au­to­mat­i­cally by a LaunchAgent that fires every hour, and the URL is plain HTTP (the in­tegrity is ver­i­fied by the CRX-3 sig­na­ture in­side the pack­age, not by trans­port se­cu­rity). The con­trol com­po­nent gives Chrome the man­i­fest point­ing at the ac­tual weights, and Chrome’s in-process OnDeviceModelComponentInstaller - a sep­a­rate code path from GoogleUpdater - then fetches the multi-GB weights di­rect from Google’s CDN.

The GoogleUpdater logs record the on-de­vice-model con­trol com­po­nent (appid {44fc7fe2 – 65ce-487c-93f4-edee46eeaaab}) be­ing down­loaded from http://​edgedl.me.gvt1.com/​edgedl/​dif­f­gen-puf­fin/%​7B44fc7fe2 – 65ce-487c-93f4-edee46eeaaab%7D/… - a 7 MB com­pressed con­trol file that ar­rived on 20 April 2026, three days be­fore the au­dit pro­file in ques­tion was cre­ated. That is the up­stream con­trol plane: it is pro­file-in­de­pen­dent, it is launched au­to­mat­i­cally by a LaunchAgent that fires every hour, and the URL is plain HTTP (the in­tegrity is ver­i­fied by the CRX-3 sig­na­ture in­side the pack­age, not by trans­port se­cu­rity). The con­trol com­po­nent gives Chrome the man­i­fest point­ing at the ac­tual weights, and Chrome’s in-process OnDeviceModelComponentInstaller - a sep­a­rate code path from GoogleUpdater - then fetches the multi-GB weights di­rect from Google’s CDN.

So we now have a four-way ev­i­dence chain - ma­cOS ker­nel filesys­tem events, Chrome’s own per-pro­file state, Chrome’s run­time fea­ture flags, and Google’s com­po­nent-up­dater logs - all four agree­ing on the same con­duct, and the con­duct is: a 4 GB AI model ar­rived on this user’s disk with­out con­sent, with­out no­tice, on a pro­file that re­ceived zero hu­man in­put, in a win­dow of 14 min­utes and 28 sec­onds, on a Tuesday af­ter­noon.

Reports of the OptGuideOnDeviceModel di­rec­tory and the weights.bin file have been cir­cu­lat­ing in com­mu­nity fo­rums for over a year - what is new in 2026 is the scale and the ver­i­fi­a­bil­ity. Chrome’s mar­ket share has held above 64% glob­ally [9][10], Chrome’s user base is be­tween 3.45 bil­lion and 3.83 bil­lion in­di­vid­u­als world­wide de­pend­ing on which 2026 es­ti­mate you trust [9][11], and Google has been rolling Gemini fea­tures into Chrome with in­creas­ing ag­gres­sion. The be­hav­iour is no longer af­fect­ing a mi­nor­ity of power users on a mi­nor­ity of plat­forms - it is af­fect­ing hun­dreds of mil­lions of de­vices, on every desk­top OS Chrome ships against.

The Anthropic com­par­i­son, point for point

The same dark-pat­tern play­book. I am re­peat­ing my cat­e­gori­sa­tion from the Claude Desktop ar­ti­cle [1] be­cause the pat­terns are iden­ti­cal and that is the point.

1. Forced bundling across trust bound­aries. Anthropic in­stalled Claude Desktop, then wrote into Brave, Edge, Arc, Vivaldi, Opera, and Chromium. Google in­stalls Chrome, then writes a 4 GB AI model un­der the user’s pro­file di­rec­tory with­out au­tho­ri­sa­tion. The bi­nary is not Chrome. It is a sep­a­rately-trained ma­chine-learn­ing model, with a sep­a­rate pur­pose, a sep­a­rate data-pro­tec­tion pro­file, and a sep­a­rate con­sent foot­print.

2. Invisible de­fault, no opt-in. No di­a­logue at first launch. No check­box in Settings. The model is down­loaded; the user finds out about it months later when their disk fills up [5][6][7].

3. More dif­fi­cult to re­move than in­stall. Adding the file took zero clicks. Removing it re­quires (a) dis­cov­er­ing the file ex­ists, (b) un­der­stand­ing what it is, (c) nav­i­gat­ing into a hid­den user pro­file path, (d) delet­ing it (and on Windows, also clear­ing the read-only at­tribute first), and (e) ac­cept­ing that Chrome will silently re-down­load it on next el­i­gi­ble win­dow un­less the user also nav­i­gates chrome://​flags, en­ter­prise pol­icy, or plat­form-spe­cific con­fig­u­ra­tion tool­ing to dis­able the un­der­ly­ing Chrome AI fea­ture [5]. None of those steps is doc­u­mented in the place a nor­mal user looks - none of them is even hinted at in de­fault Chrome.

4. Pre-staging of ca­pa­bil­ity the user has not re­quested. The Nano model ex­ists on the user’s disk so that Chrome fea­tures that use it can run in­stantly when the user in­vokes them. The user has not in­voked any of those fea­tures. The model still sits there, tak­ing 4 GB.

5. Scope in­fla­tion through generic nam­ing. OptGuideOnDeviceModel is in­ter­nal Chrome jar­gon for OptimizationGuide on-de­vice model stor­age”. A user look­ing at their disk us­age, even one who knows roughly what they are look­ing at, would not match OptGuideOnDeviceModel/weights.bin to Gemini Nano LLM weights”. Accurate nam­ing would be GeminiNanoLLM/weights.bin. Google chose to ob­fus­cate the name.

6. Registration into re­sources the user has not con­fig­ured. A user who has not opened Chrome’s AI fea­tures still gets the model. A user who has opened them once and de­cided they were not in­ter­ested still gets the model. The file’s pres­ence is de­cou­pled from the user’s ac­tual use of any fea­ture it pow­ers.

7. Documentation gap. Google’s user-fac­ing doc­u­men­ta­tion about Chrome’s AI fea­tures does not, with the promi­nence pro­por­tion­ate to a 4 GB silent down­load, tell the user that the cost of the fea­ture be­ing avail­able is a 4 GB file ap­pear­ing on their de­vice. The be­hav­iour is doc­u­mented in places a cu­ri­ous ad­min will find. It is not doc­u­mented in the place a reg­u­lar user looks be­fore in­stalling Chrome or be­fore Chrome de­cides to be­gin push­ing the model.

8. Automatic re-in­stall on every run. Same as Claude Desktop. Delete the file, Chrome re-cre­ates it. The user’s dele­tion is treated as a tran­sient state to be cor­rected, not as a di­rec­tive to be re­spected.

9. Retroactive sur­vival of any fu­ture user con­sent. If Google in fu­ture starts ask­ing users would you like Chrome to down­load a 4 GB AI model”, that prompt does not retro-ac­tively le­git­imise the silent in­stalls that have al­ready hap­pened on hun­dreds of mil­lions of de­vices. The dam­age to the trust re­la­tion­ship is done. The bytes have moved. The at­mos­phere has been writ­ten to.

10. Code-signed, shipped through the nor­mal re­lease chan­nel. This is not test build be­hav­iour. It is Chrome sta­ble.

The AI Mode” pill is the cherry on top

Here is the part that should make every pri­vacy lawyer in the au­di­ence put their cof­fee down. When Chrome 147 launches against an el­i­gi­ble pro­file, the om­ni­box - the ad­dress bar at the top of the win­dow, the most vis­i­ble piece of real es­tate in the en­tire browser - ren­ders an AI Mode” pill to the right of the URL field. A rea­son­able user, see­ing AI Mode” sit­ting in their browser’s most promi­nent UI el­e­ment in 2026, with the well-pub­li­cised ex­is­tence of on-de­vice LLMs in Chrome and a 4 GB Gemini Nano bi­nary al­ready silently in­stalled on their disk, is go­ing to draw what feels like an ob­vi­ous in­fer­ence - that the vis­i­ble AI Mode is us­ing the on-de­vice model, that their queries stay on the de­vice, that the lo­cal model is what pow­ers the lo­cal-look­ing sur­face.

Every part of that in­fer­ence is wrong. The AI Mode pill in the Chrome 147 om­ni­box is a cloud-backed Search Generative Experience sur­face - every query the user types into it is sent over the net­work to Google’s servers for pro­cess­ing by Google’s hosted mod­els. The on-de­vice Nano model is not in­voked by the AI Mode UI flow at all. They are en­tirely sep­a­rate code paths - the most vis­i­ble AI af­for­dance in the browser does not use the lo­cal model the user has been silently given, and the fea­tures that do use the lo­cal model (Help-Me-Write in <textarea>, tab-group AI sug­ges­tions, smart paste, page sum­mary) are buried in textarea-con­text menus and tab-group right-click menus that the av­er­age user will dis­cover, on av­er­age, never.

Think about what that arrange­ment ac­tu­ally is. The user pays the stor­age cost of the silent in­stall (4 GB on disk, plus the band­width of the silent down­load). The user’s most vis­i­ble AI ex­pe­ri­ence - the pill they ac­tu­ally see and click - de­liv­ers no on-de­vice ben­e­fit at all be­cause it routes to Google’s servers re­gard­less. The on-de­vice model is there­fore a sunk cost im­posed on the user, with no off­set­ting trans­parency ben­e­fit at the sur­face where trans­parency would mat­ter most. To put it an­other way - if the on-de­vice in­stall had given the user a clear your AI Mode queries stay on your de­vice” prop­erty, the in­stall would have a de­fen­si­ble pri­vacy fram­ing (worse stor­age, bet­ter data flow). It does not - the in­stall gives Google a fu­ture-op­tions re­source (the model can be in­voked by other Chrome sub­sys­tems with­out fur­ther server round-trips) at the user’s disk-and-band­width ex­pense, while the head­line AI sur­face con­tin­ues to send the user’s queries to Google as be­fore. The lo­cal model is a Google-side as­set po­si­tioned on the user’s de­vice - it is not a user-side as­set and one could ar­gue it is noth­ing but sleight-of-hand to hide that ac­tu­ally, the vis­i­ble AI mode is NOT us­ing the lo­cal model.

That arrange­ment, on its own, en­gages at least three of the de­cep­tive de­sign pat­tern fam­i­lies cat­a­logued in EDPB Guidelines 03/2022 [20]. It is mis­lead­ing in­for­ma­tion be­cause the vis­i­ble la­bel AI Mode” cre­ates a false im­pres­sion about where pro­cess­ing oc­curs - the la­bel does not say cloud-backed” or queries sent to Google”, and a rea­son­able user with knowl­edge of on-de­vice AI will in­fer lo­cal­ity from the prox­im­ity of an on-de­vice 4 GB model on their disk. It is skip­ping be­cause the user is not given a mo­ment to choose be­tween lo­cal-only and cloud-backed AI sur­faces - both are switched on by the same up­stream roll­out, with no per-fea­ture con­sent. And it is hin­der­ing be­cause turn­ing AI Mode off does not also re­move the on-de­vice in­stall, and re­mov­ing the on-de­vice in­stall does not turn AI Mode off - the two are sep­a­rately con­trolled, and dis­cov­er­ing both con­trols re­quires know­ing about both chrome://​flags and chrome://​set­tings/​ai, nei­ther of which is ob­vi­ous in de­fault Chrome.

So: not just a non-con­sented in­stall, but a non-con­sented in­stall that dou­bles as cover for a par­al­lel cloud-backed sur­face that mis­rep­re­sents to the user where their typ­ing is be­ing processed. Both lay­ers com­pound the con­sent prob­lem.

Why this is un­law­ful in the EEA and the UK

Article 5(3) of Directive 2002/58/EC (the ePri­vacy Directive) pro­hibits the stor­ing of in­for­ma­tion, or the gain­ing of ac­cess to in­for­ma­tion al­ready stored, in the ter­mi­nal equip­ment of a sub­scriber or user, with­out the user’s prior, freely-given, spe­cific, in­formed, and un­am­bigu­ous con­sent, ex­cept where strictly nec­es­sary for the pro­vi­sion of an in­for­ma­tion-so­ci­ety ser­vice ex­plic­itly re­quested by the user [2]. The 4 GB Gemini Nano weights file is in­for­ma­tion stored in the user’s ter­mi­nal equip­ment. The user did not con­sent. The user has not re­quested any ser­vice that strictly re­quires a 4 GB on-de­vice LLM. Chrome is func­tional with­out the file. The Article 5(3) breach is di­rect.

Article 5(1) GDPR re­quires pro­cess­ing of per­sonal data to be law­ful, fair, and trans­par­ent to the data sub­ject [3]. Where the user’s hard­ware is pro­filed to de­ter­mine el­i­gi­bil­ity for the model push, where the in­stall events are logged on Google’s servers, and where the on-de­vice fea­tures the model pow­ers process user prompts (whether or not those prompts leave the de­vice), the law­ful­ness, fair­ness, and trans­parency of all of that pro­cess­ing de­pend on the user be­ing told, in plain lan­guage, what is hap­pen­ing. They are not.

Article 25 GDPR re­quires the con­troller to im­ple­ment ap­pro­pri­ate tech­ni­cal and or­gan­i­sa­tional mea­sures to en­sure that, by de­fault, only per­sonal data that are nec­es­sary for each spe­cific pur­pose are processed [3]. Pre-staging a 4 GB AI model on a user’s disk, against a con­tin­gency that the user might in fu­ture in­voke an AI fea­ture, is the ar­chi­tec­tural op­po­site of by-de­fault min­imi­sa­tion and the pro­fil­ing of the de­vice to de­ter­mine whether or not to push the model is not dif­fer­ent to the pro­fil­ing used to track you on­line and as such that pro­file con­tains per­sonal data and if the AI model is used, will process per­sonal data, so the GDPR ar­gu­ments are in scope and valid.

Under the UK GDPR and the Privacy and Electronic Communications Regulations 2003, the analy­sis is the same. Under the California Consumer Privacy Act, the ab­sence of a no­tice-at-col­lec­tion cov­er­ing this spe­cific cat­e­gory of pre-staged soft­ware puts Google’s CCPA no­tice pos­ture in ques­tion [12].

Then there are the crim­i­nal-law vi­o­la­tions un­der var­i­ous na­tional com­puter-mis­use statutes - which again can­not be over­stated.

ESG: the cli­mate cost of the silent push

The Anthropic case I wrote about was a desk­top ap­pli­ca­tion in­stalling a 350-byte JSON man­i­fest in seven di­rec­to­ries. The band­width and en­ergy cost of that, summed across all Claude Desktop users, was neg­li­gi­ble. The Chrome case is dif­fer­ent. Chrome is push­ing a 4 GB bi­nary across hun­dreds of mil­lions of de­vices. That has a mea­sur­able, quan­tifi­able, and frankly alarm­ing en­vi­ron­men­tal foot­print.

I am cal­cu­lat­ing this us­ing the same method­ol­ogy our WebSentinel au­dit plat­form ap­plies to web­site en­vi­ron­men­tal analy­sis [13]:

Energy in­ten­sity of net­work data trans­fer: 0.06 kWh per GB, the mid-band of Pärssinen et al. (2018) Environmental im­pact as­sess­ment of on­line ad­ver­tis­ing”, Science of The Total Environment [14]. The pa­per re­ports a 0.04 – 0.10 kWh/​GB range de­pend­ing on the share of fixed-line vs mo­bile trans­fer and in­clu­sion of end-user de­vice en­ergy. 0.06 is a de­fen­si­ble mid-point.

Grid emis­sions fac­tor: 0.25 kg CO2e per kWh, the EEA / IEA com­pos­ite EU-27 elec­tric­ity-sup­ply fac­tor for 2024 re­port­ing [15]. Globally the fig­ure varies from ~0.10 kg/​kWh on mostly-re­new­able grids to over 0.70 kg/​kWh on coal-heavy grids; 0.25 is mid-band for a global push and is the fig­ure WebSentinel uses by de­fault.

Per-device cost of one Nano push

Bandwidth: 4 GB

Energy: 4 × 0.06 = 0.24 kWh per de­vice per push

CO2: 0.24 × 0.25 = 0.06 kg CO2e per de­vice per push

That is per de­vice, per push. A sin­gle down­load of the model. It does not in­clude re-down­loads trig­gered by the user try­ing and fail­ing to delete the file. It does not in­clude sub­se­quent up­dates to the model. It does not in­clude the on-de­vice in­fer­ence en­ergy when the model is ac­tu­ally used. It is just the one-time de­liv­ery cost to one de­vice.

Aggregated cost across the de­ploy­ment

Google does not pub­lish how many de­vices re­ceive the Nano push. The el­i­gi­bil­ity cri­te­ria gat­ing the push (a hard­ware performance class” that Chrome com­putes from CPU class, GPU class, sys­tem RAM and avail­able VRAM - typ­i­cally ~16 GB uni­fied mem­ory or bet­ter on Apple Silicon, ~16 GB RAM and a dis­crete or in­te­grated GPU with suf­fi­cient VRAM on Windows and Linux) carve out the very low end of the con­sumer in­stall base, but the qual­i­fy­ing pop­u­la­tion is still enor­mous. I will use three il­lus­tra­tive de­ploy­ment bands so the reader can pick whichever they con­sider clos­est to re­al­ity. None of these bands is im­plau­si­bly large for a fea­ture that ships in de­fault-on Chrome.

To com­pare those num­bers to what an ESG re­port could com­pare to:

24 GWh (low band) is roughly the an­nual elec­tric­ity con­sump­tion of about 7,000 av­er­age UK house­holds [16].

24 GWh (low band) is roughly the an­nual elec­tric­ity con­sump­tion of about 7,000 av­er­age UK house­holds [16].

120 GWh (mid band) is roughly the an­nual elec­tric­ity con­sump­tion of about 36,000 av­er­age UK house­holds, or the an­nual out­put of a 14 MW wind tur­bine run­ning at typ­i­cal UK ca­pac­ity fac­tor.

120 GWh (mid band) is roughly the an­nual elec­tric­ity con­sump­tion of about 36,000 av­er­age UK house­holds, or the an­nual out­put of a 14 MW wind tur­bine run­ning at typ­i­cal UK ca­pac­ity fac­tor.

240 GWh (high band) is roughly the an­nual elec­tric­ity con­sump­tion of about 72,000 av­er­age UK house­holds, or the an­nual out­put of about 28 MW of in­stalled wind ca­pac­ity.

240 GWh (high band) is roughly the an­nual elec­tric­ity con­sump­tion of about 72,000 av­er­age UK house­holds, or the an­nual out­put of about 28 MW of in­stalled wind ca­pac­ity.

6,000 tonnes CO2e (low band) is roughly the an­nual emis­sions of 1,300 av­er­age pas­sen­ger cars in the EU [17].

6,000 tonnes CO2e (low band) is roughly the an­nual emis­sions of 1,300 av­er­age pas­sen­ger cars in the EU [17].

30,000 tonnes CO2e (mid band) is roughly the an­nual emis­sions of 6,500 cars, or one re­turn flight from London to Sydney for about 8,000 pas­sen­gers in econ­omy.

30,000 tonnes CO2e (mid band) is roughly the an­nual emis­sions of 6,500 cars, or one re­turn flight from London to Sydney for about 8,000 pas­sen­gers in econ­omy.

60,000 tonnes CO2e (high band) is roughly the an­nual emis­sions of 13,000 cars.

60,000 tonnes CO2e (high band) is roughly the an­nual emis­sions of 13,000 cars.

These are the de­liv­ery-only num­bers. They count the bytes tra­vers­ing the net­work ex­actly once. They do not count:

The roughly 4 GB × N de­vices of disk-stor­age cost, sus­tained, on user hard­ware. SSDs have a per-GB em­bod­ied car­bon cost of ap­prox­i­mately 0.16 kg CO2e per GB of NAND man­u­fac­tured [18]; for 1 bil­lion de­vices × 4 GB that is around 640,000 tonnes CO2e of em­bod­ied SSD al­lo­cated to a use case the user did not con­sent to. This is a one-off man­u­fac­tur­ing-car­bon im­pact, but the stor­age bur­den is borne in per­pe­tu­ity by user de­vices that could oth­er­wise have used the space for user data.

The on-de­vice in­fer­ence en­ergy when Nano is in­voked. Per in­fer­ence this is small. At 2 bil­lion daily Chrome users it is no longer small.

The re-down­load cy­cle for users who try to delete the file. Each suc­cess­ful re-trig­ger of the down­load is an­other 4 GB × 0.06 kWh × 0.25 kg = 0.06 kg CO2e per de­vice per re-down­load.

The fu­ture model up­dates. Gemini Nano is not a one-shot arte­fact; it is an evolv­ing model with pe­ri­odic weight re­freshes. Each re­fresh re­peats the cal­cu­la­tion.

In ESG-reporting lan­guage, the one-time push of the cur­rent model is a Scope 3 Category 11 (“use of sold prod­ucts”) emis­sion against Google, at­trib­ut­able to the user-side de­liv­ery of a bi­nary the user did not re­quest, in the op­er­a­tion of a free prod­uct Google dis­trib­utes [4].

Why the band­width side mat­ters in its own right

In ad­di­tion to the car­bon cost, the net­work-band­width cost is paid by ISPs, by mo­bile net­work op­er­a­tors, by users on me­tered con­nec­tions, and by every piece of net­work in­fra­struc­ture that has to carry an un­wanted 4 GB pay­load to a des­ti­na­tion that did not ask for it. Per the Pärssinen ref­er­ence, around 50% of that de­liv­ery en­ergy is in the ac­cess net­work and CDN edge, around 30% is in user-side equip­ment (router, mo­dem, NIC), and the re­main­der is in the core. None of that in­fra­struc­ture ex­ists for free. Every byte Chrome pushes is a byte that com­petes with bytes the user ac­tu­ally wanted.

For users on capped mo­bile data plans, par­tic­u­larly in re­gions where smart­phone-as-only-in­ter­net is dom­i­nant (much of Africa, much of South and Southeast Asia, most of Latin America), 4 GB of un­re­quested down­load is on the or­der of a mon­th’s data al­lowance, vapourised by Chrome on the user’s be­half. Google has not, to my knowl­edge, pub­lished any analy­sis of the wel­fare im­pact of this on the pop­u­la­tions whose in­ter­net ac­cess is me­tered.

Keep in mind that mo­bile data plans (4G and 5G) are used by many house­holds who do not have ac­cess to fiber, ca­ble or adsl and are used for desk­top de­vices as well as mo­bile - so the ar­gu­ment that Google won’t push this to mo­bile de­vices (although I have not found any­thing of­fi­cial to sup­port that ar­gu­ment any­way) will not fly.

What Google should have done

This is not a hard list. It is the same list I gave Anthropic in the Claude Desktop ar­ti­cle, ap­plied to Google.

Ask. First time Chrome is about to down­load the Nano model, pop a di­a­logue. Chrome would like to down­load a 4 GB AI model file to your de­vice to power the fol­low­ing fea­tures. Allow, or skip and de­cide later.” Two but­tons. Done.

Ask. First time Chrome is about to down­load the Nano model, pop a di­a­logue. Chrome would like to down­load a 4 GB AI model file to your de­vice to power the fol­low­ing fea­tures. Allow, or skip and de­cide later.” Two but­tons. Done.

Pull, not push. Trigger the down­load as a down­stream con­se­quence of the user in­vok­ing an AI fea­ture for the first time. Let the fea­ture it­self be the con­sent event. Do not pre-stage on a con­tin­gency.

Pull, not push. Trigger the down­load as a down­stream con­se­quence of the user in­vok­ing an AI fea­ture for the first time. Let the fea­ture it­self be the con­sent event. Do not pre-stage on a con­tin­gency.

Surface it. In chrome://​set­tings/, list the AI model files Chrome has down­loaded, their size, the fea­tures they power, and a Remove and stop down­load­ing” but­ton per model. Make re­moval per­sis­tent, not a tran­sient state Chrome cor­rects on next launch.

Surface it. In chrome://​set­tings/, list the AI model files Chrome has down­loaded, their size, the fea­tures they power, and a Remove and stop down­load­ing” but­ton per model. Make re­moval per­sis­tent, not a tran­sient state Chrome cor­rects on next launch.

Document it. Tell the user, plainly, in the Chrome de­scrip­tion on the Microsoft Store, in the Chrome in­staller, on the Google Chrome down­load page, that Chrome will down­load ad­di­tional model files of sub­stan­tial size on sup­ported hard­ware. Currently, this is es­sen­tially un­doc­u­mented to a nor­mal user.

Document it. Tell the user, plainly, in the Chrome de­scrip­tion on the Microsoft Store, in the Chrome in­staller, on the Google Chrome down­load page, that Chrome will down­load ad­di­tional model files of sub­stan­tial size on sup­ported hard­ware. Currently, this is es­sen­tially un­doc­u­mented to a nor­mal user.

Respect dele­tion. If the user deletes weights.bin, do not re-cre­ate it. If the user has a strong pref­er­ence about what is on their disk, the ap­pli­ca­tion is not in a po­si­tion to over­ride that pref­er­ence be­cause the ap­pli­ca­tion thinks it knows bet­ter.

Respect dele­tion. If the user deletes weights.bin, do not re-cre­ate it. If the user has a strong pref­er­ence about what is on their disk, the ap­pli­ca­tion is not in a po­si­tion to over­ride that pref­er­ence be­cause the ap­pli­ca­tion thinks it knows bet­ter.

Disclose at scale. Publish, in Google’s an­nual ESG re­port, the ag­gre­gate band­width and car­bon foot­print of all AI-feature model pushes to user de­vices, bro­ken down by re­gion. Treat it as the Scope 3 Category 11 emis­sion it is. Account for it.

Disclose at scale. Publish, in Google’s an­nual ESG re­port, the ag­gre­gate band­width and car­bon foot­print of all AI-feature model pushes to user de­vices, bro­ken down by re­gion. Treat it as the Scope 3 Category 11 emis­sion it is. Account for it.

add Phase-A porting guide · oven-sh/bun@46d3bc2

github.com

Skip to con­tent

Secure your code as you build

We read every piece of feed­back, and take your in­put very se­ri­ously.

Include my email ad­dress so I can be con­tacted

Use saved searches to fil­ter your re­sults more quickly

To see all avail­able qual­i­fiers, see our doc­u­men­ta­tion.

Sign up

You signed in with an­other tab or win­dow. Reload to re­fresh your ses­sion.

You signed out in an­other tab or win­dow. Reload to re­fresh your ses­sion.

You switched ac­counts on an­other tab or win­dow. Reload to re­fresh your ses­sion.

Notifications

You must be signed in to change no­ti­fi­ca­tion set­tings

You can’t per­form that ac­tion at this time.

AI didn't delete your database, you did

idiallo.com

Last week, a tweet went vi­ral show­ing a guy claim­ing that a Cursor/Claude agent deleted his com­pa­ny’s pro­duc­tion data­base. We watched from the side­lines as he tried to get a con­fes­sion from the agent: Why did you delete it when you were told never to per­form this ac­tion?” Then he tried to parse the an­swer to ei­ther learn from his mis­take or warn us about the dan­gers of AI agents.

I have a ques­tion too: why do you have an API end­point that deletes your en­tire pro­duc­tion data­base? His post ram­bled on about false mar­ket­ing in AI, bad cus­tomer sup­port, and so on. What was miss­ing was ac­count­abil­ity.

I’m not one to blindly de­fend AI, I al­ways err on the side of cau­tion. But I also know you can’t blame a tool for your own mis­takes.

In 2010, I worked with a com­pany that had a very man­ual de­ploy­ment process. We used SVN for ver­sion con­trol. To de­ploy, we had to copy trunk, the equiv­a­lent of the mas­ter branch, into a re­lease folder la­beled with a re­lease date. Then we made a sec­ond copy of that re­lease and called it current.” That way, pulling the cur­rent folder al­ways gave you the lat­est re­lease.

One day, while de­ploy­ing, I ac­ci­den­tally copied trunk twice. To fix it via the CLI, I edited my pre­vi­ous com­mand to delete the du­pli­cate. Then I con­tin­ued the de­ploy­ment with­out any is­sues… or so I thought. Turns out, I had­n’t deleted the du­pli­cate copy at all. I had edited the wrong com­mand and deleted trunk in­stead. Later that day, an­other de­vel­oper was con­fused when he could­n’t find it.

All hell broke loose. Managers scram­bled, meet­ings were called. By the time the news reached my team, the lead de­vel­oper had al­ready run a com­mand to re­vert the dele­tion. He checked the logs, saw that I was re­spon­si­ble, and my next task was to write a script to au­to­mate our de­ploy­ment process so this kind of mis­take could­n’t hap­pen again. Before the day was over, we had a more ro­bust sys­tem in place. One that even­tu­ally grew into a full CI/CD pipeline.

Automation helps elim­i­nate the silly mis­takes that come with man­ual, repet­i­tive work. We could have eas­ily gone around ask­ing Why did­n’t SVN pre­vent us from delet­ing trunk?” But the real prob­lem was our man­ual process. Unlike ma­chines, we can’t re­peat a task ex­actly the same way every sin­gle day. We are bound to slip up even­tu­ally.

With AI gen­er­at­ing large swaths of code, we get the il­lu­sion of that same se­cu­rity. But au­toma­tion means do­ing the same thing the same way every time. AI is more like me copy­ing and past­ing branches, it’s bound to make mis­takes, and it’s not equipped to ex­plain why it did what it did. The terms we use, like thinking” and reasoning,” may look like re­flec­tion from an in­tel­li­gent agent. But these are mar­ket­ing terms slapped on top of AI. In re­al­ity, the mod­els are still just gen­er­at­ing to­kens.

Now, back to the main prob­lem this guy faced. Why does a pub­lic-fac­ing API that can delete all your pro­duc­tion data­bases even ex­ist? If the AI had­n’t called that end­point, some­one else even­tu­ally would have. It’s like putting a self-de­struct but­ton on your car’s dash­board. You have every rea­son not to press it, be­cause you like your car and it takes you from point A to point B. But a mo­ti­vated tod­dler who wig­gles out of his car seat will hit that big red but­ton the mo­ment he sees it. You can’t then in­ter­ro­gate the child about his rea­son­ing. Mine would have an­swered sim­ply: I did it be­cause I pressed it.”

I sus­pect a large part of this com­pa­ny’s ap­pli­ca­tion was vibe-coded. The soft­ware ar­chi­tects used AI to spec the prod­uct from AI-generated de­scrip­tions pro­vided by the prod­uct team. The de­vel­op­ers used AI to write the code. The re­view­ers used AI to ap­prove it. Now, when a bug ap­pears, the only op­tion is to in­ter­ro­gate yet an­other AI for an­swers, prob­a­bly not even run­ning on the same GPU that gen­er­ated the orig­i­nal code. You can’t blame the GPU!

The sim­ple so­lu­tion is know what you’re de­ploy­ing to pro­duc­tion. The more re­al­is­tic one is, if you’re go­ing to use AI ex­ten­sively, build a process where com­pe­tent de­vel­op­ers use it as a tool to aug­ment their work, not a way to avoid ac­count­abil­ity. And please, don’t let your CEO or CTO write the code.

GitHub - angelos-p/llm-from-scratch

github.com

Train Your Own LLM From Scratch

A hands-on work­shop where you write every piece of a GPT train­ing pipeline your­self, un­der­stand­ing what each com­po­nent does and why.

Andrej Karpathy’s nanoGPT was my first real ex­po­sure to LLMs and trans­form­ers. Seeing how a work­ing lan­guage model could be built in a few hun­dred lines of PyTorch com­pletely changed how I thought about AI and in­spired me to go deeper into the space.

This work­shop is my at­tempt to give oth­ers that same ex­pe­ri­ence. nanoGPT tar­gets re­pro­duc­ing GPT-2 (124M params) and cov­ers a lot of ground. This pro­ject strips it down to the es­sen­tials and scales it to a ~10M param model that trains on a lap­top in un­der an hour — de­signed to be com­pleted in a sin­gle work­shop ses­sion.

What You’ll Build

A work­ing GPT model trained from scratch on your MacBook, ca­pa­ble of gen­er­at­ing Shakespeare-like text. You’ll write:

Tokenizer — turn­ing text into num­bers the model can process

Model ar­chi­tec­ture — the trans­former: em­bed­dings, at­ten­tion, feed-for­ward lay­ers

Training loop — for­ward pass, loss, back­prop, op­ti­mizer, learn­ing rate sched­ul­ing

Text gen­er­a­tion — sam­pling from your trained model

Prerequisites

Any lap­top or desk­top (Mac, Linux, or Windows)

Python 3.12+

Comfort read­ing Python code (you don’t need ML ex­pe­ri­ence)

Training uses Apple Silicon GPU (MPS), NVIDIA GPU (CUDA), or CPU au­to­mat­i­cally. Also works on Google Colab — up­load the files and run with !python train.py.

Getting Started

Local (recommended)

Install uv if you don’t have it:

# ma­cOS / Linux

curl -LsSf https://​as­tral.sh/​uv/​in­stall.sh | sh

# Windows

pow­er­shell -ExecutionPolicy ByPass -c irm https://​as­tral.sh/​uv/​in­stall.ps1 | iex”

Then set up the pro­ject:

uv sync

mkdir scratch­pad && cd scratch­pad

Google Colab

If you don’t have a lo­cal setup, up­load the repo to Colab and in­stall de­pen­den­cies:

!pip in­stall torch numpy tqdm tik­to­ken

Upload data/​shake­speare.txt to your Colab files, then write your code in note­book cells or up­load .py files and run them with !python train.py.

Work through the docs in or­der. Each part walks you through writ­ing a piece of the pipeline, ex­plain­ing what each com­po­nent does and why. By the end, you’ll have a work­ing model.py, train.py, and gen­er­ate.py that you wrote your­self.

Architecture: GPT at a Glance

Input Text

┌─────────────────┐

│ Tokenizer │ hello” → [20, 43, 50, 50, 53] (character-level)

└────────┬────────┘

┌─────────────────┐

│ Token Embed + │ to­ken IDs → vec­tors (n_embd di­men­sions)

│ Position Embed │ + po­si­tional in­for­ma­tion

└────────┬────────┘

┌─────────────────┐

│ Transformer │  × n_layer

│ Block: │

┌────────────┐

│ │ LayerNorm │ │

│ │ Self-Attn │ │ n_­head par­al­lel at­ten­tion heads

│ │ + Residual │ │

├────────────┤

│ │ LayerNorm │ │

│ │ MLP (FFN) │ │ ex­pand 4x, GELU, pro­ject back

│ │ + Residual │ │

└────────────┘

└────────┬────────┘

┌─────────────────┐

│ LayerNorm │

│ Linear → log­its│ vo­cab_­size out­puts (probability over next to­ken)

└─────────────────┘

Model Configs for This Workshop

All con­figs use char­ac­ter-level to­k­eniza­tion (vocab_size=65) and block­_­size=256.

Tokenization: Characters vs BPE

This work­shop uses char­ac­ter-level to­k­eniza­tion on Shakespeare. BPE to­k­eniza­tion (GPT-2′s 50k vo­cab) does­n’t work on small datasets — most to­ken bi­grams are too rare for the model to learn pat­terns from.

Part 5 cov­ers switch­ing to BPE for larger datasets.

Key References

nanoGPT — The pro­ject this work­shop is based on. Minimal GPT train­ing in ~300 lines of PyTorch

build-nanogpt video lec­ture — 4-hour video build­ing GPT-2 from an empty file

Karpathy’s mi­crogpt — A full GPT in 200 lines of pure Python, no de­pen­den­cies

nanochat — Full ChatGPT clone train­ing pipeline

Attention Is All You Need (2017) — The orig­i­nal trans­former pa­per

GPT-2 pa­per (2019) — Language mod­els as un­su­per­vised learn­ers

TinyStories pa­per — Why small mod­els trained on cu­rated data punch above their weight

Async Rust never left the MVP state - Blog - Tweede golf

tweedegolf.nl

I’ve pre­vi­ously ex­plained async bloat and some work-arounds for it, but would much pre­fer to solve the is­sue at the root, in the com­piler. I’ve sub­mit­ted a Project Goal, and am look­ing for help to fund the ef­fort.

I love me some async Rust! It’s amaz­ing how we can write ex­ecu­tor ag­nos­tic code that can run con­cur­rently on huge servers and tiny mi­cro­con­trollers.

But es­pe­cially on those tiny mi­cro­con­trollers we no­tice that async Rust is far from the zero cost ab­strac­tions we were promised. That’s be­cause every byte of bi­nary size counts and async in­tro­duces a lot of bloat. This bloat ex­ists on desk­tops and servers as well, but it’s much less not­i­ca­ble when you have sub­stan­tially more mem­ory and com­pute avail­able.

I’ve pre­vi­ously ex­plained some work-arounds for this is­sue, but would much pre­fer to get to the root of the prob­lem, and work on im­prov­ing async bloat in the com­piler. As such I have sub­mit­ted a Project Goal.

This is part 2 of my blog se­ries on this topic. See part 1 for the ini­tial ex­plo­ration of the topic and what you can do when writ­ing async code to avoid some of the bloat. In this sec­ond part we’ll dive into the in­ter­nals and trans­late the meth­ods of blog 1 into op­ti­miza­tions for the com­piler.

What I won’t be talk­ing about is the of­ten dis­cussed prob­lem of fu­tures be­com­ing big­ger than nec­es­sary and them do­ing a lot of copy­ing. People are aware of that al­ready. In fact, there is an open PR that tack­les part of it: https://​github.com/​rust-lang/​rust/​pull/​135527

Anatomy of a gen­er­ated fu­ture

We’re go­ing to be look­ing at this code:

fn foo() -> impl Future<Output = i32> {

async { 5 }

}

fn bar() -> impl Future<Output = i32> {

async {

foo().await + foo().await

}

}

god­bolt

We’re us­ing the desug­ared syn­tax for fu­tures be­cause it’s eas­ier to see what’s hap­pen­ing.

So what does the bar fu­ture look like?

There are two await points, so the state ma­chine must have at least two states, right?

Well, yes. But there’s more.

Luckily we can ask the com­piler to dump MIR for us at var­i­ous passes. An in­ter­est­ing pass is the corou­tine_re­sume pass. This is the last async-spe­cific MIR pass. Why is this im­por­tant? Well, async is a lan­guage fea­ture that still ex­ists in MIR, but not in LLVM IR. So the trans­for­ma­tion of async to state ma­chine hap­pens as a MIR pass.

The bar func­tion gen­er­ates 360 lines of MIR. Pretty crazy, right? Although this gets op­ti­mized some­what later on, the non-async ver­sion uses only 23 lines for this.

The com­piler also out­puts the CoroutineLayout. It’s ba­si­cally an enum with these states (comments my own):

vari­ant_­fields: {

Unresumed(0): [], // Starting state

Returned (1): [],

Panicked (2): [],

Suspend0 (3): [_s1], // At await point 1, _s1 = the foo fu­ture

Suspend1 (4): [_s0, _s2], // At await point 2, _s0 = re­sult of _s1, s2 = the sec­ond foo fu­ture

},

So what are Returned and Panicked?

Well, Future::poll is a safe func­tion. Calling it must not in­duce any UB, even when the fu­ture is done. So af­ter Suspend1 the fu­ture re­turns Ready and the fu­ture is changed to the Returned state. Once polled again in that state, the poll func­tion will panic.

The Panicked state ex­ists so that af­ter an async fn has pan­icked, but the catch-un­wind mech­a­nism was used to catch it, the fu­ture can’t be polled any­more. Polling a fu­ture in the Panicked state will panic. If this mech­a­nism was­n’t there, we could poll the fu­ture again af­ter a panic. But the fu­ture may be in an in­com­plete state and so that could cause UB. This mech­a­nism is very sim­i­lar to mu­tex poi­son­ing.

(I’m 90% sure I’m cor­rect about the Panicked state, but I can’t re­ally find any docs that ac­tu­ally de­scribe this.)

Cool, this seems rea­son­able.

Why panic?

But is it rea­son­able? Futures in the Returned state will panic. But they don’t have to. The only thing we can’t do is cause UB to hap­pen.

Panics are rel­a­tively ex­pen­sive. They in­tro­duce a path with a side-ef­fect that’s not eas­ily op­ti­mized out. What if in­stead, we just re­turn Pending again? Nothing un­safe go­ing on, so we ful­fill the con­tract of the Future type.

I’ve hacked this in the com­piler to try it out and saw a 2%-5% re­duc­tion in bi­nary size for async em­bed­ded firmware.

So I pro­pose this should be a switch, just like over­flow-checks = false is for in­te­ger over­flow. In de­bug builds it would still panic so that wrong be­hav­ior is im­me­di­ately vis­i­ble, but in re­lease builds we get smaller fu­tures.

Similarly, when panic=abort is used, we might be able to get rid of the Panicked state al­to­gether. I want to look into the reper­cus­sions of that.

Always a state ma­chine

We’ve looked at bar, but not yet at foo.

fn foo() -> impl Future<Output = i32> {

async { 5 }

}

Let’s im­ple­ment it man­u­ally, to see what the op­ti­mal so­lu­tion would be.

struct FooFut;

impl Future for FooFut {

type Output = i32;

fn poll(self: Pin<&mut Self>, _cx: &mut Context<’_>) -> Poll<Self::Output> {

Poll::Ready(5)

}

}

Easy right? We don’t need any state. We just re­turn the num­ber.

Let’s see what the gen­er­ated MIR is for the ver­sion the com­piler gives us:

// MIR for `foo::{closure#0}` 0 corou­tine_re­sume

/* corou­tine_lay­out = CoroutineLayout {

field­_­tys: {},

vari­ant_­fields: {

Unresumed(0): [],

Returned (1): [],

Panicked (2): [],

},

stor­age_­con­flicts: BitMatrix(0x0) {},

} */

fn foo::{clo­sure#0}(_1: Pin<&mut {async block@src\main.rs:5:5: 5:10}>, _2: &mut Context<’_>) -> Poll<i32> {

de­bug _task_context => _2;

let mut _0: core::task::Poll<i32>;

let mut _3: i32;

let mut _4: u32;

let mut _5: &mut {async block@src\main.rs:5:5: 5:10};

bb0: {

_5 = copy (_1.0: &mut {async block@src\main.rs:5:5: 5:10});

_4 = dis­crim­i­nant((*_5));

switchInt(move _4) -> [0: bb1, 1: bb4, oth­er­wise: bb5];

}

bb1: {

_3 = const 5_i32;

goto -> bb3;

}

bb2: {

_0 = Poll::<i32>::Ready(move _3);

dis­crim­i­nant((*_5)) = 1;

re­turn;

}

bb3: {

goto -> bb2;

}

bb4: {

as­sert(const false, `async fn` re­sumed af­ter com­ple­tion”) -> [success: bb4, un­wind un­reach­able];

}

bb5: {

un­reach­able;

}

}

Yikes! That’s a lot of code!

Notice at line 4 that we still have the 3 de­fault states and at line 22 that we’re still switch­ing on it. There’s a big op­ti­miza­tion op­por­tu­nity here that we’re not us­ing, i.e. to have no states and al­ways re­turn Poll::Ready(5) on every poll.

DNSSEC Debugger - nic.de

dnssec-analyzer.verisignlabs.com

Back to Verisign Labs Tools

Analyzing DNSSEC prob­lems for nic.de

Move your mouse over any or sym­bols for re­me­di­a­tion hints.

Want a sec­ond opin­ion? Test nic.de at dnsviz.net.

↓ Advanced op­tions

Accelerating Gemma 4: faster inference with multi-token prediction drafters

blog.google

May 05, 2026

By us­ing Multi-Token Prediction (MTP) drafters, Gemma 4 mod­els re­duce la­tency bot­tle­necks and achieve im­proved re­spon­sive­ness for de­vel­op­ers.

Olivier Lacombe

Director, Product Management

Maarten Grootendorst

Developer Relations Engineer

Your browser does not sup­port the au­dio el­e­ment.

Listen to ar­ti­cle

This con­tent is gen­er­ated by Google AI. Generative AI is ex­per­i­men­tal

[[duration]] min­utes

Just a few weeks ago, we in­tro­duced Gemma 4, our most ca­pa­ble open mod­els to date. With over 60 mil­lion down­loads in just the first few weeks, Gemma 4 is de­liv­er­ing un­prece­dented in­tel­li­gence-per-pa­ra­me­ter to de­vel­oper work­sta­tions, mo­bile de­vices and the cloud. Today, we are push­ing ef­fi­ciency even fur­ther.

We’re re­leas­ing Multi-Token Prediction (MTP) drafters for the Gemma 4 fam­ily. By us­ing a spe­cial­ized spec­u­la­tive de­cod­ing ar­chi­tec­ture, these drafters de­liver up to a 3x speedup with­out any degra­da­tion in out­put qual­ity or rea­son­ing logic.

Tokens-per-second speed in­creases, tested on hard­ware us­ing LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.

Why spec­u­la­tive de­cod­ing?

The tech­ni­cal re­al­ity is that stan­dard LLM in­fer­ence is mem­ory-band­width bound, cre­at­ing a sig­nif­i­cant la­tency bot­tle­neck. The proces­sor spends the ma­jor­ity of its time mov­ing bil­lions of pa­ra­me­ters from VRAM to the com­pute units just to gen­er­ate a sin­gle to­ken. This leads to un­der-uti­lized com­pute and high la­tency, es­pe­cially on con­sumer-grade hard­ware.

Speculative de­cod­ing de­cou­ples to­ken gen­er­a­tion from ver­i­fi­ca­tion. By pair­ing a heavy tar­get model (e.g., Gemma 4 31B) with a light­weight drafter (the MTP model), we can uti­lize idle com­pute to predict” sev­eral fu­ture to­kens at once with the drafter in less time than it takes for the tar­get model to process just one to­ken. The tar­get model then ver­i­fies all of these sug­gested to­kens in par­al­lel.

How spec­u­la­tive de­cod­ing works

Standard large lan­guage mod­els gen­er­ate text au­tore­gres­sively, pro­duc­ing ex­actly one to­ken at a time. While ef­fec­tive, this process ded­i­cates the same amount of com­pu­ta­tion to pre­dict­ing an ob­vi­ous con­tin­u­a­tion (like pre­dict­ing words” af­ter Actions speak louder than…”) as it does to solv­ing a com­plex logic puz­zle.

MTP mit­i­gates this in­ef­fi­ciency through spec­u­la­tive de­cod­ing, a tech­nique in­tro­duced by Google re­searchers in Fast Inference from Transformers via Speculative Decoding. If the tar­get model agrees with the draft, it ac­cepts the en­tire se­quence in a sin­gle for­ward pass —and even gen­er­ates an ad­di­tional to­ken of its own in the process. This means your ap­pli­ca­tion can out­put the full drafted se­quence plus one to­ken in the time it usu­ally takes to gen­er­ate a sin­gle one.

Unlocking faster AI from the edge to the work­sta­tion

For de­vel­op­ers, in­fer­ence speed is of­ten the pri­mary bot­tle­neck for pro­duc­tion de­ploy­ment. Whether you are build­ing cod­ing as­sis­tants, au­tonomous agents that re­quire rapid multi-step plan­ning, or re­spon­sive mo­bile ap­pli­ca­tions run­ning en­tirely on-de­vice, every mil­lisec­ond mat­ters.

By pair­ing a Gemma 4 model with its cor­re­spond­ing drafter, de­vel­op­ers can achieve:

Improved re­spon­sive­ness: Drastically re­duce la­tency for near real-time chat, im­mer­sive voice ap­pli­ca­tions and agen­tic work­flows.

Supercharged lo­cal de­vel­op­ment: Run our 26B MoE and 31B Dense mod­els on per­sonal com­put­ers and con­sumer GPUs with un­prece­dented speed, pow­er­ing seam­less, com­plex of­fline cod­ing and agen­tic work­flows.

Enhanced on-de­vice per­for­mance: Maximize the util­ity of our E2B and E4B mod­els on edge de­vices by gen­er­at­ing out­puts faster, which in turn pre­serves valu­able bat­tery life.

Zero qual­ity degra­da­tion: Because the pri­mary Gemma 4 model re­tains the fi­nal ver­i­fi­ca­tion, you get iden­ti­cal fron­tier-class rea­son­ing and ac­cu­racy, just de­liv­ered sig­nif­i­cantly faster.

Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in to­kens per sec­ond. Same out­put qual­ity, half the wait time.

Where you can dive deeper into MTP drafters

To make these MTP drafters ex­cep­tion­ally fast and ac­cu­rate, we in­tro­duced sev­eral ar­chi­tec­tural en­hance­ments un­der the hood. The draft mod­els seam­lessly uti­lize the tar­get mod­el’s ac­ti­va­tions and share its KV cache, mean­ing they don’t have to waste time re­cal­cu­lat­ing con­text the larger model has al­ready fig­ured out. For our E2B and E4B edge mod­els, where the fi­nal logit cal­cu­la­tion be­comes a big bot­tle­neck, we even im­ple­mented an ef­fi­cient clus­ter­ing tech­nique in the em­bed­der to fur­ther ac­cel­er­ate gen­er­a­tion.

We’ve also been closely an­a­lyz­ing hard­ware-spe­cific op­ti­miza­tions. For ex­am­ple, while the 26B mix­ture-of-ex­perts model pre­sents unique rout­ing chal­lenges at a batch size of 1 on Apple Silicon, pro­cess­ing mul­ti­ple re­quests si­mul­ta­ne­ously (e.g., batch sizes of 4 to 8) un­locks up to a ~2.2x speedup lo­cally. We see sim­i­lar gains with Nvidia A100 when in­creas­ing batch size.

Want to see the ex­act me­chan­ics of how this works? We’ve pub­lished an in-depth tech­ni­cal ex­plainer that un­packs the vi­sual ar­chi­tec­ture, KV cache shar­ing and ef­fi­cient em­bed­ders pow­er­ing these drafters.

How to get started

The MTP drafters for the Gemma 4 fam­ily are avail­able to­day un­der the same open-source Apache 2.0 li­cense as Gemma 4. Read the doc­u­men­ta­tion to learn how to use MTP with Gemma 4. You can down­load the model weights right now on Hugging Face, Kaggle, and start ex­per­i­ment­ing with faster in­fer­ence with trans­form­ers, MLX, VLLM, SGLang, and Ollama or try them di­rectly on Google AI Edge Gallery for Android or iOS.

We can’t wait to see how this new­found speed ac­cel­er­ates what you build next in the Gemmaverse.

Y Combinator’s Stake in OpenAI

daringfireball.net

Speaking of com­pa­nies with valu­able mi­nor­ity stakes in AI com­pa­nies, there’s one thing that stuck in my craw about the block­buster Ronan Farrow / Andrew Marantz in­ves­tiga­tive piece on Sam Altman and OpenAI last month for The New Yorker. It did­n’t come up dur­ing Nilay Patel’s ex­cel­lent in­ter­view with Farrow on Decoder, ei­ther.

Sam Altman was the pres­i­dent of Y Combinator for sev­eral years, and left to be­come the full-time CEO of OpenAI. The New Yorker quotes Y Combinator co-founder Paul Graham mul­ti­ple times, in the con­text of Altman’s trust­wor­thi­ness. (Some of those quotes are first­hand, oth­ers sec­ond­hand.) Graham’s role in the story — par­tic­u­larly his pub­lic re­marks af­ter pub­li­ca­tion — com­prised an en­tire sec­tion in my own take on the New Yorker piece, wherein I con­cluded:

I would char­ac­ter­ize Graham’s tweets re: Altman this week as

em­pha­siz­ing only that Altman was not fired or oth­er­wise forced

from YC, and could have stayed as CEO at YC if he’d found an­other

CEO for OpenAI. But for all of Graham’s elu­ci­dat­ing en­gage­ment on

Twitter/X this week re­gard­ing this story, he’s danc­ing around the

core ques­tion of the Farrow/Marantz in­ves­ti­ga­tion, the one right

there in The New Yorker’s head­line: Can Sam Altman be trusted?

We did­n’t remove’ Sam Altman” and We did­n’t want him to

leave” are not the same things as say­ing, say, I think Sam

Altman is hon­est and trust­wor­thy” or Sam Altman is a man of

in­tegrity”. If Paul Graham were to say such things, clearly and

un­am­bigu­ously, those re­marks would carry tremen­dous weight. But — rather con­spic­u­ously to my eyes — he’s not say­ing such things.

I would char­ac­ter­ize Graham’s tweets re: Altman this week as

em­pha­siz­ing only that Altman was not fired or oth­er­wise forced

from YC, and could have stayed as CEO at YC if he’d found an­other

CEO for OpenAI. But for all of Graham’s elu­ci­dat­ing en­gage­ment on

Twitter/X this week re­gard­ing this story, he’s danc­ing around the

core ques­tion of the Farrow/Marantz in­ves­ti­ga­tion, the one right

there in The New Yorker’s head­line: Can Sam Altman be trusted?

We did­n’t remove’ Sam Altman” and We did­n’t want him to

leave” are not the same things as say­ing, say, I think Sam

Altman is hon­est and trust­wor­thy” or Sam Altman is a man of

in­tegrity”. If Paul Graham were to say such things, clearly and

un­am­bigu­ously, those re­marks would carry tremen­dous weight. But — rather con­spic­u­ously to my eyes — he’s not say­ing such things.

The thing that stuck in my craw is this: Does Y Combinator own a stake in OpenAI? And if they do, given OpenAI’s sky-high val­u­a­tion, is­n’t that stake worth bil­lions of dol­lars?

OpenAI was seeded by an off­shoot of Y Combinator called YC Research in 2016 — when Altman was run­ning YC. In December 2023, the well-known AI ex­pert (and AI-hype skep­tic) Gary Marcus wrote the fol­low­ing, in a piece on Altman’s trust­wor­thi­ness in the wake of the OpenAI board saga that saw Altman fired, re-hired, and the board purged in the course of a tu­mul­tuous week:

After pok­ing around, I found out that I have no eq­uity in OpenAI”

was only half the truth; while Altman to my knowl­edge holds no

di­rect eq­uity in OpenAI, he does have an in­di­rect stake in

OpenAI, and that fact should have been dis­closed.

In par­tic­u­lar, he owns a stake of Y Combinator, and Y Combinator

owns a stake in OpenAI. It may well be worth tens of mil­lions of

dol­lars; even for Altman, that’s not triv­ial. Since he was

President of Y Combinator, and CEO of OpenAI; he surely was

aware of this.

After pok­ing around, I found out that I have no eq­uity in OpenAI”

was only half the truth; while Altman to my knowl­edge holds no

di­rect eq­uity in OpenAI, he does have an in­di­rect stake in

OpenAI, and that fact should have been dis­closed.

In par­tic­u­lar, he owns a stake of Y Combinator, and Y Combinator

owns a stake in OpenAI. It may well be worth tens of mil­lions of

dol­lars; even for Altman, that’s not triv­ial. Since he was

President of Y Combinator, and CEO of OpenAI; he surely was

aware of this.

So it’s well known that Y Combinator owns some stake in OpenAI. But how big is that stake? This seems like dev­il­ishly dif­fi­cult in­for­ma­tion to ob­tain. I asked around and a lit­tle birdie who knows sev­eral OpenAI in­vestors came back with an an­swer: Y Combinator owns about 0.6 per­cent of OpenAI. At OpenAI’s cur­rent $852 bil­lion val­u­a­tion, that’s worth over $5 bil­lion.

Graham and his wife Jessica Livingston are two of Y Combinator’s four found­ing part­ners. The fact that Paul Graham per­son­ally has bil­lions of dol­lars at stake with OpenAI does­n’t mean that his pub­lic opin­ion on Sam Altman’s trust­wor­thi­ness and lead­er­ship is in­valid. But it cer­tainly seems like the sort of thing that ought to be dis­closed when quot­ing Graham as an Altman char­ac­ter ref­er­ence. A bil­lion dol­lars here, a bil­lion there — that adds up to the sort of money that might skew a fel­low’s opin­ion.

Agent Skills

addyosmani.com

A se­nior en­gi­neer’s job is mostly the parts that don’t show up in the diff. Specs. Tests. Reviews. Scope dis­ci­pline. Refusing to ship what can’t be ver­i­fied. AI cod­ing agents skip those parts by de­fault. Agent Skills is my at­tempt to make them not op­tional.

The de­fault be­hav­iour of any AI cod­ing agent is to take the short­est path to done.” Ask for a fea­ture and it writes the fea­ture. It does not ask whether you have a spec, write a test be­fore the im­ple­men­ta­tion, con­sider whether the change crosses a trust bound­ary, or check what the PR will look like to a re­viewer. It pro­duces code, de­clares vic­tory, and moves on.

This is the same fail­ure mode every se­nior en­gi­neer has spent their ca­reer learn­ing to avoid. The se­nior ver­sion of any task in­cludes work that does­n’t show up in the diff: sur­fac­ing as­sump­tions, writ­ing the spec, break­ing the work into re­view­able chunks, choos­ing the bor­ing de­sign, leav­ing ev­i­dence that the re­sult is cor­rect, siz­ing the change so a hu­man can ac­tu­ally re­view it. Those steps are most of what sep­a­rates en­gi­neers who ship re­li­able soft­ware at scale from peo­ple who push code that breaks.

Agents skip those steps for the same rea­son any ju­nior would. They’re in­vis­i­ble. The re­ward sig­nal points at task com­plete” not task com­plete and the de­sign doc ex­ists.” So we have to bolt the se­nior-en­gi­neer scaf­fold­ing back on.

Agent Skills is my at­tempt at that scaf­fold­ing. It just crossed 27K stars, so ap­par­ently I’m not alone in want­ing it. This post is the part the README does­n’t quite cover: why each de­sign choice ex­ists, how it maps onto stan­dard SDLC and Google’s pub­lished en­gi­neer­ing prac­tices, and what you should steal from the pro­ject even if you never in­stall a sin­gle skill.

What a skill” ac­tu­ally is

The word skill” is do­ing a lot of work in the Claude Code / Anthropic vo­cab­u­lary, and it helps to be pre­cise. A skill is a mark­down file with front­mat­ter that gets in­jected into the agen­t’s con­text when the sit­u­a­tion calls for it. Somewhere be­tween a sys­tem-prompt frag­ment and a run­book.

A skill is not ref­er­ence doc­u­men­ta­tion. It is not everything you should know about test­ing.” It is a work­flow: a se­quence of steps the agent fol­lows, with check­points that pro­duce ev­i­dence, end­ing in a de­fined exit cri­te­rion.

That dis­tinc­tion is the whole game. If you put a 2,000-word es­say on test­ing best prac­tices into the agen­t’s con­text, the agent reads it, gen­er­ates plau­si­ble-look­ing text, and skips the ac­tual test­ing. If you put a work­flow there (write the fail­ing test first, run it, watch it fail, write the min­i­mum code to pass, watch it pass, refac­tor), the agent has some­thing to do, and you have some­thing to ver­ify.

Process over prose. Workflows over ref­er­ence. Steps with exit cri­te­ria over es­says with­out them. That sin­gle dis­tinc­tion sep­a­rates a use­ful skill from a pretty mark­down file. It also ex­plains why so many AI rules” re­pos end up do­ing noth­ing in prac­tice. The rules are es­says.

The SDLC the skills en­code

The twenty skills in the repo or­gan­ise around six life­cy­cle phases, with seven slash com­mands sit­ting on top. Define (/spec) is where you de­cide what you’re ac­tu­ally build­ing. Plan (/plan) breaks the work down. Build (/build) im­ple­ments it in ver­ti­cal slices. Verify (/test) proves it works. Review (/review) catches what slipped through. Ship (/ship) gets it to users safely. /code-simplify sits across the bot­tom of the whole thing.

This is­n’t a co­in­ci­dence. It’s the same SDLC every func­tion­ing en­gi­neer­ing or­gan­i­sa­tion runs, just in dif­fer­ent vo­cab­u­lary. Google calls it de­sign doc → re­view → im­ple­men­ta­tion → read­abil­ity re­view → launch check­list. Amazon calls it the work­ing-back­wards memo and the bar raiser. Every healthy team has some ver­sion of this loop.

What’s new with AI cod­ing agents is that most agents skip most of these phases by de­fault. You ask for a fea­ture, you get an im­ple­men­ta­tion, and the spec, plan, tests, re­view, and launch check­list all just don’t hap­pen. Skills push the agent through the same phases a se­nior en­gi­neer forces them­selves through, be­cause ship­ping the code with­out them is how you pro­duce in­ci­dents.

A com­plex fea­ture might ac­ti­vate eleven skills in se­quence. A small bug fix might use three. The router (using-agent-skills) de­cides which ap­ply. The point is that the work­flow scales to the ac­tual scope, not to the as­sumed scope.

Five prin­ci­ples that are do­ing the work

Five de­sign de­ci­sions in the pro­ject are the load-bear­ing ones. The rest of the sys­tem fol­lows from them.

1. Process over prose

Already cov­ered. Workflows are agent-ac­tion­able; es­says are not. The same is true for hu­man teams. If your team hand­book is 200 pages, no one reads it un­der time pres­sure. If it’s a small set of work­flows with check­points, peo­ple ac­tu­ally run them.

2. Anti-rationalization ta­bles

This is the most dis­tinc­tive de­sign de­ci­sion in the pro­ject, and the one I most want other teams to steal.

Each skill in­cludes a table of com­mon ex­cuses an agent (or a tired en­gi­neer) might use to skip the work­flow, paired with a writ­ten re­but­tal. A few ex­am­ples close to the orig­i­nals:

This task is too sim­ple to need a spec.” → Acceptance cri­te­ria still ap­ply. Five lines is fine. Zero lines is not.

I’ll write tests later.” → Later is the load-bear­ing word. There is no later. Write the fail­ing test first.

Tests pass, ship it.” → Passing tests are ev­i­dence, not proof. Did you check the run­time? Did you ver­ify user-vis­i­ble be­hav­iour? Did a hu­man read the diff?

The rea­son this works is that LLMs are ex­cel­lent at ra­tio­nal­i­sa­tion. They will pro­duce a plau­si­ble-sound­ing para­graph ex­plain­ing why this par­tic­u­lar task does­n’t need a spec, or why this par­tic­u­lar change is fine to merge with­out re­view. Anti-rationalization ta­bles are pre-writ­ten re­but­tals to lies the agent has­n’t yet told.

The pat­tern is just as good for hu­man teams. Most en­gi­neer­ing de­cay is­n’t any­one choos­ing to do bad work. It’s peo­ple ac­cept­ing plau­si­ble-sound­ing jus­ti­fi­ca­tions for skip­ping the parts they don’t feel like do­ing. A team that writes down its anti-ra­tio­nal­iza­tions is a team that has fewer of them.

3. Verification is non-ne­go­tiable

Every skill ter­mi­nates in con­crete ev­i­dence. Tests pass. Build out­put is clean. The run­time trace shows the ex­pected be­hav­iour. A re­viewer signs off. Seems right” is never suf­fi­cient.

This is the same prin­ci­ple that makes Anthropic’s har­ness re­cover from fail­ures, that makes Cursor’s plan­ner/​worker/​judge split ac­tu­ally catch bugs, that makes any long-run­ning agent re­cov­er­able. The agent is a gen­er­a­tor. You need a sep­a­rate sig­nal that the work is done. Skills bake that sig­nal into every work­flow.

4. Progressive dis­clo­sure

Do not load all twenty skills into con­text at ses­sion start. Activate them based on the phase. A small meta-skill (using-agent-skills) acts as a router that de­cides which skill ap­plies to the cur­rent task.

This is the har­ness en­gi­neer­ing les­son ap­plied at skill gran­u­lar­ity. Every to­ken loaded into con­text de­grades per­for­mance some­where, so you load what’s rel­e­vant and leave the rest on disk. Progressive dis­clo­sure is how you get a twenty-skill li­brary into a 5K-token slot with­out poi­son­ing the well.

5. Scope dis­ci­pline

The meta-skill en­codes a non-ne­go­tiable I’d sta­ple to every agent if I could: touch only what you’re asked to touch.” Don’t refac­tor ad­ja­cent sys­tems. Don’t re­move code you don’t fully un­der­stand. Don’t brush against a TODO and de­cide to rewrite the file.

This sounds ob­vi­ous un­til you watch an agent de­cide that fix­ing one bug re­quires mod­ern­iz­ing three un­re­lated files. Scope dis­ci­pline is the sin­gle biggest de­ter­mi­nant of whether an agen­t’s PR is merge­able or has to be un­wound. It’s also the prin­ci­ple that maps most cleanly onto Google’s code re­view norms, where re­view­ers will block a PR for do­ing more than one thing.

The Google DNA

The skills are sat­u­rated with prac­tices from Software Engineering at Google and Google’s pub­lic en­gi­neer­ing cul­ture. This is in­ten­tional. Most of what makes Google-scale soft­ware work is doc­u­mented and pub­lic, and it is ex­actly the part agents are most likely to skip.

A par­tial map of which skill en­codes which prac­tice:

Hyrum’s Law in api-and-in­ter­face-de­sign. Every ob­serv­able be­hav­iour of your API will even­tu­ally be de­pended on by some­one, so de­sign with that in mind.

The test pyra­mid (~80/15/5) and the Beyoncé Rule in test-dri­ven-de­vel­op­ment. If you liked it, you should have put a test on it.” Infrastructure changes don’t catch bugs; tests do.

DAMP over DRY in tests. Google’s test­ing phi­los­o­phy is ex­plicit that test code should read like a spec­i­fi­ca­tion even at the cost of some du­pli­ca­tion. Over-abstracted tests are a known anti-pat­tern.

~100-line PR siz­ing, with Critical / Nit / Optional / FYI sever­ity la­bels in code-re­view-and-qual­ity. Straight from Google’s code re­view norms. Big PRs don’t get re­viewed; they get rub­ber-stamped.

Chesterton’s Fence in code-sim­pli­fi­ca­tion. Don’t re­move a thing un­til you un­der­stand why it was put there.

Trunk-based de­vel­op­ment and atomic com­mits in git-work­flow-and-ver­sion­ing.

Shift Left and fea­ture flags in ci-cd-and-au­toma­tion. Catch prob­lems as early as pos­si­ble, de­cou­ple de­ploy from re­lease.

Code-as-liability in dep­re­ca­tion-and-mi­gra­tion. Every line you keep is one you have to main­tain for­ever, so pre­fer the smaller sur­face.

None of these are new ideas. The point is that none of them are in the agent by de­fault. A fron­tier model has read the phrase Hyrum’s Law” in its train­ing data, but it does not ap­ply Hyrum’s Law when it’s de­sign­ing your API at 3am. Skills are how you make sure it does.

How to ac­tu­ally use it

Three modes, in roughly in­creas­ing com­mit­ment.

Mode 1: in­stall via mar­ket­place. If you’re us­ing Claude Code:

/plugin mar­ket­place add ad­dyos­mani/​agent-skills

/plugin in­stall agent-skills@addy-agent-skills

You get the slash com­mands (/spec, /plan, /build, /test, /review, /ship, /code-simplify) and the agent ac­ti­vates the rel­e­vant skills au­to­mat­i­cally based on con­text. This is the path I’d rec­om­mend most peo­ple start on.

Mode 2: drop the mark­down into your tool of choice. The skills are plain mark­down with front­mat­ter. Cursor users put them in .cursor/rules/. Gemini CLI has its own in­stall path. Codex, Aider, Windsurf, OpenCode, any­thing that ac­cepts a sys­tem prompt can read them. The tool­ing mat­ters less than the work­flow un­der­neath.

Mode 3: read them as a spec. Even if you never in­stall any­thing, the skills are a doc­u­mented de­scrip­tion of what good en­gi­neer­ing with AI agents looks like. Read code-re­view-and-qual­ity.md and ap­ply the five-axis frame­work to your team’s re­view process. Read test-dri­ven-de­vel­op­ment.md and use it to set­tle the next do we need to write the test first” ar­gu­ment with a ju­nior. Read the meta-skill and steal the five non-ne­go­tiables for your own AGENTS.md.

This third mode is where I’d ac­tu­ally start. Pick the four or five skills clos­est to your cur­rent pain. Decide which work­flows you want en­forced. Then in­stall the run­time, or roll your own, to do the en­forc­ing.

What to steal even if you never in­stall

A few pat­terns from the pro­ject I’d steal re­gard­less of whether you use AI cod­ing agents at all.

Anti-rationalization as a team prac­tice. Write down the lies your team tells it­self. We’ll fix the tests af­ter launch.” This change is too small for a de­sign doc.” It’s fine, we have mon­i­tor­ing.” Pair each with the re­but­tal. Put it in your AGENTS.md or your en­gi­neer­ing wiki. It will save you ar­gu­ments and it will catch the next tired Friday-afternoon short­cut.

Process over prose for any­thing you write in­ter­nally. If you find your­self writ­ing a 2,000-word doc ti­tled how we ap­proach X” you’ve writ­ten ref­er­ence ma­te­r­ial. Convert it to a work­flow with check­points. The doc shrinks to 400 words and peo­ple ac­tu­ally run it. This ap­plies as much to on­board­ing guides and run­books as it does to agent skills.

Verification as a hard exit cri­te­rion. Make produce ev­i­dence” the exit step of every task. For agents, for en­gi­neers, for your­self. Evidence is what­ever proves the work is done: a green test run, a screen­shot, a log, a re­view ap­proval. Without it, the task is not done. Seems right” never closes the loop.

Progressive dis­clo­sure for any rule­book. Do not write a 50-page hand­book. Write a small router that points to the right small chap­ter for the sit­u­a­tion. This is true for AGENTS.md, for run­books, for in­ci­dent play­books, for any­thing any­one will read un­der time pres­sure.

Five non-ne­go­tiables, lifted from the meta-skill, that I’d put in any AGENTS.md to­mor­row:

Surface as­sump­tions be­fore build­ing. Wrong as­sump­tions held silently are the most com­mon fail­ure mode.

Stop and ask when re­quire­ments con­flict. Don’t guess.

Push back when war­ranted. The agent (or en­gi­neer) is not a yes-ma­chine.

Prefer the bor­ing, ob­vi­ous so­lu­tion. Cleverness is ex­pen­sive.

Touch only what you’re asked to touch.

That’s a worth­while en­gi­neer­ing cul­ture in five lines, and you don’t need to in­stall any­thing to adopt it.

Where this fits in the har­ness

In the broader pic­ture, skills are one layer of agent har­ness en­gi­neer­ing. The har­ness is the model plus every­thing you build around it; skills are the reusable work­flow chunks that get pro­gres­sively dis­closed into the sys­tem prompt. They sit along­side AGENTS.md (the rolling rule­book), hooks (the de­ter­min­is­tic en­force­ment layer), tools (the ac­tions the agent can take), and the ses­sion log (the durable mem­ory). Each layer has a spe­cific job. Skills do the se­nior-en­gi­neer-process job.

Skills mat­ter more for long-run­ning agents than they do for chat-style ones, be­cause long runs am­plify every short­cut. An agent that skips the test in a 10-minute ses­sion pro­duces one bug. An agent that skips the test in a 30-hour ses­sion pro­duces a de­bug­ging ar­chae­ol­ogy pro­ject at the end of the run, when no one re­mem­bers what the orig­i­nal in­tent was. The longer the run, the more the se­nior-en­gi­neer scaf­fold­ing has to be en­forced rather than sug­gested.

The porta­bil­ity of the skills for­mat mat­ters too. The same SKILL.md file works in Claude Code, Cursor (with rules), Gemini CLI, Codex, and any other har­ness that ac­cepts sys­tem-prompt con­tent. Write the work­flow once, the run­time en­forces it. That’s the thing the mark­down-with-front­mat­ter for­mat buys you that be­spoke prompt en­gi­neer­ing does not.

Closing

The thing I most want peo­ple to take from this pro­ject, more than the skills them­selves, is the fram­ing.

AI cod­ing agents are ex­tremely ca­pa­ble ju­nior en­gi­neers with no in­stinct for the parts of the job that don’t show up in the diff. The se­nior-en­gi­neer­ing work (surfacing as­sump­tions, siz­ing changes, writ­ing the spec, leav­ing ev­i­dence, re­fus­ing to merge what can’t be re­viewed) is ex­actly what an agent will skip un­less you make it im­pos­si­ble to skip. The job, in­creas­ingly, is to en­code that dis­ci­pline as some­thing the agent can­not talk it­self out of.

Skills are one shape of that. Anti-rationalization ta­bles. Progressive dis­clo­sure. Process over prose. Verification as the load-bear­ing exit cri­te­rion. The Google prac­tices that al­ready work, made portable.

You can in­stall my ver­sion. You can roll your own. The les­son stands ei­ther way: the se­nior-en­gi­neer parts of the job are no longer op­tional, even when the en­gi­neer is a model.

The repo is at github.com/​ad­dyos­mani/​agent-skills (MIT). For the broader scaf­fold­ing pic­ture, see Agent Harness Engineering and Long-running Agents.

iOS 27 is adding a 'Create a Pass' button to Apple Wallet

walletwallet.alen.ro

Bloomberg’s Mark Gurman re­ported on Monday that iOS 27 will add a Create a Pass” fea­ture to the Wallet app. Tap the +” but­ton you al­ready use to add credit cards or pass emails, and Wallet will of­fer some­thing it has never of­fered be­fore on iPhone: a path to build your own pass.

You can scan a QR code on a pa­per ticket or mem­ber­ship card with the cam­era, or build a pass from scratch in a lay­out ed­i­tor. The whole flow runs with­out an Apple Developer ac­count, a Pass Type ID, or any cer­tifi­cate sign­ing.

iOS 27 is ex­pected to pre­view at WWDC on June 8, with a pub­lic re­lease in September.

How the new flow works

Reporting from Bloomberg, MacRumors, 9to5Mac, and AppleInsider lines up on the same work­flow. Inside the Wallet app, the ex­ist­ing +” but­ton gains a new op­tion for cre­at­ing a pass. From there you choose be­tween two start­ing points:

Scan a QR code from a pa­per card, ticket, or screen

Build a cus­tom pass from scratch with no scan needed

Once you are in the ed­i­tor, Wallet ex­poses ad­justable styles, im­ages, col­ors, and text fields. The re­ports de­scribe a fairly con­ven­tional tem­plate-dri­ven lay­out, closer in spirit to what Pass2U, WalletWallet, and other third-party gen­er­a­tors have of­fered for years than to Apple’s de­vel­oper-only PassKit pipeline.

Three tem­plates, color-coded

Apple is test­ing three start­ing tem­plates, each tied to a de­fault color:

Standard (orange): the de­fault for any gen­eral-pur­pose pass.

Membership (blue): geared to­ward gyms, clubs, li­braries, and other re­cur­ring-ac­cess cards.

Event (purple): meant for tick­ets to games, movies, and one-off oc­ca­sions.

The color choice is not just dec­o­ra­tion. Wallet cur­rently sorts passes vi­su­ally in the stack, and the tem­plate hue is what sets each card apart at a glance, so a quick look is enough to pick out the or­ange punch card from the pur­ple ticket with­out read­ing a word.

Why now: 14 years of PassKit drought

Apple shipped PassKit along­side iOS 6 back in 2012. The pitch was clean: busi­nesses build .pkpass files, cus­tomers tap to add, every­one wins. In prac­tice, the con­sis­tent adopters ended up be­ing air­lines, big-box re­tail­ers, tick­et­ing plat­forms, and a hand­ful of na­tional chains. Most gyms, cafes, li­braries, rec cen­ters, and small loy­alty pro­grams never built one, be­cause the path re­quires an Apple Developer ac­count, sign­ing cer­tifi­cates, and enough en­gi­neer­ing work that just print a pa­per card” al­most al­ways won the bud­get con­ver­sa­tion.

The Next Web’s fram­ing is blunt: Apple is no longer wait­ing on de­vel­op­ers. With Create a Pass, the sup­ply-side prob­lem is fi­nally be­ing solved from the de­mand side. If the busi­ness will not build a Wallet pass, the user does it them­selves from the QR code that busi­ness al­ready printed.

That is a mean­ing­ful shift in pos­ture. For more than a decade, Wallet has been a di­rec­tory of what brands chose to ship. In iOS 27 it be­comes a di­rec­tory of what peo­ple choose to keep.

What this means for WalletWallet

We will be hon­est. WalletWallet ex­ists be­cause of this ex­act gap. You take a bar­code from any loy­alty card, paste it into our web app, pick a color, and a free Apple Wallet pass lands on your phone in about a minute, all from the browser with­out an ac­count or any de­vel­oper setup. Once Create a Pass ships in September, a chunk of that work­flow moves na­tively into the iPhone Wallet app.

That is good for users. We started this pro­ject to make Wallet friend­lier for the cafes-and-gyms long tail, and Apple agree­ing with us at OS-level scope is a healthy out­come. The cat­e­gory needed it.

A few places where we still help, even af­ter iOS 27 ships:

Google Wallet. Create a Pass is iPhone-only. Roughly half of the wal­let-us­ing world is on Android, and our gen­er­a­tor builds Google Wallet passes from the same form.

Web, no OS up­grade. iOS 27 needs a com­pat­i­ble iPhone and the September up­date. WalletWallet runs in any browser to­day. iOS 14, iPad, Mac, a friend’s lap­top, all fine.

Tag passes with real in­te­gra­tions. Our Bandcamp, SoundCloud, and Spotify pass builders pull artist art and links au­to­mat­i­cally into a tag pass. That is a dif­fer­ent shape from the generic tem­plated pass Apple is show­ing.

Sharing. A web-gen­er­ated .pkpass is just a file. You can email it, post it, hand it to a friend on Android via QR. The Wallet-native flow is more locked to the de­vice that built it.

We ex­pect to lose vol­ume on the sim­plest one-bar­code-to-Wal­let case once Create a Pass goes live. That is fine. The rea­son WalletWallet started was that Apple’s bar for a Wallet pass was too high for nor­mal peo­ple. If iOS 27 low­ers that bar, the world we wanted is closer.

What we still do not know

The cur­rent re­ports cover the UI, the tem­plates, and the high-level work­flow. They are silent on a lot of de­tails that mat­ter:

Whether iCloud will sync user-cre­ated passes across iPhone, iPad, and Mac

Whether passes can be ex­ported as .pkpass files to share with non-iPhone users

Whether Wallet sup­ports Code 128, PDF417, and Aztec bar­codes, or only QR

Whether mer­chants can claim, co-sign, or up­date user-cre­ated passes af­ter the fact

Whether passes have lock-screen be­hav­ior tied to time and lo­ca­tion, the way de­vel­oper-is­sued passes do to­day

We will know more once Apple pre­views iOS 27 at WWDC on June 8, and again when the first de­vel­oper be­tas land. We will up­date this post when there is some­thing con­crete to add.

Quick re­cap

iOS 27 is adding a Create a Pass but­ton to the Wallet app, with a QR-scan or build-from-scratch flow and three color-coded tem­plates: Standard (orange), Membership (blue), and Event (purple). Bloomberg broke the story on May 4, and a pub­lic re­lease is ex­pected in September 2026. It will be the first time iPhone users do not need a third-party tool to put a bar­code into Wallet, and for us that is a sign the cat­e­gory is ma­tur­ing the right way.

Sources

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.