10 interesting stories served every morning and every evening.

Steam Machine

store.steampowered.com

© Valve Corporation. All rights re­served. All trade­marks are prop­erty of their re­spec­tive own­ers in the US and other coun­tries. Privacy Policy |  Legal |  Accessibility |  Steam Subscriber Agreement |  Refunds |  Cookies

Steam Hardware - Steam Machine launches today! - Steam News

store.steampowered.com

© Valve Corporation. All rights re­served. All trade­marks are prop­erty of their re­spec­tive own­ers in the US and other coun­tries. Privacy Policy |  Legal |  Accessibility |  Steam Subscriber Agreement |  Refunds |  Cookies

Desktop apps

docs.deno.com

deno desk­top turns a Deno pro­ject (anything from a sin­gle TypeScript file to a Next.js app) into a self-con­tained desk­top ap­pli­ca­tion. The out­put is a re­dis­trib­utable bi­nary that bun­dles your code, the Deno run­time, and a web ren­der­ing en­gine into one bun­dle per plat­form.

Coming in Deno 2.9

deno desk­top ships in Deno v2.9.0 and is not in a sta­ble re­lease yet. To try it now, run deno up­grade ca­nary to in­stall the ca­nary build. The com­mand, con­fig­u­ra­tion keys, and TypeScript APIs may still change be­fore the fea­ture is sta­ble.

Why deno desk­top Jump to head­ing

Web tech­nol­ogy is the most widely-known UI toolkit in the world. Desktop apps built on web stacks (Electron, Tauri, Electrobun) take ad­van­tage of that, but each has trade­offs you have to live with: huge bi­na­ries, miss­ing plat­form sup­port, no JavaScript ecosys­tem, no built-in up­date story, no frame­work in­te­gra­tion.

deno desk­top is opin­ion­ated about those trade­offs:

Small by de­fault, full Node com­pat­i­bil­ity. The de­fault WebView back­end uses the op­er­at­ing sys­tem’s own we­b­view for small bi­na­ries, and you still have the en­tire npm ecosys­tem avail­able through Deno’s Node com­pat layer. Opt into the bun­dled Chromium (CEF) back­end when you need iden­ti­cal ren­der­ing across ma­cOS, Windows, and Linux.

Framework auto-de­tec­tion. Point deno desk­top at a Next.js, Astro, Fresh, Remix, Nuxt, SvelteKit, SolidStart, TanStack Start, or Vite SSR pro­ject and it runs: the pro­duc­tion server in re­lease mode, the dev server with hot re­load un­der –hmr. No code changes are re­quired to take an ex­ist­ing web pro­ject to the desk­top.

In-process bind­ings in­stead of IPC. Backend and UI com­mu­ni­ca­tion goes through in-process chan­nels, not socket-based IPC. Values are still en­coded as they cross the call bound­ary, but there is no cross-process round-trip be­tween your Deno code and the we­b­view.

Cross-compile from one ma­chine. The same ma­chine can build for ma­cOS, Windows, and Linux. Backends are down­loaded as needed, not built lo­cally.

Built-in bi­nary-diff auto-up­date. Ship a sin­gle lat­est.json man­i­fest and bs­d­iff patches; the run­time polls, ap­plies, and rolls back au­to­mat­i­cally on failed launches.

Hello, desk­top Jump to head­ing

Create a one-file desk­top app:

main.ts

Deno.serve(() => new Response(“<h1>Hello, desk­top</​h1>”, { head­ers: { content-type”: text/html” }, }) );

>_

deno desk­top main.ts

The com­piled bi­nary opens a win­dow pointed at a lo­cal HTTP server bound to your Deno.serve() han­dler. Run it di­rectly:

>_

./main # ma­cOS / Linux .\main.exe # Windows

Deno.serve() au­to­mat­i­cally binds to the ad­dress the we­b­view nav­i­gates to, so you do not need to pass a port or host­name. See HTTP serv­ing for de­tails.

What’s in this sec­tion Jump to head­ing

Configuration: the desk­top block in deno.json.

Backends: CEF, we­b­view, raw; how to choose.

HTTP serv­ing: Deno.serve() in­te­gra­tion and the serv­ing model.

Frameworks: Next.js, Astro, Fresh, Remix, Nuxt, SvelteKit, and oth­ers.

Windows: Deno.BrowserWindow life­cy­cle, mul­ti­ple win­dows, events.

Bindings: call­ing Deno code from the we­b­view via bind­ings.<name>().

Menus: ap­pli­ca­tion and con­text menus.

Tray and dock: sys­tem sta­tus icons and the ma­cOS dock.

Dialogs: prompt(), alert(), con­firm() as na­tive pop­ups.

Notifications: na­tive OS no­ti­fi­ca­tions via the Web Notification API.

Hot mod­ule re­place­ment: –hmr for frame­work and non-frame­work apps.

DevTools: uni­fied DevTools at­tached to both the Deno run­time and the we­b­view.

Auto-update: Deno.autoUpdate(), man­i­fests, bs­d­iff, roll­back.

Error re­port­ing: cap­tur­ing un­caught ex­cep­tions and pan­ics.

Distribution: cross-com­pi­la­tion, out­put for­mats, in­stallers.

Comparison: how deno desk­top re­lates to Electron, Tauri, Electrobun, Dioxus.

deno desk­top CLI ref­er­ence: the com­mand, its flags, and the deno.json desk­top schema.

Pledging Another $400,000 to the Zig Software Foundation

mitchellh.com

My fam­ily is pledg­ing an­other $400,0001 to the Zig Software Foundation (ZSF). This brings our to­tal pledged sup­port for ZSF to $700,000, af­ter our ini­tial do­na­tion in 2024.

Zig con­tin­ues to earn my re­spect as a tech­ni­cal pro­ject and as a com­mu­nity. The 2026 de­vlog shows steady progress on the hard prob­lems of build­ing an ex­cel­lent lan­guage and com­piler. I also deeply re­spect the pro­jec­t’s ap­proach to main­tain­er­ship and com­mu­nity, re­flected in ini­tia­tives like Loris Cro’s Contributor Poker and Zig’s AI Ban. That phi­los­o­phy con­tin­ues to at­tract and de­velop some of the most tal­ented peo­ple in open source.

Recently, Zig’s strict no-LLM con­tri­bu­tion pol­icy be­came a pub­lic topic of dis­cus­sion again, es­pe­cially in the con­text of Bun’s Zig fork and Rust rewrite. I have no prob­lem with what Bun did, I think Bun is a great pro­ject, and I’m not in­ter­ested in turn­ing this into a Bun post. Instead, what stood out to me was how quickly peo­ple vil­lainized one an­other. Too much of the con­ver­sa­tion lacked em­pa­thy and re­spect for view­points dif­fer­ent from our own.

I use AI heav­ily. I’ve writ­ten about my AI adop­tion jour­ney and ship­ping real fea­tures with AI as­sis­tance. I’m also quite vo­cal about re­main­ing ra­tio­nal about its ca­pa­bil­i­ties and frus­trated with its neg­a­tive im­pacts on open source.

The point is that I have opin­ions. Those opin­ions don’t fully align with ZSFs ap­proach. And yet, I have noth­ing but re­spect for ZSF: the peo­ple, the poli­cies, and the pro­ject. Part of what makes the in­ter­net and open source great is that pro­jects can be weird and dif­fer­ent. They can set un­usual bound­aries, build their own cul­ture, and pur­sue qual­ity in ways that won’t make sense to every­one.

Zig is ex­cep­tional soft­ware: am­bi­tious, prac­ti­cal, in­de­pen­dent, and un­usu­ally se­ri­ous about qual­ity. Ghostty ex­ists in large part be­cause Zig made it pos­si­ble for me to build the kind of soft­ware I wanted to build. This is why I sup­port Zig.

I’m proud to sup­port Zig and the Zig Software Foundation again. Please con­sider do­nat­ing if you can.

Footnotes

$200,000 per year split over two years, the same struc­ture as our 2024 do­na­tion. ↩

$200,000 per year split over two years, the same struc­ture as our 2024 do­na­tion. ↩

Never Give Them Your Face

nevergivethemyourface.com

They want your face. It will be called safety. Verification. Age as­sur­ance. A small step to pro­tect chil­dren. But strip the lan­guage away and the de­mand is plain: be­fore you may speak, post, or read, you must first prove who you are. And the only way they’ve fig­ured out how to do it is with your gov­ern­ment ID, or with your face held up to a cam­era that de­cides whether you are old enough to be trusted. This is the deal now be­ing writ­ten into law on three con­ti­nents, and you are meant to ac­cept it qui­etly. Don’t.

It’s al­ways won’t some­one think of the chil­dren?!”. But this af­fects every­one.

No one dis­putes that the in­ter­net can hurt kids. That grief is real, but it’s be­ing ex­ploited. Here is the trick: to con­firm that a child is not pre­sent, a ser­vice has to check every­body. Every adult passes through the check­point. A law writ­ten about six­teen-year-olds qui­etly be­comes an iden­tity re­quire­ment for the en­tire in­ter­net. You are not carded be­cause you are sus­pected of any­thing. You are carded be­cause card­ing has be­come the price of ad­mis­sion to life on the web.

We run back­ground checks on peo­ple who want to buy a gun, but we do not back­ground check every­one at all times just in case. Yet that is ex­actly the de­sign here. It’s a per­mit check at the door of every con­ver­sa­tion, ap­plied to all, jus­ti­fied by the few.

It is not age ver­i­fi­ca­tion. It is iden­tity ver­i­fi­ca­tion.

Watch the words drift. This whole sys­tem was sold as age as­sur­ance, which is a yes-or-no ques­tion, are you over eigh­teen? But al­most none of these sys­tems are built to an­swer only that. They are built to know who you are: your name, your date of birth, your doc­u­ment num­ber, your face. This is not age ver­i­fi­ca­tion at all. It is forced iden­tity track­ing. Your real-world iden­tity cap­tured by not only Meta, Facebook, Twitter, Instagram, etc, but shared broadly with every creepy agency you al­ready worry about having all your data”.

Name the places now de­mand­ing age ver­i­fi­ca­tion,” and see how many will ac­cept a plain gov­ern­ment doc­u­ment that says only that you are over eigh­teen — and noth­ing else. Almost none will. Because age was never the point.

Name the places now de­mand­ing age ver­i­fi­ca­tion,” and see how many will ac­cept a plain gov­ern­ment doc­u­ment that says only that you are over eigh­teen — and noth­ing else. Almost none will. Because age was never the point.

We spent a gen­er­a­tion teach­ing peo­ple the first rule of the in­ter­net: never give out your real iden­tity to strangers. We have a word, doxxing, for in­flict­ing that ex­po­sure on some­one against their will. And now the same gov­ern­ments and plat­forms are ask­ing every cit­i­zen to do it to them­selves, vol­un­tar­ily, as a con­di­tion of log­ging in.

You can change a pass­word. You can­not change your face.

A leaked pass­word is an in­con­ve­nience. You re­set it and move on. Your face, your dri­ver’s li­cense, the unique geom­e­try a scan­ner re­duces to a num­ber can­not be re­set. A face scan is not a pho­to­graph. It is a three-di­men­sional map of you, a bio­met­ric tem­plate pre­cise enough to be matched later against a sur­veil­lance cam­era on a street cor­ner. When you hand over and it lives on some­one else’s server, of­ten a third-party ven­dor you never chose, can­not name, and can­not hold to ac­count.

Every one of those data­bases is a hon­ey­pot. The ver­i­fier promises your doc­u­ments are deleted the mo­ment they are checked. They are not al­ways deleted, and the promise is worth­less the day the com­pany is breached. Remember the last twenty years of worth­les $17.99 Equifax IDentityGuard+ cred­its from all those data breaches? It has hap­pened, it will hap­pen again, ex­cept this time it’s not your email, hashed pass­word, or even your SSN. It’s your face and pass­port that’s for sale on the dark web.

It does not work — and it makes the dan­ger worse.

Here is the in­sult be­neath the in­jury: it fails at the one thing it promises. Determined teenagers route around age gates like breath­ing — a bor­rowed lo­gin, a VPN, a check­box, a ver­i­fied ac­count bought for the price of a cof­fee. Within hours of one plat­form rolling out age brack­ets, pre-ver­i­fied ac­counts for every age were for sale on eBay. Teenagers ma­chete their way through tech­nolo­gies de­signed to protect” or limit them in the same way that wa­ter finds the cracks in the wall. They have all the time in the world, all of the in­cen­tives, and all of the so­cial struc­ture and ob­fus­cated chat chan­nels to do it.

Worse, the ar­chi­tec­ture built to protect” chil­dren can en­dan­ger them. Sort users into age-la­beled pens and you have not only failed to stop a preda­tor, you have cre­ated a chil­dren in­dex, a phone­book, a way to fil­ter di­rectly for chil­dren. Teenagers pushed off main­stream plat­forms do not stop go­ing on­line (see point about wa­ter above). They move to smaller, darker, un­mod­er­ated cor­ners, away from the very over­sight that was sup­posed to keep them safe. The chil­dren are not saved. The sur­veil­lance is the only thing that sur­vives in­tact.

Safe now, ??? later

The data­base you are help­ing build for a trust­wor­thy gov­ern­ment does not stay in trust­wor­thy hands. Administrations change. A reg­istry that merely cat­a­logs who you are to­day be­comes, un­der a fu­ture gov­ern­ment, a map of who to find. We al­ready know that US fed­eral agen­cies spy on cit­i­zens whole­sale: who at­tended which protest, who read which fo­rum, who be­longs to which group. People are right to be afraid of what a hos­tile regime would do with a ready-made list. The data does not for­get, and it does not take sides. It sim­ply waits for who­ever holds it next.

The whole in­ter­net starts to feel like the of­fice: every­one too fright­ened to say any­thing but the safe thing, lest a real name at­tached to a real opin­ion cost them a real job.

The whole in­ter­net starts to feel like the of­fice: every­one too fright­ened to say any­thing but the safe thing, lest a real name at­tached to a real opin­ion cost them a real job.

A prin­ci­pled stance

Most peo­ple are fine with this, based on the same de­bunked nothing to hide” fal­lac­ies that are al­ways trot­ted out in these con­ver­sa­tions. Surveys find over­whelm­ing ma­jori­ties want chil­dren pro­tected on­line, and large ma­jori­ties say they sup­port age ver­i­fi­ca­tion in the ab­stract.

This is not a pop­u­lar­ity con­test, and re­fusal is not a vote you are try­ing to win. A ver­i­fi­ca­tion regime does not need your ap­proval — it needs your par­tic­i­pa­tion. It only works if nearly every­one com­plies. The point of re­fusal is not to per­suade a ma­jor­ity be­fore act­ing; it is to deny the sys­tem the uni­ver­sal co­op­er­a­tion it re­quires to func­tion at all. You do not need to win the poll. Just don’t up­load the photo. Never give them your face.

If Starbucks asked to scan your ID and put it in a na­tional data­base to sell you a latte, would you give it to them? No, be­cause you value your iden­tity more than your latte. Do you not value your iden­tity more than your abil­ity to see some ran­dom cousin post about their re­pug­nant po­lit­i­cal opin­ions or a pic­ture of some­one’s dog?

I am but one

In the­ory, us nor­mal in­ter­net users can stop this whole sys­tem by opt-ing out, by boy­cotting the process. Imagine a National Month of Identity Choice”, where no one used any plat­form de­mand­ing your face, no one logged on, no one saw any ads, no one bought any spon­sored pro­jects. The plat­forms would see mas­sive rev­enue drops, and there would be in­tense lob­by­ing to re­verse these aw­ful laws. We can do it.

The only word they can­not route around is no.

These sys­tems run on com­pli­ance. They as­sume you will sigh, up­load the photo, and move on. Their en­tire busi­ness model de­pends on it. Which is also their weak­ness. A ver­i­fi­ca­tion wall that no one ver­i­fies for is a wall with no one stand­ing at it.

So refuse. Refuse the scan. Refuse the up­load. Close the ac­counts that de­mand it and tell them, in writ­ing, ex­actly why you are leav­ing. The plat­forms need you far more than you need them. \You can live with­out the feed, they can­not live with­out the crowd. Do not com­ply in ad­vance. The face on your ID is the most per­ma­nent thing you own.

Never give them your face.

help i accidentally a wigglegram

lmao.center

Do you know what a wig­gle­gram is?

It is a kind of stereo im­age you make by loop­ing frames to­gether, like as a GIF.

The ef­fect is quite con­vinc­ing.

I am some­thing of an in­de­ci­sive pho­tog­ra­pher and when I like an an­gle I will take a lot of frames, from slightly dif­fer­ent an­gles etc., look­ing for the shot”. And since I am also a bit of a hoarder I never clear out my cam­era roll.

Same shot from dif­fer­ent an­gles”? You know what that sounds a bit fa­mil­iar.

Sure enough my phone is full of wig­gle­grams that I took by ac­ci­dent. Years’ worth, wait­ing for me to sit down and stitch them to­gether.

Or, per­haps, for some­thing to stitch them to­gether. It oc­curred to me last week­end that I can use per­cep­tual hash­ing - what TinEye (et al.) uses for re­verse im­age search - to try and find runs of sim­i­lar im­ages and pull them out from my li­brary au­to­mat­i­cally. So I wrote a lit­tle script to hash all my pic­tures:

Hashing is quick but down­load­ing pho­tos from iCloud is not.

The re­sult is a hash that - un­like a cryp­to­graphic func­tion like sha1 - will share more bits with hashes of sim­i­lar-look­ing im­ages than with dis­sim­i­lar ones. We can use that to cal­cu­late the ham­ming dis­tance be­tween pairs of im­ages and find a thresh­old:

And ex­tract pairs:

And hun­dreds of wig­gle­grams spew forth.

A few of them I am guilty of tak­ing in­ten­tion­ally. But most are true ac­ci­dents. As such many of them come out as less stereoscopic” and more kinescopic” - like lit­tle un­in­ten­tional movies.

Animals are a nat­ural fit for the con­cept, un­pre­dictable as they are:

Design-work also. (I am al­ways in­de­ci­sive.)

And sculp­ture:

What fun. I have the script up on Github if you want to play with it - it’ll work on your iCloud pho­tos li­brary if you’re on a Mac, or you can point it at a di­rec­tory of pic­tures oth­er­wise.

Cheers~

home ~ posted june 04 2026

GLM-5.2 vs Claude Opus

techstackups.com

GLM-5.2 just came out, and it’s an­other step for­ward for what open mod­els can do. The in­ter­net promptly freaked out, and it’s hard to tell what’s real and what’s hype.

So we ran it head-to-head against Claude Opus 4.8: same one-shot prompt, build a 3D plat­former in raw WebGL from scratch. Here’s our take af­ter run­ning the test and dig­ging through the bench­marks and the buzz.

We’re not switch­ing our main off Opus. In our test Opus was faster and shipped a cleaner, more cor­rect game, and it can check its own vi­sual out­put, which the text-only GLM-5.2 can’t. But GLM-5.2 earns a per­ma­nent spot in the ar­se­nal: it’s a gen­uinely ca­pa­ble model at a frac­tion of the price, and be­cause it’s open weights, it’ll al­ways be avail­able. A closed model can be re­tired or re­stricted with lit­tle warn­ing (Fable was a re­cent re­minder); weights you can down­load can’t be taken away.

You can play both games right now, or grab the source:

GLM-5.2′s game: 3dgame-glm.d.ritzademo.com

Opus’s game: 3dgame-opus.d.ritzademo.com

Source for both: github.com/​james­daniel­whit­ford/​glm-5.2-vs-opus-plat­form­ers

Both are browser games writ­ten from scratch, with no game en­gine or 3D ren­der­ing li­brary like Three.js. The 3D mod­els are free CC0 as­sets from Kenney.

Here’s how the two runs com­pared:

GLM-5.2 cost a frac­tion as much. Opus fin­ished in half the time and shipped a cleaner game.

On pa­per, the bench­marks put GLM-5.2 just be­hind the top closed mod­els, and the on­line buzz is a mix of gen­uine sig­nal and as­tro­turf. We get into both be­low, af­ter the game.

What is GLM-5.2​

GLM-5.2 is Z.ai’s lat­est flag­ship model. It’s open weights un­der an MIT li­cense, so you can down­load it, run it your­self, or call it through Z.ai’s API.

It’s built for long-hori­zon tasks, the kind of long, multi-step cod­ing-agent work that runs for hours. It ships with a 1M-token con­text win­dow and two think­ing ef­fort lev­els, High and Max, that trade speed for ca­pa­bil­ity.

note

GLM-5.2 is text-only, not mul­ti­modal. It can’t read im­ages, so work­flows built around screen­shots or di­a­grams still need a model like Claude Opus.

Z.ai po­si­tions it roughly be­tween Claude Opus 4.7 and 4.8 at sim­i­lar to­ken us­age. Here’s their an­nounce­ment, if you want to read more:

@Zai_org on X

Pricing and ac­cess​

Because it’s open weights, GLM-5.2 is cheap. Through an API it costs a frac­tion of Opus, and you can run it your­self for free if you have the hard­ware.

Pricing, per 1M to­kens (vendor docs):

On out­put to­kens, GLM-5.2 is less than a fifth the price of Opus.

The weights are on Hugging Face and ModelScope un­der an MIT li­cense, with no re­gional re­stric­tions. You can serve it lo­cally with frame­works like vLLM, SGLang, or Transformers.

Our vibe test: a 3D game from scratch​

To cut through the vibes, we gave Opus 4.8 and GLM-5.2 the same one-shot prompt: build a 3D plat­former game from scratch, in raw WebGL, with no game en­gine or 3D li­brary.

Why this task​

A model can zero-shot a good-look­ing land­ing page, and the com­mu­nity al­ready dis­counts that as a test of much. A 3D plat­former in raw WebGL can’t be faked in one pretty file. It has real struc­ture: a GLB model parser, ma­trix and vec­tor math, GLSL shaders, skinned skele­tal an­i­ma­tion, a fixed-timestep loop, col­li­sion, a fol­low cam­era.

That struc­ture tests both things peo­ple ar­gue about at once. Holding a lay­ered, multi-file build to­gether over many steps is the agen­tic part, where GLM-5.2 is meant to be strong. Getting the en­gine in­ter­nals right, the parts that look fine but qui­etly break, is the rea­son­ing-and-taste part, where Opus is meant to pull ahead.

We bun­dled the 3D as­sets lo­cally, so the test is the en­gine and the ren­der­ing, not whether the har­ness can fetch a model file. The art it­self is a hu­man-made as­set pack, Kenney’s CC0 Platformer Kit, and both agents were handed the iden­ti­cal files.

What each model had to build​

To fin­ish, each model had to build:

A 3D en­gine and ren­derer in raw WebGL, no Three.js or any li­brary.

A loader for the sup­plied 3D char­ac­ter and world mod­els.

A char­ac­ter that runs and jumps around an arena, with grav­ity and col­li­sion.

A fol­low cam­era and key­board con­trols.

The whole thing runnable in the browser with one com­mand.

Both did most of it by hand (by tool? by claw?): a GLB bi­nary parser, the ma­trix and quater­nion math, a WebGL2 ren­derer with GLSL skin­ning shaders, and sub­stepped AABB col­li­sion to keep the char­ac­ter from tun­nel­ing through plat­forms.

Both got the same prompt, the same as­sets, and one at­tempt with no hints. We ran Opus 4.8 with ex­tended think­ing on high, and GLM-5.2 with think­ing set to high (GLM-5.2 also has a higher Max tier we did­n’t use). You can dig into both runs your­self:

Play GLM-5.2′s game: 3dgame-glm.d.ritzademo.com

Play Opus’s game: 3dgame-opus.d.ritzademo.com

Source for both: github.com/​james­daniel­whit­ford/​glm-5.2-vs-opus-plat­form­ers

Opus build tran­script: full ses­sion

GLM-5.2 build tran­script: full ses­sion

How long it took, and what it cost​

Opus 4.8 built in Claude Code; GLM-5.2 built in Pi over OpenRouter.

Side-by-side time­lapse. Opus fin­ishes at 34:00, GLM-5.2 at 1:11.

The time­lapse shows the whole build com­pressed: Opus work­ing through it in roughly half the wall-clock time, GLM-5.2 grind­ing longer but for far less money. The full num­bers are in the re­sults table at the top.

Playtesting both games​

We played both games start to fin­ish. Here’s how each one held up.

Both built the same kind of game: a third-per­son 3D plat­former with the same con­trols. You move with WASD or the ar­row keys, jump with space, sprint with shift, and or­bit the cam­era by drag­ging the mouse, with the wheel to zoom. The goal is the same too: col­lect the coins across the plat­forms and reach the flag, avoid­ing a spike haz­ard, with a fall off the world send­ing you back to the start.

GLM-5.2​

GLM-5.2′s game looks kind of rough. From the playthrough:

It does­n’t look great over­all.

The char­ac­ter is miss­ing some of its ma­te­ri­als.

The spike haz­ard does­n’t kill the char­ac­ter.

Reaching the flag does noth­ing. There’s no win con­di­tion.

So it’s not that great. It did nail one thing, though: the spring.

GLM-5.2 spring launch.

You can jump on the spring and launch up to the next plat­form.

Opus​

Opus’s game is cleaner, and plays well. From the playthrough:

The cam­era and con­troller work.

The spike haz­ard kills the player, so that logic is cor­rect. But it sits off to the side of the level, not on the path, so you’d have to go out of your way to hit it.

It looks good over­all, and you can reach the flag and win. There’s a real win con­di­tion.

The an­i­ma­tions look good and run smoothly, with tex­tures ap­plied prop­erly.

Opus: an­i­ma­tions, tex­tures, con­troller work­ing.

How each model checked its own work​

Both mod­els were told to ver­ify their work be­fore stop­ping. One com­mon way an agent does this is to take a screen­shot of the fin­ished prod­uct and look at it, to check that noth­ing is bro­ken or miss­ing. That is ex­actly what Opus did in its ses­sion.

GLM-5.2 hit a prob­lem here, be­cause it can’t read im­ages. It is­n’t mul­ti­modal. So in­stead of look­ing at a screen­shot, it fell back on a hacky workaround: it wrote scripts to read the raw pixel data and check whether the col­ors came out roughly as ex­pected.

Why GLM-5.2′s self-check missed the bugs​

Because it could­n’t see the screen­shot it had saved, GLM-5.2 tried to ver­ify the frame by read­ing its pix­els in­stead. Here’s an ex­cerpt from its fi­nal re­port, where it analyzed” the saved im­age by sam­pling col­ors:

fi­nal_s­tart/​overview/​flag.png an­a­lyzed for color: grass green, dirt brown, coin gold, flag red, char­ac­ter bluish, half-Lam­bert lit, no black

fi­nal_s­tart/​overview/​flag.png an­a­lyzed for color: grass green, dirt brown, coin gold, flag red, char­ac­ter bluish, half-Lam­bert lit, no black

The col­ors it ex­pected were there, so it con­firmed the game was fin­ished and stopped. But as you can see in its own fi­nal screen­shot be­low, the char­ac­ter is a flat gray with its tex­tures miss­ing, and the de­bug over­lay is still sit­ting over the scene. An agent that could ac­tu­ally look at the screen­shot would likely have caught both, and gone back to fix them.

GLM-5.2′s fi­nal screen­shot: tex­tures miss­ing on the char­ac­ter, de­bug over­lay still on. It never saw the frame.

On a task with a vi­sual re­sult, be­ing able to un­der­stand an im­age gives a model a real edge over one that can’t.

How Opus checked its work​

Opus is mul­ti­modal, so it could read a screen­shot di­rectly. Its har­ness ren­dered the game and cap­tured a frame, and Opus in­spected that im­age as part of its ver­i­fi­ca­tion. Here’s an ex­cerpt from its ses­sion, de­scrib­ing what it saw:

The fi­nal scene ren­ders cor­rectly: grass-topped blocks with brown dirt sides, the stair­case climb­ing up, gold/​sil­ver coins and a jewel, the blue spike-block haz­ard on the right is­land, the red flag at the top goal, the char­ac­ter […] stand­ing on the start plaza, and the score HUD. Lighting and shad­ing are cor­rect, geom­e­try is clean.

The fi­nal scene ren­ders cor­rectly: grass-topped blocks with brown dirt sides, the stair­case climb­ing up, gold/​sil­ver coins and a jewel, the blue spike-block haz­ard on the right is­land, the red flag at the top goal, the char­ac­ter […] stand­ing on the start plaza, and the score HUD. Lighting and shad­ing are cor­rect, geom­e­try is clean.

Opus’s screen­shot: clean HUD, de­bug read­outs re­moved.

Because it could see the frame, Opus no­ticed the de­bug read­outs it had left on screen and cleared them be­fore fin­ish­ing.

The bugs​

Both games had bugs. Here’s what broke in each.

GLM-5.2​

GLM-5.2′s bugs were fre­quent and vis­i­ble, and sev­eral were fun­da­men­tals.

The char­ac­ter faces the wrong way. It walks in the right di­rec­tion, but the model is turned back­wards the whole time.

Missing tex­tures and a dis­ap­pear­ing head. The char­ac­ter ren­ders flat gray in­stead of tex­tured, and its head van­ishes when­ever the cam­era moves. The Kenney mod­els point to a shared color palette in a sep­a­rate file rather than em­bed­ding it, and GLM-5.2′s ren­derer never loaded that file, so it fell back to flat col­ors. Opus loaded the palette, so its char­ac­ter came out tex­tured.

The death spike does­n’t kill. The char­ac­ter lands right on a spike haz­ard and noth­ing hap­pens. No death, no re­set.

Opus​

Opus’s were fewer and sub­tler, edge cases rather than bro­ken ba­sics.

Standing on thin air. The char­ac­ter can sit be­side a plat­form, in mid-air, with­out falling. This is its coy­ote-time grace pe­riod, the brief win­dow where you can still jump just af­ter step­ping off an edge, tuned a lit­tle too gen­er­ously. A pol­ish fea­ture slightly over­done, not a bro­ken fun­da­men­tal.

Winning from too far away. The win trig­gers while the char­ac­ter is still well short of the flag.

What the test showed​

Both mod­els built a com­plete, run­ning 3D plat­former from scratch, no en­gine and no 3D li­brary, in a sin­gle pass. That is a high bar, and not long ago nei­ther would have cleared it. Here is how they split.

GLM-5.2: slower, rougher, cheaper​

GLM-5.2 took over twice as long and shipped a rough game: a gray un­tex­tured char­ac­ter, a spike that does­n’t kill, no work­ing win con­di­tion, and a de­bug over­lay still on screen at the end. Most of its bugs were fun­da­men­tals. It cost a fifth as much.

Opus: faster, cleaner, pricier​

Opus fin­ished in half the time and shipped the cleaner, more cor­rect game. Its bugs were edge cases, not bro­ken ba­sics. It cost roughly four times as much.

The mul­ti­modal ad­van­tage​

Codex SQLite feedback logs can write ~640 TB/year and rapidly consume SSD endurance

github.com

Update at Jun 22, 2026: the fol­low­ing 2 PRs are merged to­day, it could avoid 85% logs(feed­back from my codex), so let me close this is­sue.

Thanks @jif-oai for the fix.

Stop log­ging every Responses WebSocket event #29432

Filter noisy tar­gets from per­sis­tent logs #29457

Following is the orig­i­nal is­sue

Following is the orig­i­nal is­sue

Issue

Codex is con­tin­u­ously writ­ing a large amount of data to the lo­cal SQLite feed­back log data­base:

~/.codex/logs_2.sqlite

~/.codex/logs_2.sqlite-wal

~/.codex/logs_2.sqlite-shm

On my ma­chine, af­ter about 21 days of up­time, the main SSD has writ­ten about 37 TB. Process/file-level checks show Codex SQLite logs are the main con­tin­u­ous writer.

That ex­trap­o­lates to roughly 640 TB/year. On a 1 TB SSD, that is about 640 full-drive writes per year. Some con­sumer SSDs are rated around 600 TBW, so this could con­sume roughly a full dri­ve’s war­ranted write en­durance in less than a year.

Evidence1

A later snap­shot makes the write am­pli­fi­ca­tion eas­ier to see:

So the data­base cur­rently re­tains only ~0.5M rows, while the SQLite AUTOINCREMENT counter has al­ready ad­vanced past 5.5B ids.

That is roughly a 10,000x gap be­tween re­tained rows and his­tor­i­cal in­serted row ids. Even us­ing the cur­rent ~1.2 GiB data­base size as a rough base­line, this points to 10TB+ scale his­tor­i­cal log churn, be­fore ac­count­ing for WAL, in­dexes, prun­ing, check­points, page rewrites, and filesys­tem/​de­vice-level write am­pli­fi­ca­tion.

Evidence2

Current re­tained rows in logs_2.sqlite:

Level dis­tri­b­u­tion:

Largest tar­get+level pairs:

The top sources are mostly global TRACE logs, mir­rored teleme­try logs, and raw web­socket/​SSE pay­load log­ging. TRACE alone is about 70.7% of re­tained bytes. codex_o­tel.log_only + codex_o­tel.trace_safe add an­other 25.3%. Filtering these cat­e­gories should re­move roughly 96% of re­tained log bytes in this sam­ple with­out fully dis­abling feed­back logs.

These are high-fre­quency re­tained sam­ples. Raw web­socket/​SSE pay­load bod­ies are in­ten­tion­ally not in­cluded be­cause they may con­tain pri­vate con­ver­sa­tion con­tent.

128,764x TRACE log: in­o­tify event: … mask: OPEN, name: Some(“ld.so.cache”) 37,982x TRACE log: in­o­tify event: … mask: OPEN, name: Some(“locale.alias”) 23,843x TRACE log: in­o­tify event: … mask: OPEN, name: Some(“passwd”) 3,639x TRACE log: <tokio-tungstenite check­out>/​src/​com­pat.rs:131 AllowStd.with_context 3,505x TRACE log: <tokio-tungstenite check­out>/​src/​lib.rs:245 WebSocketStream.with_context 3,362x TRACE log: <tokio-tungstenite check­out>/​src/​com­pat.rs:154 Read.read 3,356x TRACE log: <tokio-tungstenite check­out>/​src/​com­pat.rs:157 Read.with_context read -> pol­l_read 3,230x TRACE log: <tokio-tungstenite check­out>/​src/​lib.rs:294 Stream.poll_next 3,227x TRACE log: <tokio-tungstenite check­out>/​src/​lib.rs:304 Stream.with_context pol­l_next -> read() 3,213x TRACE log: in­o­tify event: … mask: OPEN, name: Some(“nsswitch.conf”) 2,001x TRACE log: WouldBlock 1,217x TRACE log: Masked: false 1,169x TRACE log: Opcode: Data(Text) 1,169x TRACE log: First: 11000001

The dom­i­nant INFO sources are mostly re­peated OpenTelemetry mir­ror events. IDs are redacted.

843x INFO codex_­client::cus­tom_ca: us­ing sys­tem root cer­tifi­cates be­cause no CA over­ride en­vi­ron­ment vari­able was se­lected …

334x INFO codex_o­tel.trace_safe: ses­sion_loop{thread­_id=<redacted>}:sub­mis­sion_dis­patch{otel.name=“op.dis­patch.user_in­put” sub­mis­sion.id=<redacted> codex.op=“user_in­put”}:turn{otel.name=“ses­sion_­task.turn” thread.id=<redacted> …}

333x INFO codex_o­tel.log_only: ses­sion_loop{thread­_id=<redacted>}:sub­mis­sion_dis­patch{otel.name=“op.dis­patch.user_in­put” sub­mis­sion.id=<redacted> codex.op=“user_in­put”}:turn{otel.name=“ses­sion_­task.turn” thread.id=<redacted> …}

332x INFO codex_o­tel.log_only: ses­sion_loop{thread­_id=<redacted>}:sub­mis­sion_dis­patch{otel.name=“op.dis­patch.user_in­put_with­_­turn_­con­text” sub­mis­sion.id=<redacted> codex.op=“user_in­put_with­_­turn_­con­text”}:turn{otel.name=“ses­sion_­task.turn” thread.id=<redacted> …}

332x INFO codex_o­tel.trace_safe: ses­sion_loop{thread­_id=<redacted>}:sub­mis­sion_dis­patch{otel.name=“op.dis­patch.user_in­put_with­_­turn_­con­text” sub­mis­sion.id=<redacted> codex.op=“user_in­put_with­_­turn_­con­text”}:turn{otel.name=“ses­sion_­task.turn” thread.id=<redacted> …}

Write am­pli­fi­ca­tion

The re­tained DB size hides the real write vol­ume. In a 15-second sam­ple:

About 36,211 rows were in­serted in 15 sec­onds, while re­tained row count stayed flat. This sug­gests con­tin­u­ous in­sert-and-prune write am­pli­fi­ca­tion: rows are in­serted, in­dexed, writ­ten to WAL, then pruned.

Likely cause

The SQLite feed­back log sink is in­stalled with a global TRACE de­fault:

Targets::new().with_default(Level::TRACE)

This per­sists all tar­gets at TRACE level by de­fault, in­clud­ing de­pen­dency/​in­ter­nal logs and large raw pro­to­col pay­loads.

Proposed fix

Keep feed­back logs en­abled, but nar­row what is per­sisted by de­fault:

Do not use global TRACE for the SQLite feed­back log sink.

Drop or raise thresh­olds for low-value de­pen­dency noise, es­pe­cially tar­get=log, hy­per­_u­til, tokio-tung­sten­ite in­ter­nals, in­o­tify spam, and low-level OpenTelemetry SDK logs.

Avoid per­sist­ing full raw web­socket/​SSE pay­loads by de­fault. Store sum­maries in­stead: event kind, du­ra­tion, suc­cess/​er­ror, to­ken us­age, and pay­load byte length.

Avoid per­sist­ing mir­rored codex_o­tel.log_only / codex_o­tel.trace_safe events un­less they are ex­plic­itly use­ful for feed­back de­bug­ging.

Add a global logs DB size/​write cap. Per-thread caps are not enough when many threads/​processes ex­ist.

An op­tional es­cape hatch such as sqlite_logs_en­abled = false would still be use­ful, but the main fix should be bet­ter de­fault fil­ter­ing.

Related is­sues and dis­cus­sions

Excessive SQLite WAL writes dur­ing stream­ing due to TRACE logs ig­nor­ing RUST_LOG #17320

Codex Desktop rapidly grows logs_2.sqlite / WAL dur­ing nor­mal ac­tive use #24275

app-server: feed­back log sqlite (logs_N.sqlite) grows un­bounded — ~0.75 GB/day, no re­ten­tion/​ro­ta­tion #26374

logs_2.sqlite-wal grows in­def­i­nitely and re­mains al­lo­cated af­ter dele­tion be­cause stale/​sus­pended Codex TUI processes keep the deleted WAL open #22444

Heavy I/O ac­tiv­ity from idle codex processes. #20563

Severe disk I/O / 100% disk ac­tive time on Windows WSL2 when us­ing Codex ex­ten­sion / CLI #27020

goal­s_1.sqlite write am­pli­fi­ca­tion: ~11 MB/s sus­tained writes (11 GB life­time) on a 4 KB data­base #27911

Codex Desktop be­comes un­us­able on long ac­tive threads due to app-server/​ren­derer mem­ory and TRACE log churn #21134

app-server: source /feedback logs from sqlite at trace level #12969

The request could not be satisfied

ipvm.com

403 ERROR

Generated by cloud­front (CloudFront) Request ID: J0mVVRdUWFLiFNvOQ40WUhzpfY7HQNYqeFzSo8buCiFkRAg2eUK9dg==

The text in Claude Code’s “Extended Thinking” output is not authentic. – blog

patrickmccanna.net

Claude Code records each ses­sion to disk. Those logs in­clude thinking blocks” — the mod­el’s own rea­son­ing as it works.

I went to in­spect that rea­son­ing this week­end and found a sig­na­ture (600 char­ac­ters long) and no text.

So I read the docs: https://​plat­form.claude.com/​docs/​en/​build-with-claude/​ex­tended-think­ing

Some de­tails worth be­ing aware of:

Claude en­crypts its rea­son­ing into that sig­na­ture.

Anthropic holds the key. Your ma­chine does­n’t re­ceive it.

The API hands back a SUMMARY of rea­son­ing, NOT the rea­son­ing it­self.

Getting the full think­ing out­put re­quires an en­ter­prise agree­ment.

Matt Green looked into this and has some more de­tailed ob­ser­va­tions on the sig­na­ture blocks.

This is worth know­ing be­fore you promise any­one an au­dit trail. Also- BEWARE: The extended-thinking” out­put from ctrl+o is a sum­mary of Fable/Opus’ think­ing. It is­n’t the ac­tual think­ing that drove the mod­el’s ac­tions in a ses­sion- but a sum­mary of the think­ing logic. This is like sav­ing a bmp as a .jpeg and then edit­ing the .jpeg and sav­ing it back as a .bmp. The con­ver­sion pro­duces data loss. [edit: I orig­i­nally had the or­der in­verted, which trig­gered some HN read­ers. Apologies!]

I’m un­der­whelmed by how Anthropic is pre­sent­ing the be­hav­ior of their ap­pli­ca­tion. If you ever need a record of the logic a used by YOUR AGENT dur­ing a ses­sion:

you can’t pro­duce one us­ing the lo­cal files. The rea­son­ing logs on your sys­tem are not ac­ces­si­ble to you.

You can log the in­puts, the out­puts, and the ac­tions of a run­ning Claude code with some scrappy scrap­ing- but even then- it’s not the ac­tual rea­son­ing that drove the agen­t’s be­hav­ior.

And the lan­guage in the docs is aw­fully in­di­rect. If you haven’t had your cof­fee, you might miss that extended think­ing re­turns a sum­mary of Claude’s full think­ing process”

Performance im­prove­ments in Open Source mod­els need to come faster.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.