10 interesting stories served every morning and every evening.




1 859 shares, 46 trendiness

mitchellh/vouch: A community trust management system based on explicit vouches to participate.

People must be vouched for be­fore in­ter­act­ing with cer­tain parts of a pro­ject (the ex­act parts are con­fig­urable to the pro­ject to en­force). People can also be ex­plic­itly

de­nounced to block them from in­ter­act­ing with the pro­ject.

The im­ple­men­ta­tion is generic and can be used by any pro­ject on any code forge, but we pro­vide GitHub in­te­gra­tion out of the box via GitHub ac­tions and the CLI.

The vouch list is main­tained in a sin­gle flat file us­ing a min­i­mal for­mat that can be triv­ially parsed us­ing stan­dard POSIX tools and any pro­gram­ming lan­guage with­out ex­ter­nal li­braries.

Vouch lists can also form a web of trust. You can con­fig­ure Vouch to read other pro­jec­t’s lists of vouched or de­nounced users. This way, pro­jects with shared val­ues can share their trust de­ci­sions with each other and cre­ate a larger, more com­pre­hen­sive web of trust across the ecosys­tem. Users al­ready proven to be trust­wor­thy in one pro­ject can au­to­mat­i­cally be as­sumed trust­wor­thy in an­other pro­ject, and so on.

Open source has al­ways worked on a sys­tem of trust and ver­ify.

Historically, the ef­fort re­quired to un­der­stand a code­base, im­ple­ment a change, and sub­mit that change for re­view was high enough that it nat­u­rally fil­tered out many low qual­ity con­tri­bu­tions from un­qual­i­fied peo­ple. For over 20 years of my life, this was enough for my pro­jects as well as enough for most oth­ers.

Unfortunately, the land­scape has changed par­tic­u­larly with the ad­vent of AI tools that al­low peo­ple to triv­ially cre­ate plau­si­ble-look­ing but ex­tremely low-qual­ity con­tri­bu­tions with lit­tle to no true un­der­stand­ing. Contributors can no longer be trusted based on the min­i­mal bar­rier to en­try to sim­ply sub­mit a change.

But, open source still works on trust! And every pro­ject has a def­i­nite group of trusted in­di­vid­u­als (maintainers) and a larger group of prob­a­bly trusted in­di­vid­u­als (active mem­bers of the com­mu­nity in any form). So, let’s move to an ex­plicit trust model where trusted in­di­vid­u­als can vouch for oth­ers, and those vouched in­di­vid­u­als can then con­tribute.

Who and how some­one is vouched or de­nounced is left en­tirely up to the pro­ject in­te­grat­ing the sys­tem. Additionally, what con­se­quences a vouched or de­nounced per­son has is also fully up to the pro­ject. Implement a pol­icy that works for your pro­ject and com­mu­nity.

Integrating vouch into a GitHub pro­ject is easy with the

pro­vided GitHub Actions. By choos­ing which ac­tions to use, you can fully con­trol how users are vouched and what they can or can’t do.

For an ex­am­ple, look at this repos­i­tory! It fully in­te­grates vouch.

Below is a list of the ac­tions and a brief de­scrip­tion of their func­tion. See the linked README in the ac­tion di­rec­tory for full us­age de­tails.

The CLI is im­ple­mented as a Nushell mod­ule and only re­quires Nushell to run. There are no other ex­ter­nal de­pen­den­cies.

This is Nushell, so you can get help on any com­mand:

use vouch *

help add

help check

help de­nounce

help gh-check-pr

help gh-man­age-by-is­sue

vouch check

# Preview new file con­tents (default)

vouch add someuser

# Write the file in-place

vouch add someuser –write

# Preview new file con­tents (default)

vouch de­nounce badac­tor

# With a rea­son

vouch de­nounce badac­tor –reason Submitted AI slop”

# Write the file in-place

vouch de­nounce badac­tor –write

Requires the GITHUB_TOKEN en­vi­ron­ment vari­able. If not set and gh

is avail­able, the to­ken from gh auth to­ken is used.

# Check PR au­thor sta­tus (dry run)

vouch gh-check-pr 123 –repo owner/​repo

# Auto-close un­vouched PRs (dry run)

vouch gh-check-pr 123 –repo owner/​repo –auto-close

# Actually close un­vouched PRs

vouch gh-check-pr 123 –repo owner/​repo –auto-close –dry-run=false

# Allow un­vouched users, only block de­nounced

vouch gh-check-pr 123 –repo owner/​repo –require-vouch=false –auto-close

# Dry run (default)

vouch gh-man­age-by-is­sue 123 456789 –repo owner/​repo

# Actually per­form the ac­tion

vouch gh-man­age-by-is­sue 123 456789 –repo owner/​repo –dry-run=false

Responds to com­ments from col­lab­o­ra­tors with write ac­cess:

* vouch — vouches for the is­sue au­thor with a rea­son

Keywords are cus­tomiz­able via –vouch-keyword and –denounce-keyword.

The mod­ule also ex­ports a lib sub­mod­ule for script­ing:

use vouch/​lib.nu *

let records = open VOUCHED.td

$records | check-user mitchellh” –default-platform github # vouched”, denounced”, or unknown”

$records | add-user newuser” # re­turns up­dated table

$records | de­nounce-user badactor” reason” # re­turns up­dated table

$records | re­move-user olduser” # re­turns up­dated table

The vouch list is stored in a .td file. See

VOUCHED.example.td for an ex­am­ple. The file is looked up at VOUCHED.td or .github/VOUCHED.td by de­fault.

* One han­dle per line (without @), sorted al­pha­bet­i­cally.

* Optionally add de­tails af­ter a space fol­low­ing the han­dle.

The from td and to td com­mands are ex­ported by the mod­ule, so Nushell’s open com­mand works na­tively with .td files to de­code into struc­tured ta­bles and en­code back to the file for­mat with com­ments and white­space pre­served.

...

Read the original on github.com »

2 424 shares, 16 trendiness

AI fatigue is real and nobody talks about it

You’re us­ing AI to be more pro­duc­tive. So why are you more ex­hausted than ever? The para­dox every en­gi­neer needs to con­front.

You’re us­ing AI to be more pro­duc­tive. So why are you more ex­hausted than ever? The para­dox every en­gi­neer needs to con­front.

I shipped more code last quar­ter than any quar­ter in my ca­reer. I also felt more drained than any quar­ter in my ca­reer. These two facts are not un­re­lated.

I build AI agent in­fra­struc­ture for a liv­ing. I’m one of the core main­tain­ers of OpenFGA (CNCF Incubating), I built agen­tic-au­thz for agent au­tho­riza­tion, I built Distill for con­text dedu­pli­ca­tion, I shipped MCP servers. I’m not some­one who dab­bles with AI on the side. I’m deep in it. I build the tools that other en­gi­neers use to make AI agents work in pro­duc­tion.

And yet, I hit a wall. The kind of ex­haus­tion that no amount of tool­ing or work­flow op­ti­miza­tion could fix.

If you’re an en­gi­neer who uses AI daily - for de­sign re­views, code gen­er­a­tion, de­bug­ging, doc­u­men­ta­tion, ar­chi­tec­ture de­ci­sions - and you’ve no­ticed that you’re some­how more tired than be­fore AI ex­isted, this post is for you. You’re not imag­in­ing it. You’re not weak. You’re ex­pe­ri­enc­ing some­thing real that the in­dus­try is ag­gres­sively pre­tend­ing does­n’t ex­ist. And if some­one who builds agent in­fra­struc­ture full-time can burn out on AI, it can hap­pen to any­one.

I want to talk about it hon­estly. Not the AI is amaz­ing and here’s my work­flow” ver­sion. The real ver­sion. The one where you stare at your screen at 11pm, sur­rounded by AI-generated code you still need to re­view, won­der­ing why the tool that was sup­posed to save you time has con­sumed your en­tire day.

Here’s the thing that broke my brain for a while: AI gen­uinely makes in­di­vid­ual tasks faster. That’s not a lie. What used to take me 3 hours now takes 45 min­utes. Drafting a de­sign doc, scaf­fold­ing a new ser­vice, writ­ing test cases, re­search­ing an un­fa­mil­iar API. All faster.

But my days got harder. Not eas­ier. Harder.

The rea­son is sim­ple once you see it, but it took me months to fig­ure out. When each task takes less time, you don’t do fewer tasks. You do more tasks. Your ca­pac­ity ap­pears to ex­pand, so the work ex­pands to fill it. And then some. Your man­ager sees you ship­ping faster, so the ex­pec­ta­tions ad­just. You see your­self ship­ping faster, so your own ex­pec­ta­tions ad­just. The base­line moves.

Before AI, I might spend a full day on one de­sign prob­lem. I’d sketch on pa­per, think in the shower, go for a walk, come back with clar­ity. The pace was slow but the cog­ni­tive load was man­age­able. One prob­lem. One day. Deep fo­cus.

Now? I might touch six dif­fer­ent prob­lems in a day. Each one only takes an hour with AI.” But con­text-switch­ing be­tween six prob­lems is bru­tally ex­pen­sive for the hu­man brain. The AI does­n’t get tired be­tween prob­lems. I do.

This is the para­dox: AI re­duces the cost of pro­duc­tion but in­creases the cost of co­or­di­na­tion, re­view, and de­ci­sion-mak­ing. And those costs fall en­tirely on the hu­man.

Before AI, my job was: think about a prob­lem, write code, test it, ship it. I was the cre­ator. The maker. That’s what drew most of us to en­gi­neer­ing in the first place - the act of build­ing.

After AI, my job in­creas­ingly be­came: prompt, wait, read out­put, eval­u­ate out­put, de­cide if out­put is cor­rect, de­cide if out­put is safe, de­cide if out­put matches the ar­chi­tec­ture, fix the parts that don’t, re-prompt, re­peat. I be­came a re­viewer. A judge. A qual­ity in­spec­tor on an as­sem­bly line that never stops.

This is a fun­da­men­tally dif­fer­ent kind of work. Creating is en­er­giz­ing. Reviewing is drain­ing. There’s re­search on this - the psy­cho­log­i­cal dif­fer­ence be­tween gen­er­a­tive tasks and eval­u­a­tive tasks. Generative work gives you flow states. Evaluative work gives you de­ci­sion fa­tigue.

I no­ticed it first dur­ing a week where I was us­ing AI heav­ily for a new mi­croser­vice. By Wednesday, I could­n’t make sim­ple de­ci­sions any­more. What should this func­tion be named? I did­n’t care. Where should this con­fig live? I did­n’t care. My brain was full. Not from writ­ing code - from judg­ing code. Hundreds of small judg­ments, all day, every day.

The cruel irony is that AI-generated code re­quires more care­ful re­view than hu­man-writ­ten code. When a col­league writes code, I know their pat­terns, their strengths, their blind spots. I can skim the parts I trust and fo­cus on the parts I don’t. With AI, every line is sus­pect. The code looks con­fi­dent. It com­piles. It might even pass tests. But it could be sub­tly wrong in ways that only sur­face in pro­duc­tion, un­der load, at 3am.

So you read every line. And read­ing code you did­n’t write, that was gen­er­ated by a sys­tem that does­n’t un­der­stand your code­base’s his­tory or your team’s con­ven­tions, is ex­haust­ing work.

This is also why I think agent se­cu­rity and au­tho­riza­tion mat­ter so much. If we can’t re­view every­thing AI pro­duces - and we can’t, not at scale - then we need sys­tems that con­strain what agents can do in the first place. Least-privilege ac­cess, scoped to­kens, au­dit trails. The less you have to worry about did the AI do some­thing dan­ger­ous,” the more cog­ni­tive bud­get you have for the work that ac­tu­ally mat­ters. This is­n’t just a se­cu­rity prob­lem. It’s a hu­man sus­tain­abil­ity prob­lem.

Engineers are trained on de­ter­min­ism. Same in­put, same out­put. That’s the con­tract. That’s what makes de­bug­ging pos­si­ble. That’s what makes rea­son­ing about sys­tems pos­si­ble.

I had a prompt that worked per­fectly on Monday. Generated clean, well-struc­tured code for an API end­point. I used the same prompt on Tuesday for a sim­i­lar end­point. The out­put was struc­turally dif­fer­ent, used a dif­fer­ent er­ror han­dling pat­tern, and in­tro­duced a de­pen­dency I did­n’t ask for.

Why? No rea­son. Or rather, no rea­son I can ac­cess. There’s no stack trace for the model de­cided to go a dif­fer­ent di­rec­tion to­day.” There’s no log that says temperature sam­pling chose path B in­stead of path A.” It just… hap­pened dif­fer­ently.

For some­one whose en­tire ca­reer is built on if it broke, I can find out why,” this is deeply un­set­tling. Not in a dra­matic way. In a slow, grind­ing, back­ground-anx­i­ety way. You can never fully trust the out­put. You can never fully re­lax. Every in­ter­ac­tion re­quires vig­i­lance.

I tried to fight this. I ver­sion-con­trolled my prompts. I built elab­o­rate sys­tem mes­sages. I cre­ated tem­plates. Some of it helped. None of it solved the fun­da­men­tal prob­lem: you are col­lab­o­rat­ing with a prob­a­bilis­tic sys­tem, and your brain is wired for de­ter­min­is­tic ones. That mis­match is a con­stant, low-grade source of stress.

This frus­tra­tion is ac­tu­ally what led me to build Distill - de­ter­min­is­tic con­text dedu­pli­ca­tion for LLMs. No LLM calls, no em­bed­dings, no prob­a­bilis­tic heuris­tics. Pure al­go­rithms that clean your con­text in ~12ms. I wanted at least one part of the AI pipeline to be some­thing I could rea­son about, de­bug, and trust. If the mod­el’s out­put is go­ing to be non­de­ter­min­is­tic, the least I can do is make sure the in­put is clean and pre­dictable.

The en­gi­neers I’ve talked to who han­dle this best are the ones who’ve made peace with it. They treat AI out­put like a first draft from a smart but un­re­li­able in­tern. They ex­pect to rewrite 30% of it. They bud­get time for that rewrit­ing. They don’t get frus­trated when the out­put is wrong be­cause they never ex­pected it to be right. They ex­pected it to be use­ful. There’s a dif­fer­ence.

Take a breath and try to keep up with just the last few months. Claude Code ships sub-agents, then skills, then an Agent SDK, then Claude Cowork. OpenAI launches Codex CLI, then GPT-5.3-Codex - a model that lit­er­ally helped code it­self. New cod­ing agents an­nounce back­ground mode with hun­dreds of con­cur­rent au­tonomous ses­sions. Google drops Gemini CLI. GitHub adds an MCP Registry. Acquisitions hap­pen weekly. Amazon Q Developer gets agen­tic up­grades. CrewAI, AutoGen, LangGraph, MetaGPT - pick your agent frame­work, there’s a new one every week. Google an­nounces A2A (Agent-to-Agent pro­to­col) to com­pete with Anthropic’s MCP. OpenAI ships its own Swarm frame­work. Kimi K2.5 drops with agent swarm ar­chi­tec­ture or­ches­trat­ing 100 par­al­lel agents. Vibe cod­ing” be­comes a thing. OpenClaw launches a skills mar­ket­place and within one week, re­searchers find 400+ ma­li­cious agent skills up­loaded to ClawHub. And some­where in the mid­dle of all this, some­one on LinkedIn posts if you’re not us­ing AI agents with sub-agent or­ches­tra­tion in 2026, you’re al­ready ob­so­lete.”

That’s not a year. That’s a few months. And I’m leav­ing stuff out.

I fell into this trap hard. I was spend­ing week­ends eval­u­at­ing new tools. Reading every changelog. Watching every demo. Trying to stay at the fron­tier be­cause I was ter­ri­fied of falling be­hind.

Here’s what that ac­tu­ally looked like: I’d spend Saturday af­ter­noon set­ting up a new AI cod­ing tool. By Sunday I’d have a ba­sic work­flow. By the fol­low­ing Wednesday, some­one would post about a dif­fer­ent tool that was way bet­ter.” I’d feel a pang of anx­i­ety. By the next week­end, I’d be set­ting up the new thing. The old thing would sit un­used. One cod­ing as­sis­tant to the next to the next and back to the first one. Each mi­gra­tion cost me a week­end and gave me maybe a 5% im­prove­ment that I could­n’t even mea­sure prop­erly.

Multiply this by every cat­e­gory - cod­ing as­sis­tants, chat in­ter­faces, agent frame­works, multi-agent or­ches­tra­tion plat­forms, MCP servers, con­text man­age­ment tools, prompt li­braries, swarm ar­chi­tec­tures, skills mar­ket­places - and you get a per­son who is per­pet­u­ally learn­ing new tools and never get­ting deep with any of them. The Hacker News front page alone is enough to give you whiplash. One day it’s Show HN: Autonomous Research Swarm” and the next it’s Ask HN: How will AI swarms co­or­di­nate?” Nobody knows. Everyone’s build­ing any­way.

The worst part is the knowl­edge de­cay. I spent two weeks build­ing a so­phis­ti­cated prompt en­gi­neer­ing work­flow in early 2025. Carefully crafted sys­tem prompts, few-shot ex­am­ples, chain-of-thought tem­plates. It worked well. Three months later, the model up­dated, the prompt­ing best prac­tices shifted, and half my tem­plates pro­duced worse re­sults than a sim­ple one-liner. Those two weeks were gone. Not in­vested. Spent. The same thing hap­pened with my MCP server setup - I built five cus­tom servers (Dev.to pub­lisher, Apple Notes in­te­gra­tion, Python and TypeScript sand­boxes, more), then the pro­to­col evolved, then the MCP Registry launched on GitHub and sud­denly there were thou­sands of pre-built ones. Some of my cus­tom work be­came re­dun­dant overnight.

The agent frame­work churn is even worse. I watched teams go from LangChain to CrewAI to AutoGen to cus­tom or­ches­tra­tion in the span of a year. Each mi­gra­tion meant rewrit­ing in­te­gra­tions, re­learn­ing APIs, re­build­ing work­flows. The peo­ple who waited and did noth­ing of­ten ended up in a bet­ter po­si­tion than the peo­ple who adopted early and had to mi­grate twice.

I’ve since adopted a dif­fer­ent ap­proach. Instead of chas­ing every new tool, I go deep on the in­fra­struc­ture layer un­der­neath them. Tools come and go. The prob­lems they solve don’t. Context ef­fi­ciency, agent au­tho­riza­tion, au­dit trails, run­time se­cu­rity - these are durable prob­lems re­gard­less of which frame­work is trend­ing this month. That’s why I built agen­tic-au­thz on OpenFGA in­stead of ty­ing it to any spe­cific agent frame­work. That’s why Distill works at the con­text level, not the prompt level. Build on the layer that does­n’t churn.

I still track the land­scape closely - you have to when you’re build­ing in­fra­struc­ture for it. But I track it to un­der­stand where the ecosys­tem is go­ing, not to adopt every new thing. There’s a dif­fer­ence be­tween be­ing in­formed and be­ing re­ac­tive.

This one is in­sid­i­ous. You’re try­ing to get AI to gen­er­ate some­thing spe­cific. The first out­put is 70% right. So you re­fine your prompt. The sec­ond out­put is 75% right but broke some­thing the first one had cor­rect. Third at­tempt: 80% right but now the struc­ture is dif­fer­ent. Fourth at­tempt: you’ve been at this for 45 min­utes and you could have writ­ten the thing from scratch in 20.

I call this the prompt spi­ral. It’s the AI equiv­a­lent of yak shav­ing. You started with a clear goal. Thirty min­utes later you’re de­bug­ging your prompt in­stead of de­bug­ging your code. You’re op­ti­miz­ing your in­struc­tions to a lan­guage model in­stead of solv­ing the ac­tual prob­lem.

The prompt spi­ral is es­pe­cially dan­ger­ous be­cause it feels pro­duc­tive. You’re it­er­at­ing. You’re get­ting closer. Each at­tempt is slightly bet­ter. But the mar­ginal re­turns are di­min­ish­ing fast, and you’ve lost sight of the fact that the goal was never get the AI to pro­duce per­fect out­put.” The goal was to ship the fea­ture.

I now have a hard rule: three at­tempts. If the AI does­n’t get me to 70% us­able in three prompts, I write it my­self. No ex­cep­tions. This sin­gle rule has saved me more time than any prompt­ing tech­nique I’ve ever learned.

Engineers tend to­ward per­fec­tion­ism. We like clean code. We like tests that pass. We like sys­tems that be­have pre­dictably. This is a fea­ture, not a bug - it’s what makes us good at build­ing re­li­able soft­ware.

AI out­put is never per­fect. It’s al­ways pretty good.” 70-80% there. The vari­able names are slightly off. The er­ror han­dling is in­com­plete. The edge cases are ig­nored. The ab­strac­tion is wrong for your code­base. It works, but it’s not right.

For a per­fec­tion­ist, this is tor­ture. Because almost right” is worse than completely wrong.” Completely wrong, you throw away and start over. Almost right, you spend an hour tweak­ing. And tweak­ing AI out­put is uniquely frus­trat­ing be­cause you’re fix­ing some­one else’s de­sign de­ci­sions - de­ci­sions that were made by a sys­tem that does­n’t share your taste, your con­text, or your stan­dards.

I had to learn to let go. Not of qual­ity - I still care about qual­ity. But of the ex­pec­ta­tion that AI would pro­duce qual­ity. I now treat every AI out­put as a rough draft. A start­ing point. Raw ma­te­r­ial. I men­tally la­bel it draft” the mo­ment it ap­pears, and that fram­ing change alone re­duced my frus­tra­tion by half.

The en­gi­neers who strug­gle most with AI are of­ten the best en­gi­neers. The ones with the high­est stan­dards. The ones who no­tice every im­per­fec­tion. AI re­wards a dif­fer­ent skill: the abil­ity to ex­tract value from im­per­fect out­put quickly, with­out get­ting emo­tion­ally in­vested in mak­ing it per­fect.

This is the one that scares me most.

I no­ticed it dur­ing a de­sign re­view meet­ing. Someone asked me to rea­son through a con­cur­rency prob­lem on the white­board. No lap­top. No AI. Just me and a marker. And I strug­gled. Not be­cause I did­n’t know the con­cepts - I did. But be­cause I had­n’t ex­er­cised that mus­cle in months. I’d been out­sourc­ing my first-draft think­ing to AI for so long that my abil­ity to think from scratch had de­graded.

It’s like GPS and nav­i­ga­tion. Before GPS, you built men­tal maps. You knew your city. You could rea­son about routes. After years of GPS, you can’t nav­i­gate with­out it. The skill at­ro­phied be­cause you stopped us­ing it.

The same thing is hap­pen­ing with AI and en­gi­neer­ing think­ing. When you al­ways ask AI first, you stop build­ing the neural path­ways that come from strug­gling with a prob­lem your­self. The strug­gle is where learn­ing hap­pens. The con­fu­sion is where un­der­stand­ing forms. Skip that, and you get faster out­put but shal­lower un­der­stand­ing.

I now de­lib­er­ately spend the first hour of my day with­out AI. I think on pa­per. I sketch ar­chi­tec­tures by hand. I rea­son through prob­lems the slow way. It feels in­ef­fi­cient. It is in­ef­fi­cient. But it keeps my think­ing sharp, and that sharp­ness pays div­i­dends for the rest of the day when I do use AI - be­cause I can eval­u­ate its out­put bet­ter when my own rea­son­ing is warmed up.

Social me­dia is full of peo­ple who seem to have AI fig­ured out. They post their work­flows. Their pro­duc­tiv­ity num­bers. Their I built this en­tire app in 2 hours with AI threads. And you look at your own ex­pe­ri­ence - the failed prompts, the wasted time, the code you had to rewrite - and you think: what’s wrong with me?

Nothing is wrong with you. Those threads are high­light reels. Nobody posts I spent 3 hours try­ing to get Claude to un­der­stand my data­base schema and even­tu­ally gave up and wrote the mi­gra­tion by hand.” Nobody posts AI-generated code caused a pro­duc­tion in­ci­dent be­cause it silently swal­lowed an er­ror.” Nobody posts I’m tired.”

The com­par­i­son trap is am­pli­fied by the fact that AI skill is hard to mea­sure. With tra­di­tional en­gi­neer­ing, you can look at some­one’s code and roughly gauge their abil­ity. With AI, the out­put de­pends on the model, the prompt, the con­text, the tem­per­a­ture, the phase of the moon. Someone’s im­pres­sive demo might not re­pro­duce on your ma­chine with your code­base.

I be­came much more se­lec­tive about AI con­tent on so­cial me­dia. I still fol­low the space closely - I have to, it’s my job. But I shifted from con­sum­ing every­one’s hot takes to fo­cus­ing on peo­ple who are ac­tu­ally build­ing and ship­ping, not just demo­ing. The ra­tio of sig­nal to anx­i­ety mat­ters. If a feed is mak­ing you feel be­hind in­stead of in­formed, it’s not serv­ing you.

I’ll be spe­cific about what changed my re­la­tion­ship with AI from ad­ver­sar­ial to sus­tain­able.

Time-boxing AI ses­sions. I don’t use AI in an open-ended way any­more. I set a timer. 30 min­utes for this task with AI. When the timer goes off, I ship what I have or switch to writ­ing it my­self. This pre­vents the prompt spi­ral and the per­fec­tion­ism trap si­mul­ta­ne­ously.

Separating AI time from think­ing time. Morning is for think­ing. Afternoon is for AI-assisted ex­e­cu­tion. This is­n’t rigid - some­times I break the rule. But hav­ing a de­fault struc­ture means my brain gets both ex­er­cise and as­sis­tance in the right pro­por­tions.

Accepting 70% from AI. I stopped try­ing to get per­fect out­put. 70% us­able is the bar. I’ll fix the rest my­self. This ac­cep­tance was the sin­gle biggest re­ducer of AI-related frus­tra­tion in my work­flow.

Being strate­gic about the hype cy­cle. I track the AI land­scape be­cause I build in­fra­struc­ture for it. But I stopped adopt­ing every new tool the week it launches. I use one pri­mary cod­ing as­sis­tant and know it deeply. I eval­u­ate new tools when they’ve proven them­selves over months, not days. Staying in­formed and stay­ing re­ac­tive are dif­fer­ent things.

Logging where AI helps and where it does­n’t. I kept a sim­ple log for two weeks: task, used AI (yes/no), time spent, sat­is­fac­tion with re­sult. The data was re­veal­ing. AI saved me sig­nif­i­cant time on boil­er­plate, doc­u­men­ta­tion, and test gen­er­a­tion. It cost me time on ar­chi­tec­ture de­ci­sions, com­plex de­bug­ging, and any­thing re­quir­ing deep con­text about my code­base. Now I know when to reach for it and when not to.

Not re­view­ing every­thing AI pro­duces. This was hard to ac­cept. But if you’re us­ing AI to gen­er­ate large amounts of code, you phys­i­cally can­not re­view every line with the same rigor. I fo­cus my re­view en­ergy on the parts that mat­ter most - se­cu­rity bound­aries, data han­dling, er­ror paths - and rely on au­to­mated tests and sta­tic analy­sis for the rest. Some rough­ness in non-crit­i­cal code is ac­cept­able.

The tech in­dus­try has a burnout prob­lem that pre­dates AI. AI is mak­ing it worse, not bet­ter. Not be­cause AI is bad, but be­cause AI re­moves the nat­ural speed lim­its that used to pro­tect us.

Before AI, there was a ceil­ing on how much you could pro­duce in a day. That ceil­ing was set by typ­ing speed, think­ing speed, the time it takes to look things up. It was frus­trat­ing some­times, but it was also a gov­er­nor. You could­n’t work your­self to death be­cause the work it­self im­posed lim­its.

AI re­moved the gov­er­nor. Now the only limit is your cog­ni­tive en­durance. And most peo­ple don’t know their cog­ni­tive lim­its un­til they’ve blown past them.

I burned out in late 2025. Not dra­mat­i­cally - I did­n’t quit or have a break­down. I just stopped car­ing. Code re­views be­came rub­ber stamps. Design de­ci­sions be­came whatever AI sug­gests.” I was go­ing through the mo­tions, pro­duc­ing more than ever, feel­ing less than ever. It took me a month to re­al­ize what had hap­pened and an­other month to re­cover.

The re­cov­ery was­n’t about us­ing less AI. It was about us­ing AI dif­fer­ently. With bound­aries. With in­ten­tion. With the un­der­stand­ing that I am not a ma­chine and I don’t need to keep pace with one. Working at Ona helped me see this clearly - when you’re build­ing AI agent in­fra­struc­ture for en­ter­prise cus­tomers, you see the hu­man cost of un­sus­tain­able AI work­flows at scale. The prob­lems aren’t just per­sonal. They’re sys­temic. And they need to be solved at the tool­ing level, not just the in­di­vid­ual level.

Ironically, the burnout pe­riod is when some of my best work hap­pened. When I stopped try­ing to use every AI tool and started think­ing about what was ac­tu­ally bro­ken, I saw the prob­lems clearly for the first time. Context win­dows fill­ing up with garbage - that be­came Distill. Agents with all-or-noth­ing API key ac­cess - that be­came agen­tic-au­thz. The in­abil­ity to au­dit what an agent ac­tu­ally did - that’s be­com­ing AgentTrace. The fa­tigue forced me to stop con­sum­ing and start build­ing. Not build­ing more fea­tures faster, but build­ing the right things de­lib­er­ately.

Here’s what I think the real skill of the AI era is. It’s not prompt en­gi­neer­ing. It’s not know­ing which model to use. It’s not hav­ing the per­fect work­flow.

It’s know­ing when to stop.

Knowing when the AI out­put is good enough. Knowing when to write it your­self. Knowing when to close the lap­top. Knowing when the mar­ginal im­prove­ment is­n’t worth the cog­ni­tive cost. Knowing that your brain is a fi­nite re­source and that pro­tect­ing it is not lazi­ness - it’s en­gi­neer­ing.

We op­ti­mize our sys­tems for sus­tain­abil­ity. We add cir­cuit break­ers. We im­ple­ment back­pres­sure. We de­sign for grace­ful degra­da­tion. We should do the same for our­selves.

AI is the most pow­er­ful tool I’ve ever used. It’s also the most drain­ing. Both things are true. The en­gi­neers who thrive in this era won’t be the ones who use AI the most. They’ll be the ones who use it the most wisely.

If you’re tired, it’s not be­cause you’re do­ing it wrong. It’s be­cause this is gen­uinely hard. The tool is new, the pat­terns are still form­ing, and the in­dus­try is pre­tend­ing that more out­put equals more value. It does­n’t. Sustainable out­put does.

I’m still build­ing in this space every day. Agent au­tho­riza­tion, con­text en­gi­neer­ing, au­dit trails, run­time se­cu­rity - the in­fra­struc­ture that makes AI agents ac­tu­ally work in pro­duc­tion. I’m more com­mit­ted to AI than ever. But I’m com­mit­ted on my terms, at my pace, build­ing things that mat­ter in­stead of chas­ing things that trend.

Take care of your brain. It’s the only one you’ve got, and no AI can re­place it.

If this res­onated, I’d love to hear your ex­pe­ri­ence. What does AI fa­tigue look like for you? Find me on X or LinkedIn, or join the dis­cus­sion on Hacker News.

I write about AI agent in­fra­struc­ture, se­cu­rity, con­text en­gi­neer­ing, and the hu­man side of build­ing with AI. You can find all my writ­ing on my writ­ing page.

...

Read the original on siddhantkhare.com »

3 370 shares, 11 trendiness

I Am Happier Writing Code by Hand

I felt the fa­mil­iar feel­ing of de­pres­sion and lethargy creep in while my eyes darted from watch­ing claude-code work and my phone. What’s the point of it all?” I thought, LLMs can gen­er­ate de­cent-ish and cor­rect-ish look­ing code while I have more time to do what? doom­scroll? This was the third time I gave claude-code a try. I felt the same feel­ings every sin­gle time and ended up delet­ing claude-code af­ter 2-3 weeks, and whad­dy­ouknow? Every. Single. Time. I re­dis­cov­ered the joy of cod­ing.

Yes, cod­ing is not soft­ware en­gi­neer­ing, but for me, it is a fun and es­sen­tial part of it. In or­der to be ef­fec­tive at soft­ware en­gi­neer­ing, you must be fa­mil­iar with the prob­lem space, and this re­quires think­ing and wrestling with the prob­lem. You can’t truly know the pain of us­ing an API by just read­ing its doc­u­men­ta­tion or im­ple­men­ta­tion. You have to use it to ex­pe­ri­ence it. The act of writ­ing code, de­spite be­ing slower, was a way for me to wres­tle with the prob­lem space, a way for me to find out that my ini­tial ideas did­n’t work, a way for think­ing. Vibe cod­ing in­ter­fered with that.

If you’re think­ing with­out writ­ing, you only think you’re think­ing.

The other ma­jor part of the job is to en­sure cor­rect­ness. For me, it is much harder to ver­ify the cor­rect­ness of code I did­n’t write com­pared to code I wrote. The process of writ­ing code helps in­ter­nal­ize the con­text and is eas­ier for my brain to think deeply about it. If I out­source this to an LLM, I skip over the process of in­ter­nal­iz­ing the prob­lem do­main and I can’t be cer­tain that the gen­er­ated code is cor­rect.

By de­sign, vibe cod­ing has an ad­dic­tive na­ture to it, you write some in­struc­tions, and code that looks cor­rect is gen­er­ated. Bam! Dopamine hit! If the code is­n’t cor­rect, then it’s just one prompt away from be­ing cor­rect, right? right?

Vibe cod­ing also has the pro­found ef­fect of turn­ing my brain off and pas­sively ac­cept­ing changes. When it is time to use my brain, the in­er­tia is much harder to over­come and it is easy to choose the lazy way out. At my low­est point, I even asked it to do a find-and-re­place in a file. Something that takes a few sec­onds, now took min­utes and a net­work call.

Even if I gen­er­ate a 1,000 line PR in 30 min­utes I still need to un­der­stand and re­view it. Since I am re­spon­si­ble for the code I ship, this makes me the bot­tle­neck.

The com­mon view of vibe cod­ing is that it is nei­ther good nor bad, it is a tool. But tools shape your work­flow and your thought process, and if a tool pre­vents you from think­ing deeply, I don’t think it is a good tool. If you are a knowl­edge worker, your core com­pe­tency is your abil­ity to think, and if a tool in­ter­feres with that, be afraid, be very afraid.

Now, I would be ly­ing if I said I did­n’t use LLMs to gen­er­ate code. I still use Claude, but I do so in a more con­trolled man­ner. I copy-paste files that I think are nec­es­sary to pro­vide the con­text, and then I copy-paste code and ask it to make changes to it or write tests for it. This fric­tion has sev­eral ben­e­fits. I can’t make changes that span mul­ti­ple files, this means the gen­er­ated diff is­n’t too large, and if I have to man­u­ally change other files I know how the code fits in. Manually giv­ing claude the con­text forces me to be fa­mil­iar with the code­base my­self, rather than tell it to just cook”. It turns code gen­er­a­tion from a pas­sive ac­tion to a de­lib­er­ate thought­ful ac­tion. It also keeps my brain en­gaged and ac­tive, which means I can still en­ter the flow state. I have found this to be the best of both worlds and a way to pre­serve my hap­pi­ness at work.

Ultimately, life is too short to not op­ti­mize for hap­pi­ness. Maybe (a big maybe) gen­er­at­ing en­tire fea­tures would make me more pro­duc­tive, but if it causes ex­is­ten­tial dread and makes me de­pressed, I don’t see it be­ing pro­duc­tive in the long run. Maybe you re­late to some of the feel­ings. Maybe you don’t. But don’t be afraid to choose dif­fer­ently.

...

Read the original on abhinavomprakash.com »

4 368 shares, 15 trendiness

(AI) Slop Terrifies Me – ezhik.jp

What if this is as good as soft­ware is ever go­ing to be? What if AI stops get­ting bet­ter and what if peo­ple stop car­ing?

Imagine if this is as good as AI gets. If this is where it stops, you’d still have mod­els that can al­most code a web browser, al­most code a com­piler—and can even pre­sent a pretty cool demo if al­lowed to take a few short­cuts. You’d still get mod­els that can kinda-sorta sim­u­late worlds and write kinda-sorta en­gag­ing sto­ries. You’d still get self-dri­ving cars that al­most work, ex­cept when they don’t. You get AI that can make you like 90% of a thing!

90% is a lot. Will you care about the last 10%?

I’m ter­ri­fied of the good enough to ship—and I’m ter­ri­fied of no­body else car­ing. I’m less afraid of AI agents writ­ing apps that they will never ex­pe­ri­ence than I am of the AI herders who won’t care enough to ac­tu­ally learn what they ship. And I sure as hell am afraid of the peo­ple who will ex­pe­ri­ence the slop and will be fine with it.

As a wood­work­ing en­thu­si­ast I am slowly mak­ing my peace with stand­ing in the mid­dle of an IKEA. But at the rate things are go­ing in this drop­ship­ping hell, IKEA would be the dream. Software temu­fi­ca­tion stings much more than soft­ware com­modi­ti­za­tion.

I think Claude and friends can help with craft­ing good soft­ware and with learn­ing new tech­nolo­gies and pro­gram­ming lan­guages—though I sure as hell move slower when I stop to learn and un­der­stand than the guy play­ing Dwarf Fortress with 17 agents. But at the same time AI mod­els seem to con­stantly nudge to­wards that same me­dian Next-React-Tailwind, good enough app. These things just don’t han­dle go­ing off the beaten path well.

Spend all the to­kens you want, try­ing to make some­thing unique like Paper by FiftyThree with AI tools will just end up look­ing nor­mal and unin­spired.

Mind you, it’s not like slop is any­thing new. A lot of hu­man de­ci­sions had to hap­pen be­fore your back­side ended up in an ex­tremely un­com­fort­able chair, your search re­sults got pol­luted by poorly-writ­ten SEO-optimized ar­ti­cles, and your brain had to deal with a ticket book­ing web­site with a user in­ter­face so poorly de­signed that it made you cry. So it’s a peo­ple prob­lem. Incentives just don’t seem to align to make good soft­ware. Move fast and break things, etc, etc. You’ll make a lit­tle ar­ti­san app, and if it’s any good, Google will come along with a free clone, kill you, then kill its clone—and the world will be left with net zero new good soft­ware. And now, with AI agents, it gets even worse as agent herders can do the same thing much faster.

Developers aside, there’s also the users. AI mod­els can’t be imag­i­na­tive, and the de­vel­op­ers can’t af­ford to, but surely with AI tools, the gap be­tween users and de­vel­op­ers will be bridged, ChatGPT will be­come the new HyperCard and peo­ple will turn their ideas into re­al­ity with just a few sen­tences? There’s so many peo­ple out there who are cod­ing with­out know­ing it, from Carol in Accounting mak­ing in­sane Excel spread­sheets to all the kids on TikTok au­tomat­ing their phones with Apple Shortcuts and hack­ing up cool Notion note­books.

But what if those peo­ple are an aber­ra­tion? What if this state of tech learned help­less­ness can­not be fixed? What if peo­ple re­ally do just want a glo­ri­fied lit­tle TV in their pocket? What if most peo­ple truly just don’t care about tech prob­lems, about pri­vacy, about Liquid Glass, about Microsoft’s up­sells, about con­stantly deal­ing with apps and fea­tures which just don’t work? What if there will be no­body left to carry the torch? What if the fu­ture of com­put­ing be­longs not to ar­ti­san de­vel­op­ers or Carol from Accounting, but to who­ever can churn out the most soft­ware out the fastest? What if good enough re­ally is good enough for most peo­ple?

I’m ter­ri­fied that our craft will die, and no­body will even care to mourn it.

...

Read the original on ezhik.jp »

5 339 shares, 37 trendiness

Art of Roads in Games

Not sure if it’s just me, but I of­ten get a pri­mal sat­is­fac­tion when­ever I see in­tri­cate pat­terns emerg­ing out of seem­ingly dis­or­dered en­vi­ron­ments.

Think about the gal­leries of ant colonies, the ab­surdly per­fect hexa­gons of hon­ey­combs, or the veins on a leaf. No ar­chi­tect, no blue­print. Just sim­ple rules stack­ing on each other that re­sult in beau­ti­ful pat­terns. I can’t ex­plain why, but see­ing those struc­tures al­ways felt good.

Humans do this too. And for me, one of the most fas­ci­nat­ing pat­terns we’ve come up with is the roads.

Sometimes I imag­ine aliens from far­away galax­ies dis­cov­er­ing Earth long af­ter we’re gone. Forests re­claimed by na­ture, cities re­duced to rub­ble, yet be­tween them, a faintly pat­tern is still vis­i­ble - the road net­work. I like to think they will feel the same way I do when look­ing at na­ture pat­terns. - Man, some­one re­ally thought this through.”

I’ve got to say, roads have fas­ci­nated me since I was a kid.

I still re­mem­ber play­ing SimCity 2000 for the first time when I was about five or six years old. I did­n’t un­der­stand much. Definitely did­n’t know what zon­ing, taxes, or de­mand were. But roads fas­ci­nated me from the start.

I think roads lie at the heart of every city builder. It’s the fab­ric on which cities are built. Since that mo­ment, I’ve played al­most every mod­ern-themed city builder out there. In the mean­time, I’ve also started notic­ing them in the real world. Examining them in more de­tail.

Despite every game bring­ing an im­prove­ment over the one be­fore, some­thing al­ways felt… off.

SimCity 4 added el­e­va­tion and di­ag­o­nal roads. SimCity 2013 in­tro­duced curved roads. Then came Cities: Skylines with a ton of free­dom. You could know freeplace roads and merge them into in­ter­sec­tions at any an­gle, build fly­overs at dif­fer­ent el­e­va­tions to con­struct crazy, yet un­re­al­is­tic, in­ter­changes. I think this was the largest break­through.

But some­thing was still nag­ging me. Highway ramps were un­re­al­is­ti­cally sharp or wob­bly, lanes that were sup­posed to be high-speed bent too sharply at cer­tain points, and the cor­ner radii of in­ter­sec­tions looked strange.

I mean look at this. This is prob­a­bly what high­way en­gi­neers have night­mares about.

And then came the mods. Mods changed every­thing. The great com­mu­nity en­abled a new kind of free­dom. One could build al­most any­thing: per­fect merge lanes, re­al­is­tic mark­ings, and smooth tran­si­tions. It was a to­tal game-changer. I am par­tic­u­larly proud of this 5-lane turbo round­about:

But even then, mods did­n’t feel com­pletely nat­ural. They were still lim­ited by the game’s orig­i­nal sys­tem.

Cities: Skylines 2 pushed it even fur­ther, with lanes be­com­ing even more re­al­is­tic and mark­ings as well. I think at this point, a non-trained eye won’t know the dif­fer­ence from re­al­ity.

Then I stopped stum­bling around and started ask­ing why? I tried to un­der­stand how en­gi­neers de­sign roads and how game de­vel­op­ers code them.

That’s when I ran straight into the fun­da­men­tal is­sue - right at the base of it. And it comes to some­thing every de­vel­oper knows about and loves:

If you’re a Unity or Unreal de­vel­oper or played with ba­si­cally any vec­tor graph­ics edit­ing soft­ware, you al­ready know them well. Bezier curves are an el­e­gant, in­tu­itive, and in­cred­i­bly pow­er­ful way to smoothly in­ter­po­late be­tween two points while tak­ing into con­sid­er­a­tion some di­rec­tion of move­ment (the tan­gent).

That’s ex­actly what roads are sup­posed to do, right? Of course, de­vel­op­ers nat­u­rally think they are the per­fect tool.

They’ve got their beauty, I need to ad­mit. But hid­den be­neath the sur­face lies an un­com­fort­able truth.

You see, the shapes of roads in real life come from an un­der­ly­ing es­sen­tial fact: the wheel axles of a ve­hi­cle. No mat­ter how you drive a car, the dis­tance be­tween the left and right wheels re­mains con­stant. You can no­tice this in tyre tracks in snow or sand. Two per­fectly par­al­lel paths, al­ways the same dis­tance apart main­tain­ing a con­sis­tent curved shape.

Here’s the is­sue with Bezier splines: they don’t pre­serve shape and cur­va­ture when off­set.

At gen­tle curves, they kinda look fine, but once you have tighter bends, the math falls apart. In mathy terms: The off­set of a Bezier curve is not a Bezier curve.

When game en­gines try to gen­er­ate a road mesh along a Bezier spline, the geom­e­try of­ten fails at tight an­gles. The in­ner edge curves at a dif­fer­ent rate than the outer edge. This cre­ates pinching,” self-in­ter­sect­ing geom­e­try.

Here is the best ex­am­ple of how they start to fail in ex­treme sce­nar­ios.

To sum up: Bézier curves are un­con­strained. The free­dom they en­able is ex­actly the Achilles’ heel”. Real roads are en­gi­neered with the con­straints of real mo­tion in mind. A car’s path can’t mag­i­cally self-in­ter­sect.

Ok, so what pre­serves par­al­lelism? If you’ve al­ready been through kinder­garten, you’re al­ready fa­mil­iar with it: It’s the CIRCLE.

It has al­most like a mag­i­cal prop­erty: no mat­ter how much you off­set it, the re­sult is still a cir­cu­lar arc. Perfectly par­al­lel with the ini­tial one. So sat­is­fy­ing.

Scrapping Bezier curves for Circle Arcs also yields a nice, un­ex­pected bonus. To pro­ce­du­rally build in­ter­sec­tions, the en­gine has to per­form many curve-curve in­ter­sec­tion op­er­a­tions mul­ti­ple times per frame. The in­ter­sec­tion be­tween two Bezier curves is no­to­ri­ously com­plex. On one side, you have poly­no­mial root find­ing, it­er­a­tive nu­mer­i­cal meth­ods, de Castelaju’s method + bound­ing boxes, and mul­ti­ple con­ver­gence checks vs a sim­ple, plain O(1) for­mula in Circle Arcs.

By stitch­ing to­gether cir­cu­lar arcs of dif­fer­ent radii, you can cre­ate any shape while ad­her­ing to proper en­gi­neer­ing prin­ci­ples.

But this is not the end of the story. Circle arcs have is­sues as well (Oh no). The prob­lem with cir­cles in in­fra­struc­ture is that they have con­stant cur­va­ture. What this means is that when en­ter­ing a cir­cu­lar curve from a straight line, the lat­eral force jumps from 0 to a fixed con­stant value (determined by the ra­dius of the cir­cle). If you were in a car or train en­ter­ing at high speed into this kind of curve, it would feel ter­ri­ble.

Civil en­gi­neers have to ac­count for this as well. So then, what curve main­tains par­al­lelism when off­set and has a smoothly in­creas­ing cur­va­ture?

Introduce you to: tran­si­tion curves - most fa­mously, the clothoid.

A clothoid grad­u­ally in­creases cur­va­ture over dis­tance. You start al­most straight, then slowly turn tighter and tighter. The steer­ing wheel ro­tates smoothly. The forces ramp up nat­u­rally, and a pas­sen­ger’s body barely no­tices the tran­si­tion.

These curves pro­vide com­fort­able rides at high speeds by main­tain­ing par­al­lel off­sets and con­tin­u­ous cur­va­ture changes.

And they are also… a math night­mare. Differential geom­e­try. Integrals. Oh my… Which is prob­a­bly why most games don’t even dare.

Vehicles move slowly on city streets. For in­ter­sec­tions of ur­ban roads, cir­cu­lar arcs are more than a de­cent choice.

Does every­thing I just ram­bled about mat­ter? Do 99% of city-builder play­ers care what shape the cor­ner ra­dius of the in­ter­sec­tion has? Most likely, no. Then why bother?

First, be­cause of cu­rios­ity. As any other nerd overly ob­sessed with the nitty-gritty de­tails of a very spe­cific sub­ject, I just wanted to see how I would im­ple­ment it. Like chal­leng­ing the sta­tus quo.

Second, even if es­tab­lished ti­tles might not ac­cu­rately ren­der roads, they are still light-years ahead of what so­lu­tions an in­die de­vel­oper can find on­line. The tu­to­ri­als and as­sets for this are just sad. I per­son­ally got bored with grids, and I just wanted to built a bet­ter so­lu­tion to share with any­one who wants to build a city builder.

In the next blog post, I’ll dis­cuss more tech­ni­cal­i­ties and dive into how I’ve built my own so­lu­tion. If you want to fol­low along or get no­ti­fied when I re­lease this as­set, scrib­ble your email be­low.

...

Read the original on sandboxspirit.com »

6 333 shares, 27 trendiness

AI Makes the Easy Part Easier and the Hard Part Harder for Developers

A friend of mine re­cently at­tended an open fo­rum panel about how en­gi­neer­ing orgs can bet­ter sup­port their en­gi­neers. The themes that came up were not sur­pris­ing:

Sacrificing qual­ity makes it hard to feel proud of the work. No ac­knowl­edge­ment of cur­rent ve­loc­ity. If we sprint to de­liver, the ex­pec­ta­tion be­comes to keep sprint­ing, for­ever.

I’ve been hear­ing vari­a­tions of this for a while now, but now I’m also hear­ing and agree­ing with AI does­n’t al­ways speed us up”.

Developers used to google things. You’d read a StackOverflow an­swer, or an ar­ti­cle, or a GitHub is­sue. You did some re­search, ver­i­fied it against your own con­text, and came to your own con­clu­sion. Nobody said Google did it for me” or it was the top re­sult so it must be true.”

Now I’m start­ing to hear AI did it for me.”

That’s ei­ther over­hyp­ing what hap­pened, or it means the de­vel­oper did­n’t come to their own con­clu­sion. Both are bad. If some­one on my team ever did say Google wrote their code be­cause they copied a StackOverflow an­swer, I’d be wor­ried about the same things I’m wor­ried about now with AI: did you ac­tu­ally un­der­stand what you pasted?

Vibe cod­ing is fun. At first. For pro­to­typ­ing or low-stakes per­sonal pro­jects, it’s use­ful. But when the stakes are real, every line of code has con­se­quences.

On a per­sonal pro­ject, I asked an AI agent to add a test to a spe­cific file. The file was 500 lines be­fore the re­quest and 100 lines af­ter. I asked why it deleted all the other con­tent. It said it did­n’t. Then it said the file did­n’t ex­ist be­fore. I showed it the git his­tory and it apol­o­gised, said it should have checked whether the file ex­isted first. (Thank you git).

Now imag­ine that in a health­care code­base in­stead of a side pro­ject.

AI as­sis­tance can cost more time than it saves. That sounds back­wards, but it’s what hap­pened here. I spent longer ar­gu­ing with the agent and re­cov­er­ing the file than I would have spent writ­ing the test my­self.

Using AI as an in­ves­ti­ga­tion tool, and not jump­ing straight to AI as so­lu­tion provider, is a step that some peo­ple skip. AI-assisted in­ves­ti­ga­tion is an un­der­rated skill that’s not easy, and it takes prac­tice to know when AI is wrong. Using AI-generated code can be ef­fec­tive, but if we give AI more of the easy code-writ­ing tasks, we can fall into the trap where AI as­sis­tance costs more time than it saves.

Most peo­ple miss this about AI-assisted de­vel­op­ment. Writing code is the easy part of the job. It al­ways has been. The hard part is in­ves­ti­ga­tion, un­der­stand­ing con­text, val­i­dat­ing as­sump­tions, and know­ing why a par­tic­u­lar ap­proach is the right one for this sit­u­a­tion. When you hand the easy part to AI, you’re not left with less work. You’re left with only the hard work. And if you skipped the in­ves­ti­ga­tion be­cause AI al­ready gave you an an­swer, you don’t have the con­text to eval­u­ate what it gave you.

Reading and un­der­stand­ing other peo­ple’s code is much harder than writ­ing code. AI-generated code is other peo­ple’s code. So we’ve taken the part de­vel­op­ers are good at (writing), of­floaded it to a ma­chine, and left our­selves with the part that’s harder (reading and re­view­ing), but with­out the con­text we’d nor­mally build up by do­ing the writ­ing our­selves.

My friend’s panel raised a point I keep com­ing back to: if we sprint to de­liver some­thing, the ex­pec­ta­tion be­comes to keep sprint­ing. Always. Tired en­gi­neers miss edge cases, skip tests, ship bugs. More in­ci­dents, more pres­sure, more sprint­ing. It feeds it­self.

This is a man­age­ment prob­lem, not an en­gi­neer­ing one. When lead­er­ship sees a team de­liver fast once (maybe with AI help, maybe not), that be­comes the new base­line. The con­ver­sa­tion shifts from how did they do that?” to why can’t they do that every time?”

My friend was say­ing:

When peo­ple claim AI makes them 10x more pro­duc­tive, maybe it’s turn­ing them from a 0.1x en­gi­neer to a 1x en­gi­neer. So tech­ni­cally yes, they’ve been 10x’d. The ques­tion is whether that’s a pro­duc­tiv­ity gain or an ex­po­sure of how lit­tle in­ves­ti­gat­ing they were do­ing be­fore.

Burnout and ship­ping slop will eat what­ever pro­duc­tiv­ity gains AI gives you. You can’t op­ti­mise your way out of peo­ple be­ing too tired to think clearly.

I’ve used the phrase AI is se­nior skill, ju­nior trust” to ex­plain how AI cod­ing agents work in prac­tice. They’re highly skilled at writ­ing code but we have to trust their out­put like we would a ju­nior en­gi­neer. The code looks good and prob­a­bly works, but we should check more care­fully be­cause they don’t have the ex­pe­ri­ence.

Another way to look at it: an AI cod­ing agent is like a bril­liant per­son who reads re­ally fast and just walked in off the street. They can help with in­ves­ti­ga­tions and could write some code, but they did­n’t go to that meet­ing last week to dis­cuss im­por­tant back­ground and con­text.

Developers need to take re­spon­si­ble own­er­ship of every line of code they ship. Not just the lines they wrote, the AI-generated ones too.

If you’re cut­ting and past­ing AI out­put be­cause some­one set an un­re­al­is­tic ve­loc­ity tar­get, you’ve got a prob­lem 6 months from now when a new team mem­ber is try­ing to un­der­stand what that code does. Or at 2am when it breaks. AI wrote it” is­n’t go­ing to help you in ei­ther sit­u­a­tion.

The other day there was a pro­duc­tion bug. A user sent an en­quiry to the ser­vice team a cou­ple of hours af­ter a big re­lease. There was an edge case time­zone dis­play bug. The de­vel­oper who made the change had 30 min­utes be­fore they had to leave to teach a class, and it was late enough for me to al­ready be at home. So I used AI to help in­ves­ti­gate, let­ting it know the bug must be based on re­cent changes and ex­plain­ing how we could re­pro­duce. Turned out some dep­re­cated meth­ods were tak­ing pri­or­ity over the cur­rent time­zone-aware ones, so the time­zone was never con­vert­ing cor­rectly. Within 15 min­utes I had the root cause, a so­lu­tion idea, and in­ves­ti­ga­tion notes in the GitHub is­sue. The de­vel­oper con­firmed the fix, oth­ers tested and de­ployed, and I went down­stairs to grab my DoorDash din­ner.

No fire drill. No stay­ing late. AI did the in­ves­ti­ga­tion grunt work, I pro­vided the con­text and ver­i­fied, the de­vel­oper con­firmed the so­lu­tion. That’s AI help­ing with the hard part.

...

Read the original on www.blundergoat.com »

7 302 shares, 15 trendiness

I put a real-time 3D shader on the Game Boy Color

Check out the code, down­load the ROMsMaking it work on the Game BoyThe Game Boy has no mul­ti­ply in­struc­tion­All scalars and lookups are 8-bit frac­tion­sHow fast is it?An over­all failed at­tempt at us­ing AI

Check out the code, down­load the ROMsMaking it work on the Game BoyThe Game Boy has no mul­ti­ply in­struc­tion­All scalars and lookups are 8-bit frac­tion­sHow fast is it?An over­all failed at­tempt at us­ing AI

I made a Game Boy Color game that ren­ders im­ages in real time. The player con­trols an or­bit­ing light and spins an ob­ject.

Before re­ally div­ing into this pro­ject, I ex­per­i­mented with the look in Blender to see if it would even look good. IMO it did, so I went ahead with it!

I ex­per­i­mented with a pseudo-dither” on the Blender mon­key by adding a small ran­dom vec­tor to each nor­mal.

It does­n’t re­ally mat­ter what soft­ware I used to pro­duce the nor­mal maps. Blender was the path of least re­sis­tance for me, so I chose that.

For the teapot, I sim­ply put in a teapot, ro­tated a cam­era around it, and ex­ported the nor­mal AOV as a PNG se­quence. Pretty straight-for­ward.

For the spin­ning Game Boy Color, I wanted to en­sure that cer­tain col­ors were solid, so I used cryp­to­mattes in the com­pos­i­tor to iden­tify spe­cific geom­e­try and out­put hard-coded val­ues in the out­put.

The geom­e­try in the screen was done by ren­der­ing a sep­a­rate scene, then com­posit­ing it in the fi­nal ren­der us­ing a cryp­to­matte for the screen.

The above an­i­ma­tions are nor­mal map frames that are used to solve the value of each pixel

Normal maps are a core con­cept of this pro­ject. They’re al­ready used every­where in 3D graph­ics.

And in­deed, nor­mal map im­ages are se­cretly a vec­tor field. The rea­son nor­mal maps tend to have a blue-ish base­line color, is be­cause every­one likes to as­so­ci­ate XYZ with RGB, and +Z is the for­ward vec­tor by con­ven­tion.

In a typ­i­cal 3D work­flow, a nor­mal map is used to en­code the nor­mal vec­tor at any given point on a tex­tured mesh.

The sim­plest way to shade a 3D ob­ject is us­ing the dot prod­uct:

where is the nor­mal vec­tor, and is the light po­si­tion when it points to­wards the ori­gin (or equiv­a­lently: the neg­a­tive light di­rec­tion).

Expanded out com­po­nent-wise, this is:

When the light vec­tor is con­stant for all pix­els, it mod­els what most 3D graph­ics soft­ware calls a distant light”, or a sun light”.

To speed up com­pu­ta­tion on the Game Boy, I use an al­ter­nate ver­sion of the dot prod­uct, us­ing spher­i­cal co­or­di­nates.

A spher­i­cal co­or­di­nate is a point rep­re­sented by a ra­dius , a pri­mary an­gle theta”, and a sec­ondary an­gle phi”. This is rep­re­sented as a tu­ple:

The dot prod­uct of two spher­i­cal co­or­di­nates:

Because all nor­mal vec­tors are unit length, and the light vec­tor is unit length, we can just as­sume the ra­dius is equal to 1. This sim­pli­fies to:

And us­ing the pre­vi­ous vari­able names, we get the for­mula:

In the ROM, I de­cided to fix L-theta” to a con­stant value for per­for­mance rea­sons. The player gets to con­trol L-phi”, cre­at­ing an or­bit­ing light ef­fect.

This means that we can ex­tract con­stant co­ef­fi­cients and and rewrite the for­mula:

The ROM en­codes each pixel as a 3-byte tu­ple of .

Not only does the SM83 CPU not sup­port mul­ti­pli­ca­tion, but it also does­n’t sup­port floats. That’s a real bum­mer.

We have to get re­ally cre­ative when the en­tire math­e­mat­i­cal foun­da­tion of this pro­ject in­volves mul­ti­ply­ing non-in­te­ger num­bers.

What do we do in­stead? We use log­a­rithms and lookup ta­bles!

Logarithms have this nice prop­erty of be­ing able to fac­tor prod­ucts to out­side the . This way, we can add val­ues in­stead!

This re­quires two lookups: a log lookup, and a pow lookup.

In pseudocode, mul­ti­ply­ing 0.3 and 0.5 looks like this:

pow = [ … ] # A 256-entry lookup table

# float_­to_log­space() is com­pile-time. Accepts -1.0 to +1.0.

# x and y are 8-bit val­ues in log-space

x = float_­to_log­space(0.3)

y = float_­to_log­space(0.5)

re­sult = pow[x + y]

One lim­i­ta­tion of this is that it’s not pos­si­ble to take the log of a neg­a­tive num­ber. e.g. has no real so­lu­tion.

We can over­come this by en­cod­ing a sign” bit in the MSB of the log-space value. When adding two log-space val­ues to­gether, the sign bit is ef­fec­tively XOR’d (toggled). We just need to en­sure the re­main­ing bits don’t over­flow into it. We en­sure this by keep­ing the re­main­ing bits small enough.

The pow lookup ac­counts for this bit and re­turns a pos­i­tive or neg­a­tive re­sult based on it.

It’s ad­van­ta­geous to re­strict num­bers to a sin­gle byte, for both run-time per­for­mance and ROM size. 8-bit frac­tions are pretty ex­treme by to­day’s stan­dards, but be­lieve it or not, it works. It’s lossy as hell, but it works!

All scalars we’re work­ing with are be­tween -1.0 and +1.0.

Addition and mul­ti­pli­ca­tion both use… ad­di­tion!

Consider adding the two bytes: 5 + 10 = 15

Why is the de­nom­i­na­tor 127 in­stead of 128? It’s be­cause I needed to rep­re­sent both pos­i­tive and neg­a­tive 1. In a two’s-com­ple­ment en­cod­ing, signed pos­i­tive 128 does­n’t ex­ist.

You might no­tice that the log-space val­ues cy­cle and be­come neg­a­tive at byte 128. The log-space val­ues use bit 7 of the byte to en­code the sign” bit. As men­tioned in the pre­vi­ous sec­tion, this is im­por­tant for tog­gling the sign dur­ing mul­ti­pli­ca­tion.

The log-space val­ues also use as a base, be­cause I chose this as a suf­fi­ciently small base to meet the re­quire­ment that adding 3 of these log-space val­ues won’t over­flow (42+42+42 = 126). Bytes 43 thru 127 are near 0, so in prac­tice the ROM does­n’t en­code these val­ues.

The lookup ta­bles look like this:

Reconstructed func­tions look like this. The pre­ci­sion er­ror is shown in the jagged staircase” pat­terns:

It may look like there’s a lot of er­ror, but it’s fast and it’s pass­able enough to look al­right! ;)

It’s ba­si­cally a com­bined . This ex­ists be­cause in prac­tice, co­sine is al­ways used with a mul­ti­pli­ca­tion.

The core cal­cu­la­tion for the shader is:

And we can rewrite it as:

The pro­ce­dure processes 15 tiles per frame. It can process more if some of the tile’s rows are empty (all 0), but it’s guar­an­teed to process at least 15.

Figure: Mesen’s Event Viewer” win­dow, show­ing a dot for each it­er­a­tion (tile row) of the shader’s crit­i­cal loop.

There’s some in­ten­tional vi­sual tear­ing as well. The im­age it­self is more than 15 tiles, so the ROM ac­tu­ally switches to ren­der­ing dif­fer­ent por­tions of the im­age for each frame. The tear­ing is less no­tice­able be­cause of ghost­ing on the LCD dis­play, so I thought it was ac­cept­able.

A pixel takes about 130 cy­cles, and an empty row’s pixel takes about 3 cy­cles.

At one point I had cal­cu­lated 15 tiles ren­der­ing at ex­actly 123,972 cy­cles, in­clud­ing the call and branch over­head. This is an over­es­ti­mate now, be­cause I since added an op­ti­miza­tion for empty rows.

The Game Boy Color’s CPU runs up to 8.388608 MHz, or roughly 139,810 T-cycles per frame (1/60 of a sec­ond).

About 89% of a frame’s avail­able CPU time goes to ren­der­ing the 15 tiles per frame. The re­main­ing time goes to other func­tion­al­ity like re­spond­ing to user in­put and per­form­ing hard­ware IO.

Figure: A hex rep­re­sen­ta­tion of the shader sub­rou­tine in­struc­tions in RAM. The blue dig­its show a patch to change sub a, 0 into sub a, 8.

The core shader sub­rou­tine con­tains a hot path that processes about 960 pix­els per frame. It’s re­ally im­por­tant to make this as fast as pos­si­ble!

Self-modifying code is a su­per-ef­fec­tive way to make code fast. But most mod­ern de­vel­op­ers don’t do this any­more, and there are good rea­sons: It’s dif­fi­cult, rarely portable, and it’s hard to do it right with­out in­tro­duc­ing se­ri­ous se­cu­rity vul­ner­a­bil­i­ties. Modern de­vel­op­ers are spoiled by an abun­dance of pro­cess­ing power, su­per-scalar proces­sors that take op­ti­mal paths, and mod­ern JIT (Just-In-Time) run­times that gen­er­ate code on the fly. But we’re on the Game Boy, bay­beee, so we don’t have those op­tions.

If you’re a de­vel­oper who uses higher-level lan­guages like Python and JavaScript, the clos­est equiv­a­lent to self-mod­i­fy­ing code is eval(). Think about how ner­vous eval() makes you feel. That’s al­most ex­actly how na­tive de­vel­op­ers feel about mod­i­fy­ing in­struc­tions.

On the Game Boy’s SM83 proces­sor, it’s faster to add and sub­tract by a hard-coded num­ber than it is to load that num­ber from mem­ory.

un­signed char Ltheta = 8;

// Slower

v = (*in++) - Ltheta;

// Faster

v = (*in++) - 8;

In SM83 as­sem­bly, this looks like:

; Slower: 28 cy­cles

ld a, [Ltheta]  ; 12 cy­cles: Read vari­able Ltheta” from HRAM

ld b, a  ; 4 cy­cles: Move value to B reg­is­ter

ld a, [hl+]  ; 8 cy­cles: Read from the HL pointer

sub a, b  ; 4 cy­cles: A = A - B

; Faster: 16 cy­cles

ld a, [hl+]  ; 8 cy­cles: Read from the HL pointer

sub a, 8  ; 8 cy­cles: A = A - 8

The faster way shaves off 12 cy­cles. If we’re ren­der­ing 960 pix­els, this saves a to­tal of 11,520 cy­cles. This does­n’t sound like a lot, but it’s roughly 10% of the shader’s run­time!

AI Will Be Writing 90% of Code in 3 to 6 Months”

— Dario Amodei, CEO of Anthropic (March 2025 - 9 months ago as of writ­ing)

95% of this pro­ject was made by hand. Large lan­guage mod­els strug­gle to write Game Boy as­sem­bly. I don’t blame them.

Update: 2026-02-03: I at­tempted to use AI to try out the process, mostly be­cause 1) the in­dus­try won’t shut up about AI, and 2) I wanted a grounded opin­ion of it for novel pro­jects, so I have a con­crete and per­sonal ref­er­ence point when talk­ing about it in the wild. At the end of the day, this is still a hob­by­ist pro­ject, so AI re­ally is­n’t the point! But still…

I be­lieve in dis­clos­ing all at­tempts or ac­tual uses of gen­er­a­tive AI out­put, be­cause I think it’s un­eth­i­cal to de­ceive peo­ple about the process of your work. Not do­ing so un­der­mines trust, and amounts to dis­in­for­ma­tion or pla­gia­rism. Disclosure also in­vites peo­ple who have dis­agree­ments to en­gage with the work, which they should be able to. I’m open to feed­back, btw.

I’ll prob­a­bly write some­thing about my ex­pe­ri­ences with AI in the fu­ture.

As far as dis­clo­sures go, I used AI for:

Python: Reading OpenEXR lay­ers, as part of a con­ver­sion script to read nor­mal map data

Python/Blender: Some Python scripts for pop­u­lat­ing Blender scenes, to demo the process in Blender

SM83 as­sem­bly: Snippets for Game Boy Color fea­tures like dou­ble-speed and VRAM DMA. Unsurprising, be­cause these are likely avail­able some­where else.

I at­tempted - and failed - to use AI for:

SM83 as­sem­bly: (Unused) Generating an ini­tial re­vi­sion of the shader code

I’ll also choose to dis­close what I did NOT use AI for:

The al­go­rithms, lookups, all other SM83 as­sem­bly

The soul 🌟 (AI tech­bros are groan­ing right now)

Just to see what it would do, I fed pseudocode into Claude Sonnet 4 (the in­dus­try claims that it’s the best AI model for cod­ing in 2025), and got it to gen­er­ate SM83 as­sem­bly:

It was an in­ter­est­ing process. To start, I chewed Claude’s food and gave it pseudocode, be­cause I had a data for­mat in mind, and I as­sumed it’d strug­gle with a higher-level de­scrip­tion.

I was skep­ti­cal that it would­n’t do well, but it did bet­ter than I thought it would. It even pro­duced code that worked when I per­sisted it and guided it enough. However, it was­n’t very fast, and it made some ini­tial mis­takes by as­sum­ing the SM83 proces­sor was the Z80 proces­sor. I at­tempted to get Claude to op­ti­mize it by of­fer­ing sug­ges­tions. It did well ini­tially, but it in­tro­duced er­rors un­til I reached the con­ver­sa­tion limit.

After that point, I man­u­ally rewrote every­thing. My fi­nal im­ple­men­ta­tion is ag­gres­sively op­ti­mized and barely has any re­sem­blance to Claude’s take.

And it loved telling me how absolutely right” I al­ways was. 🥺

...

Read the original on blog.otterstack.com »

8 277 shares, 14 trendiness

Blood omega-3 is inversely related to risk of early-onset dementia

Clipboard, Search History, and sev­eral other ad­vanced fea­tures are tem­porar­ily un­avail­able.

Skip to main page con­tent

An of­fi­cial web­site of the United States gov­ern­ment

The .gov means it’s of­fi­cial.

Federal gov­ern­ment web­sites of­ten end in .gov or .mil. Before shar­ing sen­si­tive in­for­ma­tion, make sure you’re on a fed­eral gov­ern­ment site.

The site is se­cure.

The https:// en­sures that you are con­nect­ing to the of­fi­cial web­site and that any in­for­ma­tion you pro­vide is en­crypted and trans­mit­ted se­curely.

Email ad­dress has not been ver­i­fied. Go to

My NCBI ac­count set­tings

to con­firm your email and then re­fresh this page.

Name must be less than 100 char­ac­ters

Unable to load your col­lec­tion due to an er­ror

Please try again

Unable to load your del­e­gates due to an er­ror

Please try again

Would you like email up­dates of new search re­sults?

Send even when there aren’t any new re­sults

Name must be less than 100 char­ac­ters

Unable to load your col­lec­tion due to an er­ror

Please try again

Blood omega-3 is in­versely re­lated to risk of early-on­set de­men­tia

2 Fatty Acid Research Institute, Sioux Falls, SD, USA; Department of Population Health Nursing Science, College of Nursing, University of Illinois-Chicago, Chicago, IL, USA.

4 Fatty Acid Research Institute, Sioux Falls, SD, USA; Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA.

Blood omega-3 is in­versely re­lated to risk of early-on­set de­men­tia

2 Fatty Acid Research Institute, Sioux Falls, SD, USA; Department of Population Health Nursing Science, College of Nursing, University of Illinois-Chicago, Chicago, IL, USA.

4 Fatty Acid Research Institute, Sioux Falls, SD, USA; Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA.

Background & aims:

Early-onset de­men­tia (EOD, de­fined as di­ag­no­sis < age 65) im­poses a high so­cio-eco­nomic bur­den. It is less preva­lent and less in­ves­ti­gated than late-on­set de­men­tia (LOD). Observational data in­di­cate that many EOD cases are as­so­ci­ated with po­ten­tially mod­i­fi­able risk fac­tors, yet the re­la­tion­ship be­tween diet and EOD has been un­der-ex­plored. Omega-3 fatty acids are promis­ing di­etary fac­tors for de­men­tia pre­ven­tion; how­ever, ex­ist­ing re­search has pri­mar­ily fo­cused on co­horts aged >65. We ex­am­ined the as­so­ci­a­tions be­tween omega-3 blood lev­els (which ob­jec­tively re­flect di­etary in­take) and in­ci­dent EOD by lever­ag­ing data from the UK Biobank co­hort.

Methods:

We in­cluded par­tic­i­pants aged 40-64, free of de­men­tia at base­line and for whom plasma omega-3 lev­els and rel­e­vant co­vari­ates were avail­able. We mod­eled the re­la­tion­ships be­tween the three omega-3 ex­po­sures (total omega-3, DHA, and non-DHA omega-3) and in­ci­dent EOD with quin­tiles (Q) and con­tin­u­ous lin­ear re­la­tion­ships. We con­structed Cox pro­por­tional haz­ards ad­just­ing for sex, age at base­line and APOE-ε4 al­lele load, be­sides other lifestyle vari­ables re­ported to re­late to in­ci­dent EOD. We also as­sessed the in­ter­ac­tion be­tween each ex­po­sure of in­ter­est and APOE-ε4 al­lele load.

Results:

The study in­cluded 217,122 par­tic­i­pants. During the mean fol­low-up of 8.3 years, 325 in­ci­dent EOD cases were as­cer­tained. Compared to par­tic­i­pants at Q1 of to­tal omega-3, those at Q4 and Q5 showed a sta­tis­ti­cally sig­nif­i­cantly lower risk of EOD (Q4, haz­ard ra­tio [95 % con­fi­dence in­ter­val] = 0.62 [0.43, 0.89]; Q5, 0.60 [0.42, 0.86]). A sta­tis­ti­cally sig­nif­i­cant in­verse as­so­ci­a­tion was also ob­served for to­tal omega-3 as a con­tin­u­ous vari­able. Compared to par­tic­i­pants at Q1 of DHA, those at Q5 of non-DHA showed a sig­nif­i­cant lower risk of EOD. A sta­tis­ti­cally sig­nif­i­cant lower risk was ob­served in Q3, Q4 and Q5 of non-DHA omega-3. Finally, we ob­served no ev­i­dence of in­ter­ac­tion omega-3 × APOE-ε4 al­lele load.

Conclusions:

This study ex­pands the ev­i­dence of a ben­e­fi­cial as­so­ci­a­tion of omega-3 and LOD to EOD as well. These find­ings sug­gest that an in­creased in­take of omega-3 fatty acids ear­lier in life may slow the de­vel­op­ment of EOD. Additional re­search is needed to con­firm our find­ings, par­tic­u­larly in more di­verse pop­u­la­tions.

Copyright © 2025 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights re­served.

...

Read the original on pubmed.ncbi.nlm.nih.gov »

9 264 shares, 48 trendiness

CCC vs GCC

Anthropic re­cently pub­lished a blog post about build­ing a C com­piler en­tirely with Claude

. They called it CCC (Claude’s C Compiler) and claimed it could com­pile the Linux ker­nel. 100% of the code was writ­ten by Claude Opus 4.6, a hu­man only guided the process by writ­ing test cases. That sounded in­ter­est­ing enough to test the claim and bench­mark CCC against the in­dus­try stan­dard GCC.

The source code of CCC is avail­able at claudes-c-com­piler

. It is writ­ten en­tirely in Rust, tar­get­ing x86-64, i686, AArch64 and RISC-V 64. The fron­tend, SSA-based IR, op­ti­mizer, code gen­er­a­tor, peep­hole op­ti­miz­ers, as­sem­bler, linker and DWARF de­bug info gen­er­a­tion are all im­ple­mented from scratch with zero com­piler-spe­cific de­pen­den­cies. That is a lot of work for an AI to do.

Before we jump into the com­par­i­son, it helps to un­der­stand what hap­pens when you com­pile a C pro­gram. There are four stages in­volved.

Image credit: The four stages of the gcc com­piler

Preprocessor: Handles #include, #define and other di­rec­tives. It takes the source code and pro­duces ex­panded source code.

Compiler: Takes the pre­processed source code and trans­lates it into as­sem­bly lan­guage. This is where the real heavy lift­ing hap­pens, un­der­stand­ing the C lan­guage, type check­ing, op­ti­miza­tions, reg­is­ter al­lo­ca­tion and so on.

Assembler: Converts the as­sem­bly lan­guage into ma­chine code (object files). It has to know the ex­act in­struc­tion en­cod­ing for the tar­get CPU ar­chi­tec­ture.

Linker: Takes one or more ob­ject files and com­bines them into a sin­gle ex­e­cutable. It re­solves ref­er­ences be­tween files, sets up mem­ory lay­out and pro­duces the fi­nal bi­nary.

Writing a pro­gram­ming lan­guage is hard (prior vibe cod­ing). Writing a com­piler is on an­other level en­tirely. A pro­gram­ming lan­guage de­fines the rules. A com­piler has to un­der­stand those rules, trans­late them into ma­chine in­struc­tions, op­ti­mize the out­put for speed and size, han­dle edge cases across dif­fer­ent CPU ar­chi­tec­tures and pro­duce cor­rect code every sin­gle time.

GCC has been in de­vel­op­ment since 1987. That is close to 40 years of work by thou­sands of con­trib­u­tors. It sup­ports dozens of ar­chi­tec­tures, hun­dreds of op­ti­miza­tion passes and mil­lions of edge cases that have been dis­cov­ered and fixed over the decades. The op­ti­miza­tion passes alone (register al­lo­ca­tion, func­tion in­lin­ing, loop un­rolling, vec­tor­iza­tion, dead code elim­i­na­tion, con­stant prop­a­ga­tion) rep­re­sent years of PhD-level re­search. This is one of the rea­sons why it’s ubiq­ui­tous.

This is why CCC be­ing able to com­pile real C code at all is note­wor­thy. But it also ex­plains why the out­put qual­ity is far from what GCC pro­duces. Building a com­piler that parses C cor­rectly is one thing. Building one that pro­duces fast and ef­fi­cient ma­chine code is a com­pletely dif­fer­ent chal­lenge.

Ironically, among the four stages, the com­piler (translation to as­sem­bly) is the most ap­proach­able one for an AI to build. It is mostly about pat­tern match­ing and rule ap­pli­ca­tion: take C con­structs and map them to as­sem­bly pat­terns.

The as­sem­bler is harder than it looks. It needs to know the ex­act bi­nary en­cod­ing of every in­struc­tion for the tar­get ar­chi­tec­ture. x86-64 alone has thou­sands of in­struc­tion vari­ants with com­plex en­cod­ing rules (REX pre­fixes, ModR/M bytes, SIB bytes, dis­place­ment sizes). Getting even one bit wrong means the CPU will do some­thing com­pletely un­ex­pected.

The linker is ar­guably the hard­est. It has to han­dle re­lo­ca­tions, sym­bol res­o­lu­tion across mul­ti­ple ob­ject files, dif­fer­ent sec­tion types, po­si­tion-in­de­pen­dent code, thread-lo­cal stor­age, dy­namic link­ing and for­mat-spe­cific de­tails of ELF bi­na­ries. The Linux ker­nel linker script alone is hun­dreds of lines of lay­out di­rec­tives that the linker must get ex­actly right.

The Linux ker­nel is one of the most com­plex C code­bases in the world. It has mil­lions of lines of code, uses GCC-specific ex­ten­sions, in­line as­sem­bly, linker scripts and count­less tricks that push the com­piler to its lim­its. It is not a good first test for a new com­piler.

SQLite, on the other hand, is dis­trib­uted as a sin­gle amal­ga­ma­tion file (one big .c file). It is stan­dard C, well-tested and self-con­tained. If your com­piler can han­dle SQLite, it can han­dle a lot. If it can­not han­dle SQLite cor­rectly, there is no point test­ing any­thing big­ger.

That is why I tested both. SQLite tells us about cor­rect­ness and run­time per­for­mance. The ker­nel tells us about scale and com­pat­i­bil­ity.

CCC was built with the gc­c_m16 Cargo fea­ture, which del­e­gates 16-bit real-mode boot code (-m16 flag) to GCC. This is needed be­cause CCCs i686 back­end pro­duces code too large for the 32KB real-mode limit. The x86_64 C code is com­piled en­tirely by CCC.

A cc­c_wrap­per.sh script routes .S as­sem­bly files to GCC (CCC does not process as­sem­bly) and all .c files to CCC.

Compilers are usu­ally mea­sured on be­low sce­nar­ios. Hence, tests are also de­signed around them.

Same hard­ware — iden­ti­cal VM specs for both com­pil­ers

Both run to com­ple­tion — no tests killed pre­ma­turely

CCC gets help where needed — gc­c_m16 fea­ture for boot code, wrap­per for as­sem­bly files

Same bench­mark script — bench­mark_sqlite.sh runs iden­ti­cally on both VMs

The bench­mark was de­signed to be CPU-bound:

* No cor­re­lated sub­queries (O(n^2) queries were re­placed with GROUP BY)

The fair com­par­i­son is CCC vs GCC at -O0 (no op­ti­miza­tion): CCC takes 87s vs GCCs 65s — CCC is 1.3x slower. The 5x faster” num­ber only ap­pears be­cause GCC is do­ing 7 min­utes of op­ti­miza­tion work that CCC sim­ply skips.

CCC com­piled every sin­gle C source file in the Linux 6.9 ker­nel with­out a sin­gle com­piler er­ror (0 er­rors, 96 warn­ings). This is gen­uinely im­pres­sive for a com­piler built en­tirely by an AI.

However, the build failed at the linker stage with around 40,784 un­de­fined ref­er­ence er­rors. The er­rors fol­low two pat­terns:

__jump_table re­lo­ca­tions — CCC gen­er­ates in­cor­rect re­lo­ca­tion en­tries for ker­nel jump la­bels (used for sta­tic keys/​tra­ce­points)

These are linker-vis­i­ble bugs in CCCs re­lo­ca­tion/​sym­bol gen­er­a­tion, not C lan­guage com­pi­la­tion bugs. This is a good ex­am­ple of why the linker is the hard­est part. The com­piler did its job fine, but the gen­er­ated re­lo­ca­tions were not quite right for the ker­nel’s com­plex linker script.

CCC -O0 and -O2 pro­duce byte-iden­ti­cal bi­na­ries (4,374,024 bytes). CCC has 15 SSA op­ti­miza­tion passes, but they all run at every op­ti­miza­tion level. There is no tiered op­ti­miza­tion — the -O flag is ac­cepted but com­pletely ig­nored.

When you ask GCC to com­pile with -O2, it per­forms dozens of ex­tra op­ti­miza­tion passes:

* Register al­lo­ca­tion: fit­ting vari­ables into CPU reg­is­ters so they do not spill to slow mem­ory

* Vectorization: us­ing SIMD in­struc­tions (SSE/AVX) to process mul­ti­ple val­ues at once

GCCs -O2 spends 7 min­utes do­ing this work, and the pay­off is clear: the re­sult­ing bi­nary runs 1.7x faster (6.1s vs 10.3s).

CCC does none of this at any op­ti­miza­tion level. Comparing CCC com­pile time vs GCC -O2 com­pile time” is like com­par­ing a printer that only prints in black-and-white vs one that does full color. The black-and-white printer is faster, but it is­n’t do­ing the same job.

CCC-compiled SQLite is func­tion­ally cor­rect — it pro­duces the same query re­sults as GCC-compiled SQLite. All 5 crash/​edge-case tests passed. But it is very slow.

No fail­ures ob­served dur­ing these tests:

The per-query break­down shows that CCCs slow­down is not uni­form. Simple queries are only 1-7x slower, but com­plex op­er­a­tions in­volv­ing nested loops blow up:

The pat­tern is clear: op­er­a­tions that in­volve nested it­er­a­tion (subqueries, JOINs) are or­ders of mag­ni­tude slower, while sim­ple se­quen­tial op­er­a­tions are only slightly slower.

Modern CPUs have a small set of fast stor­age lo­ca­tions called reg­is­ters. A good com­piler tries to keep fre­quently used vari­ables in these reg­is­ters. When there are more vari­ables than reg­is­ters, the com­piler spills” them to the stack (regular RAM), which is much slower.

CCCs biggest per­for­mance prob­lem is ex­ces­sive reg­is­ter spilling. SQLite’s core ex­e­cu­tion en­gine sqlite3Vd­be­Exec is a sin­gle func­tion with 100+ lo­cal vari­ables and a mas­sive switch state­ment. CCC does not have good reg­is­ter al­lo­ca­tion, so it spills al­most all vari­ables to the stack.

movq -0x1580(%rbp), %rax  ; load from deep stack off­set

movq %rax, -0x2ae8(%rbp)  ; store to an­other deep stack off­set

movq -0x1588(%rbp), %rax  ; load next value

movq %rax, -0x2af0(%rbp)  ; store to next off­set

; … dozens more mem­ory-to-mem­ory copies

CCC uses stack off­sets up to -0x2ae8 (11,000 bytes deep) for a func­tion with 32 vari­ables. Every op­er­a­tion goes: stack -> rax -> stack, us­ing %rax as a shut­tle reg­is­ter.

CCC is 4.2x slower than GCC O0 for reg­is­ter-heavy code. In sqlite3Vd­be­Exec with 100+ vari­ables and 200+ switch cases, this ra­tio com­pounds to 100x+.

CCC runs the same 15-pass SSA pipeline at all op­ti­miza­tion lev­els:

This means -O2 pro­vides zero ben­e­fit. Every bi­nary CCC pro­duces is ef­fec­tively -O0 qual­ity, re­gard­less of what flag you pass.

The 2.78x code bloat means more in­struc­tion cache misses, which com­pounds the reg­is­ter spilling penalty.

CCC-compiled bi­na­ries lack in­ter­nal func­tion sym­bols (nm re­ports 0 sym­bols, read­elf shows only 90 PLT stubs vs GCCs 1,500+ func­tions). This makes pro­fil­ing and de­bug­ging im­pos­si­ble.

The NOT IN (subquery) pat­tern causes SQLite to ex­e­cute a nested loop: for each of the around 100,000 rows in the outer table, it scans through around 10,000 rows in the in­ner table. That is roughly 1 bil­lion it­er­a­tions through SQLite’s main ex­e­cu­tion func­tion (sqlite3VdbeExec), which is ba­si­cally a gi­ant switch state­ment.

With CCCs roughly 4x per-it­er­a­tion over­head from reg­is­ter spilling, plus ex­tra cache misses from the 2.78x larger bi­nary (the CPU can­not keep all the in­struc­tions in its fast cache), the slow­down com­pounds:

* Cache pres­sure: around 2-3x ad­di­tional penalty (instructions do not fit in L1/L2 cache)

This is why sim­ple queries (INSERT, DROP TABLE) are only 1-2x slower, but nested op­er­a­tions blow up to 100,000x+ slower.

Correctness: Compiled every C file in the ker­nel (0 er­rors) and pro­duced cor­rect SQLite out­put for all queries

Stability: Zero crashes, zero seg­faults across all tests

Memory us­age: 5.9x more RAM for com­pi­la­tion (1.6 GB vs 272 MB for SQLite)

Compilation speed: Could only be com­pared with -O0 as CCC does not do any­thing be­yond this. CCC is around 25% slower vs GCC (87s vs 65s)

Within hours of Anthropic re­leas­ing CCC, some­one opened is­sue #1

Hello world does not com­pile”. The ex­am­ple straight from the README did not work on a fresh Fedora or Ubuntu in­stall:

$ ./target/release/ccc -o hello hello.c

/usr/include/stdio.h:34:10: er­ror: std­def.h: No such file or di­rec­tory

/usr/include/stdio.h:37:10: er­ror: stdarg.h: No such file or di­rec­tory

ccc: er­ror: 2 pre­proces­sor er­ror(s) in hello.c

Meanwhile, GCC com­piled it just fine. The is­sue was that CCCs pre­proces­sor did not search the right sys­tem in­clude paths for std­def.h and stdarg.h (these come from the com­piler, not the C li­brary). It got 288 thumbs-up re­ac­tions, over 200 com­ments and turned into one of those leg­endary GitHub threads where peo­ple tag @claude ask­ing it to fix the bug, ask @grok for sum­maries and post com­ments like my job is safe”.

Someone got it work­ing on Compiler Explorer and re­marked that the as­sem­bly out­put reminds me of the qual­ity of an un­der­grad­u­ate’s com­piler as­sign­ment”. Which, to be fair, is both harsh and not en­tirely wrong when you look at the reg­is­ter spilling pat­terns.

The is­sue is still open at the time of writ­ing.

Claude’s C Compiler is a re­mark­able achieve­ment. It is a work­ing C com­piler built en­tirely by an AI that can cor­rectly com­pile 2,844 files from the Linux ker­nel with­out a sin­gle er­ror. It pro­duces func­tion­ally cor­rect code (verified with SQLite — all queries re­turn cor­rect re­sults, all crash tests pass).

But it is not ready for real use:

The out­put code is very slow. CCC-compiled SQLite takes 2 hours to run a bench­mark that GCC fin­ishes in 10 sec­onds. The root cause is poor reg­is­ter al­lo­ca­tion — CCC uses a sin­gle reg­is­ter as a shut­tle to move val­ues be­tween stack lo­ca­tions, turn­ing every op­er­a­tion into mul­ti­ple mem­ory ac­cesses.

The compiles the ker­nel” claim needs a foot­note. CCC com­piles all the C source files, but the fi­nal bi­nary can­not be pro­duced be­cause CCC gen­er­ates in­cor­rect re­lo­ca­tions for ker­nel data struc­tures (__jump_table, __ksymtab).

Optimization flags are dec­o­ra­tive. Passing -O2 or -O3 to CCC does lit­er­ally noth­ing — the out­put bi­nary is byte-iden­ti­cal to -O0.

For Anthropic’s stated goal of demon­strat­ing that Claude can build com­plex soft­ware, CCC is a gen­uine suc­cess. For any­one want­ing to com­pile soft­ware to ac­tu­ally run ef­fi­ciently, GCC (or Clang, or any pro­duc­tion com­piler) re­mains the only real op­tion.

All scripts, re­sults and graphs are avail­able at com­pare-claude-com­piler

Part of this work was as­sisted by AI. The Python scripts used to gen­er­ate bench­mark re­sults and graphs were writ­ten with AI as­sis­tance. The bench­mark de­sign, test ex­e­cu­tion, analy­sis and writ­ing were done by a hu­man with AI help­ing where needed.

...

Read the original on harshanu.space »

10 258 shares, 12 trendiness

GitHub Agentic Workflows

Imagine a world where im­prove­ments to your repos­i­to­ries are au­to­mat­i­cally de­liv­ered as pull re­quests each morn­ing, ready for you to re­view. Issues are au­to­mat­i­cally triaged, CI fail­ures an­a­lyzed, doc­u­men­ta­tion main­tained, test cov­er­age im­proved and com­pli­ance mon­i­tored - all de­fined via sim­ple mark­down files.

GitHub Agentic Workflows de­liver this: repos­i­tory au­toma­tion, run­ning the cod­ing agents you know and love, in GitHub Actions, with strong guardrails and se­cu­rity-first de­sign prin­ci­ples.

Use GitHub Copilot, Claude by Anthropic or OpenAI Codex for event-trig­gered, re­cur­ring and sched­uled jobs to im­prove, doc­u­ment and an­a­lyze your repos­i­tory. GitHub Agentic Workflows are de­signed to aug­ment your ex­ist­ing, de­ter­min­is­tic CI/CD with Continuous AI ca­pa­bil­i­ties

GitHub Agentic Workflows has been de­vel­oped by GitHub Next and Microsoft Research with guardrails in mind. Agentic work­flows run with min­i­mal per­mis­sions by de­fault, with ex­plicit al­lowlist­ing for write op­er­a­tions and sand­boxed ex­e­cu­tion to help keep your repos­i­tory safe.

Workflows run with read-only per­mis­sions by de­fault. Write op­er­a­tions re­quire ex­plicit ap­proval through san­i­tized safe out­puts (pre-approved GitHub op­er­a­tions), with sand­boxed ex­e­cu­tion, tool al­lowlist­ing, and net­work iso­la­tion en­sur­ing AI agents op­er­ate within con­trolled bound­aries.

Write - Create a .md file with your au­toma­tion in­struc­tions in nat­ural lan­guage

Compile - Run gh aw com­pile to trans­form it into a GitHub Actions work­flow with guardrails (.lock.yml)

Run - GitHub Actions ex­e­cutes your work­flow au­to­mat­i­cally based on your trig­gers

Here’s a sim­ple work­flow that runs daily to cre­ate an up­beat sta­tus re­port:

The gh aw cli con­verts this into a GitHub Actions Workflow (.yml) that runs an AI agent (Copilot, Claude, Codex, …) in a con­tainer­ized en­vi­ron­ment on a sched­ule or man­u­ally.

The AI cod­ing agent reads your repos­i­tory con­text, an­a­lyzes is­sues, gen­er­ates vi­su­al­iza­tions, and cre­ates re­ports - all de­fined in nat­ural lan­guage rather than com­plex code.

Install the ex­ten­sion, add a sam­ple work­flow, and trig­ger your first run - all from the com­mand line in min­utes.

Create cus­tom agen­tic work­flows di­rectly from the GitHub web in­ter­face us­ing nat­ural lan­guage.

...

Read the original on github.github.io »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.