10 interesting stories served every morning and every evening.

Valve open source the Steam Machine e-ink screen so you can make your own

www.gamingonlinux.com

While Valve will not be mak­ing and pro­vid­ing their own e-ink dis­play for the Steam Machine, they have opened it up so any­one can now do it. Valve orig­i­nally teased it with the first lot of re­view­ers that got their hands on it.

All of it is avail­able on their GitLab un­der the MIT li­cense, which goes over every­thing you need to make your own and stick it on the front of your fancy new Steam Machine.

Image Credit - Gamers Nexus

They’re now call­ing it the Inkterface” and there’s a good few things you’ll need to make it in­clud­ing:

1 x Adafruit ESP32 Feather with 2MB PSRAM. 1 x Adafruit eInk Breakout Friend. 1 x Adafruit 5.83″ Monochrome eInk Panel. 13 x M2.5 x 5mm Pan Head Machine Screws. 4 x 1/4″ x 1/4″ x 3/16″ Stepped Magnet SB443-OUT.

1 x Adafruit ESP32 Feather with 2MB PSRAM.

1 x Adafruit eInk Breakout Friend.

1 x Adafruit 5.83″ Monochrome eInk Panel.

13 x M2.5 x 5mm Pan Head Machine Screws.

4 x 1/4″ x 1/4″ x 3/16″ Stepped Magnet SB443-OUT.

Valve even pro­vided a video on the GitLab show­ing it be­ing put to­gether, which we’re re-host­ing to make it eas­ily view­able for you:

Pretty cool to see.

Maybe we will see some other ven­dors ac­tu­ally do them pre-built for us. JSAUX teased they would be do­ing it back in November 2025, and check­ing back to­day they’ve said they still plan to do Ink & Pixel ver­sions”. If the Steam Machine is pop­u­lar enough - no doubt we’ll have other ac­ces­sory brands do var­i­ous ver­sions of their own.

🌐 External Sources: git­lab.steamos.cloud, x.com/​jsaux­of­fi­cial, gamer­snexus.net Article taken from GamingOnLinux.com.

The Anti-Amazon

phenomenalworld.org

We are in a new age of lo­gis­ti­cal prowess, led by the dy­namism of Amazon as it strives to carry out dizzy­ingly com­plex forms of or­der ful­fill­ment and de­liv­ery. With the age of agen­tic com­merce just around the cor­ner—think of go­ing to ChatGPT and hav­ing an AI agent scour every web­site for the cheap­est of­fer­ing of the spe­cific dog food you buy—there is an ex­pec­ta­tion that the fu­ture of re­tail is near in­fi­nite as­sort­ment and ul­tra-fast de­liv­ery. Consumers want the ex­act fla­vor of the ex­act thing that they’re look­ing for, and they want it at their doorstep now. It seems some­times that we are test­ing the bounds of in­fra­struc­tural ca­pac­ity and au­toma­tion in lo­gis­tics to ful­fill this dream.

There are a few things wrong with this dream, how­ever, and the first, as I’ll re­view in a mo­ment, is sim­ply that it might not be so de­sir­able. Even if you think it is prefer­able at an in­di­vid­ual level, there are good rea­sons to ques­tion the so­cial value of the lo­gis­ti­cal com­plex­ity that it ne­ces­si­tates. Home de­liv­ery of sin­gle-pack­aged items en­tails an en­tirely dif­fer­ent cost struc­ture than freight trucks dri­ving to con­sumer-fac­ing ware­houses de­liv­er­ing en­tire pal­lets of goods to be dri­ven home by cus­tomers them­selves. Two com­pa­nies have emerged with ideal-type busi­ness mod­els that dra­ma­tize the dif­fer­ent economies at each end of this spec­trum: Amazon and Costco. Late to the e-com­merce game, min­i­mally in­vested in their dis­tri­b­u­tion net­work, and com­mit­ted as ever to an ar­ti­fi­cially-lim­ited as­sort­ment, Costco is the anti-Ama­zon. It em­bod­ies the pre­cise op­po­site of every­thing imag­ined by the e-com­merce fu­tur­ists—and yet some­how its rev­enue has grown by an av­er­age of more than 10 per­cent every year for the last five years.

Constraint and so­cial­ity

In some cases, con­sumers might want ac­cess to full prod­uct as­sort­ment: when, for in­stance, there’s a spot in the home that only fits a fur­nish­ing of cer­tain di­men­sions, or when mak­ing a ma­jor elec­tron­ics pur­chase. But in gen­eral, scrolling through op­tions and read­ing through re­views on­line for every con­sump­tion choice is over­whelm­ing and anx­i­ety-pro­duc­ing: infinite, mean­ing­less op­tions can re­sult in some­thing like a con­sumer fugue state,” The Atlantic once ar­gued.

One bril­liant fea­ture of the Costco ex­pe­ri­ence is, para­dox­i­cally, the con­straint: as op­posed to Amazon, with its near in­fi­nite as­sort­ment, or even Walmart, which has ap­prox­i­mately 130,000 SKUs (stock keep­ing units, or dis­tinct items) in the av­er­age Supercenter, any given Costco will only hold 4,000 SKUs to choose from. While most re­tail­ers to­day as­sume that con­sumers want ever greater as­sort­ment, Costco’s pop­u­lar­ity speaks to a coun­ter­vail­ing de­sire for less choice. Indeed, the pre-se­lec­tion of items for sale in their ware­houses is part of the value propo­si­tion: not only are you go­ing to get a lot of a par­tic­u­lar thing for a good price, but you also won’t have to de­lib­er­ate over mi­cro-dif­fer­ences in a more ro­bust as­sort­ment.

In other words, win­now­ing se­lec­tion is a ser­vice, not a lim­i­ta­tion—es­pe­cially with Costco’s prod­uct cat­a­log. Costco is not known for hav­ing the cheap­est goods, but it is known for hav­ing the cheap­est price on its goods, and that is be­cause its buy­ing team has closer re­la­tion­ships with sup­pli­ers than any other big re­tailer. Such scrutiny and com­mu­ni­ca­tion point away from low-road sup­pli­ers. This is a struc­tural ef­fect of Costco’s con­scious choice to of­fer a low SKU count: fewer prod­ucts to in­ves­ti­gate means more time to in­ves­ti­gate each prod­uct, and a nat­ural grav­i­ta­tion away from the bar­gain base­ment. That its mem­ber-cus­tomers have come to ex­pect a cer­tain qual­ity of every­thing in their stores re­in­forces this dy­namic.

The low SKU count also al­lows Costco nat­u­rally to do some­thing that Amazon does by squeez­ing sup­pli­ers: a low or even neg­a­tive cash con­ver­sion cy­cle (CCC). The CCC is a cor­po­rate fi­nance mea­sure of how long it takes to turn in­ven­tory into cash through sales. Amazon of­ten ne­go­ti­ates de­layed pay­ment terms with sup­pli­ers, lean­ing on them to al­low pay­ment win­dows longer than the thirty-day in­dus­try norm. Meanwhile, given the speed of its e-com­merce busi­ness, Amazon is of­ten re­ceiv­ing pay­ment from con­sumers way be­fore it has to pay sup­pli­ers, es­sen­tially giv­ing the re­tailer in­ter­est-free cash. Costco en­joys the same ben­e­fit of a short or neg­a­tive CCC, but with­out hav­ing to anger sup­pli­ers sim­ply be­cause fewer SKUs means a faster-mov­ing in­ven­tory for the SKUs that they do carry. In other words, when a Costco store re­ceives a ship­ment of a par­tic­u­lar item from a sup­plier, it is of­ten go­ing to sell every unit in that ship­ment in less than a month, thanks to its scale and the sim­ple fact that that par­tic­u­lar item is go­ing to be the only va­ri­ety in store.

Costco’s in-store ex­pe­ri­ence is an­other draw for cus­tomers, and this too runs counter to the pre­vail­ing view in the future of re­tail” con­ver­sa­tion. While the e-com­merce share of re­tail has been steadily grow­ing, it’s still un­der 17 per­cent in the United States, and one won­ders in a so­ci­ety as anti-so­cial as our own if cus­tomers find in-per­son shop­ping, de­graded a social” ven­ture as it is, de­sir­able even in its in­con­ve­nience.

Shopping at Costco is al­ways some­what har­ried: no shop­per can avoid lines at the reg­is­ters or traf­fic jams in the aisles, even on the week­days. It is the pre­cise op­po­site of e-com­merce con­ve­nience. And yet mem­bers not only don’t seem to mind the nui­sance, they pos­i­tively em­brace it. Costco no­tably spends very lit­tle on ad­ver­tis­ing, but it does­n’t re­ally need to, given the re­mark­able amount of free at­ten­tion it gets by word of mouth and on so­cial me­dia from en­thu­si­as­tic shop­pers talkin’ deals.” Costco has be­come a re­tail des­ti­na­tion with a very loyal mem­ber­ship base (its an­nual mem­ber­ship re­newal rate is typ­i­cally above 90 per­cent) while of­fer­ing a sparse, no-frills re­tail ex­pe­ri­ence.

Benefits of sim­plic­ity

But con­sumer pref­er­ence is only one met­ric by which to judge the de­sir­abil­ity of assortment and home de­liv­ery” vs. constraint and in-per­son shop­ping.” It should be re­mem­bered that the term logistics” comes from a mil­i­tary con­text—the French word for the art of mov­ing, quar­ter­ing, and sup­ply­ing troops”—and lo­gis­ti­cal suc­cess in busi­ness means most fun­da­men­tally the suc­cess with which goods are sup­plied to the cus­tomers who need them. Applied so­cially, our un­der­stand­ing of lo­gis­ti­cal suc­cess must be based not only on how goods are be­ing sup­plied to any par­tic­u­lar per­son, with their own smat­ter­ing of in­di­vid­ual pref­er­ences, but also on how they are be­ing sup­plied as a whole.

At the so­cial level, lo­gis­ti­cal suc­cess can be mea­sured in terms of cost ef­fi­ciency. This cost can be un­der­stood in ac­count­ing terms as over­head: the ware­houses, the ve­hi­cle fleet, fuel costs, fork­lifts. An en­ter­prise is more ef­fi­cient when it can spread these costs over a larger vol­ume of goods. A cost-ef­fi­cient op­er­a­tion is also sim­ple, in that it’s re­li­able and not prone to dis­rup­tion. The more com­pli­cated an op­er­a­tion, the more likely it is to fail. A sim­ple op­er­a­tion also puts fewer de­mands on trans­porta­tion in­fra­struc­ture—an ur­gent ques­tion in con­gested ur­ban en­vi­ron­ments.

To put it crudely, hav­ing some­one in a Sprinter van de­liver a re­cently-pur­chased tooth­brush to your doorstep is sim­ply not a uni­ver­sal­iz­able ac­tion, from ei­ther a busi­ness or lo­gis­ti­cal stand­point. It is a mod­ern feat that Amazon is ca­pa­ble of do­ing this, but that it can be done does not mean that it should, nor even that it can be done writ large. For most con­sump­tion, it is far more ef­fi­cient for peo­ple to han­dle the last-mile de­liv­ery” them­selves by go­ing to stores and buy­ing a good amount of stuff when they do so. This keeps de­liv­ery vans off the road, and it min­i­mizes car trips for nec­es­sary pur­chases. For the re­tail­ers, it sup­presses un­nec­es­sary mark-up both by keep­ing over­head costs low and by sim­pli­fy­ing over­all lo­gis­ti­cal op­er­a­tions.

Costco’s in­come state­ments as re­ported in its Form 10-K in­clude the stan­dard op­er­at­ing ex­pense cat­e­gory for Selling, General, and Administrative” costs. This cat­e­gory is con­sis­tently and qual­i­ta­tively lower than any of its com­peti­tors’—10 per­cent of sales, com­pared to Amazon’s de­liv­ery costs of 40 per­cent of non-AWS sales. One of the rea­sons for this is the bare bones na­ture of the Costco dis­tri­b­u­tion net­work. At Costco’s depots” (as op­posed to their stores, which the com­pany calls warehouses”), all in- and out­bound in­ven­tory is cross-docked in pal­let quan­ti­ties: full pal­lets come in from sup­pli­ers on one side of the build­ing, work­ers on elec­tric pal­let jacks move those pal­lets from one side to the other, and full pal­lets are loaded onto trucks bound for stores. There is no pal­let break­down at a Costco de­pot, no con­veyor belts, no fancy au­toma­tion.

Such low over­head not only al­lows Costco to de­liver on low prices to their cus­tomers; it also al­lows the com­pany to pay rel­a­tively high wages to their work­ers. According to Indeed, Walmart pays re­tail sales as­so­ci­ates an av­er­age of $16.23 an hour, and Amazon pays ware­house as­so­ci­ates an av­er­age of $19.14 an hour. Costco pays front end as­so­ci­ates an av­er­age of $21.29 an hour. This has al­lowed Costco to achieve an as­ton­ish­ingly low rate of la­bor turnover: com­pared to 60 per­cent turnover in re­tail gen­er­ally and 150 per­cent in Amazon ware­houses, the an­nual work­force turnover rate at Costco is just 6 per­cent. This is of­ten treated in the busi­ness press as a mat­ter of com­pany culture,” but it has a clear eco­nomic un­der­pin­ning. When you min­i­mize over­head, you can sim­ply pay work­ers more with­out squeez­ing your over­all mar­gin.

As I’ve said else­where, it’s ironic that Jeff Bezos orig­i­nally got the idea for Prime, Amazon’s mem­ber­ship model, from for­mer Costco CEO Jim Sinegal. Prime en­ti­tles mem­bers to free two-day de­liv­ery on over 300 mil­lion prod­ucts (in ad­di­tion to stream­ing ser­vices). With such a wide range of pos­si­ble sin­gle-item or­ders, free de­liv­ery en­cour­ages less bundling of cus­tomer pur­chases. Whereas Costco mem­ber­ship helps to re­duce over­head, Amazon mem­ber­ship in­creases it. Meeting two-day de­liv­ery de­mand re­quires dra­matic in­vest­ments in their dis­tri­b­u­tion net­work, which is re­flected in the higher share of sales ac­counted for by Amazon’s de­liv­ery costs. Lower over­head means more for work­ers, but it also means less or­ga­ni­za­tional stress on those work­ers, as Costco em­ploy­ees are not sub­ject to the quo­tas and sur­veil­lance that Amazon’s e-com­merce busi­ness de­mands.

We don’t typ­i­cally praise Costco for its lo­gis­tics in the way that we do Amazon. But the for­mer in fact of­fers a far more lo­gis­ti­cally el­e­gant and so­cially ben­e­fi­cial model of goods pro­vi­sion than the lat­ter. Amazon dom­i­nates when it comes to lo­gis­ti­cally-com­plex op­er­a­tions, but there is no in­her­ent rea­son to pre­fer com­pli­cated op­er­a­tions to sim­ple ones. If any­thing, sim­plic­ity should be the rule. Why do you need to fig­ure out how to in­te­grate ro­botic arms into a ful­fill­ment op­er­a­tion dom­i­nated by au­tonomous mo­bile units when you can cross-dock full pal­lets? Is it more lo­gis­ti­cally im­pres­sive to solve dif­fi­cult prob­lems or to elim­i­nate the need to solve them in the first place?

That said, there is no ques­tion that, in a bet­ter so­ci­ety than the one we have, key parts of Amazon’s op­er­a­tion would be re­tained for of­fer­ing func­tions that con­tribute to the so­cial good. The ca­pac­ity to de­liver pre­scrip­tion med­i­cines same-day to the el­derly is a gen­uine so­cial con­tri­bu­tion. (Academics who like to talk about counter-logistics,” a po­lit­i­cal ori­en­ta­tion aimed at dis­man­tling lo­gis­ti­cal power, tend to ig­nore the pos­i­tive so­cial ben­e­fits that mod­ern lo­gis­tics pro­vides.)

But if we’re look­ing for a gen­er­al­iz­able model for the so­cial pro­vi­sion of goods, Costco of­fers a foun­da­tion­ally use­ful blue­print for every­day per­sonal con­sump­tion while Amazon does not. Amazon is hop­ing that its foray into gro­cery and every­day es­sen­tials will en­cour­age more or­der bundling, and given the im­por­tance it’s ac­cord­ing to this seg­ment, it will no doubt con­tinue to make head­way there. But to date it has still not been able to make the con­ver­sion away from be­ing an on­line con­ve­nience store, which tells you some­thing im­por­tant about its model: Amazon is there to fill in the gaps of a dom­i­nant mode of goods pro­cure­ment, not to re­place it.

Lessons for pub­lic gro­cery

In May, New York City mayor Zohran Mamdani re­it­er­ated his ded­i­ca­tion to cre­at­ing a pub­lic op­tion in gro­cery, an­nounc­ing plans to roll out one pub­lic gro­cery store in each bor­ough, with two lo­ca­tions in the Bronx and Manhattan al­ready scouted. The pub­lic gro­cery store is a model long over­due for wide­spread test­ing out, and as ex-Whole Foods Vice President Errol Schweizer has force­fully ar­gued, there is al­ready a shin­ing ex­am­ple of it in the mil­i­tary com­mis­sary sys­tem. Commissary prices are typ­i­cally 25 – 30 per­cent lower for vet­er­ans and mil­i­tary fam­i­lies.

The may­or’s crit­ics have un­sur­pris­ingly set­tled on the cri­tique that Mamdani’s plan will use tax­payer dol­lars to make life even harder for strug­gling gro­cers in NYC—conceding the idea that it will in­deed re­sult in cheaper gro­ceries for New Yorkers. This is the ter­rain on which they want the dis­course to play out be­cause they ex­pect, not with­out some jus­ti­fi­ca­tion, that spend­ing on this pro­ject will in­volve an in­ef­fi­cient use of re­sources that is not worth the so­cial ben­e­fit it pro­vides. They will be comb­ing through the re­ceipts to find that one ex­pense that il­lus­trates gov­ern­ment in­ef­fi­ciency or even graft.

One sim­ple way to keep over­head low and stay cash pos­i­tive is to fol­low the Costco model: low SKU count, high vol­ume. The low SKU count is not only a way to have a de­sir­able cash con­ver­sion cy­cle (an im­por­tant way of beat­ing back crit­ics on the right); it also cre­ates the op­por­tu­nity to de­velop re­la­tion­ships with good sup­pli­ers. There will be a temp­ta­tion to in­vest in giv­ing these stores that retail look,” with con­sul­tants jump­ing in to em­pha­size the im­por­tance of shelf place­ment and sig­nage. To my mind, the aisles could look much more like Costco ware­houses, but with full case stacks in­stead of pal­lets, pro­vided that shop­pers know the city has made the ef­fort on the front end to work with high-road sup­pli­ers. A solid mar­ket­ing cam­paign around the re­la­tion­ships that the city has de­vel­oped with lo­cal sup­pli­ers will do more to drive traf­fic than the usual re­tail gim­micks.

Volume is the other key con­sid­er­a­tion, and if I were on Mamdani’s team, I would deem­pha­size the one in each bor­ough” line. Mamdani has said that these stores will buy and sell at whole­sale prices” in part by centraliz[ing] ware­hous­ing and dis­tri­b­u­tion,” but cen­tral ware­hous­ing and dis­tri­b­u­tion on five stores does­n’t mean much. When Costco had five stores, they had no cen­tral dis­tri­b­u­tion net­work be­cause they did­n’t need it. Volume will be what makes cen­tral­iz­ing ware­hous­ing and dis­tri­b­u­tion worth­while, and for that Mamdani’s team will need to be open to achiev­ing the scale economies that the math will point to. Errol Schweizer and food sys­tems ex­pert Raj Patel have ar­gued as much else­where, sug­gest­ing at least twenty stores.

There are many other lessons to learn from Costco, but one that sticks out to me as per­fectly re­pro­ducible within the con­text of pub­lic gro­cery stores is choos­ing a sin­gle loss leader that the sys­tem be­comes known for: in Costco’s case, the $1.50 hot dog and soda combo (that price has not changed for over forty years). For NYCs pub­lic gro­cery stores, how about a $2.12 ha­lal wrap?

It’s worth not­ing that Costco traces its lin­eage back to Fedco, or the Federal Employees Distributing Company, a mem­ber­ship store started by post of­fice em­ploy­ees in 1948. Fedco was es­sen­tially copied by Sol Price when he cre­ated Price Club, Costco’s pri­mary com­peti­tor un­til the two merged in 1993. The ba­sic idea be­hind Fedco was that fed­eral em­ploy­ees could lever­age their col­lec­tive buy­ing power to elim­i­nate tra­di­tional re­tail store markup. It was, in essence, a re­mark­able tes­ta­ment to the power of the pub­lic purse, one that in­spired the cre­ation of a re­tail be­he­moth that nat­u­rally holds lessons for ex­er­cis­ing that power once more.

Filed Under

Espionage Against the European Parliament: Member of Committee Investigating Spyware Hacked with Pegasus - The Citizen Lab

citizenlab.ca

Key Findings

Former Member of the European Parliament, Stelios Kouloglou, was re­peat­edly hacked with NSO Group’s Pegasus spy­ware while on the com­mit­tee in­ves­ti­gat­ing Pegasus spy­ware abuses.

Kouloglou was in­fected dur­ing key pe­ri­ods of PEGA com­mit­tee ac­tiv­ity, and the spy­ware would have likely cap­tured non-pub­lic in­for­ma­tion about com­mit­tee ac­tiv­i­ties, pos­si­bly breach­ing EU par­lia­men­tary con­fi­den­tial­ity and priv­i­lege frame­works.

We are not at­tribut­ing these in­fec­tions to a par­tic­u­lar gov­ern­ment at this time, and found no in­di­ca­tions that the Greek Government is re­spon­si­ble. Instead, we note an over­lap be­tween the first in­fec­tion and a pre­vi­ously iden­ti­fied Pegasus cam­paign tar­get­ing Russian and Belarusian-speaking ex­iled jour­nal­ists and ac­tivists in Europe, sug­gest­ing a Pegasus cus­tomer with au­tho­riza­tion to spy in mul­ti­ple European coun­tries is re­spon­si­ble.

Background

Stelios Kouloglou is a promi­nent Greek in­ves­tiga­tive jour­nal­ist who was elected as a Member of the European Parliament in 2015. He re­ported for Greek ra­dio and TV from Paris (1983 – 84), Moscow (1989 – 93), and Yugoslavia (1992 – 95). He later founded and re­ported for Television Without Borders (TVXS) start­ing in 2008.

Kouloglou was elected to the European par­lia­ment as an in­de­pen­dent in the Syriza par­ty’s elec­toral list (affiliated with the Left). He was elected to the next par­lia­men­tary term in the 2019 European elec­tions.

Kouloglou was a sub­sti­tute mem­ber of the European Parliament’s Committee of Inquiry to in­ves­ti­gate the use of Pegasus and equiv­a­lent sur­veil­lance spy­ware (PEGA Committee) from March 24, 2022 to July 18, 2023. The PEGA Committee was es­tab­lished on March 10, 2022 fol­low­ing the 2021 pub­li­ca­tion of the Pegasus Project and other re­port­ing which re­vealed European gov­ern­ments used spy­ware to sur­veil jour­nal­ists, ac­tivists, politi­cians, and other cit­i­zens. Led by MEP Sophie in t Veld, the PEGA Committee was tasked to in­ves­ti­gate the scope of spy­ware us­age in con­tra­ven­tion of EU law, fo­cus­ing on Pegasus and equiv­a­lent sur­veil­lance spy­ware.”

While sit­ting as an MEP, Kouloglou con­tin­ued to write opin­ion pieces and re­port for TVXS. He left the Syriza party in October 2023 and sat as an in­de­pen­dent un­til the elec­tions of June 2024, af­ter which he served as a mem­ber of the New Left. His par­lia­men­tary term ended in July 2024.

Kouloglou Infected with Pegasus Spyware

In May 2026, Kouloglou con­tacted the Citizen Lab and we con­ducted a foren­sic analy­sis of ar­ti­facts from his iPhone. We found with high con­fi­dence that his de­vice was suc­cess­fully in­fected with Pegasus spy­ware on or around October 21, 2022, and again on March 6 and 7, 2023.

On 2022 – 10-21 10:16, there was a lookup for a HomeKit email ad­dress rauhare­po888 [@]gmail.com. Two min­utes later, a Pegasus process used mo­bile data. We as­sess that the phone was hacked with the PWNYOURHOME zero-click ex­ploit at this point. PWNYOURHOME ap­peared to first in­volve the at­tacker send­ing a spe­cially crafted NSKeyedArchive that landed in HomeKit, fol­lowed by ma­li­cious con­tent that landed in MessagesBlastDoorService. Apple mit­i­gated the first is­sue with a change to HomeKit in iOS 16.3.1, though we as­sess that they fixed the MessagesBlastDoorServiceissue ear­lier, likely in iOS 16.1.

We ad­di­tion­ally saw Pegasus ac­tiv­ity on Kouloglou’s de­vice be­tween 2023 – 03-06 09:49 and 2023 – 03-07 07:30 that we as­sess is likely linked to the same ex­ploit. On the 2022 and 2023 dates, we as­sess that the de­vice was run­ning iOS 15.5 (19F77).

These find­ings do not pre­clude the pos­si­bil­ity of ad­di­tional in­fec­tions that we have been un­able to cap­ture due to lim­i­ta­tions of avail­able foren­sic data.

Apple Notifications

Further val­i­dat­ing our find­ing of tar­get­ing, our foren­sic analy­sis shows Kouloglou re­ceived mul­ti­ple Apple threat no­ti­fi­ca­tions about tar­get­ing with mer­ce­nary spy­ware on three oc­ca­sions: March 2, 2023, August 29, 2023, and April 10, 2024. It is im­por­tant to note that threat no­ti­fi­ca­tions from Apple and other com­pa­nies are not real-time alerts. They are typ­i­cally sent to users in batches, of­ten months or more af­ter tar­get­ing takes place.

Kouloglou re­ports to us that he did not re­call re­ceiv­ing the Apple no­ti­fi­ca­tions we ob­served.

Targeting Context and PEGA Committee Activities

Kouloglou helped the Citizen Lab re­con­struct his ac­tiv­i­ties dur­ing the pe­ri­ods when he was tar­geted with Pegasus spy­ware (see the de­tailed time­line in the Appendix). Throughout the pe­riod un­der con­sid­er­a­tion, Kouloglou wrote nu­mer­ous ar­ti­cles and gave fre­quent in­ter­views about spy­ware abuses. We sum­ma­rize the key con­tex­tual de­tails in the fol­low­ing sec­tions.

First Pegasus Infection Period: PEGA Hearing Prep, Country Visits

The date of the first known Pegasus in­fec­tion of Kouloglou’s de­vice — October 21, 2022 — aligns with a par­tic­u­larly in­tense pe­riod of ac­tiv­ity around the PEGA Committee’s de­lib­er­a­tions and in­ves­ti­ga­tions.

First, a se­ries of PEGA Committee hear­ings were about to com­mence fol­low­ing the in­fec­tion date, in­clud­ing Big Tech and Spyware” (October 26), Spyware and e-pri­vacy” (October 26), and spy­ware and fun­da­men­tal rights (October 27).

Importantly, the PEGA Committee was also in the midst of prepa­ra­tions for the pub­li­ca­tion of its first draft re­port. Drafts of the re­port were be­ing dis­cussed and cir­cu­lat­ing among PEGA Committee mem­bers and their staff in the weeks lead­ing up to this pub­li­ca­tion. Kouloglou con­firms that the first in­fec­tion date (October 21, 2022) co­in­cided with a pe­riod of in­tense dis­cus­sion and ex­change that pri­mar­ily took place over text mes­sages and email. The first draft of the PEGA Committee Report was de­liv­ered by MEP in t Veld on November 8, 2022. The draft fo­cused on al­le­ga­tions of spy­ware in Poland, Hungary, Greece, Cyprus, and Spain.

In ad­di­tion to the hear­ing and re­port draft­ing, PEGA Committee mem­bers had vis­ited sev­eral European coun­tries as part of their mis­sion. Throughout October, the PEGA com­mit­tee was plan­ning its re­search vis­its to Greece and Cyprus sched­uled for November 1 to 4, 2022. Kouloglou helped with plan­ning and par­tic­i­pated in both vis­its as part of the PEGA Committee de­lib­er­a­tions. Kouloglous’ de­vice was hacked ten days prior to the start of this trip, at a time when com­mu­ni­ca­tions were be­ing ex­changed about the vis­its.

Meeting with Thanasis Koukakis

On October 21, 2022, the ex­act date of the in­fec­tion, Kouloglou was in the hos­pi­tal for elec­tive surgery. He was vis­ited in his hos­pi­tal room by Greek in­ves­tiga­tive jour­nal­ist Thanasis Koukakis, who has worked closely on mer­ce­nary spy­ware is­sues in Greece, and had tes­ti­fied to the PEGA Committee the pre­vi­ous month. In March 2022, the Citizen Lab had con­firmed Koukakis was him­self tar­geted with Intellexa’s Predator spy­ware and he was at this time pur­su­ing le­gal reme­dies and for­mal com­plaints with rel­e­vant au­thor­i­ties in Greece about the spy­ing. Koukakis memo­ri­al­ized his meet­ing with Kouloglou with a pho­to­graph (Figure 2).

Given that the in­fec­tion took place while Kouloglou was a pa­tient at a Greek hos­pi­tal, it is pos­si­ble that con­fi­den­tial med­ical in­for­ma­tion could have been in­ter­cepted from his de­vice, in­clud­ing dis­cus­sions go­ing on in his room. If the spy­ware cap­tured con­ver­sa­tions be­tween Kouloglou and med­ical staff, or de­tails stored on the phone con­cern­ing ap­point­ments, med­ical re­sults, di­ag­noses, and other health re­lated in­for­ma­tion, then the hack­ing of his de­vice may im­pli­cate Greece’s laws con­cern­ing con­fi­den­tial­ity of health-re­lated data, which are con­sid­ered a spe­cial cat­e­gory of per­sonal data and are sub­ject to en­hanced pro­tec­tions (Law 4624/2019 un­der the Greek Penal Code).

Second Pegasus Infection Period: Intense PEGA Deliberations

Kouloglou’s de­vice was hacked with Pegasus spy­ware a sec­ond time, on March 6 and 7, 2023. According to Kouloglou, dur­ing this time frame, the PEGA com­mit­tee was en­gaged in in­tense dis­cus­sions re­lated to the fi­nal draft­ing process. On March 6, 2023 Kouloglou trav­eled from Athens to Brussels and was in Brussels on March 6 and 7 dur­ing the time­frame of the in­fec­tion.

It may also be sig­nif­i­cant that, at this time, PEGA Rapporteur MEP in t Veld was in Greece as part of a mis­sion with the LIBE Committee (Committee on Civil Liberties, Justice and Home Affairs), which is a stand­ing com­mit­tee of the European Parliament pri­mar­ily re­spon­si­ble for draft­ing leg­is­la­tion and pro­vid­ing de­mo­c­ra­tic over­sight around is­sues con­cern­ing hu­man rights, data pro­tec­tion, asy­lum, im­mi­gra­tion, and anti-dis­crim­i­na­tion. On that mis­sion, the LIBE del­e­ga­tion ques­tioned the Greek Director of the National Transparency Authority and other of­fi­cials on the Greek spy­ware scan­dal (report para 183).

As with the pre­vi­ous in­fec­tion dates, the date of this in­fec­tion was also fol­lowed by a string of PEGA hear­ings and a re­search trip to Spain (although Kouloglou did not par­tic­i­pate in that trip him­self). This in­fec­tion took place ap­prox­i­mately two months prior to the adop­tion of the first PEGA Committee re­port (May 8, 2023).

Separately, Kouloglou and Thanasis Koukakis had made ten­ta­tive plans over WhatsApp to meet on or around March 6 and 7, 2023, but ul­ti­mately their in per­son meet­ing did not take place.

The European Parliament Under Surveillance

This is the first time a mem­ber of the PEGA Committee has been pub­licly iden­ti­fied as a vic­tim of Pegasus spy­ware while serv­ing on the Committee.

There have been a few pub­lic cases of MEP tar­get­ing prior to the cre­ation of the PEGA Committee. Four Catalan MEPs were ei­ther di­rectly or in­di­rectly tar­geted with Pegasus: MEP Diana Riba’s de­vices were in­fected in October 2019. Catalan MEP Jordi Solé was tar­geted in June 2020, just prior to tak­ing his seat in the European Parliament. Two other Catalan MEPs were tar­geted through their staff or fam­ily mem­bers: Clara Ponsati (July 2020) and Carles Puigdemont (October 2019 and July 2020). Riba, Solé, and Puigdemont all joined the PEGA Committee: Riba as Vice-Chair, Puigdemont as a mem­ber, and Solé as a sub­sti­tute. They tes­ti­fied about their ex­pe­ri­ences to the PEGA Committee, along­side Antoni Comín and Nikos Androulakis (whose de­vice was tar­geted with Predator spy­ware).

Outside of the PEGA Committee, in February 2024, Politico re­ported that MEPs on the se­cu­rity and de­fence sub­com­mit­tee were asked to have their phones checked af­ter traces of spy­ware were found on two de­vices. French MEP Nathalie Loiseau, chair of the com­mit­tee, con­firmed she was tar­geted with Pegasus. The European Parliament’s IT Services in­formed Bulgarian MEP Elena Yoncheva that her de­vice had been tar­geted in late October 2023.  In May 2024, fol­low­ing the con­clu­sion of the PEGA Committee pro­ceed­ings, German MEP Daniel Freund an­nounced he had been tar­geted with Candiru’s mer­ce­nary spy­ware.

Attribution

While we as­sess with high con­fi­dence that Kouloglou was tar­geted and in­fected with NSO Group’s Pegasus mer­ce­nary spy­ware, we are not at­tribut­ing these in­ves­ti­ga­tions to a spe­cific NSO Group cus­tomer.

Since 2022, when the Citizen Lab first dis­cov­ered the hack­ing of Thanasis Koukakis’ de­vice with Predator spy­ware, the Greek gov­ern­ment has been em­broiled in a grow­ing sur­veil­lance scan­dal in­volv­ing abu­sive tar­get­ing of civil so­ci­ety. However, we have no in­di­ca­tions that this hack­ing was the work of the Greek gov­ern­ment. There are no re­ports that Greece is or was a cus­tomer of NSO Group or a user of Pegasus spy­ware. While the Greek gov­ern­ment is known to have ex­ten­sively abused Intellexa’s Predator mer­ce­nary spy­ware, the Citizen Lab is un­aware of any tech­ni­cal in­di­ca­tors sug­gest­ing Greek se­cu­rity and in­tel­li­gence ser­vices had ac­cess to NSO Group’s Pegasus spy­ware.

However, we be­lieve that the same op­er­a­tor tar­geted both Kouloglou in 2022 and the tar­gets we high­lighted in our May 2024 joint re­port with Access Now. In that re­port, we found that seven Russian and Belarusian-speaking in­de­pen­dent jour­nal­ists and op­po­si­tion ac­tivists based in Europe were tar­geted and/​or in­fected with NSO Group’s Pegasus mer­ce­nary spy­ware. One of the redacted Apple IDs (Email 1) from that re­port is rauhare­po888[@]gmail.com, the same HomeKit email that tar­geted Kouloglou. In our un­der­stand­ing of Pegasus in­fec­tion in­fra­struc­ture dur­ing this pe­riod, we be­lieve that these emails are unique to spe­cific op­er­a­tors. We are un­able to say whether the sec­ond in­fec­tion in 2023 is sim­i­larly con­nected to this op­er­a­tor, or a dif­fer­ent op­er­a­tor.

We fur­ther note that in­fec­tions ap­pear to have been pre­sent on his phone in at least two European ju­ris­dic­tions (We fur­ther note that in­fec­tions ap­pear to have been pre­sent on his phone in at least two European ju­ris­dic­tions (Greece and Belgium). Based on what we know of NSO Group’s li­cens­ing, this would likely in­di­cate that the cus­tomer had a li­cense that en­abled in­fec­tions in mul­ti­ple EU ju­ris­dic­tions, nar­row­ing the list of po­ten­tial Pegasus op­er­a­tors that could be re­spon­si­ble for this case.

Conclusion

The in­fec­tion of a European MEP and PEGA Committee mem­ber’s de­vice is a sig­nif­i­cant and trou­bling find­ing. It is made more trou­bling by the fact that we are un­sure of the sta­tus of the phones of many of the other Committee mem­bers dur­ing the time of its pro­ceed­ings. Short of a com­pre­hen­sive screen­ing, there is no way to know whether any other PEGA Committee mem­bers or their staff may have been sim­i­larly in­fected.

Whichever en­tity is re­spon­si­ble for the hack­ing, the in­fec­tion could have ex­posed strictly con­fi­den­tial ex­changes among PEGA Committee mem­bers and their staff, and other sen­si­tive and con­fi­den­tial par­lia­men­tary pro­ceed­ings, in­clud­ing to par­ties un­der in­ves­ti­ga­tion by the Committee it­self.

The find­ing that a PEGA Committee mem­ber was tar­geted with Pegasus spy­ware dur­ing the Committee’s work high­lights the se­ri­ous threat that mer­ce­nary spy­ware poses to the in­tegrity of de­mo­c­ra­tic processes.

As out­lined above, this case is not the first that a Member of the European Parliament has had their de­vices ei­ther tar­geted or hacked with mer­ce­nary spy­ware, which il­lus­trates the cor­ro­sive na­ture of un­reg­u­lated mer­ce­nary hack­ing.

While we are un­able to con­clu­sively at­tribute these in­fec­tions to a par­tic­u­lar gov­ern­ment agency at this time, and we have no ev­i­dence that the Greek gov­ern­ment was re­spon­si­ble for this case, over­lap with an op­er­a­tor re­spon­si­ble for hack­ing the de­vices of ex­iled Russian and Belarusian-speaking jour­nal­ists and ac­tivists based in Europe war­rants fur­ther in­ves­ti­ga­tion.

Recommendations

In light of our dis­cov­ery, we rec­om­mend that European Union in­sti­tu­tions open im­me­di­ate in­ves­ti­ga­tions to de­ter­mine the scope and scale of this breach of EU pri­vacy and process.

MEPS and Staff: Get Screened

We urge MEPs and their staff that par­tic­i­pated in the PEGA Committee to im­me­di­ately seek foren­sic screen­ing for signs of spy­ware in­fec­tion, and pre­serve work and per­sonal de­vices that may have been tar­geted.

The Directorate-General for Information Technologies and Cybersecurity (DG ITEC) of­fers this spy­ware screen­ing.

The Directorate-General for Information Technologies and Cybersecurity (DG ITEC) of­fers this spy­ware screen­ing.

Exercise vig­i­lance for state-spon­sored at­tack warn­ings, and seek prompt ex­pert as­sis­tance when such warn­ings are re­ceived.

The work of MEPs ex­poses them to more so­phis­ti­cated threats. EU MEPs should en­able Lockdown mode (iPhone) and Advanced Protect for Android. This mode strongly in­creases the pro­tec­tion of a de­vice against mer­ce­nary spy­ware. The DG ITEC may be able to pro­vide ad­di­tional cy­ber­se­cu­rity guid­ance.

European Parliament: Investigate, Increase Reporting & Screening

The European Parliament should con­duct an im­me­di­ate in­ves­ti­ga­tion into spy­ware at­tacks tar­get­ing MEPs and par­lia­men­tary processes, given the sur­veil­lance of the PEGA com­mit­tee de­tailed in this re­port.

Since some time has passed since this par­tic­u­lar at­tack, prompt in­ves­ti­ga­tion is a mat­ter of ur­gency to en­sure that foren­sic traces are not lost.

Since some time has passed since this par­tic­u­lar at­tack, prompt in­ves­ti­ga­tion is a mat­ter of ur­gency to en­sure that foren­sic traces are not lost.

We urge the Parliament to com­mis­sion an an­nual re­port on cy­ber and sur­veil­lance threats to the Parliament and mem­bers to iden­tify ar­eas of vul­ner­a­bil­i­ties, and make rec­om­men­da­tions on how to in­crease par­lia­men­tary se­cu­rity.

Such a re­port could be pro­duced by the European Parliamentary Research Service (EPRS) or other en­ti­ties.

Such a re­port could be pro­duced by the European Parliamentary Research Service (EPRS) or other en­ti­ties.

DG ITEC of­fers op­tional screen­ing for spy­ware for MEPs and staff. This case sug­gests that this ca­pa­bil­ity may be un­der­used. We urge DG ITEC to de­velop a plan to achieve sub­stan­tially higher screen­ing rates, and pub­lish yearly sta­tis­tics on the num­ber of de­vices screened and rates of dis­cov­ery.

We also urge DG ITEC to reg­u­larly cir­cu­late spe­cific guid­ance for MEPs and staff about vig­i­lance to state spon­sored at­tack warn­ings from com­pa­nies like Apple and Google.

We also urge DG ITEC to reg­u­larly cir­cu­late spe­cific guid­ance for MEPs and staff about vig­i­lance to state spon­sored at­tack warn­ings from com­pa­nies like Apple and Google.

We rec­om­mend that DG ITEC con­sider (optionally) col­lect­ing (optionally) and pro­vid­ing to large plat­forms ac­count in­for­ma­tion as­so­ci­ated with MEPs and their staff. DG ITEC could re­quest ad­di­tional scrutiny be de­voted to threats against these ac­counts, and that plat­forms in­form them when they send these ac­counts state-spon­sored threat warn­ings. This in­for­ma­tion could en­sure that state-spon­sored at­tack warn­ings are quickly acted on and that pat­terns are ob­served.

European Commission: Screen Commissioners & Staff for Spyware

We urge the European Commission to un­der­take their own in­ves­ti­ga­tion and screen­ing to de­ter­mine whether com­mis­sion­ers or com­mis­sion staff have been tar­geted with mer­ce­nary spy­ware.

We urge the Commission’s Directorate-General for Digital Services (DG DIGIT) to de­velop a com­pre­hen­sive spy­ware screen­ing and re­sponse ca­pa­bil­ity, along­side reg­u­lar screen­ings, and we note that DG ITEC may be a use­ful in­ter­locu­tor in this en­deavor.

Parliamentary Assembly of the Council of Europe (PACE): Screen Members & Staff for Spyware

Given past PACE com­mit­tee work on mer­ce­nary spy­ware abuses in Europe, we urge an in­ves­ti­ga­tion into whether Members or staff have been tar­geted with mer­ce­nary spy­ware.

We urge the Council’s Directorate of Information Technology (DIT) to con­sult with peer or­ga­ni­za­tions and con­duct reg­u­lar screen­ing of PACE Members and their staff for signs of mer­ce­nary spy­ware tar­get­ing.

National Parliaments: Screen and Secure Your Members

While this case con­cerns a MEP, there is a long his­tory of law­mak­ers tar­geted with Pegasus and sim­i­lar mer­ce­nary spy­ware. We com­mend the DG ITEC for hav­ing de­vel­oped tech­niques for screen­ing mem­bers and we en­cour­age the se­cu­rity ser­vices and over­sight bod­ies of na­tional par­lia­ments to copy this model.

Tech Companies: Make Your Threat Warnings Count

This case il­lus­trates that a re­cip­i­ent of mul­ti­ple threat warn­ings failed to no­tice them. Since the goal of these no­ti­fi­ca­tions should be for a tar­get to take ac­tions, en­sur­ing that the tar­get sees and un­der­stands them is crit­i­cal. UX re­search, in­clud­ing with past no­ti­fi­ca­tion re­cip­i­ents, will be key to cre­at­ing no­ti­fi­ca­tions that cap­ture a re­cip­i­en­t’s at­ten­tion, ex­plain the is­sue, and make the next steps easy.

Note on Research Ethics

All re­search in­volv­ing hu­man sub­jects con­ducted at the Citizen Lab is gov­erned un­der re­search ethics pro­to­cols re­viewed and ap­proved by the University of Toronto’s Research Ethics Board.

Acknowledgements

Thanks to Rebekah Brown and Adam Senft for care­ful re­view, and to Anna Mackay and Claire Posno for com­mu­ni­ca­tions and lay­out sup­port. We are very grate­ful to Stelios Kouloglou and Thanasis Koukakis for con­sent­ing to be named in and par­tic­i­pat­ing in this re­search. Special thanks to Zacharias Kesses and TNG.

Appendix: Timeline of PEGA Deliberations and Infection Dates

On March 20 – 23, PEGA does a re­search mis­sion to Spain. Kouloglou does not travel with this del­e­ga­tion.

GitHub - jamesob/local-llm: Everything I know about running LLMs locally

github.com

jame­sob’s guide to run­ning SOTA LLMs lo­cally

Note: noth­ing in this README aside from the ta­bles was writ­ten by AI.

Have $2k burn­ing a hole in your pocket and want some lo­cal, state-of-the-art ma­chine in­tel­li­gence?

How about $40k?

If Dario and Altman are giv­ing you heart­burn (they should be), read on to fig­ure out how to run this new kind of com­put­ing lo­cally.

In this repo you’ll find

the hard­ware I use to run SOTA lo­cally,

why I bought what and lit­tle-known se­crets for con­fig­ur­ing it,

why I bought what and lit­tle-known se­crets for con­fig­ur­ing it,

how I run speech-to-text (STT) lo­cally,

ready-to-run con­fig­u­ra­tion for run­ning mod­els I think are good within Docker con­tain­ers.

Contents

My setup

I was lucky/​dumb enough to buy 4x RTX Pro 6000s back when they were cheaper. Because RAM is now so ex­pen­sive, I opted to build a last-gen DDR4 sys­tem to host these cards, the parts for which I got off eBay. This al­lowed me to keep base sys­tem cost rea­son­able while still get­ting a lot of VRAM.

Another some­what un­usual thing I did was to use PCIe4 switches (from c-payne.com). This al­lows the GPUs to com­mu­ni­cate to one an­other directly” at wire speeds dur­ing the allre­duce step in ten­sor par­al­lelism, rather than hav­ing to send all data through the PCI root com­plex. The up­shot of this is re­duced la­tency be­tween the cards with less of a need for ex­pen­sive PCIe5 hard­ware.

Consequently, I’m spend­ing money on VRAM (where it counts) rather than on a PCIe5/DDR5 base sys­tem, which is ter­rif­i­cally ex­pen­sive as of July 2026.

My par­tic­u­lar BOM is de­tailed be­low.

How much are you will­ing to spend?

~$2k

A great way to go is 2x RTX 3090s for a to­tal of 48GB VRAM to­tal. You can then run Qwen3.6 – 27B, which is an awe­some model.

You can also run SOTA speech-to-text (STT) with whis­per-large-v3, which I find very use­ful. That’s the model - you’d then ac­cess it with my cross-plat­form stt har­ness.

I’ve found lo­cal STT sur­pris­ingly use­ful - and I feel com­fort­able us­ing it, un­like a hosted equiv­a­lent. You can find a ready-to-run con­fig in ./runners/stt that only as­sumes the pres­ence of ~11GB of VRAM on an Nvidia GPU.

~$40k

At this price level, you get the next step up in model in­tel­li­gence. Something pretty close to Claude Opus.

You’d buy 4x RTX 6000 Pros for a to­tal of 384GB of VRAM.

Current best mod­els for 4x RTX6kPRO

Other ap­proaches

Note: these are my rec­om­men­da­tions, but there are other com­pletely valid ways to spend your money. For ex­am­ple, there’s prob­a­bly also some regime where rather than get­ting 4 rtx6kpros, you al­lo­cate most of your money to build­ing out a linked 4x DGX Spark clus­ter for a to­tal of 512GB VRAM and use that as the slow, big brain to drive Qwen3.7 – 27b to do the rote tasks quickly.

Hardware

Here’s the hard­ware I wound up pur­chas­ing for the 4x RTX 6000 pro ma­chine.

Base sys­tem

A mod­est, last-gen EPYC sys­tem pur­chased in parts al­most en­tirely from eBay.

GPUs

c-payne PCIe Gen4 Switch Sub-BOM (c-payne.com)

GPU mount

I had to cus­tom fab­ri­cate a wood en­clo­sure for the PCI switch and GPUs, which took about a day.

I found the PCI switch’s builtin fan very loud and seem­ingly use­less, so I sim­ply un­plugged that from the board.

Hoarding model weights

I save all model weights lo­cally on a ZFS filesys­tem that’s repli­cated across the two 8TB dri­ves, which is mounted at ~/storage.

For any model I want to run, I first down­load the model us­ing

hf down­load <model-name> –local-dir ~/storage/<model-name>

Running mod­els

Once the model weights are cached lo­cally, I have a spe­cific di­rec­tory for each model that con­tains a docker-com­pose.yml file that cor­dones off the run­ning of each model to its own Docker con­tainer.

You can find these con­fig­u­ra­tions in ./runners/.

Each con­tainer mounts in ~/storage/models in read-only mode to ob­tain the weights that I’ve cached lo­cally.

I then use open­code hosted on a VM on an­other ma­chine to ac­cess the mod­els once they’re serv­ing on http://​clank.j.co:5000.

I use a net­work-in­ter­nal DNS server to point clank.j.co to the LLM ma­chine, but you could sim­ply do http://&​lt;llm-ma­chine-ip>:5000 too.

The har­ness it­self

I cre­ated a VM and clanked up an ap­pli­ca­tion that ba­si­cally just cre­ates a tmux ses­sion for each di­rec­tory within the VMs ~/src tree, which then runs an open­code in­stance that backs up to the in­fer­ence ma­chine’s HTTP API (http://​clank.j.co:5000).

One key to mak­ing the open­source mod­els good is tool­ing them prop­erly; a sum­mary of my skills/ is:

camo­fox, kagi.com API key, and searXNG for web brows­ing and search,

Telegram bot for com­mu­ni­ca­tion and alert­ing,

a lo­cal pri­vate Gitea in­stance for col­lab­o­rat­ing on source code.

The clanker will ei­ther work with me in­ter­ac­tively in a ses­sion, or can be farmed off to work on Gitea is­sues and file PRs there.

All this hap­pens in a sand­boxed VM where the only com­mu­ni­ca­tion back to the host sys­tem hap­pens via a shared filesys­tem mount, so the thing can go ham and in­stall what­ever it wants.

Getting the PCI switches to work prop­erly

There was a lot of fid­dling with the BIOS in or­der to make sure the moth­er­board was­n’t down­reg­u­lat­ing the PCI switch speeds.

BIOS Configuration (ROMED8 – 2T)

Reducing gain on the redriver

Per c-payne’s ad­vice, I did re­duce the gain to lvl 3” us­ing his tool, which was prob­a­bly the most finicky part of the process.

The gain level is go­ing to be a func­tion of how long your SAS con­nec­tor ca­bles are.

Picking the right SAS ca­bles

I screwed up and or­dered too few of the ca­bles from c-payne di­rectly, so I bought what I thought was the same SAS ca­ble off of Amazon. There was ac­tu­ally a slight dif­fer­ence that was caus­ing is­sues, and I had to re­order ca­bles - so dou­ble-check that you’re get­ting the right stuff!

Kernel / GRUB Parameters

# /etc/default/grub GRUB_CMDLINE_LINUX=“iommu=off amd_iommu=off nomod­e­set” sudo up­date-grub

# nvidi­a_uvm P2P fix echo options nvidi­a_uvm uvm_dis­able_hmm=1’ | sudo tee /etc/modprobe.d/uvm.conf sudo up­date-initramfs -u

Without iommu=off, NCCL hangs on multi-GPU P2P.

ACS Disable (critical for switch P2P)

With ACS en­abled (default), P2P traf­fic gets bounced through the CPU root port in­stead of stay­ing in­side the switch fab­ric, negat­ing the switch en­tirely. pcie_ac­s_over­ride re­quires a patched ker­nel, so we dis­able via set­pci at run­time.

# /usr/local/bin/disable-acs.sh #!/bin/bash if [ $EUID -ne 0 ]; then echo ERROR: must be run as root” exit 1 fi

for BDF in $(lspci -d *:*:*” | awk {print $1}’); do set­pci -v -s ${BDF} ECAP_ACS+0x6.w > /dev/null 2>&1 if [ $? -ne 0 ]; then con­tinue fi echo Disabling ACS on $(lspci -s ${BDF})” set­pci -v -s ${BDF} ECAP_ACS+0x6.w=0000 done

Run on every boot via sys­temd oneshot:

# /etc/systemd/system/disable-acs.service [Unit] Description=Disable PCIe ACS for GPU P2P After=multi-user.target

[Service] Type=oneshot ExecStart=/usr/local/bin/disable-acs.sh

[Install] WantedBy=multi-user.target

Verify: lspci -vvv | grep ACSCtl should show all mi­nus signs, and nvidia-smi topo -m should show PIX be­tween all four GPUs (not PHB/NODE).

Use ./tools/measure-gpu-speed.sh to mea­sure this eas­ily.

GPU Power Limiting

In or­der to avoid in­stalling a 220V cir­cuit, I (probably un­wisely) run this rig on a sin­gle 110V cir­cuit, but I power reg­u­late the cards.

Persistence mode + power cap ap­plied at boot via sys­temd (install-gpu-power-limit.sh):

sudo nvidia-smi -pm 1 sudo nvidia-smi -pl 350 # 350W per GPU (default 600W)

350W/GPU = 1,400W GPU load, sized for the PSU bud­get. During the in­terim sin­gle-1700W-PSU phase (before the 240V cir­cuit), cards ran at ~260W (4×260 = 1,040W GPUs + ~280W sys­tem ≈ 1,320W to­tal).

Verify: nvidia-smi –query-gpu=index,power.limit,power.draw –format=csv

Result

Upstream: Gen4 x16 (~30 GB/s to CPU). P2P through switch: 27.5 GB/s uni­di­rec­tional / 50.4 GB/s bidi­rec­tional, 0.37 – 0.45 µs la­tency, i.e. Gen4 line rate. Note: lspci may still show down­stream GPU links as 2.5GT/s (downgraded)” at idle if ASPM is ac­tive any­where; this is cos­metic. Links re­train to Gen4 un­der load.

Resources

A fre­quently up­dated repo on get­ting the most out of 4, 6, or 8 RTX 6000 Pro cards: https://​github.com/​lo­cal-in­fer­ence-lab/​rtx6kpro

Indie PCI switches that I use: https://​c-payne.com

RTX6kPRO dis­cord server; lotta guys bench­ing and test­ing new mod­els: https://​dis­cord.gg/​QM­N­vFkuDN

Wordgard

wordgard.net

Wordgard [wɜrd-gɑrd] noun A gar­den for cul­ti­vat­ing words. Open-source JavaScript li­brary im­ple­ment­ing an in-browser rich-text ed­i­tor.

Semantic Rich Text Editor System

Wordgard pro­vides a set of tools for build­ing con­tent ed­i­tors. It is not a free-form HTML ed­i­tor, but one where you con­trol pre­cisely what kind of con­tent you sup­port.

Its main dis­tin­guish­ing fea­ture is a pow­er­ful pro­gram­ming in­ter­face that makes the li­brary a good foun­da­tion for cus­tomized ed­i­tors—even com­plex, de­mand­ing ones.

Wordgard is open source un­der a per­mis­sive li­cense (MIT). It is be­ing de­vel­oped on code.haver­beke.berlin. Bug re­ports are very wel­come. Pull re­quests are not ac­cepted.

If you are us­ing Wordgard com­mer­cially, there is a so­cial (but no le­gal) ex­pec­ta­tion that you help fund its main­te­nance. Start here.

Discussing the pro­ject or ask­ing ques­tions is best done on the fo­rum. Bugs should be re­ported through the is­sue tracker.

GitHub - teamchong/pxpipe: cut Fable 5 token usage by rendering text context as images

github.com

Cut Claude Code’s in­put to­kens by ren­der­ing bulky con­text as im­ages — the same sys­tem prompt, tool docs, and his­tory, in a frac­tion of the to­kens.

An im­age’s to­ken cost is fixed by its pixel di­men­sions, not by how much text is in­side it. Dense con­tent (code, JSON, tool out­put) packs ~3.1 chars per im­age-to­ken vs ~1 char per text-to­ken on real Claude Code traf­fic. px­pipe is a lo­cal proxy that ex­ploits the gap: it rewrites the bulky parts of each re­quest into com­pact PNGs be­fore it leaves your ma­chine. At cur­rent Fable list prices that lands as a ~59 – 70% lower end-to-end bill — but prices move and work­loads dif­fer, so the durable num­ber is the to­ken cut it­self, mea­sured per-re­quest against a free coun­t_­to­kens coun­ter­fac­tual in ~/.pxpipe/events.jsonl.

This is what the model sees in­stead of text:

~48k chars of sys­tem prompt + tool docs: ≈25k to­kens as text, ≈2.7k im­age to­kens as this page. Real pipeline out­put; the model reads ren­ders like this at 100/100 (see bench­marks).

Demo

Fable 5 (the de­fault, 100/100 reader) — plain left, px­pipe right:

px­pipe counts an ex­act to­ken 10/10 across 39 im­aged filler files (matches grep line-for-line), gets the multi-step ledger arith­metic right, and ends the ses­sion at $6.06 with con­text to spare (73.5k/1M) vs $42.21 at 96% full. One caveat vis­i­ble in the clip: the px­pipe arm needed a nudge to match the re­quested one-line out­put for­mat.

Opus 4.8 (disabled by de­fault) — same lay­out:

Text nee­dles read fine on both arms; the im­aged phrase-count does­n’t read on Opus — and px­pipe says so in­stead of fab­ri­cat­ing a num­ber. That mis­read rate is why Opus is opt-in.

Try it (30 sec­onds)

npx px­pipe-proxy # proxy on 127.0.0.1:47821 ANTHROPIC_BASE_URL=http://​127.0.0.1:47821 claude # point Claude Code at it

Dashboard at http://​127.0.0.1:47821/: to­kens saved, every text→im­age con­ver­sion side by side, kill switch, live model chips. Responses stream nor­mally — px­pipe com­presses the re­quest only, never the mod­el’s out­put. Recent turns stay text; the sys­tem prompt, tool docs, and older bulk his­tory are im­aged.

The hon­est part

It is lossy. Exact 12-char hex strings in dense im­aged con­tent: 13/15 on Fable 5, 0/15 on Opus — and misses are silent con­fab­u­la­tions, not er­rors. Byte-exact val­ues (IDs, hashes, se­crets) must stay text; re­cent turns do. A ded­i­cated ver­ba­tim-risk guard is not built yet.

Escape hatch: sub­agents on non-al­lowlisted mod­els pass through as text — route byte-ex­act work there (CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4 – 6, or model: son­net in agent front­mat­ter).

Real work: SWE-bench Lite pi­lot 10/10 both arms at −65% re­quest size; SWE-bench Pro 14/19 ON vs 15/19 OFF at −60%, ver­dicts agree 18/19, and the sin­gle split re-re­solved 3/3 on repli­ca­tion — run-to-run vari­ance, not com­pres­sion. Small n; re­ceipts in eval/.

Workload-dependent. Wins on to­ken-dense con­tent (~1 char/​to­ken), loses money on sparse prose (~3.5 chars/​to­ken); a prof­itabil­ity gate (calibrated on N=391 pro­duc­tion rows) im­ages only where the math wins.

Model scope: de­fault PXPIPE_MODELS=claude-fable-5,gpt-5.6. Opus 4.7/4.8 mis­read ~7% of ren­ders and GPT 5.5 de­grades on im­aged con­text, so both are opt-in via PXPIPE_MODELS or the dash­board chips. PXPIPE_MODELS=off dis­ables imag­ing. Everything else passes through byte-iden­ti­cal. On the GPT path, tool de­f­i­n­i­tions stay na­tive JSON and no Anthropic cache_­con­trol mark­ers are used.

Benchmarks (reproducible)

Measured with novel ran­dom-num­ber prob­lems the model can­not have mem­o­rized:

SWE-bench run to­tals, re­ceipts, and caveats: eval/​swe-bench/ · eval/​swe-bench-pro/ · eval/​nee­dle-haystack/ · eval/​gist-re­call/ · analy­sis in FINDINGS.md. (GSM8K scored 96% im­aged, but it’s in train­ing data — mem­o­rized an­swers sur­vive mis­reads — so we lead with the novel-num­ber evals.)

How it works

tool_re­sult string ──► wrap at 1928px-wide columns ──► pack ~92,000 chars/​page ──► PNG[]

The proxy in­ter­cepts /v1/messages, rewrites el­i­gi­ble bulk into im­age blocks, splices them back cache-friendly (static pre­fix pre­served, prompt caching keeps work­ing), and for­wards. A 1928×1928 im­age costs ≈4,761 vi­sion to­kens and holds ≈92,000 chars, so text wins only above ~19 chars/​to­ken — Claude Code traf­fic runs ~1.91 (N=391). A per-re­quest es­ti­ma­tor de­cides; sparse prose stays text. Events log to ~/.pxpipe/events.jsonl.

Library use (no proxy)

im­port { ren­der­Text­ToP­ngs, trans­for­mAn­throp­icMes­sages } from pxpipe”;

const imgs = await ren­der­Text­ToP­ngs(tool­Re­sult­Text); // RenderedImage[] const { body, ap­plied, info } = await trans­for­mAn­throp­icMes­sages({ body: re­quest­Bytes, model: claude-fable-5”, });

op­tions.keepSharp(block) pins blocks as text; op­tions.emitRe­cov­er­able re­turns the orig­i­nals of im­aged blocks. Pure-JS run­time (Node and edge/​Work­ers); @napi-rs/canvas is build-time only. Full API: src/​core/​in­dex.ts.

Development

pnpm in­stall && pnpm test pnpm run build # re­gen­er­ates dist/

FAQ

Is the head­line end-to-end, or only on the re­quests you touched? End-to-end, the whole bill. Most com­pres­sion tools re­port sav­ings only on the in­put slice they touched, which flat­ters the num­ber. The end-to-end de­nom­i­na­tor is every pro­duc­tion re­quest: the small ones px­pipe cor­rectly left un­touched, all cache writes and reads, and all out­put to­kens (which the proxy never com­presses). On a 13,709-request snap­shot that was 59% ($100 → ~$41); a later 8,904-compressed-request trace mea­sured ~70%. Compressed-only runs higher (~72 – 74%) and is quoted sep­a­rately, never as the head­line. The ex­act fig­ure is work­load-de­pen­dent — re­pro­duce it on your own log.

How is the math mea­sured? Both sides of the same re­quest, at the same mo­ment. For every /v1/messages POST the proxy fires a free coun­t_­to­kens probe on the orig­i­nal un­com­pressed body (the coun­ter­fac­tual) in par­al­lel with the real for­ward, and reads Anthropic’s ac­tu­ally-billed us­age block off the re­sponse. Both land in the same row of ~/.pxpipe/events.jsonl, so there is no turn-count or run-to-run con­found. Dollar con­ver­sion uses Fable 5 list ra­tios: in­put ×1.0, cache write ×1.25, cache read ×0.1, out­put ×5. Cache pric­ing is ap­plied iden­ti­cally to both sides, so the caching dis­count can­cels and can­not be dou­ble-counted as savings”. Re-derive it your­self from the events log: the for­mula and field names are doc­u­mented in src/​core/​base­line.ts.

What does it ac­tu­ally com­press? Three kinds of in­put blocks, each be­hind a prof­itabil­ity gate:

large tool_re­sult bod­ies (file reads, com­mand out­put, logs) above ~6k chars of to­ken-dense con­tent

older col­lapsed his­tory: turns be­hind the live tail get re-ren­dered as im­age pages, re­cent turns al­ways stay text

the sta­tic sys­tem prompt + tool docs slab

Everything else passes through byte-iden­ti­cal: your mes­sages, re­cent turns, the mod­el’s out­put (it is the re­sponse, the proxy never touches it), sparse prose, and any­thing too small to win. Models out­side the al­lowlist pass through en­tirely — the de­fault scope is Fable 5 and GPT 5.6 only. Opus 4.8 and GPT 5.5 read im­aged con­tent mea­sur­ably worse (FINDINGS.md 2026 – 06-16), so they are de­lib­er­ately opt-in via the dash­board or PXPIPE_MODELS, never silently im­aged.

Has it ever failed for real, out­side the bench­marks? Yes, once in weeks of daily use: the model re­called a per­son’s name from im­aged chat his­tory and got it con­fi­dently wrong. No er­ror, just a plau­si­ble wrong name. That is the doc­u­mented fail­ure mode: ex­act strings in im­aged con­tent are not byte-safe. Coding ses­sions tol­er­ate this be­cause the agent re-reads files be­fore edit­ing; pure chat re­call has no such check. This fail­ure mode is mea­sured, not anec­do­tal: the leg­i­bil­ity au­dit quan­ti­fies ex­act-string re­call off ren­dered pages (blind reads top out at 63% on dense iden­ti­fiers, with every miss pre­dicted by a glyph-con­fus­abil­ity ma­trix) and doc­u­ments the shipped mit­i­ga­tions — page geom­e­try clamped to the APIs re­sam­ple cap so billed pix­els ac­tu­ally reach the vi­sion en­coder, and ex­act iden­ti­fiers (SHAs, num­bers) rid­ing along­side as text.

Why does the README read like an AI wrote it? Because one did. Most of this re­po’s com­mits — the code and the docs — were au­thored by Opus/Fable agent ses­sions run­ning be­hind px­pipe it­self, read­ing their own col­lapsed his­tory as im­age pages while they worked.

Limitations

Lossy (above); ver­ba­tim re­call from im­ages is un­re­li­able.

PNG en­cod­ing adds la­tency to large re­quests be­fore they leave.

ASCII/Latin-1 well tested; CJK works but con­ser­v­a­tively.

Roadmap

Hypotheses, not claims — they ship as num­bers with an n or they get cut: sharper glyph ren­der­ing (eval/glyph-matrix/, paused mid-run), whether im­aged bulk stretches ef­fec­tive con­text (~2x the real con­tent in the same 1M win­dow), and whether a smaller ac­tive con­text im­proves long-task ac­cu­racy.

License

MIT.

Factories are just rooms

interconnected.org

I went into my kid’s school a cou­ple months back and spoke to the year group about man­u­fac­tur­ing.

Honestly it was the most re­ward­ing speak­ing gig I’ve done all year.

It was about the process of mak­ing my AI clock and I have a ton of pics from my fac­tory visit to Shenzhen (mostly pics that I have only shared with Kickstarter back­ers).

I talked about where ideas come from and the value of play­ing around, and how it’s neat to learn new tech­niques that you can com­bine to­gether.

I talked about pro­to­typ­ing and de­sign — and was sure to use the words prototyping” and design”. I showed ex­ploratory sketches and what CAD looks like.

I handed round var­i­ous it­er­a­tions of e-pa­per screens, and elec­tron­ics from bread­board to PCB, and var­i­ous it­er­a­tions of plas­tic parts.

It’s in­ter­est­ing to see how a plas­tic en­clo­sure comes apart, and to con­nect that to what an in­jec­tion mould­ing ma­chine is do­ing.

(A lot of the kids are fa­mil­iar with 3D print­ers, so I showed a time­lapse of a 3D print — it would take a year to print all my clocks! And then a real-time video of in­jec­tion mould­ing, and how that would only take a day.)

And then pho­tos of fac­tory floors, and here’s the team, and as­sem­bly lines and what a page from an as­sem­bly pro­ce­dure looks like, and pack­ag­ing too.

7 year olds have great ques­tions.

Like: how does it not break in the post?

Well here’s a vi­bra­tion ma­chine in ac­tion and that’s how we test it.

And, look, in this card­board pack­ag­ing, here’s a cra­dle, and this was made by a pack­ag­ing de­signer — you could be a pack­ag­ing de­signer too if you want.

Like: how does the but­ton work?

Well you’re right I did­n’t pass round the sep­a­rate but­ton piece, good spot, it’s small and I did­n’t want to lose it. So now let’s talk about as­sem­bly and about in­dus­trial de­sign­ers…

I don’t like those videos of fac­to­ries that are sup­posed to in­spire awe.

You know the ones I mean: you see a thou­sand prod­ucts a sec­ond whizzing by on 20 par­al­lel belts. You come away say­ing wow. When they showed man­u­fac­tur­ing on kid’s TV when I was grow­ing up, that was what they showed.

Awe” is the op­po­site of what I want to con­vey.

Except for a very spe­cific types of per­son, when you show some­thing with the ex­pec­ta­tion that awe” is the ap­pro­pri­ate re­sponse, you are im­plic­itly say­ing to your au­di­ence: you should step back here and ap­pre­ci­ate this from a dis­tance. Like look­ing at a great work of art. Gasp but do not place your­self in the pic­ture.

Whereas!

I want to re-home man­u­fac­tur­ing. I want these kids to be­come de­sign­ers, en­gi­neers, in­ven­tors, fac­tory own­ers, and all the rest. Makers of any kind; par­tic­i­pants in the on­go­ing mak­ing of our world.

So my mes­sage is: sure this is com­pli­cated but it’s fine, we can do com­pli­cated.

Factories are just rooms.

The stuff around us is­n’t di­vine - these chairs we’re sit­ting on, the TV at the front of the class­room, the pots for the plants - all this stuff was in­vented and fig­ured out and made by peo­ple.

p.s. you can be one of those peo­ple.

So when I heard the class was learn­ing about in­vent­ing, I of­fered to go in and show that scrappy dead ends are cool ac­tu­ally (it was amaz­ing to speak with a class that al­ready knows the word prototyping”) and this is elec­tron­ics and this is go­ing from sketch­ing to plas­tic and this is what it means to make a prod­uct and to sell it.

I deeply feel this mis­sion to nor­malise get­ting our hands dirty with the world — when they’re 7 years old, while their brains are still es­tab­lish­ing what’s nor­mal.

(This is con­nected with what I was say­ing about train­ing for col­lec­tive ef­fi­cacy.)

And I’m just some­one’s dad, you know? So if this guy can do it…

If you have the op­por­tu­nity to go into your lo­cal school and talk about mak­ing things too, please do. You will be re­warded with won­der­ful cu­rios­ity, en­gage­ment and ques­tions from the kids.

Hopefully one of them one day will look around them, think someone should do some­thing about that”, re­mem­ber back, and say - oh that some­one can be ME.

Performance per dollar is getting faster and cheaper | Wafer

www.wafer.ai

Have you no­ticed we like AMD?

The de­mand for in­fer­ence is sky­rock­et­ing and out­pac­ing sup­ply. With fron­tier mod­els be­ing re­leased al­most every other week — Claude Fable, GLM5.2, and Minimax M3, to name a few — the to­ken craze is only get­ting cra­zier, and there aren’t enough Blackwells go­ing around to sup­port it. Thus, NVIDIA GPU prices are climb­ing fast, and to­kens are get­ting re­ally ex­pen­sive.

In comes AMD. At around 2.75x cheaper per GPU on av­er­age (MI355X vs B300) with com­pa­ra­ble hard­ware specs, the so­lu­tion to cheap in­fer­ence is hid­ing in plain sight — a mes­sage we at Wafer have been preach­ing for months. But al­though AMDs Instinct MI350 se­ries com­petes with Blackwells at the sil­i­con level, NVIDIAs soft­ware ad­van­tage and day-0 sup­port typ­i­cally al­lows providers to serve in­fer­ence much faster on their hard­ware with much less fric­tion.

Conversely, on the MI355X / ROCm stack SOTA per­for­mance rarely comes out of the box for these fron­tier mod­els (sometimes it does!). In fact, you’re lucky if you can find an im­age that runs them at all. Without this day-0 sup­port, build­ing and op­ti­miz­ing for the newest mod­els can re­quire weeks of en­gi­neer­ing and com­pute. By then, the newest model has al­ready been re­leased, mak­ing it so AMD is al­ways play­ing catch-up.

But as agents im­prove at ker­nel and model op­ti­miza­tion, this gap is clos­ing in real time. At Wafer, we’ve proven this time and time again.

And again — on a 20k in / 1k out, 60% cache hit rate work­load, we hit an ag­gre­gate through­put of 2626 tok/​s/​node @ 2.4 rps with a de­fined knee of ≤5s TTFT — only 80% of the per­for­mance mea­sured on a B200, de­spite be­ing over 2x cheaper.

We also hit 213 tok/​s on GLM5.2 on 10k in­put to­kens / 1.5k out­put to­kens sin­gle stream, fol­low­ing Artificial Analysis stan­dards, served on AMD MI355X ca­pac­ity from TensorWave. Though this num­ber does­n’t top the AA leader­board, it still wins on per­for­mance per dol­lar.

How we did it

The first step with any model work is to choose a quan­ti­za­tion and frame­work. We quan­tized the base bf16 GLM-5.2 to MXFP4 with AMD Quark. In com­par­i­son to z-ai’s of­fi­cial FP8 quan­ti­za­tion, our MXFP4 was loss­less (GPQA-Diamond, tau2, GSM8K).

As for the in­fer­ence frame­work, we had three op­tions — vLLM, ATOM, and sglang. Among the three, we chose sglang — vLLM had no work­ing MXFP4 + GlmMoeDsa path so the MXFP4 weights pro­vided no ben­e­fit, and ATOMs out­put de­graded at long con­text. Sglang was the in­fer­ence en­gine with the least fric­tion to na­tive sup­port, able to take ad­van­tage of the quan­ti­za­tion while re­main­ing co­her­ent.

The next nat­ural step to im­prov­ing through­put was en­abling spec­u­la­tive de­code on sglang. However, the sglang ROCm im­age does not sup­port this out of the box. There were two fixes needed be­fore MTP worked prop­erly.

First, the MTP head, like every other layer, keeps its sin­gle shared ex­pert stored in bf16, not MXFP4. However, the MTP head is reg­is­tered un­der a dif­fer­ent mod­ule pre­fix than the main de­coder stack (Quark names its bf16 shared ex­pert model.lay­ers.78.mlp.shared_­ex­perts.*, while the MTP lay­er’s real pre­fix is model.de­coder.*). Because of the mis­match, sglang’s quan­ti­za­tion lookup fails and de­faults to build­ing that shared ex­pert as MXFP4. At load it then tries to read a full-width bf16 weight into a half-width 4-bit slot and the init crashes on a shape mis­match. Quark records which weights to leave un-quan­tized as a list of layer names, so we copied over the layer 78 en­tries to that list a sec­ond time un­der the de­coder name sglang ac­tu­ally uses. This fix un­blocked spec­u­la­tive de­code, net­ting us close to a 3x gain in sin­gle stream through­put.

Second, deep spec­u­la­tive de­code (such as the 5/1/6 con­fig z-ai sug­gests) was still blocked. The fused multi-step meta­data ker­nel needed for draft depth ≥4 writes #include <cuda_runtime.h> with no ROCm guard. Fix: one #ifdef USE_ROCM guard.

Two triv­ial, but nec­es­sary changes to take full ad­van­tage of spec­u­la­tive de­code. With spec dec work­ing prop­erly, along­side a few con­fig op­ti­miza­tions (such as –kv-cache-dtype fp8_e4m3 and –enable-aiter-allreduce-fusion), we reached our head­line sin­gle stream de­code num­ber at 213 tok/​s.

But for ag­gre­gate through­put, es­pe­cially with our de­fined work­load, de­code op­ti­miza­tions are nec­es­sary but in­suf­fi­cient. At 20k in @ 60% cache, the work­load is pri­mar­ily pre­fill bound.

At TP8, which was the con­fig­u­ra­tion op­ti­mized for sin­gle stream de­code, the MI355X can run GLM5.2-MXFP4 at 1461 tok/​s/​node. Switching to TP4×DP2 net­ted a mas­sive im­prove­ment on this work­load, get­ting us to 1944 tok/​s/​node at 2.0 RPS — still rel­a­tively slow com­pared to our mea­sured Blackwell per­for­mance, which hit 3192 tok/​s/​node at 3.0 RPS. A big rea­son for the poor pre­fill per­for­mance on the MI355X is that on the sglang im­age, GLM-5.2’s fp4 MoE was silently on a slow FlyDSL heuris­tic fall­back (aiter only shipped tuned con­figs for the a8w8/​fp8 path). We tuned the MoE ker­nel se­lec­tion our­selves on GLMs fp4 shapes (model_dim 6144, moe_in­ter 2048, E=256, topk=8), which al­lowed us to reach 2626 tok/​s/​node at 2.4 RPS. Much bet­ter.

Why this mat­ters

Although there was some de­gree of fric­tion, achiev­ing the best per­for­mance per dol­lar ra­tio on the MI355X was­n’t par­tic­u­larly hard — though there were some frame­work re­lated bugs, un­like our work with Qwen3.5 397B, you’ll no­tice that we did­n’t ac­tu­ally write any cus­tom ker­nels this time. Though this study does­n’t take multi-node per­for­mance into con­sid­er­a­tion, sin­gle-node de­ploy­ments still re­main highly preva­lent in prac­tice.

SOTA on AMD is be­com­ing more a mat­ter of sup­port, not soft­ware. The CUDA moat is erod­ing in real time.

Zuckerberg ‘Admits’ Meta’s Layoffs Were Ineffective

eshumarneedi.com

Mark Zuckerberg, Meta’s chief ex­ec­u­tive, at a Meta town hall re­ported by Katie Paul and Courtney Rozen at Reuters:

In ret­ro­spect, he said, the trajectory of the agen­tic de­vel­op­ment over at least the last four months has­n’t re­ally ac­cel­er­ated in the way that we ex­pected,” and ​that the com­pa­ny’s bets on the new struc­ture haven’t come to fruition yet.” Zuckerberg was re­fer­ring to AI agents, au­to­mated sys­tems that can ​execute tasks on be­half of a user. Conversations he was hav­ing with our top peo­ple” when they started plan­ning the re­struc­tur­ing in January and February were that they were wor­ried that we weren’t go­ing to move fast enough to adapt,” Zuckerberg said. At the time, he said, ex­ec­u­tives were super op­ti­mistic” about tools like Claude Code from AI startup Anthropic.

In ret­ro­spect, he said, the trajectory of the agen­tic de­vel­op­ment over at least the last four months has­n’t re­ally ac­cel­er­ated in the way that we ex­pected,” and ​that the com­pa­ny’s bets on the new struc­ture haven’t come to fruition yet.” Zuckerberg was re­fer­ring to AI agents, au­to­mated sys­tems that can ​execute tasks on be­half of a user.

Conversations he was hav­ing with our top peo­ple” when they started plan­ning the re­struc­tur­ing in January and February were that they were wor­ried that we weren’t go­ing to move fast enough to adapt,” Zuckerberg said.

At the time, he said, ex­ec­u­tives were super op­ti­mistic” about tools like Claude Code from AI startup Anthropic.

The self-cre­ated tragedy of Meta is that the com­pany is loath to in­vent new prod­ucts. Instead, Meta’s man­age­ment more or less re­lies on vibes” to gov­ern its de­ci­sions, and those vibes are of­ten ei­ther wrong or far too late. The most per­ti­nent ex­am­ple of the for­mer is the ill-fated meta­verse, which was de­vel­oped solely on the (unbelievable) whim that the pan­demic would last for­ever — or at least far longer than it ac­tu­ally did — and peo­ple would be­come ac­cus­tomed to re­plac­ing in-per­son in­ter­ac­tion with vir­tual re­al­ity. It was pre­cisely at this mo­ment, roughly around mid-2020, that Meta (then Facebook) dis­in­te­grated from a so­cial me­dia com­pany into a Ship of Theseus that still tech­ni­cally op­er­ated its core so­cial plat­forms but fun­da­men­tally was dis­tracted by a red her­ring. Vibes-based man­age­ment.

As I wrote in my now in­fa­mous Meta-stasizing Cancer of Indirection” piece, Zuckerberg did not learn from this dis­as­trous fail­ure as the ar­ti­fi­cial in­tel­li­gence boom kicked off in 2023. Long story short: Zuckerberg threw his com­pany into tur­moil be­cause he was too late to iden­tify that the meta­verse was an abysmal fail­ure. By the time he did, the AI boom was al­ready in full swing, and Meta was thor­oughly left out. This strate­gic fail­ure, cou­pled with Zuckerberg’s ar­guably in­com­pe­tent man­age­ment style, left em­ploy­ees ei­ther out of em­ploy­ment, di­rec­tion­less, or both. It is just im­pos­si­ble to run a com­pany on a whim — the meta­verse was a dis­trac­tion, and so was AI be­cause Meta was far too late and im­prop­erly or­ga­nized. Vibes-based man­age­ment.

Zuckerberg yet again plunged his com­pany into chaos af­ter the suc­cess of Claude Code in December 2025. Knowing the com­pany was be­hind in de­vel­op­ing AI prod­ucts af­ter ob­serv­ing the rise of agen­tic cod­ing, Zuckerberg ef­fec­tively put Alexandr Wang, the chief of Meta’s AI di­vi­sion, in charge of the en­tire com­pany. The only thing Wang did was wrongly de­ter­mine that all hu­man pro­gram­mers were a waste of time and money and that it would be bet­ter to fire them and spend the freed-up cash on tal­ented AI en­gi­neers who would un­wit­tingly de­velop their own re­place­ments. So that’s ex­actly what Zuckerberg did, per Wang’s hunch: he fired thou­sands of em­ploy­ees, put AI in charge of con­tent mod­er­a­tion, and man­dated that the re­main­ing Meta work­ers in­stall spy­ware that would track their com­puter use to train an agent that could take their job. Vibes-based man­age­ment.

I can’t tell if Zuckerberg is dimwit­ted or just evil. The prob­lem dur­ing the first era of the AI boom (circa 2023) was in­deed that Meta was too slow to iden­tify the meta­verse flub. But that was no longer Meta’s prob­lem en­ter­ing the agen­tic cod­ing era: The prob­lem, rather, was that Meta had no co­her­ent strat­egy. The last thing it should’ve done was move fast enough to adopt” be­cause adopting” was not the an­swer to Meta’s prob­lems. AI-assisted pro­gram­ming has de­vel­oped in the last six months — contrary to Zuckerberg’s claim that it hasn’t re­ally ac­cel­er­ated” — but in­deed not in the way Meta ex­pected, be­cause Meta’s vibes-based man­age­ment this time was just plain wrong. AI never had the po­ten­tial to re­place so many work­ers at an in­stant. The vibe was — unlike in 2023 — not late, but wrong en­tirely. And I’m con­fi­dent in say­ing only a fool could have lent cre­dence to that laugh­ably in­cor­rect the­ory.

I’d say get­ting fired by Meta is like catch­ing the last plane out of Vietnam, even in this ruth­less job mar­ket.

Leanstral 1.5: Proof Abundance for All

mistral.ai

Thinking

Summary

Leanstral 1.5, a free Apache-2.0 li­censed model with 6B ac­tive pa­ra­me­ters, de­liv­ers a ma­jor per­for­mance up­grade in for­mal ver­i­fi­ca­tion, sat­u­rat­ing miniF2F, solv­ing 587/672 PutnamBench prob­lems, and achiev­ing state-of-the-art re­sults on FATE-H (87%) and FATE-X (34%). Trained through mid-train­ing, su­per­vised fine-tun­ing, and re­in­force­ment learn­ing with CISPO, it ex­cels in agen­tic proof en­gi­neer­ing and real-world code ver­i­fi­ca­tion, un­cov­er­ing 5 pre­vi­ously un­known bugs across 57 repos­i­to­ries tested. Fully open-sourced and avail­able via Hugging Face and a free API, Leanstral 1.5 is now ac­ces­si­ble for prac­ti­cal proof en­gi­neer­ing in Lean 4.

Since its launch, Leanstral has of­fered an open, prac­ti­cal ap­proach to proof en­gi­neer­ing in Lean 4. Today, we are re­leas­ing Leanstral 1.5, a free Apache-2.0 li­censed model with 119B to­tal and only 6B ac­tive pa­ra­me­ters, de­liv­er­ing a per­for­mance up­grade that makes for­mal ver­i­fi­ca­tion more pow­er­ful and ac­ces­si­ble than ever.

Leanstral 1.5 sat­u­rates miniF2F, solves 587/672 PutnamBench prob­lems, and achieves a new state-of-the-art of %87 on FATE-H and 34% on FATE-X. Beyond bench­marks, it ver­i­fies com­plex code prop­er­ties and un­cov­ers pre­vi­ously un­known bugs in open-source repos­i­to­ries—prov­ing that rig­or­ous for­mal meth­ods can be both ef­fec­tive and prac­ti­cal for real-world use.

Training Leanstral

Leanstral 1.5 goes through a three-stage process: mid-train­ing, su­per­vised fine-tun­ing, and re­in­force­ment learn­ing with CISPO. Leanstral 1.5 lever­ages ex­ten­sive train­ing on two RL en­vi­ron­ments:

In the mul­ti­turn en­vi­ron­ment, the model is given a the­o­rem state­ment and must ei­ther prove or dis­prove it. The model sub­mits a proof, re­ceives Lean com­piler feed­back, and re­fines its ap­proach with each at­tempt. If the proof com­piles it suc­ceeds; oth­er­wise the loop con­tin­ues un­til the model ei­ther solves the prob­lem or ex­hausts its bud­get.

In the code agent en­vi­ron­ment, Leanstral op­er­ates like a de­vel­oper in a raw filesys­tem: it ed­its files, runs bash com­mands, and uses the Lean lan­guage server to in­spect goals, er­rors, and type in­for­ma­tion in real time. This al­lows it to tackle long-hori­zon tasks like com­plet­ing par­tial proofs in a repos­i­tory, build­ing aux­il­iary lem­mas, and per­sist­ing through mul­ti­ple rounds of con­text com­paction. The model learns to nav­i­gate the full proof-en­gi­neer­ing work­flow and is fi­nally ver­i­fied by our fork of SafeVerify for cor­rect­ness given a list of tar­get the­o­rems.

Evaluation

We eval­u­ate Leanstral on the fol­low­ing bench­marks:

miniF2F is a cross-sys­tem bench­mark for for­mal math­e­mat­ics, rang­ing from el­e­men­tary prob­lems to IMO-level chal­lenges, test­ing di­verse proof abil­i­ties across al­ge­bra, com­bi­na­torics, and num­ber the­ory.

miniF2F is a cross-sys­tem bench­mark for for­mal math­e­mat­ics, rang­ing from el­e­men­tary prob­lems to IMO-level chal­lenges, test­ing di­verse proof abil­i­ties across al­ge­bra, com­bi­na­torics, and num­ber the­ory.

PutnamBench con­sists of 672 prob­lems from the Putnam Mathematical Competition, re­quir­ing deep rea­son­ing and long proof chains to solve chal­leng­ing math­e­mat­i­cal prob­lems.

PutnamBench con­sists of 672 prob­lems from the Putnam Mathematical Competition, re­quir­ing deep rea­son­ing and long proof chains to solve chal­leng­ing math­e­mat­i­cal prob­lems.

FATE-H and FATE-X are ab­stract al­ge­bra bench­marks for grad­u­ate and PhD-level prob­lems, re­spec­tively, test­ing ad­vanced rea­son­ing in ar­eas like group the­ory, ring the­ory, and mod­ule the­ory.

FATE-H and FATE-X are ab­stract al­ge­bra bench­marks for grad­u­ate and PhD-level prob­lems, re­spec­tively, test­ing ad­vanced rea­son­ing in ar­eas like group the­ory, ring the­ory, and mod­ule the­ory.

FLTEval is based on real pull re­quests from the Fermat’s Last Theorem repos­i­tory, test­ing prac­ti­cal proof en­gi­neer­ing with real-world com­plex­ity.

FLTEval is based on real pull re­quests from the Fermat’s Last Theorem repos­i­tory, test­ing prac­ti­cal proof en­gi­neer­ing with real-world com­plex­ity.

We sat­u­rate miniF2F com­pletely, reach­ing 100% on both the val­i­da­tion and test sets. On PutnamBench and FATE-H/X, we com­pare Leanstral 1.5 against Goedel-Architect with­out nat­ural-lan­guage guid­ance, Seed-Prover 1.5 at its high set­ting, and AxProverBase. Leanstral reaches a new state-of-the-art on FATE-H/X, solv­ing 87 and 34 prob­lems re­spec­tively. On PutnamBench, it edges out Seed-Prover 1.5 high by 7 prob­lems at far lower cost: about $4 per prob­lem, against an es­ti­mated $300 or more for Seed-Prover, whose high set­ting runs with a bud­get of 10 H20-days per prob­lem. The only provers ranked higher op­er­ate un­der dif­fer­ent con­di­tions—some re­ceive nat­ural-lan­guage proof guid­ance, oth­ers cost far more to run, like Aleph Prover at $54 – 68 per prob­lem.

Leanstral 1.5 shows the strongest test-time scal­ing we have seen from a for­mal-rea­son­ing model. The fig­ure be­low tracks Pass@8 on PutnamBench as we raise the to­ken bud­get per at­tempt from 25k to 4M: per­for­mance climbs smoothly and mo­not­o­n­i­cally the whole way, from 44 prob­lems solved at 50k to 244 at 200k, 493 at 1M, and 587 at 4M. Rather than giv­ing up when a proof runs long, Leanstral keeps rea­son­ing, edit­ing files, and re­vis­ing across mil­lions of to­kens, turn­ing that bud­get di­rectly into solved prob­lems—the same be­hav­ior be­hind the AVL-tree proof be­low, which ran for over 2.7 mil­lion to­kens across 22 com­pactions.

With this re­lease, we also fully open source FLTEval. Leanstral 1.5 lifts pass@1 on the bench­mark from 21.9 to 28.9 and pass@8 from 31.9 to 43.2, sur­pass­ing Opus 4.6′s 39.6 at one-sev­enth the cost. It also widens its lead over open-source mod­els 3 – 10× larger, as shown in the fig­ure be­low.

Code Verification Case Studies

While be­ing pri­mar­ily trained for math­e­mat­ics, Leanstral 1.5 ex­hibits strong abil­i­ties in code ver­i­fi­ca­tion. We pre­sent 2 crit­i­cal case stud­ies to demon­strate its im­pact.

AVL Trees: Proving Time Complexity

AVL trees are self-bal­anc­ing bi­nary search trees that main­tain O(log n) height through re­bal­anc­ing dur­ing in­ser­tions and dele­tions. Leanstral 1.5 proved these time com­plex­ity guar­an­tees for a real im­ple­men­ta­tion—a task that re­quired struc­tural in­duc­tion to mir­ror the tree’s re­cur­sive struc­ture, care­ful han­dling of monadic time track­ing, and ex­haus­tive case analy­sis for re­bal­anc­ing paths. Over 2.7 mil­lion to­kens and 22 com­pactions, Leanstral sys­tem­at­i­cally un­folded each layer of the TimeM monad, ex­pos­ing the un­der­ly­ing com­pu­ta­tions de­spite their in­ter­leav­ing with con­trol flow. It es­tab­lished an al­most tight bound of 48 steps per height unit plus a con­stant for in­ser­tion, then con­nected height to tree size via a log­a­rith­mic re­la­tion­ship, de­liv­er­ing com­plete, ver­i­fied proofs that in­ser­tion and dele­tion are in­deed O(log n).

Bug Discovery: Finding Hidden Flaws

To test Leanstral’s bug-catch­ing abil­i­ties, we built an au­to­mated pipeline: Aeneas trans­lates Rust code to Lean, while Leanstral in­fers the user in­tent and gen­er­ates cor­rect­ness prop­er­ties from the code. Leanstral then at­tempts to prove each prop­erty in four at­tempts. If they all fail, it tries to prove the nega­tion in­stead, also with four at­tempts. Across 57 tested repos­i­to­ries, this process flagged 47 vi­o­lated prop­er­ties, with 11 point­ing to gen­uine bugs—5 of them pre­vi­ously un­re­ported on GitHub.

One such bug was in the sign func­tion for zigzag de­cod­ing of the da­trs/​var­in­te­ger li­brary. On in­put Std.U64.MAX, the ex­pres­sion (value + 1) over­flowed, caus­ing crashes in de­bug mode and silent cor­rup­tion in re­lease mode—an edge case that test­ing and fuzzing would typ­i­cally miss. Leanstral’s pipeline caught it au­to­mat­i­cally, demon­strat­ing that for­mal ver­i­fi­ca­tion can al­ready be ap­plied to real-world code­bases and find bugs that some tra­di­tional meth­ods over­look.

Get Started

Leanstral 1.5 has a Apache-2.0 li­cense. The weights can be found on Huggingface, while also be­ing avail­able now as a free API end­point as leanstral-1 – 5. We rec­om­mend us­ing it in Mistral Vibe. To be­gin your jour­ney, grab an API Key, and:

1. Set up Mistral Vibe

uv tool in­stall mis­tral-vibeuv tool up­date mis­tral-vibevibe –setup

uv tool in­stall mis­tral-vibe

uv tool up­date mis­tral-vibe

vibe –setup

2. Install Leanstral 1.5

/leanstallexit

/leanstall

exit

3. Launch the agent

vibe –agent lean

vibe –agent lean

4. Install Lean LSP MCP (Optional)

It is highly rec­om­mended to in­stall Lean LSP MCP by adding the fol­low­ing to your ~/.vibe/config.toml

[[mcp_servers]]name = lean-lsp”transport = stdio”command = uvx”args = [“lean-lsp-mcp”]tool_timeout_sec = 600

[[mcp_servers]]

name = lean-lsp”

trans­port = stdio”

com­mand = uvx”

args = [“lean-lsp-mcp”]

tool_­time­out_sec = 600

If there are no ex­ist­ing MCP servers, you may have to re­move mcp_servers = [].

5. Start prov­ing

Ask Leanstral to tackle a the­o­rem, de­bug a proof, or con­tribute to a repos­i­tory. It’s that sim­ple.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.