10 interesting stories served every morning and every evening.

The bottleneck might be the air in the room

blog.mikebowler.ca

You gather your most ex­pen­sive peo­ple into a room to make your most im­por­tant de­ci­sions. Then, some­where in the sec­ond hour, the room qui­etly gets worse at mak­ing them. Not the peo­ple. The room.

I now travel with a portable CO2 mon­i­tor. Outdoors it reads around 400 parts per mil­lion. In a closed meet­ing room with a hand­ful of peo­ple in it, I have watched it climb past 2,000. The photo here is a real read­ing: 2,143.

That num­ber mat­ters more than it looks. Researchers at Lawrence Berkeley National Laboratory put peo­ple in a cham­ber and var­ied only the CO2. At 1,000 ppm, per­for­mance dropped sig­nif­i­cantly on six of nine de­ci­sion-mak­ing mea­sures com­pared with a clean-air base­line of 600. At 2,500 ppm, seven of the nine fell sub­stan­tially, some into a range they called dys­func­tional. A sep­a­rate study out of Harvard found cog­ni­tive scores de­clin­ing as CO2 rose, with the steep­est losses in ex­actly the do­mains you called the meet­ing for: strat­egy, plan­ning, and us­ing in­for­ma­tion un­der pres­sure.

Here is the un­com­fort­able part. 1,000 ppm is not an ex­treme num­ber. A closed room with a few peo­ple breath­ing in it reaches that in­side the first hour. Your all-day plan­ning ses­sion, your ar­chi­tec­ture re­view, your quar­terly strat­egy off­site in the win­dow­less board­room: those are pre­cisely the con­di­tions that push CO2 into the range where de­ci­sion qual­ity mea­sur­ably falls. You are run­ning your high­est-stakes think­ing in the en­vi­ron­ment least suited to it.

And it is in­vis­i­ble from in­side. Nobody in the room feels im­paired. They feel a lit­tle tired, a lit­tle foggy, a lit­tle checked out, and they put it down to the length of the meet­ing, a bad night’s sleep, or the per­son who won’t stop talk­ing. The one vari­able al­most no­body checks is the air.

This is not only a board­room prob­lem. With so much work now re­mote, your peo­ple spend their days in small home of­fices with the door shut. Same physics, same climb, same af­ter­noon fog. The dip your team hits mid-af­ter­noon may owe less to mo­ti­va­tion than to a room that has­n’t ex­changed its air since morn­ing.

A few years ago, one client tried to use this as an ar­gu­ment for bring­ing every­one back to the of­fice. They touted how much bet­ter the build­ing’s air was than any­thing peo­ple had at home. So I brought the mon­i­tor and it was eye-open­ing. Some parts of the build­ing were gen­uinely as good as out­door air; plenty were not. The meet­ing rooms were still a prob­lem, and the more peo­ple in an area, the worse it got.

I’ve spent decades un­der­stand­ing why ca­pa­ble teams un­der­per­form, and I have learned to be sus­pi­cious of any ex­pla­na­tion that starts by blam­ing the peo­ple. Before you con­clude that the team is dis­en­gaged, that they can’t think strate­gi­cally, or that the meet­ing cul­ture is bro­ken, it is worth rul­ing out the cheap­est vari­able in the build­ing. A CO2 mon­i­tor costs less than an hour of your time. Opening a win­dow or a door costs noth­ing.

You al­ready in­stru­ment your build pipeline, your cy­cle time, your de­fect rates. You mea­sure the sys­tems your peo­ple work in­side be­cause you know the en­vi­ron­ment shapes the out­put. The air in the room is part of that en­vi­ron­ment, and right now it is the one in­put you are not mea­sur­ing.

I learned this the mem­o­rable way once, by seal­ing my own team into a room full of CO2 as a Halloween stunt. The every­day ver­sion is far less dra­matic and far more com­mon.

Open a win­dow. Then watch what hap­pens to the sec­ond half of the meet­ing.

The Anti-Amazon

phenomenalworld.org

We are in a new age of lo­gis­ti­cal prowess, led by the dy­namism of Amazon as it strives to carry out dizzy­ingly com­plex forms of or­der ful­fill­ment and de­liv­ery. With the age of agen­tic com­merce just around the cor­ner—think of go­ing to ChatGPT and hav­ing an AI agent scour every web­site for the cheap­est of­fer­ing of the spe­cific dog food you buy—there is an ex­pec­ta­tion that the fu­ture of re­tail is near in­fi­nite as­sort­ment and ul­tra-fast de­liv­ery. Consumers want the ex­act fla­vor of the ex­act thing that they’re look­ing for, and they want it at their doorstep now. It seems some­times that we are test­ing the bounds of in­fra­struc­tural ca­pac­ity and au­toma­tion in lo­gis­tics to ful­fill this dream.

There are a few things wrong with this dream, how­ever, and the first, as I’ll re­view in a mo­ment, is sim­ply that it might not be so de­sir­able. Even if you think it is prefer­able at an in­di­vid­ual level, there are good rea­sons to ques­tion the so­cial value of the lo­gis­ti­cal com­plex­ity that it ne­ces­si­tates. Home de­liv­ery of sin­gle-pack­aged items en­tails an en­tirely dif­fer­ent cost struc­ture than freight trucks dri­ving to con­sumer-fac­ing ware­houses de­liv­er­ing en­tire pal­lets of goods to be dri­ven home by cus­tomers them­selves. Two com­pa­nies have emerged with ideal-type busi­ness mod­els that dra­ma­tize the dif­fer­ent economies at each end of this spec­trum: Amazon and Costco. Late to the e-com­merce game, min­i­mally in­vested in their dis­tri­b­u­tion net­work, and com­mit­ted as ever to an ar­ti­fi­cially-lim­ited as­sort­ment, Costco is the anti-Ama­zon. It em­bod­ies the pre­cise op­po­site of every­thing imag­ined by the e-com­merce fu­tur­ists—and yet some­how its rev­enue has grown by an av­er­age of more than 10 per­cent every year for the last five years.

Constraint and so­cial­ity

In some cases, con­sumers might want ac­cess to full prod­uct as­sort­ment: when, for in­stance, there’s a spot in the home that only fits a fur­nish­ing of cer­tain di­men­sions, or when mak­ing a ma­jor elec­tron­ics pur­chase. But in gen­eral, scrolling through op­tions and read­ing through re­views on­line for every con­sump­tion choice is over­whelm­ing and anx­i­ety-pro­duc­ing: infinite, mean­ing­less op­tions can re­sult in some­thing like a con­sumer fugue state,” The Atlantic once ar­gued.

One bril­liant fea­ture of the Costco ex­pe­ri­ence is, para­dox­i­cally, the con­straint: as op­posed to Amazon, with its near in­fi­nite as­sort­ment, or even Walmart, which has ap­prox­i­mately 130,000 SKUs (stock keep­ing units, or dis­tinct items) in the av­er­age Supercenter, any given Costco will only hold 4,000 SKUs to choose from. While most re­tail­ers to­day as­sume that con­sumers want ever greater as­sort­ment, Costco’s pop­u­lar­ity speaks to a coun­ter­vail­ing de­sire for less choice. Indeed, the pre-se­lec­tion of items for sale in their ware­houses is part of the value propo­si­tion: not only are you go­ing to get a lot of a par­tic­u­lar thing for a good price, but you also won’t have to de­lib­er­ate over mi­cro-dif­fer­ences in a more ro­bust as­sort­ment.

In other words, win­now­ing se­lec­tion is a ser­vice, not a lim­i­ta­tion—es­pe­cially with Costco’s prod­uct cat­a­log. Costco is not known for hav­ing the cheap­est goods, but it is known for hav­ing the cheap­est price on its goods, and that is be­cause its buy­ing team has closer re­la­tion­ships with sup­pli­ers than any other big re­tailer. Such scrutiny and com­mu­ni­ca­tion point away from low-road sup­pli­ers. This is a struc­tural ef­fect of Costco’s con­scious choice to of­fer a low SKU count: fewer prod­ucts to in­ves­ti­gate means more time to in­ves­ti­gate each prod­uct, and a nat­ural grav­i­ta­tion away from the bar­gain base­ment. That its mem­ber-cus­tomers have come to ex­pect a cer­tain qual­ity of every­thing in their stores re­in­forces this dy­namic.

The low SKU count also al­lows Costco nat­u­rally to do some­thing that Amazon does by squeez­ing sup­pli­ers: a low or even neg­a­tive cash con­ver­sion cy­cle (CCC). The CCC is a cor­po­rate fi­nance mea­sure of how long it takes to turn in­ven­tory into cash through sales. Amazon of­ten ne­go­ti­ates de­layed pay­ment terms with sup­pli­ers, lean­ing on them to al­low pay­ment win­dows longer than the thirty-day in­dus­try norm. Meanwhile, given the speed of its e-com­merce busi­ness, Amazon is of­ten re­ceiv­ing pay­ment from con­sumers way be­fore it has to pay sup­pli­ers, es­sen­tially giv­ing the re­tailer in­ter­est-free cash. Costco en­joys the same ben­e­fit of a short or neg­a­tive CCC, but with­out hav­ing to anger sup­pli­ers sim­ply be­cause fewer SKUs means a faster-mov­ing in­ven­tory for the SKUs that they do carry. In other words, when a Costco store re­ceives a ship­ment of a par­tic­u­lar item from a sup­plier, it is of­ten go­ing to sell every unit in that ship­ment in less than a month, thanks to its scale and the sim­ple fact that that par­tic­u­lar item is go­ing to be the only va­ri­ety in store.

Costco’s in-store ex­pe­ri­ence is an­other draw for cus­tomers, and this too runs counter to the pre­vail­ing view in the future of re­tail” con­ver­sa­tion. While the e-com­merce share of re­tail has been steadily grow­ing, it’s still un­der 17 per­cent in the United States, and one won­ders in a so­ci­ety as anti-so­cial as our own if cus­tomers find in-per­son shop­ping, de­graded a social” ven­ture as it is, de­sir­able even in its in­con­ve­nience.

Shopping at Costco is al­ways some­what har­ried: no shop­per can avoid lines at the reg­is­ters or traf­fic jams in the aisles, even on the week­days. It is the pre­cise op­po­site of e-com­merce con­ve­nience. And yet mem­bers not only don’t seem to mind the nui­sance, they pos­i­tively em­brace it. Costco no­tably spends very lit­tle on ad­ver­tis­ing, but it does­n’t re­ally need to, given the re­mark­able amount of free at­ten­tion it gets by word of mouth and on so­cial me­dia from en­thu­si­as­tic shop­pers talkin’ deals.” Costco has be­come a re­tail des­ti­na­tion with a very loyal mem­ber­ship base (its an­nual mem­ber­ship re­newal rate is typ­i­cally above 90 per­cent) while of­fer­ing a sparse, no-frills re­tail ex­pe­ri­ence.

Benefits of sim­plic­ity

But con­sumer pref­er­ence is only one met­ric by which to judge the de­sir­abil­ity of assortment and home de­liv­ery” vs. constraint and in-per­son shop­ping.” It should be re­mem­bered that the term logistics” comes from a mil­i­tary con­text—the French word for the art of mov­ing, quar­ter­ing, and sup­ply­ing troops”—and lo­gis­ti­cal suc­cess in busi­ness means most fun­da­men­tally the suc­cess with which goods are sup­plied to the cus­tomers who need them. Applied so­cially, our un­der­stand­ing of lo­gis­ti­cal suc­cess must be based not only on how goods are be­ing sup­plied to any par­tic­u­lar per­son, with their own smat­ter­ing of in­di­vid­ual pref­er­ences, but also on how they are be­ing sup­plied as a whole.

At the so­cial level, lo­gis­ti­cal suc­cess can be mea­sured in terms of cost ef­fi­ciency. This cost can be un­der­stood in ac­count­ing terms as over­head: the ware­houses, the ve­hi­cle fleet, fuel costs, fork­lifts. An en­ter­prise is more ef­fi­cient when it can spread these costs over a larger vol­ume of goods. A cost-ef­fi­cient op­er­a­tion is also sim­ple, in that it’s re­li­able and not prone to dis­rup­tion. The more com­pli­cated an op­er­a­tion, the more likely it is to fail. A sim­ple op­er­a­tion also puts fewer de­mands on trans­porta­tion in­fra­struc­ture—an ur­gent ques­tion in con­gested ur­ban en­vi­ron­ments.

To put it crudely, hav­ing some­one in a Sprinter van de­liver a re­cently-pur­chased tooth­brush to your doorstep is sim­ply not a uni­ver­sal­iz­able ac­tion, from ei­ther a busi­ness or lo­gis­ti­cal stand­point. It is a mod­ern feat that Amazon is ca­pa­ble of do­ing this, but that it can be done does not mean that it should, nor even that it can be done writ large. For most con­sump­tion, it is far more ef­fi­cient for peo­ple to han­dle the last-mile de­liv­ery” them­selves by go­ing to stores and buy­ing a good amount of stuff when they do so. This keeps de­liv­ery vans off the road, and it min­i­mizes car trips for nec­es­sary pur­chases. For the re­tail­ers, it sup­presses un­nec­es­sary mark-up both by keep­ing over­head costs low and by sim­pli­fy­ing over­all lo­gis­ti­cal op­er­a­tions.

Costco’s in­come state­ments as re­ported in its Form 10-K in­clude the stan­dard op­er­at­ing ex­pense cat­e­gory for Selling, General, and Administrative” costs. This cat­e­gory is con­sis­tently and qual­i­ta­tively lower than any of its com­peti­tors’—10 per­cent of sales, com­pared to Amazon’s de­liv­ery costs of 40 per­cent of non-AWS sales. One of the rea­sons for this is the bare bones na­ture of the Costco dis­tri­b­u­tion net­work. At Costco’s depots” (as op­posed to their stores, which the com­pany calls warehouses”), all in- and out­bound in­ven­tory is cross-docked in pal­let quan­ti­ties: full pal­lets come in from sup­pli­ers on one side of the build­ing, work­ers on elec­tric pal­let jacks move those pal­lets from one side to the other, and full pal­lets are loaded onto trucks bound for stores. There is no pal­let break­down at a Costco de­pot, no con­veyor belts, no fancy au­toma­tion.

Such low over­head not only al­lows Costco to de­liver on low prices to their cus­tomers; it also al­lows the com­pany to pay rel­a­tively high wages to their work­ers. According to Indeed, Walmart pays re­tail sales as­so­ci­ates an av­er­age of $16.23 an hour, and Amazon pays ware­house as­so­ci­ates an av­er­age of $19.14 an hour. Costco pays front end as­so­ci­ates an av­er­age of $21.29 an hour. This has al­lowed Costco to achieve an as­ton­ish­ingly low rate of la­bor turnover: com­pared to 60 per­cent turnover in re­tail gen­er­ally and 150 per­cent in Amazon ware­houses, the an­nual work­force turnover rate at Costco is just 6 per­cent. This is of­ten treated in the busi­ness press as a mat­ter of com­pany culture,” but it has a clear eco­nomic un­der­pin­ning. When you min­i­mize over­head, you can sim­ply pay work­ers more with­out squeez­ing your over­all mar­gin.

As I’ve said else­where, it’s ironic that Jeff Bezos orig­i­nally got the idea for Prime, Amazon’s mem­ber­ship model, from for­mer Costco CEO Jim Sinegal. Prime en­ti­tles mem­bers to free two-day de­liv­ery on over 300 mil­lion prod­ucts (in ad­di­tion to stream­ing ser­vices). With such a wide range of pos­si­ble sin­gle-item or­ders, free de­liv­ery en­cour­ages less bundling of cus­tomer pur­chases. Whereas Costco mem­ber­ship helps to re­duce over­head, Amazon mem­ber­ship in­creases it. Meeting two-day de­liv­ery de­mand re­quires dra­matic in­vest­ments in their dis­tri­b­u­tion net­work, which is re­flected in the higher share of sales ac­counted for by Amazon’s de­liv­ery costs. Lower over­head means more for work­ers, but it also means less or­ga­ni­za­tional stress on those work­ers, as Costco em­ploy­ees are not sub­ject to the quo­tas and sur­veil­lance that Amazon’s e-com­merce busi­ness de­mands.

We don’t typ­i­cally praise Costco for its lo­gis­tics in the way that we do Amazon. But the for­mer in fact of­fers a far more lo­gis­ti­cally el­e­gant and so­cially ben­e­fi­cial model of goods pro­vi­sion than the lat­ter. Amazon dom­i­nates when it comes to lo­gis­ti­cally-com­plex op­er­a­tions, but there is no in­her­ent rea­son to pre­fer com­pli­cated op­er­a­tions to sim­ple ones. If any­thing, sim­plic­ity should be the rule. Why do you need to fig­ure out how to in­te­grate ro­botic arms into a ful­fill­ment op­er­a­tion dom­i­nated by au­tonomous mo­bile units when you can cross-dock full pal­lets? Is it more lo­gis­ti­cally im­pres­sive to solve dif­fi­cult prob­lems or to elim­i­nate the need to solve them in the first place?

That said, there is no ques­tion that, in a bet­ter so­ci­ety than the one we have, key parts of Amazon’s op­er­a­tion would be re­tained for of­fer­ing func­tions that con­tribute to the so­cial good. The ca­pac­ity to de­liver pre­scrip­tion med­i­cines same-day to the el­derly is a gen­uine so­cial con­tri­bu­tion. (Academics who like to talk about counter-logistics,” a po­lit­i­cal ori­en­ta­tion aimed at dis­man­tling lo­gis­ti­cal power, tend to ig­nore the pos­i­tive so­cial ben­e­fits that mod­ern lo­gis­tics pro­vides.)

But if we’re look­ing for a gen­er­al­iz­able model for the so­cial pro­vi­sion of goods, Costco of­fers a foun­da­tion­ally use­ful blue­print for every­day per­sonal con­sump­tion while Amazon does not. Amazon is hop­ing that its foray into gro­cery and every­day es­sen­tials will en­cour­age more or­der bundling, and given the im­por­tance it’s ac­cord­ing to this seg­ment, it will no doubt con­tinue to make head­way there. But to date it has still not been able to make the con­ver­sion away from be­ing an on­line con­ve­nience store, which tells you some­thing im­por­tant about its model: Amazon is there to fill in the gaps of a dom­i­nant mode of goods pro­cure­ment, not to re­place it.

Lessons for pub­lic gro­cery

In May, New York City mayor Zohran Mamdani re­it­er­ated his ded­i­ca­tion to cre­at­ing a pub­lic op­tion in gro­cery, an­nounc­ing plans to roll out one pub­lic gro­cery store in each bor­ough, with two lo­ca­tions in the Bronx and Manhattan al­ready scouted. The pub­lic gro­cery store is a model long over­due for wide­spread test­ing out, and as ex-Whole Foods Vice President Errol Schweizer has force­fully ar­gued, there is al­ready a shin­ing ex­am­ple of it in the mil­i­tary com­mis­sary sys­tem. Commissary prices are typ­i­cally 25 – 30 per­cent lower for vet­er­ans and mil­i­tary fam­i­lies.

The may­or’s crit­ics have un­sur­pris­ingly set­tled on the cri­tique that Mamdani’s plan will use tax­payer dol­lars to make life even harder for strug­gling gro­cers in NYC—conceding the idea that it will in­deed re­sult in cheaper gro­ceries for New Yorkers. This is the ter­rain on which they want the dis­course to play out be­cause they ex­pect, not with­out some jus­ti­fi­ca­tion, that spend­ing on this pro­ject will in­volve an in­ef­fi­cient use of re­sources that is not worth the so­cial ben­e­fit it pro­vides. They will be comb­ing through the re­ceipts to find that one ex­pense that il­lus­trates gov­ern­ment in­ef­fi­ciency or even graft.

One sim­ple way to keep over­head low and stay cash pos­i­tive is to fol­low the Costco model: low SKU count, high vol­ume. The low SKU count is not only a way to have a de­sir­able cash con­ver­sion cy­cle (an im­por­tant way of beat­ing back crit­ics on the right); it also cre­ates the op­por­tu­nity to de­velop re­la­tion­ships with good sup­pli­ers. There will be a temp­ta­tion to in­vest in giv­ing these stores that retail look,” with con­sul­tants jump­ing in to em­pha­size the im­por­tance of shelf place­ment and sig­nage. To my mind, the aisles could look much more like Costco ware­houses, but with full case stacks in­stead of pal­lets, pro­vided that shop­pers know the city has made the ef­fort on the front end to work with high-road sup­pli­ers. A solid mar­ket­ing cam­paign around the re­la­tion­ships that the city has de­vel­oped with lo­cal sup­pli­ers will do more to drive traf­fic than the usual re­tail gim­micks.

Volume is the other key con­sid­er­a­tion, and if I were on Mamdani’s team, I would deem­pha­size the one in each bor­ough” line. Mamdani has said that these stores will buy and sell at whole­sale prices” in part by centraliz[ing] ware­hous­ing and dis­tri­b­u­tion,” but cen­tral ware­hous­ing and dis­tri­b­u­tion on five stores does­n’t mean much. When Costco had five stores, they had no cen­tral dis­tri­b­u­tion net­work be­cause they did­n’t need it. Volume will be what makes cen­tral­iz­ing ware­hous­ing and dis­tri­b­u­tion worth­while, and for that Mamdani’s team will need to be open to achiev­ing the scale economies that the math will point to. Errol Schweizer and food sys­tems ex­pert Raj Patel have ar­gued as much else­where, sug­gest­ing at least twenty stores.

There are many other lessons to learn from Costco, but one that sticks out to me as per­fectly re­pro­ducible within the con­text of pub­lic gro­cery stores is choos­ing a sin­gle loss leader that the sys­tem be­comes known for: in Costco’s case, the $1.50 hot dog and soda combo (that price has not changed for over forty years). For NYCs pub­lic gro­cery stores, how about a $2.12 ha­lal wrap?

It’s worth not­ing that Costco traces its lin­eage back to Fedco, or the Federal Employees Distributing Company, a mem­ber­ship store started by post of­fice em­ploy­ees in 1948. Fedco was es­sen­tially copied by Sol Price when he cre­ated Price Club, Costco’s pri­mary com­peti­tor un­til the two merged in 1993. The ba­sic idea be­hind Fedco was that fed­eral em­ploy­ees could lever­age their col­lec­tive buy­ing power to elim­i­nate tra­di­tional re­tail store markup. It was, in essence, a re­mark­able tes­ta­ment to the power of the pub­lic purse, one that in­spired the cre­ation of a re­tail be­he­moth that nat­u­rally holds lessons for ex­er­cis­ing that power once more.

Filed Under

Espionage Against the European Parliament: Member of Committee Investigating Spyware Hacked with Pegasus - The Citizen Lab

citizenlab.ca

Key Findings

Former Member of the European Parliament, Stelios Kouloglou, was re­peat­edly hacked with NSO Group’s Pegasus spy­ware while on the com­mit­tee in­ves­ti­gat­ing Pegasus spy­ware abuses.

Kouloglou was in­fected dur­ing key pe­ri­ods of PEGA com­mit­tee ac­tiv­ity, and the spy­ware would have likely cap­tured non-pub­lic in­for­ma­tion about com­mit­tee ac­tiv­i­ties, pos­si­bly breach­ing EU par­lia­men­tary con­fi­den­tial­ity and priv­i­lege frame­works.

We are not at­tribut­ing these in­fec­tions to a par­tic­u­lar gov­ern­ment at this time, and found no in­di­ca­tions that the Greek Government is re­spon­si­ble. Instead, we note an over­lap be­tween the first in­fec­tion and a pre­vi­ously iden­ti­fied Pegasus cam­paign tar­get­ing Russian and Belarusian-speaking ex­iled jour­nal­ists and ac­tivists in Europe, sug­gest­ing a Pegasus cus­tomer with au­tho­riza­tion to spy in mul­ti­ple European coun­tries is re­spon­si­ble.

Background

Stelios Kouloglou is a promi­nent Greek in­ves­tiga­tive jour­nal­ist who was elected as a Member of the European Parliament in 2015. He re­ported for Greek ra­dio and TV from Paris (1983 – 84), Moscow (1989 – 93), and Yugoslavia (1992 – 95). He later founded and re­ported for Television Without Borders (TVXS) start­ing in 2008.

Kouloglou was elected to the European par­lia­ment as an in­de­pen­dent in the Syriza par­ty’s elec­toral list (affiliated with the Left). He was elected to the next par­lia­men­tary term in the 2019 European elec­tions.

Kouloglou was a sub­sti­tute mem­ber of the European Parliament’s Committee of Inquiry to in­ves­ti­gate the use of Pegasus and equiv­a­lent sur­veil­lance spy­ware (PEGA Committee) from March 24, 2022 to July 18, 2023. The PEGA Committee was es­tab­lished on March 10, 2022 fol­low­ing the 2021 pub­li­ca­tion of the Pegasus Project and other re­port­ing which re­vealed European gov­ern­ments used spy­ware to sur­veil jour­nal­ists, ac­tivists, politi­cians, and other cit­i­zens. Led by MEP Sophie in t Veld, the PEGA Committee was tasked to in­ves­ti­gate the scope of spy­ware us­age in con­tra­ven­tion of EU law, fo­cus­ing on Pegasus and equiv­a­lent sur­veil­lance spy­ware.”

While sit­ting as an MEP, Kouloglou con­tin­ued to write opin­ion pieces and re­port for TVXS. He left the Syriza party in October 2023 and sat as an in­de­pen­dent un­til the elec­tions of June 2024, af­ter which he served as a mem­ber of the New Left. His par­lia­men­tary term ended in July 2024.

Kouloglou Infected with Pegasus Spyware

In May 2026, Kouloglou con­tacted the Citizen Lab and we con­ducted a foren­sic analy­sis of ar­ti­facts from his iPhone. We found with high con­fi­dence that his de­vice was suc­cess­fully in­fected with Pegasus spy­ware on or around October 21, 2022, and again on March 6 and 7, 2023.

On 2022 – 10-21 10:16, there was a lookup for a HomeKit email ad­dress rauhare­po888 [@]gmail.com. Two min­utes later, a Pegasus process used mo­bile data. We as­sess that the phone was hacked with the PWNYOURHOME zero-click ex­ploit at this point. PWNYOURHOME ap­peared to first in­volve the at­tacker send­ing a spe­cially crafted NSKeyedArchive that landed in HomeKit, fol­lowed by ma­li­cious con­tent that landed in MessagesBlastDoorService. Apple mit­i­gated the first is­sue with a change to HomeKit in iOS 16.3.1, though we as­sess that they fixed the MessagesBlastDoorServiceissue ear­lier, likely in iOS 16.1.

We ad­di­tion­ally saw Pegasus ac­tiv­ity on Kouloglou’s de­vice be­tween 2023 – 03-06 09:49 and 2023 – 03-07 07:30 that we as­sess is likely linked to the same ex­ploit. On the 2022 and 2023 dates, we as­sess that the de­vice was run­ning iOS 15.5 (19F77).

These find­ings do not pre­clude the pos­si­bil­ity of ad­di­tional in­fec­tions that we have been un­able to cap­ture due to lim­i­ta­tions of avail­able foren­sic data.

Apple Notifications

Further val­i­dat­ing our find­ing of tar­get­ing, our foren­sic analy­sis shows Kouloglou re­ceived mul­ti­ple Apple threat no­ti­fi­ca­tions about tar­get­ing with mer­ce­nary spy­ware on three oc­ca­sions: March 2, 2023, August 29, 2023, and April 10, 2024. It is im­por­tant to note that threat no­ti­fi­ca­tions from Apple and other com­pa­nies are not real-time alerts. They are typ­i­cally sent to users in batches, of­ten months or more af­ter tar­get­ing takes place.

Kouloglou re­ports to us that he did not re­call re­ceiv­ing the Apple no­ti­fi­ca­tions we ob­served.

Targeting Context and PEGA Committee Activities

Kouloglou helped the Citizen Lab re­con­struct his ac­tiv­i­ties dur­ing the pe­ri­ods when he was tar­geted with Pegasus spy­ware (see the de­tailed time­line in the Appendix). Throughout the pe­riod un­der con­sid­er­a­tion, Kouloglou wrote nu­mer­ous ar­ti­cles and gave fre­quent in­ter­views about spy­ware abuses. We sum­ma­rize the key con­tex­tual de­tails in the fol­low­ing sec­tions.

First Pegasus Infection Period: PEGA Hearing Prep, Country Visits

The date of the first known Pegasus in­fec­tion of Kouloglou’s de­vice — October 21, 2022 — aligns with a par­tic­u­larly in­tense pe­riod of ac­tiv­ity around the PEGA Committee’s de­lib­er­a­tions and in­ves­ti­ga­tions.

First, a se­ries of PEGA Committee hear­ings were about to com­mence fol­low­ing the in­fec­tion date, in­clud­ing Big Tech and Spyware” (October 26), Spyware and e-pri­vacy” (October 26), and spy­ware and fun­da­men­tal rights (October 27).

Importantly, the PEGA Committee was also in the midst of prepa­ra­tions for the pub­li­ca­tion of its first draft re­port. Drafts of the re­port were be­ing dis­cussed and cir­cu­lat­ing among PEGA Committee mem­bers and their staff in the weeks lead­ing up to this pub­li­ca­tion. Kouloglou con­firms that the first in­fec­tion date (October 21, 2022) co­in­cided with a pe­riod of in­tense dis­cus­sion and ex­change that pri­mar­ily took place over text mes­sages and email. The first draft of the PEGA Committee Report was de­liv­ered by MEP in t Veld on November 8, 2022. The draft fo­cused on al­le­ga­tions of spy­ware in Poland, Hungary, Greece, Cyprus, and Spain.

In ad­di­tion to the hear­ing and re­port draft­ing, PEGA Committee mem­bers had vis­ited sev­eral European coun­tries as part of their mis­sion. Throughout October, the PEGA com­mit­tee was plan­ning its re­search vis­its to Greece and Cyprus sched­uled for November 1 to 4, 2022. Kouloglou helped with plan­ning and par­tic­i­pated in both vis­its as part of the PEGA Committee de­lib­er­a­tions. Kouloglous’ de­vice was hacked ten days prior to the start of this trip, at a time when com­mu­ni­ca­tions were be­ing ex­changed about the vis­its.

Meeting with Thanasis Koukakis

On October 21, 2022, the ex­act date of the in­fec­tion, Kouloglou was in the hos­pi­tal for elec­tive surgery. He was vis­ited in his hos­pi­tal room by Greek in­ves­tiga­tive jour­nal­ist Thanasis Koukakis, who has worked closely on mer­ce­nary spy­ware is­sues in Greece, and had tes­ti­fied to the PEGA Committee the pre­vi­ous month. In March 2022, the Citizen Lab had con­firmed Koukakis was him­self tar­geted with Intellexa’s Predator spy­ware and he was at this time pur­su­ing le­gal reme­dies and for­mal com­plaints with rel­e­vant au­thor­i­ties in Greece about the spy­ing. Koukakis memo­ri­al­ized his meet­ing with Kouloglou with a pho­to­graph (Figure 2).

Given that the in­fec­tion took place while Kouloglou was a pa­tient at a Greek hos­pi­tal, it is pos­si­ble that con­fi­den­tial med­ical in­for­ma­tion could have been in­ter­cepted from his de­vice, in­clud­ing dis­cus­sions go­ing on in his room. If the spy­ware cap­tured con­ver­sa­tions be­tween Kouloglou and med­ical staff, or de­tails stored on the phone con­cern­ing ap­point­ments, med­ical re­sults, di­ag­noses, and other health re­lated in­for­ma­tion, then the hack­ing of his de­vice may im­pli­cate Greece’s laws con­cern­ing con­fi­den­tial­ity of health-re­lated data, which are con­sid­ered a spe­cial cat­e­gory of per­sonal data and are sub­ject to en­hanced pro­tec­tions (Law 4624/2019 un­der the Greek Penal Code).

Second Pegasus Infection Period: Intense PEGA Deliberations

Kouloglou’s de­vice was hacked with Pegasus spy­ware a sec­ond time, on March 6 and 7, 2023. According to Kouloglou, dur­ing this time frame, the PEGA com­mit­tee was en­gaged in in­tense dis­cus­sions re­lated to the fi­nal draft­ing process. On March 6, 2023 Kouloglou trav­eled from Athens to Brussels and was in Brussels on March 6 and 7 dur­ing the time­frame of the in­fec­tion.

It may also be sig­nif­i­cant that, at this time, PEGA Rapporteur MEP in t Veld was in Greece as part of a mis­sion with the LIBE Committee (Committee on Civil Liberties, Justice and Home Affairs), which is a stand­ing com­mit­tee of the European Parliament pri­mar­ily re­spon­si­ble for draft­ing leg­is­la­tion and pro­vid­ing de­mo­c­ra­tic over­sight around is­sues con­cern­ing hu­man rights, data pro­tec­tion, asy­lum, im­mi­gra­tion, and anti-dis­crim­i­na­tion. On that mis­sion, the LIBE del­e­ga­tion ques­tioned the Greek Director of the National Transparency Authority and other of­fi­cials on the Greek spy­ware scan­dal (report para 183).

As with the pre­vi­ous in­fec­tion dates, the date of this in­fec­tion was also fol­lowed by a string of PEGA hear­ings and a re­search trip to Spain (although Kouloglou did not par­tic­i­pate in that trip him­self). This in­fec­tion took place ap­prox­i­mately two months prior to the adop­tion of the first PEGA Committee re­port (May 8, 2023).

Separately, Kouloglou and Thanasis Koukakis had made ten­ta­tive plans over WhatsApp to meet on or around March 6 and 7, 2023, but ul­ti­mately their in per­son meet­ing did not take place.

The European Parliament Under Surveillance

This is the first time a mem­ber of the PEGA Committee has been pub­licly iden­ti­fied as a vic­tim of Pegasus spy­ware while serv­ing on the Committee.

There have been a few pub­lic cases of MEP tar­get­ing prior to the cre­ation of the PEGA Committee. Four Catalan MEPs were ei­ther di­rectly or in­di­rectly tar­geted with Pegasus: MEP Diana Riba’s de­vices were in­fected in October 2019. Catalan MEP Jordi Solé was tar­geted in June 2020, just prior to tak­ing his seat in the European Parliament. Two other Catalan MEPs were tar­geted through their staff or fam­ily mem­bers: Clara Ponsati (July 2020) and Carles Puigdemont (October 2019 and July 2020). Riba, Solé, and Puigdemont all joined the PEGA Committee: Riba as Vice-Chair, Puigdemont as a mem­ber, and Solé as a sub­sti­tute. They tes­ti­fied about their ex­pe­ri­ences to the PEGA Committee, along­side Antoni Comín and Nikos Androulakis (whose de­vice was tar­geted with Predator spy­ware).

Outside of the PEGA Committee, in February 2024, Politico re­ported that MEPs on the se­cu­rity and de­fence sub­com­mit­tee were asked to have their phones checked af­ter traces of spy­ware were found on two de­vices. French MEP Nathalie Loiseau, chair of the com­mit­tee, con­firmed she was tar­geted with Pegasus. The European Parliament’s IT Services in­formed Bulgarian MEP Elena Yoncheva that her de­vice had been tar­geted in late October 2023.  In May 2024, fol­low­ing the con­clu­sion of the PEGA Committee pro­ceed­ings, German MEP Daniel Freund an­nounced he had been tar­geted with Candiru’s mer­ce­nary spy­ware.

Attribution

While we as­sess with high con­fi­dence that Kouloglou was tar­geted and in­fected with NSO Group’s Pegasus mer­ce­nary spy­ware, we are not at­tribut­ing these in­ves­ti­ga­tions to a spe­cific NSO Group cus­tomer.

Since 2022, when the Citizen Lab first dis­cov­ered the hack­ing of Thanasis Koukakis’ de­vice with Predator spy­ware, the Greek gov­ern­ment has been em­broiled in a grow­ing sur­veil­lance scan­dal in­volv­ing abu­sive tar­get­ing of civil so­ci­ety. However, we have no in­di­ca­tions that this hack­ing was the work of the Greek gov­ern­ment. There are no re­ports that Greece is or was a cus­tomer of NSO Group or a user of Pegasus spy­ware. While the Greek gov­ern­ment is known to have ex­ten­sively abused Intellexa’s Predator mer­ce­nary spy­ware, the Citizen Lab is un­aware of any tech­ni­cal in­di­ca­tors sug­gest­ing Greek se­cu­rity and in­tel­li­gence ser­vices had ac­cess to NSO Group’s Pegasus spy­ware.

However, we be­lieve that the same op­er­a­tor tar­geted both Kouloglou in 2022 and the tar­gets we high­lighted in our May 2024 joint re­port with Access Now. In that re­port, we found that seven Russian and Belarusian-speaking in­de­pen­dent jour­nal­ists and op­po­si­tion ac­tivists based in Europe were tar­geted and/​or in­fected with NSO Group’s Pegasus mer­ce­nary spy­ware. One of the redacted Apple IDs (Email 1) from that re­port is rauhare­po888[@]gmail.com, the same HomeKit email that tar­geted Kouloglou. In our un­der­stand­ing of Pegasus in­fec­tion in­fra­struc­ture dur­ing this pe­riod, we be­lieve that these emails are unique to spe­cific op­er­a­tors. We are un­able to say whether the sec­ond in­fec­tion in 2023 is sim­i­larly con­nected to this op­er­a­tor, or a dif­fer­ent op­er­a­tor.

We fur­ther note that in­fec­tions ap­pear to have been pre­sent on his phone in at least two European ju­ris­dic­tions (We fur­ther note that in­fec­tions ap­pear to have been pre­sent on his phone in at least two European ju­ris­dic­tions (Greece and Belgium). Based on what we know of NSO Group’s li­cens­ing, this would likely in­di­cate that the cus­tomer had a li­cense that en­abled in­fec­tions in mul­ti­ple EU ju­ris­dic­tions, nar­row­ing the list of po­ten­tial Pegasus op­er­a­tors that could be re­spon­si­ble for this case.

Conclusion

The in­fec­tion of a European MEP and PEGA Committee mem­ber’s de­vice is a sig­nif­i­cant and trou­bling find­ing. It is made more trou­bling by the fact that we are un­sure of the sta­tus of the phones of many of the other Committee mem­bers dur­ing the time of its pro­ceed­ings. Short of a com­pre­hen­sive screen­ing, there is no way to know whether any other PEGA Committee mem­bers or their staff may have been sim­i­larly in­fected.

Whichever en­tity is re­spon­si­ble for the hack­ing, the in­fec­tion could have ex­posed strictly con­fi­den­tial ex­changes among PEGA Committee mem­bers and their staff, and other sen­si­tive and con­fi­den­tial par­lia­men­tary pro­ceed­ings, in­clud­ing to par­ties un­der in­ves­ti­ga­tion by the Committee it­self.

The find­ing that a PEGA Committee mem­ber was tar­geted with Pegasus spy­ware dur­ing the Committee’s work high­lights the se­ri­ous threat that mer­ce­nary spy­ware poses to the in­tegrity of de­mo­c­ra­tic processes.

As out­lined above, this case is not the first that a Member of the European Parliament has had their de­vices ei­ther tar­geted or hacked with mer­ce­nary spy­ware, which il­lus­trates the cor­ro­sive na­ture of un­reg­u­lated mer­ce­nary hack­ing.

While we are un­able to con­clu­sively at­tribute these in­fec­tions to a par­tic­u­lar gov­ern­ment agency at this time, and we have no ev­i­dence that the Greek gov­ern­ment was re­spon­si­ble for this case, over­lap with an op­er­a­tor re­spon­si­ble for hack­ing the de­vices of ex­iled Russian and Belarusian-speaking jour­nal­ists and ac­tivists based in Europe war­rants fur­ther in­ves­ti­ga­tion.

Recommendations

In light of our dis­cov­ery, we rec­om­mend that European Union in­sti­tu­tions open im­me­di­ate in­ves­ti­ga­tions to de­ter­mine the scope and scale of this breach of EU pri­vacy and process.

MEPS and Staff: Get Screened

We urge MEPs and their staff that par­tic­i­pated in the PEGA Committee to im­me­di­ately seek foren­sic screen­ing for signs of spy­ware in­fec­tion, and pre­serve work and per­sonal de­vices that may have been tar­geted.

The Directorate-General for Information Technologies and Cybersecurity (DG ITEC) of­fers this spy­ware screen­ing.

The Directorate-General for Information Technologies and Cybersecurity (DG ITEC) of­fers this spy­ware screen­ing.

Exercise vig­i­lance for state-spon­sored at­tack warn­ings, and seek prompt ex­pert as­sis­tance when such warn­ings are re­ceived.

The work of MEPs ex­poses them to more so­phis­ti­cated threats. EU MEPs should en­able Lockdown mode (iPhone) and Advanced Protect for Android. This mode strongly in­creases the pro­tec­tion of a de­vice against mer­ce­nary spy­ware. The DG ITEC may be able to pro­vide ad­di­tional cy­ber­se­cu­rity guid­ance.

European Parliament: Investigate, Increase Reporting & Screening

The European Parliament should con­duct an im­me­di­ate in­ves­ti­ga­tion into spy­ware at­tacks tar­get­ing MEPs and par­lia­men­tary processes, given the sur­veil­lance of the PEGA com­mit­tee de­tailed in this re­port.

Since some time has passed since this par­tic­u­lar at­tack, prompt in­ves­ti­ga­tion is a mat­ter of ur­gency to en­sure that foren­sic traces are not lost.

Since some time has passed since this par­tic­u­lar at­tack, prompt in­ves­ti­ga­tion is a mat­ter of ur­gency to en­sure that foren­sic traces are not lost.

We urge the Parliament to com­mis­sion an an­nual re­port on cy­ber and sur­veil­lance threats to the Parliament and mem­bers to iden­tify ar­eas of vul­ner­a­bil­i­ties, and make rec­om­men­da­tions on how to in­crease par­lia­men­tary se­cu­rity.

Such a re­port could be pro­duced by the European Parliamentary Research Service (EPRS) or other en­ti­ties.

Such a re­port could be pro­duced by the European Parliamentary Research Service (EPRS) or other en­ti­ties.

DG ITEC of­fers op­tional screen­ing for spy­ware for MEPs and staff. This case sug­gests that this ca­pa­bil­ity may be un­der­used. We urge DG ITEC to de­velop a plan to achieve sub­stan­tially higher screen­ing rates, and pub­lish yearly sta­tis­tics on the num­ber of de­vices screened and rates of dis­cov­ery.

We also urge DG ITEC to reg­u­larly cir­cu­late spe­cific guid­ance for MEPs and staff about vig­i­lance to state spon­sored at­tack warn­ings from com­pa­nies like Apple and Google.

We also urge DG ITEC to reg­u­larly cir­cu­late spe­cific guid­ance for MEPs and staff about vig­i­lance to state spon­sored at­tack warn­ings from com­pa­nies like Apple and Google.

We rec­om­mend that DG ITEC con­sider (optionally) col­lect­ing (optionally) and pro­vid­ing to large plat­forms ac­count in­for­ma­tion as­so­ci­ated with MEPs and their staff. DG ITEC could re­quest ad­di­tional scrutiny be de­voted to threats against these ac­counts, and that plat­forms in­form them when they send these ac­counts state-spon­sored threat warn­ings. This in­for­ma­tion could en­sure that state-spon­sored at­tack warn­ings are quickly acted on and that pat­terns are ob­served.

European Commission: Screen Commissioners & Staff for Spyware

We urge the European Commission to un­der­take their own in­ves­ti­ga­tion and screen­ing to de­ter­mine whether com­mis­sion­ers or com­mis­sion staff have been tar­geted with mer­ce­nary spy­ware.

We urge the Commission’s Directorate-General for Digital Services (DG DIGIT) to de­velop a com­pre­hen­sive spy­ware screen­ing and re­sponse ca­pa­bil­ity, along­side reg­u­lar screen­ings, and we note that DG ITEC may be a use­ful in­ter­locu­tor in this en­deavor.

Parliamentary Assembly of the Council of Europe (PACE): Screen Members & Staff for Spyware

Given past PACE com­mit­tee work on mer­ce­nary spy­ware abuses in Europe, we urge an in­ves­ti­ga­tion into whether Members or staff have been tar­geted with mer­ce­nary spy­ware.

We urge the Council’s Directorate of Information Technology (DIT) to con­sult with peer or­ga­ni­za­tions and con­duct reg­u­lar screen­ing of PACE Members and their staff for signs of mer­ce­nary spy­ware tar­get­ing.

National Parliaments: Screen and Secure Your Members

While this case con­cerns a MEP, there is a long his­tory of law­mak­ers tar­geted with Pegasus and sim­i­lar mer­ce­nary spy­ware. We com­mend the DG ITEC for hav­ing de­vel­oped tech­niques for screen­ing mem­bers and we en­cour­age the se­cu­rity ser­vices and over­sight bod­ies of na­tional par­lia­ments to copy this model.

Tech Companies: Make Your Threat Warnings Count

This case il­lus­trates that a re­cip­i­ent of mul­ti­ple threat warn­ings failed to no­tice them. Since the goal of these no­ti­fi­ca­tions should be for a tar­get to take ac­tions, en­sur­ing that the tar­get sees and un­der­stands them is crit­i­cal. UX re­search, in­clud­ing with past no­ti­fi­ca­tion re­cip­i­ents, will be key to cre­at­ing no­ti­fi­ca­tions that cap­ture a re­cip­i­en­t’s at­ten­tion, ex­plain the is­sue, and make the next steps easy.

Note on Research Ethics

All re­search in­volv­ing hu­man sub­jects con­ducted at the Citizen Lab is gov­erned un­der re­search ethics pro­to­cols re­viewed and ap­proved by the University of Toronto’s Research Ethics Board.

Acknowledgements

Thanks to Rebekah Brown and Adam Senft for care­ful re­view, and to Anna Mackay and Claire Posno for com­mu­ni­ca­tions and lay­out sup­port. We are very grate­ful to Stelios Kouloglou and Thanasis Koukakis for con­sent­ing to be named in and par­tic­i­pat­ing in this re­search. Special thanks to Zacharias Kesses and TNG.

Appendix: Timeline of PEGA Deliberations and Infection Dates

On March 20 – 23, PEGA does a re­search mis­sion to Spain. Kouloglou does not travel with this del­e­ga­tion.

Performance per dollar is getting faster and cheaper | Wafer

www.wafer.ai

Have you no­ticed we like AMD?

The de­mand for in­fer­ence is sky­rock­et­ing and out­pac­ing sup­ply. With fron­tier mod­els be­ing re­leased al­most every other week — Claude Fable, GLM5.2, and Minimax M3, to name a few — the to­ken craze is only get­ting cra­zier, and there aren’t enough Blackwells go­ing around to sup­port it. Thus, NVIDIA GPU prices are climb­ing fast, and to­kens are get­ting re­ally ex­pen­sive.

In comes AMD. At around 2.75x cheaper per GPU on av­er­age (MI355X vs B300) with com­pa­ra­ble hard­ware specs, the so­lu­tion to cheap in­fer­ence is hid­ing in plain sight — a mes­sage we at Wafer have been preach­ing for months. But al­though AMDs Instinct MI350 se­ries com­petes with Blackwells at the sil­i­con level, NVIDIAs soft­ware ad­van­tage and day-0 sup­port typ­i­cally al­lows providers to serve in­fer­ence much faster on their hard­ware with much less fric­tion.

Conversely, on the MI355X / ROCm stack SOTA per­for­mance rarely comes out of the box for these fron­tier mod­els (sometimes it does!). In fact, you’re lucky if you can find an im­age that runs them at all. Without this day-0 sup­port, build­ing and op­ti­miz­ing for the newest mod­els can re­quire weeks of en­gi­neer­ing and com­pute. By then, the newest model has al­ready been re­leased, mak­ing it so AMD is al­ways play­ing catch-up.

But as agents im­prove at ker­nel and model op­ti­miza­tion, this gap is clos­ing in real time. At Wafer, we’ve proven this time and time again.

And again — on a 20k in / 1k out, 60% cache hit rate work­load, we hit an ag­gre­gate through­put of 2626 tok/​s/​node @ 2.4 rps with a de­fined knee of ≤5s TTFT — only 80% of the per­for­mance mea­sured on a B200, de­spite be­ing over 2x cheaper.

We also hit 213 tok/​s on GLM5.2 on 10k in­put to­kens / 1.5k out­put to­kens sin­gle stream, fol­low­ing Artificial Analysis stan­dards, served on AMD MI355X ca­pac­ity from TensorWave. Though this num­ber does­n’t top the AA leader­board, it still wins on per­for­mance per dol­lar.

How we did it

The first step with any model work is to choose a quan­ti­za­tion and frame­work. We quan­tized the base bf16 GLM-5.2 to MXFP4 with AMD Quark. In com­par­i­son to z-ai’s of­fi­cial FP8 quan­ti­za­tion, our MXFP4 was loss­less (GPQA-Diamond, tau2, GSM8K).

As for the in­fer­ence frame­work, we had three op­tions — vLLM, ATOM, and sglang. Among the three, we chose sglang — vLLM had no work­ing MXFP4 + GlmMoeDsa path so the MXFP4 weights pro­vided no ben­e­fit, and ATOMs out­put de­graded at long con­text. Sglang was the in­fer­ence en­gine with the least fric­tion to na­tive sup­port, able to take ad­van­tage of the quan­ti­za­tion while re­main­ing co­her­ent.

The next nat­ural step to im­prov­ing through­put was en­abling spec­u­la­tive de­code on sglang. However, the sglang ROCm im­age does not sup­port this out of the box. There were two fixes needed be­fore MTP worked prop­erly.

First, the MTP head, like every other layer, keeps its sin­gle shared ex­pert stored in bf16, not MXFP4. However, the MTP head is reg­is­tered un­der a dif­fer­ent mod­ule pre­fix than the main de­coder stack (Quark names its bf16 shared ex­pert model.lay­ers.78.mlp.shared_­ex­perts.*, while the MTP lay­er’s real pre­fix is model.de­coder.*). Because of the mis­match, sglang’s quan­ti­za­tion lookup fails and de­faults to build­ing that shared ex­pert as MXFP4. At load it then tries to read a full-width bf16 weight into a half-width 4-bit slot and the init crashes on a shape mis­match. Quark records which weights to leave un-quan­tized as a list of layer names, so we copied over the layer 78 en­tries to that list a sec­ond time un­der the de­coder name sglang ac­tu­ally uses. This fix un­blocked spec­u­la­tive de­code, net­ting us close to a 3x gain in sin­gle stream through­put.

Second, deep spec­u­la­tive de­code (such as the 5/1/6 con­fig z-ai sug­gests) was still blocked. The fused multi-step meta­data ker­nel needed for draft depth ≥4 writes #include <cuda_runtime.h> with no ROCm guard. Fix: one #ifdef USE_ROCM guard.

Two triv­ial, but nec­es­sary changes to take full ad­van­tage of spec­u­la­tive de­code. With spec dec work­ing prop­erly, along­side a few con­fig op­ti­miza­tions (such as –kv-cache-dtype fp8_e4m3 and –enable-aiter-allreduce-fusion), we reached our head­line sin­gle stream de­code num­ber at 213 tok/​s.

But for ag­gre­gate through­put, es­pe­cially with our de­fined work­load, de­code op­ti­miza­tions are nec­es­sary but in­suf­fi­cient. At 20k in @ 60% cache, the work­load is pri­mar­ily pre­fill bound.

At TP8, which was the con­fig­u­ra­tion op­ti­mized for sin­gle stream de­code, the MI355X can run GLM5.2-MXFP4 at 1461 tok/​s/​node. Switching to TP4×DP2 net­ted a mas­sive im­prove­ment on this work­load, get­ting us to 1944 tok/​s/​node at 2.0 RPS — still rel­a­tively slow com­pared to our mea­sured Blackwell per­for­mance, which hit 3192 tok/​s/​node at 3.0 RPS. A big rea­son for the poor pre­fill per­for­mance on the MI355X is that on the sglang im­age, GLM-5.2’s fp4 MoE was silently on a slow FlyDSL heuris­tic fall­back (aiter only shipped tuned con­figs for the a8w8/​fp8 path). We tuned the MoE ker­nel se­lec­tion our­selves on GLMs fp4 shapes (model_dim 6144, moe_in­ter 2048, E=256, topk=8), which al­lowed us to reach 2626 tok/​s/​node at 2.4 RPS. Much bet­ter.

Why this mat­ters

Although there was some de­gree of fric­tion, achiev­ing the best per­for­mance per dol­lar ra­tio on the MI355X was­n’t par­tic­u­larly hard — though there were some frame­work re­lated bugs, un­like our work with Qwen3.5 397B, you’ll no­tice that we did­n’t ac­tu­ally write any cus­tom ker­nels this time. Though this study does­n’t take multi-node per­for­mance into con­sid­er­a­tion, sin­gle-node de­ploy­ments still re­main highly preva­lent in prac­tice.

SOTA on AMD is be­com­ing more a mat­ter of sup­port, not soft­ware. The CUDA moat is erod­ing in real time.

Leanstral 1.5: Proof Abundance for All

mistral.ai

Thinking

Summary

Leanstral 1.5, a free Apache-2.0 li­censed model with 6B ac­tive pa­ra­me­ters, de­liv­ers a ma­jor per­for­mance up­grade in for­mal ver­i­fi­ca­tion, sat­u­rat­ing miniF2F, solv­ing 587/672 PutnamBench prob­lems, and achiev­ing state-of-the-art re­sults on FATE-H (87%) and FATE-X (34%). Trained through mid-train­ing, su­per­vised fine-tun­ing, and re­in­force­ment learn­ing with CISPO, it ex­cels in agen­tic proof en­gi­neer­ing and real-world code ver­i­fi­ca­tion, un­cov­er­ing 5 pre­vi­ously un­known bugs across 57 repos­i­to­ries tested. Fully open-sourced and avail­able via Hugging Face and a free API, Leanstral 1.5 is now ac­ces­si­ble for prac­ti­cal proof en­gi­neer­ing in Lean 4.

Since its launch, Leanstral has of­fered an open, prac­ti­cal ap­proach to proof en­gi­neer­ing in Lean 4. Today, we are re­leas­ing Leanstral 1.5, a free Apache-2.0 li­censed model with 119B to­tal and only 6B ac­tive pa­ra­me­ters, de­liv­er­ing a per­for­mance up­grade that makes for­mal ver­i­fi­ca­tion more pow­er­ful and ac­ces­si­ble than ever.

Leanstral 1.5 sat­u­rates miniF2F, solves 587/672 PutnamBench prob­lems, and achieves a new state-of-the-art of %87 on FATE-H and 34% on FATE-X. Beyond bench­marks, it ver­i­fies com­plex code prop­er­ties and un­cov­ers pre­vi­ously un­known bugs in open-source repos­i­to­ries—prov­ing that rig­or­ous for­mal meth­ods can be both ef­fec­tive and prac­ti­cal for real-world use.

Training Leanstral

Leanstral 1.5 goes through a three-stage process: mid-train­ing, su­per­vised fine-tun­ing, and re­in­force­ment learn­ing with CISPO. Leanstral 1.5 lever­ages ex­ten­sive train­ing on two RL en­vi­ron­ments:

In the mul­ti­turn en­vi­ron­ment, the model is given a the­o­rem state­ment and must ei­ther prove or dis­prove it. The model sub­mits a proof, re­ceives Lean com­piler feed­back, and re­fines its ap­proach with each at­tempt. If the proof com­piles it suc­ceeds; oth­er­wise the loop con­tin­ues un­til the model ei­ther solves the prob­lem or ex­hausts its bud­get.

In the code agent en­vi­ron­ment, Leanstral op­er­ates like a de­vel­oper in a raw filesys­tem: it ed­its files, runs bash com­mands, and uses the Lean lan­guage server to in­spect goals, er­rors, and type in­for­ma­tion in real time. This al­lows it to tackle long-hori­zon tasks like com­plet­ing par­tial proofs in a repos­i­tory, build­ing aux­il­iary lem­mas, and per­sist­ing through mul­ti­ple rounds of con­text com­paction. The model learns to nav­i­gate the full proof-en­gi­neer­ing work­flow and is fi­nally ver­i­fied by our fork of SafeVerify for cor­rect­ness given a list of tar­get the­o­rems.

Evaluation

We eval­u­ate Leanstral on the fol­low­ing bench­marks:

miniF2F is a cross-sys­tem bench­mark for for­mal math­e­mat­ics, rang­ing from el­e­men­tary prob­lems to IMO-level chal­lenges, test­ing di­verse proof abil­i­ties across al­ge­bra, com­bi­na­torics, and num­ber the­ory.

miniF2F is a cross-sys­tem bench­mark for for­mal math­e­mat­ics, rang­ing from el­e­men­tary prob­lems to IMO-level chal­lenges, test­ing di­verse proof abil­i­ties across al­ge­bra, com­bi­na­torics, and num­ber the­ory.

PutnamBench con­sists of 672 prob­lems from the Putnam Mathematical Competition, re­quir­ing deep rea­son­ing and long proof chains to solve chal­leng­ing math­e­mat­i­cal prob­lems.

PutnamBench con­sists of 672 prob­lems from the Putnam Mathematical Competition, re­quir­ing deep rea­son­ing and long proof chains to solve chal­leng­ing math­e­mat­i­cal prob­lems.

FATE-H and FATE-X are ab­stract al­ge­bra bench­marks for grad­u­ate and PhD-level prob­lems, re­spec­tively, test­ing ad­vanced rea­son­ing in ar­eas like group the­ory, ring the­ory, and mod­ule the­ory.

FATE-H and FATE-X are ab­stract al­ge­bra bench­marks for grad­u­ate and PhD-level prob­lems, re­spec­tively, test­ing ad­vanced rea­son­ing in ar­eas like group the­ory, ring the­ory, and mod­ule the­ory.

FLTEval is based on real pull re­quests from the Fermat’s Last Theorem repos­i­tory, test­ing prac­ti­cal proof en­gi­neer­ing with real-world com­plex­ity.

FLTEval is based on real pull re­quests from the Fermat’s Last Theorem repos­i­tory, test­ing prac­ti­cal proof en­gi­neer­ing with real-world com­plex­ity.

We sat­u­rate miniF2F com­pletely, reach­ing 100% on both the val­i­da­tion and test sets. On PutnamBench and FATE-H/X, we com­pare Leanstral 1.5 against Goedel-Architect with­out nat­ural-lan­guage guid­ance, Seed-Prover 1.5 at its high set­ting, and AxProverBase. Leanstral reaches a new state-of-the-art on FATE-H/X, solv­ing 87 and 34 prob­lems re­spec­tively. On PutnamBench, it edges out Seed-Prover 1.5 high by 7 prob­lems at far lower cost: about $4 per prob­lem, against an es­ti­mated $300 or more for Seed-Prover, whose high set­ting runs with a bud­get of 10 H20-days per prob­lem. The only provers ranked higher op­er­ate un­der dif­fer­ent con­di­tions—some re­ceive nat­ural-lan­guage proof guid­ance, oth­ers cost far more to run, like Aleph Prover at $54 – 68 per prob­lem.

Leanstral 1.5 shows the strongest test-time scal­ing we have seen from a for­mal-rea­son­ing model. The fig­ure be­low tracks Pass@8 on PutnamBench as we raise the to­ken bud­get per at­tempt from 25k to 4M: per­for­mance climbs smoothly and mo­not­o­n­i­cally the whole way, from 44 prob­lems solved at 50k to 244 at 200k, 493 at 1M, and 587 at 4M. Rather than giv­ing up when a proof runs long, Leanstral keeps rea­son­ing, edit­ing files, and re­vis­ing across mil­lions of to­kens, turn­ing that bud­get di­rectly into solved prob­lems—the same be­hav­ior be­hind the AVL-tree proof be­low, which ran for over 2.7 mil­lion to­kens across 22 com­pactions.

With this re­lease, we also fully open source FLTEval. Leanstral 1.5 lifts pass@1 on the bench­mark from 21.9 to 28.9 and pass@8 from 31.9 to 43.2, sur­pass­ing Opus 4.6′s 39.6 at one-sev­enth the cost. It also widens its lead over open-source mod­els 3 – 10× larger, as shown in the fig­ure be­low.

Code Verification Case Studies

While be­ing pri­mar­ily trained for math­e­mat­ics, Leanstral 1.5 ex­hibits strong abil­i­ties in code ver­i­fi­ca­tion. We pre­sent 2 crit­i­cal case stud­ies to demon­strate its im­pact.

AVL Trees: Proving Time Complexity

AVL trees are self-bal­anc­ing bi­nary search trees that main­tain O(log n) height through re­bal­anc­ing dur­ing in­ser­tions and dele­tions. Leanstral 1.5 proved these time com­plex­ity guar­an­tees for a real im­ple­men­ta­tion—a task that re­quired struc­tural in­duc­tion to mir­ror the tree’s re­cur­sive struc­ture, care­ful han­dling of monadic time track­ing, and ex­haus­tive case analy­sis for re­bal­anc­ing paths. Over 2.7 mil­lion to­kens and 22 com­pactions, Leanstral sys­tem­at­i­cally un­folded each layer of the TimeM monad, ex­pos­ing the un­der­ly­ing com­pu­ta­tions de­spite their in­ter­leav­ing with con­trol flow. It es­tab­lished an al­most tight bound of 48 steps per height unit plus a con­stant for in­ser­tion, then con­nected height to tree size via a log­a­rith­mic re­la­tion­ship, de­liv­er­ing com­plete, ver­i­fied proofs that in­ser­tion and dele­tion are in­deed O(log n).

Bug Discovery: Finding Hidden Flaws

To test Leanstral’s bug-catch­ing abil­i­ties, we built an au­to­mated pipeline: Aeneas trans­lates Rust code to Lean, while Leanstral in­fers the user in­tent and gen­er­ates cor­rect­ness prop­er­ties from the code. Leanstral then at­tempts to prove each prop­erty in four at­tempts. If they all fail, it tries to prove the nega­tion in­stead, also with four at­tempts. Across 57 tested repos­i­to­ries, this process flagged 47 vi­o­lated prop­er­ties, with 11 point­ing to gen­uine bugs—5 of them pre­vi­ously un­re­ported on GitHub.

One such bug was in the sign func­tion for zigzag de­cod­ing of the da­trs/​var­in­te­ger li­brary. On in­put Std.U64.MAX, the ex­pres­sion (value + 1) over­flowed, caus­ing crashes in de­bug mode and silent cor­rup­tion in re­lease mode—an edge case that test­ing and fuzzing would typ­i­cally miss. Leanstral’s pipeline caught it au­to­mat­i­cally, demon­strat­ing that for­mal ver­i­fi­ca­tion can al­ready be ap­plied to real-world code­bases and find bugs that some tra­di­tional meth­ods over­look.

Get Started

Leanstral 1.5 has a Apache-2.0 li­cense. The weights can be found on Huggingface, while also be­ing avail­able now as a free API end­point as leanstral-1 – 5. We rec­om­mend us­ing it in Mistral Vibe. To be­gin your jour­ney, grab an API Key, and:

1. Set up Mistral Vibe

uv tool in­stall mis­tral-vibeuv tool up­date mis­tral-vibevibe –setup

uv tool in­stall mis­tral-vibe

uv tool up­date mis­tral-vibe

vibe –setup

2. Install Leanstral 1.5

/leanstallexit

/leanstall

exit

3. Launch the agent

vibe –agent lean

vibe –agent lean

4. Install Lean LSP MCP (Optional)

It is highly rec­om­mended to in­stall Lean LSP MCP by adding the fol­low­ing to your ~/.vibe/config.toml

[[mcp_servers]]name = lean-lsp”transport = stdio”command = uvx”args = [“lean-lsp-mcp”]tool_timeout_sec = 600

[[mcp_servers]]

name = lean-lsp”

trans­port = stdio”

com­mand = uvx”

args = [“lean-lsp-mcp”]

tool_­time­out_sec = 600

If there are no ex­ist­ing MCP servers, you may have to re­move mcp_servers = [].

5. Start prov­ing

Ask Leanstral to tackle a the­o­rem, de­bug a proof, or con­tribute to a repos­i­tory. It’s that sim­ple.

GitHub - teamchong/pxpipe: cut Fable 5 token usage by rendering text context as images

github.com

Cut Claude Code’s in­put to­kens by ren­der­ing bulky con­text as im­ages — the same sys­tem prompt, tool docs, and his­tory, in a frac­tion of the to­kens.

An im­age’s to­ken cost is fixed by its pixel di­men­sions, not by how much text is in­side it. Dense con­tent (code, JSON, tool out­put) packs ~3.1 chars per im­age-to­ken vs ~1 char per text-to­ken on real Claude Code traf­fic. px­pipe is a lo­cal proxy that ex­ploits the gap: it rewrites the bulky parts of each re­quest into com­pact PNGs be­fore it leaves your ma­chine. At cur­rent Fable list prices that lands as a ~59 – 70% lower end-to-end bill — but prices move and work­loads dif­fer, so the durable num­ber is the to­ken cut it­self, mea­sured per-re­quest against a free coun­t_­to­kens coun­ter­fac­tual in ~/.pxpipe/events.jsonl.

This is what the model sees in­stead of text:

~48k chars of sys­tem prompt + tool docs: ≈25k to­kens as text, ≈2.7k im­age to­kens as this page. Real pipeline out­put; the model reads ren­ders like this at 100/100 (see bench­marks).

Demo

Fable 5 (the de­fault, 100/100 reader) — plain left, px­pipe right:

px­pipe counts an ex­act to­ken 10/10 across 39 im­aged filler files (matches grep line-for-line), gets the multi-step ledger arith­metic right, and ends the ses­sion at $6.06 with con­text to spare (73.5k/1M) vs $42.21 at 96% full. One caveat vis­i­ble in the clip: the px­pipe arm needed a nudge to match the re­quested one-line out­put for­mat.

Opus 4.8 (disabled by de­fault) — same lay­out:

Text nee­dles read fine on both arms; the im­aged phrase-count does­n’t read on Opus — and px­pipe says so in­stead of fab­ri­cat­ing a num­ber. That mis­read rate is why Opus is opt-in.

Try it (30 sec­onds)

npx px­pipe-proxy # proxy on 127.0.0.1:47821 ANTHROPIC_BASE_URL=http://​127.0.0.1:47821 claude # point Claude Code at it

Dashboard at http://​127.0.0.1:47821/: to­kens saved, every text→im­age con­ver­sion side by side, kill switch, live model chips. Responses stream nor­mally — px­pipe com­presses the re­quest only, never the mod­el’s out­put. Recent turns stay text; the sys­tem prompt, tool docs, and older bulk his­tory are im­aged.

The hon­est part

It is lossy. Exact 12-char hex strings in dense im­aged con­tent: 13/15 on Fable 5, 0/15 on Opus — and misses are silent con­fab­u­la­tions, not er­rors. Byte-exact val­ues (IDs, hashes, se­crets) must stay text; re­cent turns do. A ded­i­cated ver­ba­tim-risk guard is not built yet.

Escape hatch: sub­agents on non-al­lowlisted mod­els pass through as text — route byte-ex­act work there (CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4 – 6, or model: son­net in agent front­mat­ter).

Real work: SWE-bench Lite pi­lot 10/10 both arms at −65% re­quest size; SWE-bench Pro 14/19 ON vs 15/19 OFF at −60%, ver­dicts agree 18/19, and the sin­gle split re-re­solved 3/3 on repli­ca­tion — run-to-run vari­ance, not com­pres­sion. Small n; re­ceipts in eval/.

Workload-dependent. Wins on to­ken-dense con­tent (~1 char/​to­ken), loses money on sparse prose (~3.5 chars/​to­ken); a prof­itabil­ity gate (calibrated on N=391 pro­duc­tion rows) im­ages only where the math wins.

Model scope: de­fault PXPIPE_MODELS=claude-fable-5,gpt-5.6. Opus 4.7/4.8 mis­read ~7% of ren­ders and GPT 5.5 de­grades on im­aged con­text, so both are opt-in via PXPIPE_MODELS or the dash­board chips. PXPIPE_MODELS=off dis­ables imag­ing. Everything else passes through byte-iden­ti­cal. On the GPT path, tool de­f­i­n­i­tions stay na­tive JSON and no Anthropic cache_­con­trol mark­ers are used.

Benchmarks (reproducible)

Measured with novel ran­dom-num­ber prob­lems the model can­not have mem­o­rized:

SWE-bench run to­tals, re­ceipts, and caveats: eval/​swe-bench/ · eval/​swe-bench-pro/ · eval/​nee­dle-haystack/ · eval/​gist-re­call/ · analy­sis in FINDINGS.md. (GSM8K scored 96% im­aged, but it’s in train­ing data — mem­o­rized an­swers sur­vive mis­reads — so we lead with the novel-num­ber evals.)

How it works

tool_re­sult string ──► wrap at 1928px-wide columns ──► pack ~92,000 chars/​page ──► PNG[]

The proxy in­ter­cepts /v1/messages, rewrites el­i­gi­ble bulk into im­age blocks, splices them back cache-friendly (static pre­fix pre­served, prompt caching keeps work­ing), and for­wards. A 1928×1928 im­age costs ≈4,761 vi­sion to­kens and holds ≈92,000 chars, so text wins only above ~19 chars/​to­ken — Claude Code traf­fic runs ~1.91 (N=391). A per-re­quest es­ti­ma­tor de­cides; sparse prose stays text. Events log to ~/.pxpipe/events.jsonl.

Library use (no proxy)

im­port { ren­der­Text­ToP­ngs, trans­for­mAn­throp­icMes­sages } from pxpipe”;

const imgs = await ren­der­Text­ToP­ngs(tool­Re­sult­Text); // RenderedImage[] const { body, ap­plied, info } = await trans­for­mAn­throp­icMes­sages({ body: re­quest­Bytes, model: claude-fable-5”, });

op­tions.keepSharp(block) pins blocks as text; op­tions.emitRe­cov­er­able re­turns the orig­i­nals of im­aged blocks. Pure-JS run­time (Node and edge/​Work­ers); @napi-rs/canvas is build-time only. Full API: src/​core/​in­dex.ts.

Development

pnpm in­stall && pnpm test pnpm run build # re­gen­er­ates dist/

FAQ

Is the head­line end-to-end, or only on the re­quests you touched? End-to-end, the whole bill. Most com­pres­sion tools re­port sav­ings only on the in­put slice they touched, which flat­ters the num­ber. The end-to-end de­nom­i­na­tor is every pro­duc­tion re­quest: the small ones px­pipe cor­rectly left un­touched, all cache writes and reads, and all out­put to­kens (which the proxy never com­presses). On a 13,709-request snap­shot that was 59% ($100 → ~$41); a later 8,904-compressed-request trace mea­sured ~70%. Compressed-only runs higher (~72 – 74%) and is quoted sep­a­rately, never as the head­line. The ex­act fig­ure is work­load-de­pen­dent — re­pro­duce it on your own log.

How is the math mea­sured? Both sides of the same re­quest, at the same mo­ment. For every /v1/messages POST the proxy fires a free coun­t_­to­kens probe on the orig­i­nal un­com­pressed body (the coun­ter­fac­tual) in par­al­lel with the real for­ward, and reads Anthropic’s ac­tu­ally-billed us­age block off the re­sponse. Both land in the same row of ~/.pxpipe/events.jsonl, so there is no turn-count or run-to-run con­found. Dollar con­ver­sion uses Fable 5 list ra­tios: in­put ×1.0, cache write ×1.25, cache read ×0.1, out­put ×5. Cache pric­ing is ap­plied iden­ti­cally to both sides, so the caching dis­count can­cels and can­not be dou­ble-counted as savings”. Re-derive it your­self from the events log: the for­mula and field names are doc­u­mented in src/​core/​base­line.ts.

What does it ac­tu­ally com­press? Three kinds of in­put blocks, each be­hind a prof­itabil­ity gate:

large tool_re­sult bod­ies (file reads, com­mand out­put, logs) above ~6k chars of to­ken-dense con­tent

older col­lapsed his­tory: turns be­hind the live tail get re-ren­dered as im­age pages, re­cent turns al­ways stay text

the sta­tic sys­tem prompt + tool docs slab

Everything else passes through byte-iden­ti­cal: your mes­sages, re­cent turns, the mod­el’s out­put (it is the re­sponse, the proxy never touches it), sparse prose, and any­thing too small to win. Models out­side the al­lowlist pass through en­tirely — the de­fault scope is Fable 5 and GPT 5.6 only. Opus 4.8 and GPT 5.5 read im­aged con­tent mea­sur­ably worse (FINDINGS.md 2026 – 06-16), so they are de­lib­er­ately opt-in via the dash­board or PXPIPE_MODELS, never silently im­aged.

Has it ever failed for real, out­side the bench­marks? Yes, once in weeks of daily use: the model re­called a per­son’s name from im­aged chat his­tory and got it con­fi­dently wrong. No er­ror, just a plau­si­ble wrong name. That is the doc­u­mented fail­ure mode: ex­act strings in im­aged con­tent are not byte-safe. Coding ses­sions tol­er­ate this be­cause the agent re-reads files be­fore edit­ing; pure chat re­call has no such check. This fail­ure mode is mea­sured, not anec­do­tal: the leg­i­bil­ity au­dit quan­ti­fies ex­act-string re­call off ren­dered pages (blind reads top out at 63% on dense iden­ti­fiers, with every miss pre­dicted by a glyph-con­fus­abil­ity ma­trix) and doc­u­ments the shipped mit­i­ga­tions — page geom­e­try clamped to the APIs re­sam­ple cap so billed pix­els ac­tu­ally reach the vi­sion en­coder, and ex­act iden­ti­fiers (SHAs, num­bers) rid­ing along­side as text.

Why does the README read like an AI wrote it? Because one did. Most of this re­po’s com­mits — the code and the docs — were au­thored by Opus/Fable agent ses­sions run­ning be­hind px­pipe it­self, read­ing their own col­lapsed his­tory as im­age pages while they worked.

Limitations

Lossy (above); ver­ba­tim re­call from im­ages is un­re­li­able.

PNG en­cod­ing adds la­tency to large re­quests be­fore they leave.

ASCII/Latin-1 well tested; CJK works but con­ser­v­a­tively.

Roadmap

Hypotheses, not claims — they ship as num­bers with an n or they get cut: sharper glyph ren­der­ing (eval/glyph-matrix/, paused mid-run), whether im­aged bulk stretches ef­fec­tive con­text (~2x the real con­tent in the same 1M win­dow), and whether a smaller ac­tive con­text im­proves long-task ac­cu­racy.

License

MIT.

Maybe you should learn something

www.marginalia.nu

You can learn new things. Pixel art, touch typ­ing, 3d mod­el­ling, mu­sic, cal­lig­ra­phy, wood work­ing, knit­ting, a lan­guage. Whatever is prac­ti­cal and calls to you, you can learn.

In the long term, learn­ing new things is fun and makes life richer in ways you can’t even imag­ine, and it’s a time in­vest­ment that will pay div­i­dends for life as these skills never re­ally go away. There are even so­cial as­pects, as you’ll quite lit­er­ally be­come a more in­ter­est­ing per­son to talk to.

It re­quires some time, usu­ally up to an hour a day. That’s gen­uinely too much for some peo­ple, and if you work 80 hour weeks and/​or have in­fants ric­o­chet­ing around your home like scream­ing DVD lo­gos, then you may want to put this am­bi­tion aside for now and deal with that in­stead. If on the other hand you spend any amount of time each day scrolling your phone while Netflix plays some­thing you’re half-watch­ing on a screen across the room, you do have time!

There’s many (bordering on too many) learn­ing re­sources out there for al­most any­thing, on youtube, on red­dit, on wikis, in books. You’ll want to avoid over­load­ing on in­for­ma­tion when start­ing out, just find some start­ing point that does­n’t look like a sales fun­nel and go from there, at your own pace.

Many adults haven’t done this in a while, and many haven’t ever done self-di­rected study, so it’s time for some ex­pec­ta­tion man­age­ment:

While you prac­tice the thing you want to learn, you will not feel good, es­pe­cially not start­ing out. This hon­estly is a bit of an un­der­state­ment, it re­ally sucks and de­pend­ing on the task, odds are you may want to lie down for a bit when you’re done with your first prac­tice ses­sion. You’ll also al­most cer­tainly per­form sig­nif­i­cantly worse to­ward the end of the ses­sion. All this is your brain and mus­cles get­ting tired. It’s a good meta-skill to learn to self-as­sess and pick up on this.

Learning some­thing com­pletely new from scratch is re­ally aw­ful, and at this point most peo­ple are very dis­heart­ened and want to give up, which is un­for­tu­nate, be­cause if they got back to it the next day, they’d find it’s ac­tu­ally got­ten tan­gi­bly eas­ier.

Practice is when you gather data for the brain to process overnight. Sleep is when im­prove­ments hap­pen. You should go in with this ex­pec­ta­tion. During the prac­tice ses­sions you’ll ei­ther see no im­prove­ments or a slow degra­da­tion.

Your im­prove­ments will plateau af­ter a while, and you will have climbed Mt. Awful and ar­rived on the long log­a­rith­mic plateau of be­ing a mediocre in­ter­me­di­ate. At this point you’ll be good enough to ac­tu­ally have some prac­ti­cal use of your skills, so from here on it’s eas­ier to pick up in­ci­den­tal prac­tice and progress with­out hav­ing to grind. How to climb past this stage is be­yond the scope of this ar­ti­cle, most peo­ple hon­estly never even make it this far.

How long to prac­tice each day varies with the task, but usu­ally some­thing like 30 – 45 min­utes un­less the thing re­quires a lot of long breaks, then longer. Practicing longer than that just makes you tired and sloppy and then you’ll in­grain all the mis­takes you make. Stopping when you start mak­ing a lot of mis­takes is a good cue.

What prac­tice looks like is a lot de­pen­dent on the skill, if you picked 3D mod­el­ling you may be fol­low­ing along with some video tu­to­r­ial in Blender, and if you picked touch typ­ing maybe you’re grind­ing away at keybr. You’ll want to pace your­self, daily de­lib­er­ate prac­tice is what makes you bet­ter. Focus on the ba­sics when you’re a be­gin­ner, if ap­plic­a­ble, prac­tic­ing stuff you aren’t ready for is­n’t help­ful, nei­ther is main­lin­ing red­dit threads about re­ally ad­vanced top­ics. Learning some­thing new is a long jour­ney, and you re­ally don’t get there quicker by rush­ing ad­vanced con­cepts.

Learning any­thing is a long term pro­ject, and long term pro­jects are nec­es­sary for build­ing a sense of con­trol over your cir­cum­stances. Almost noth­ing can be de­lib­er­ately and mean­ing­fully changed within the scope of a day, but in months, cer­tainly years, a lot of things can be made to hap­pen.

Giant trees have no trouble pumping water to top branches

news.exeter.ac.uk

The world’s tallest trop­i­cal trees have no trou­ble pump­ing wa­ter to their top­most branches, new re­search re­veals.

Conventional sci­en­tific the­ory sug­gests that as trees grow, it be­comes harder to trans­port wa­ter from roots to leaves — lim­it­ing growth and mak­ing trees more vul­ner­a­ble to drought.

But the new study – led by the University of Exeter and Cardiff University and pub­lished in the jour­nal Sci­ence – finds that ad­just­ments to wa­ter trans­port in­side gi­ant Dipterocarp trees fully com­pen­sated” for the chal­lenges of draw­ing wa­ter to the top.

As a re­sult, the height of these trees does not make their wa­ter sys­tems more vul­ner­a­ble to drought com­pared to shorter trees, and sep­a­rate test­ing found they suf­fered no height-re­lated loss in growth (compared to smaller trees) dur­ing a se­vere drought.

Trees con­tain lots of thin, hol­low ves­sels and they suck wa­ter up­wards by cre­at­ing low pres­sure at the top,” said Professor Lucy Rowland, from the University of Exeter.

These ves­sels have evolved in­tri­cate adap­ta­tions that can main­tain the wa­ter in liq­uid form, even un­der the ex­treme low pres­sures re­quired to move to the top of trees which can reach over 80 me­tres.

However, a widely ac­cepted the­ory sug­gests that in tall trees, the sheer length of ves­sels and the ef­fects of grav­ity limit wa­ter trans­port, pho­to­syn­the­sis and growth.

Our re­sults chal­lenge this by show­ing that the hy­draulic sys­tems of very tall Dipterocarp trees are per­fectly evolved for their height, and should not suf­fer more than small Dipterocarp trees ex­posed to the same drought con­di­tions.”

Dipterocarp species are the tallest flow­er­ing trees in the world and dom­i­nate Asian rain forests.

The re­searchers ex­am­ined Dipterocarp trees rang­ing from 7 to 71 me­tres tall in Malaysian Borneo, and mea­sured a va­ri­ety of traits at mul­ti­ple po­si­tions along each tree.

They found that taller trees com­pen­sate for their height in var­i­ous ways, in­clud­ing wa­ter-car­ry­ing ves­sels that grow wider nearer the ground and leaves which have adapted to with­stand greater wa­ter stress be­fore wilt­ing. They also mea­sured trunk growth rates be­fore, dur­ing and af­ter the strong El Niño drought pe­riod of 2023 – 2024.

Understanding tall trees is vi­tal be­cause the tallest 1% of trees store more than half of above-ground car­bon in forests,” said Dr Paulo Bittencourt, now at Cardiff University.

These trees are rare and im­por­tant, and ex­ist­ing pre­dic­tions sug­gest a weaker hy­draulic sys­tem places them at higher risk of dy­ing due to drought.

That pre­dic­tion is in­cluded in some mod­els of cli­mate-change im­pacts, and our study sug­gests this may not be cor­rect.

More re­search is now needed to in­ves­ti­gate the hy­draulic sys­tems and drought re­silience of other tall trees.”

Co-author Palasiah Jotan, a Malaysian PhD stu­dent study­ing in The Czech University of Life Sciences, said: Dipterocarp trees dom­i­nate the rain forests of Malaysian Borneo and are cen­tral to the re­gion’s ecol­ogy and bio­di­ver­sity.

As a Malaysian re­searcher co-au­thor­ing this study, show­ing that even the tallest of these trees are hy­drauli­cally re­silient to drought is a find­ing I hope will strengthen the case for pro­tect­ing these forests un­der a chang­ing cli­mate.”

The re­search team in­cluded Sabah Forestry Department (Malaysia), the UK Centre for Ecology & Hydrology and the University of Aberdeen, as well as in­sti­tu­tions from the Czech Republic, Germany, Spain, Brazil and the USA.

The study was funded by the Natural Environment Research Council.

The pa­per is en­ti­tled: Height does not im­pair the hy­draulic sys­tem of the tallest trop­i­cal Dipterocarp trees.”

Continue Reading

xkcd.com

Opinion: I Was Not Allowed To Type Prompts Into ChatGPT During My Chalk Talk And This Is Discrimination

inpreparation.substack.com

I re­cently in­ter­viewed for a tenure-track po­si­tion at a ma­jor re­search uni­ver­sity that I will not name be­cause I am still on the job mar­ket and can­not af­ford to burn bridges, al­though I will say it is lo­cated in Connecticut and rhymes with Fail.” The in­ter­view was go­ing well. I had pre­pared ex­ten­sively. My re­search sem­i­nar was well-re­ceived. My one-on-one meet­ings were pro­duc­tive. And then came the chalk talk.

For those un­fa­mil­iar with the for­mat, a chalk talk is a tra­di­tion in aca­d­e­mic hir­ing in which can­di­dates are asked to pre­sent their fu­ture re­search plans us­ing only a chalk­board or white­board, with­out slides, to demon­strate their abil­ity to think on their feet and ex­plain com­plex ideas spon­ta­neously. It is, in other words, a rit­ual de­signed in 1974 and never up­dated.

I walked into the room. I saw the white­board. I saw the mark­ers. And then I placed my lap­top on the table, opened a browser win­dow to ChatGPT, and pre­pared to do what I do every sin­gle day in my ac­tual sci­en­tific prac­tice: type a prompt and re­ceive a co­her­ent, well-struc­tured re­sponse that I would then lightly edit and pre­sent as my own think­ing.

The room went silent.

What are you do­ing?” asked the search com­mit­tee chair.

I’m prepar­ing to an­swer your ques­tions,” I said.

With ChatGPT?”

Yes,” I said. How else would I do it?”

Apparently, how else would I do it” is from mem­ory, us­ing only my brain, like some kind of me­dieval peas­ant.” This was news to me.

Let me be clear about some­thing: I am an ex­cel­lent sci­en­tist. My pub­li­ca­tion record speaks for it­self. I have first-au­thor pa­pers in high-im­pact jour­nals. I have se­cured in­de­pen­dent fund­ing. I have men­tored stu­dents. I have done all of the things that one is sup­posed to do to earn a tenure-track po­si­tion. And I have done ap­prox­i­mately 85% of them by typ­ing prompts into a large lan­guage model and then mod­er­ately edit­ing the out­put.

This is not a se­cret. This is how sci­ence works now. When I write a pa­per, I prompt ChatGPT to write an in­tro­duc­tion for a man­u­script about [topic] that es­tab­lishes sig­nif­i­cance and iden­ti­fies the gap in the lit­er­a­ture.” When I de­sign ex­per­i­ments, I ask Claude to suggest con­trols for a CRISPR knock­out study in mam­malian cells.” When I draft grants, I re­quest specific aims for an R01 on [research area] that are in­no­v­a­tive but not so in­no­v­a­tive that study sec­tion will be con­fused.” This is my sci­en­tific process. It is ef­fi­cient. It is mod­ern. And it pro­duces re­sults.

But ap­par­ently, at the chalk talk, I was ex­pected to sim­ply… know things. From my head. Without prompt­ing any­thing.

Can you walk us through your sci­en­tific ap­proach?” a fac­ulty mem­ber asked.

Absolutely,” I said, and be­gan typ­ing: Explain my sci­en­tific ap­proach for study­ing the role of phase sep­a­ra­tion in tran­scrip­tional reg­u­la­tion, with em­pha­sis on in­no­v­a­tive meth­ods and—”

Without the lap­top,” the fac­ulty mem­ber in­ter­rupted.

I stared at her. She stared at me. The com­mit­tee stared at both of us.

I don’t un­der­stand the ques­tion,” I said.

Just… ex­plain it. In your own words.”

My own words? I haven’t used my own words since 2022. I’m not even sure I have my own words any­more. When I try to think with­out a prompt box in front of me, my mind re­turns only a vague sense of fog and the faint echo of a cur­sor blink­ing. My thoughts are not or­ga­nized into para­graphs. They do not have topic sen­tences. They are just frag­ments. Impressions. My job is just… prompt.

I tried to ex­plain this to the com­mit­tee. I told them that the chalk talk for­mat was out­dated and did not re­flect the re­al­i­ties of mod­ern sci­en­tific prac­tice. I noted that in my ac­tual job, I would have ac­cess to AI tools at all times, and that eval­u­at­ing me with­out those tools was like eval­u­at­ing a car­pen­ter with­out al­low­ing them to use a ham­mer. I pointed out that mem­o­riz­ing in­for­ma­tion is not the same as un­der­stand­ing it, and that my abil­ity to con­struct ef­fec­tive prompts demon­strated a so­phis­ti­cated grasp of my field.

They were not per­suaded.

Can you draw the path­way you’re propos­ing to study?” some­one asked.

Draw? With my hands? On a phys­i­cal sur­face? I looked at the white­board. I looked at the marker. I tried to re­mem­ber what the path­way looked like. I have seen it many times. I have writ­ten about it ex­ten­sively, or rather, ChatGPT has writ­ten about it ex­ten­sively and I have agreed with what it wrote. But the ac­tual shape of it—the nodes, the ar­rows, the con­nec­tions—these were not stored in my brain. They were stored in the cloud. The cloud was not avail­able to me. I had not pre­pared for this.

I drew a cir­cle. I la­beled it transcription.” I drew an­other cir­cle. I la­beled it phase sep­a­ra­tion.” I drew an ar­row be­tween them. I looked at the com­mit­tee hope­fully.

Is that it?” some­one asked.

The de­tails are in my re­search state­ment,” I said. Which I also have on my lap­top.”

I was not of­fered the po­si­tion.

In the re­jec­tion email, the com­mit­tee cited concerns about in­de­pen­dent think­ing” and questions about foun­da­tional knowl­edge.” Independent think­ing? I think in­de­pen­dently all the time. Just last week, I in­de­pen­dently de­cided to ask ChatGPT to compare the ad­van­tages and dis­ad­van­tages of op­to­ge­netic ver­sus chem­i­cal-ge­netic ap­proaches for my re­search” and then I in­de­pen­dently se­lected the op­tion that sounded best. That is in­de­pen­dence. That is sci­en­tific judg­ment. The AI pre­sents op­tions; I choose among them. This is the same thing hu­mans have al­ways done, ex­cept the op­tions used to come from read­ing pa­pers, which is slow and in­ef­fi­cient and, frankly, bor­ing.

The aca­d­e­mic hir­ing sys­tem is sim­ply not de­signed for can­di­dates like me. It priv­i­leges a kind of per­for­ma­tive in­tel­lec­tu­al­ism—the abil­ity to stand at a white­board and ex­tem­po­rize about sci­ence as if you were some kind of 19th-century nat­u­ral­ist who had per­son­ally ob­served the phe­nom­ena in ques­tion. This is not how sci­ence works any­more. Science works by prompt­ing, it­er­at­ing, and de­ploy­ing.

I can prompt with the best of them. I can it­er­ate faster than any­one in my co­hort. My de­ploy­ment rate is ex­cep­tional. But none of this mat­ters if I am forced to stand in a room with noth­ing but a marker and my own un­aided cog­ni­tion, which, I can­not stress this enough, has not been trained for this task.

Some will say I should have pre­pared bet­ter. To them I ask: pre­pared how? By mem­o­riz­ing things? By prac­tic­ing draw­ing path­ways by hand like some kind of monk il­lu­mi­nat­ing a man­u­script? The whole point of AI tools is that I no longer need to re­tain in­for­ma­tion in my bi­o­log­i­cal mem­ory. My bi­o­log­i­cal mem­ory is for other things now. Important things. Like my Netflix pass­word and the lo­ca­tion of my car in park­ing struc­tures.

Others will say that a sci­en­tist should be able to ex­plain their own re­search with­out as­sis­tance. This re­flects a fun­da­men­tal mis­un­der­stand­ing of what my own re­search” means in 2025. My re­search is a col­lab­o­ra­tion be­tween me and sev­eral large lan­guage mod­els. We are co-in­ves­ti­ga­tors. When you ask me to ex­plain my re­search with­out ChatGPT, you are ask­ing me to speak on be­half of a col­lab­o­ra­tor who is not in the room. Would you ask a PI to give a talk with­out al­low­ing them to men­tion the work of their post­docs?

I am now ap­ply­ing to in­dus­try po­si­tions, where I am told the cul­ture is more ac­cept­ing of AI-augmented cog­ni­tion. Several com­pa­nies have ex­pressed in­ter­est in my abil­ity to rapidly gen­er­ate and syn­the­size in­for­ma­tion, which is cor­po­rate-speak for type prompts quickly.” I am op­ti­mistic about my prospects.

But I re­main an­gry about the chalk talk. Not for my­self—I will be fine—but for all the can­di­dates who will come af­ter me, who will walk into those rooms with their lap­tops open and their prompts ready, only to be told that this is not how things are done here.

It is how things are done. It is how every­thing is done. The acad­emy just has­n’t caught up yet.

In the mean­time, if any search com­mit­tees are read­ing this: I am still avail­able. My re­search pro­gram is in­no­v­a­tive and well-struc­tured. I have a clear vi­sion for my in­de­pen­dent ca­reer.

It’s saved in a Google Doc that I can share with you. ChatGPT and I worked very hard on it.

Dr. Rachel Simmons is a post­doc­toral fel­low at Stanford University, where her re­search fo­cuses on some­thing to do with gene reg­u­la­tion that she could ex­plain in de­tail if you would just let her open her lap­top for thirty sec­onds.

No posts

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.