10 interesting stories served every morning and every evening.




1 400 shares, 17 trendiness

Ki Editor

Bridge the gap be­tween cod­ing in­tent and ac­tion: ma­nip­u­late syn­tax struc­tures di­rectly, avoid­ing mouse or key­board gym­nas­tics. Amplify your cod­ing ef­fi­ciency: wield mul­ti­ple cur­sors for par­al­lel syn­tax node op­er­a­tions, rev­o­lu­tion­iz­ing bulk ed­its and refac­tor­ing.Se­lec­tion Modes stan­dard­ize move­ments across words, lines, syn­tax nodes, and more, of­fer­ing un­prece­dented flex­i­bil­ity and con­sis­tency.

...

Read the original on ki-editor.org »

2 347 shares, 17 trendiness

Put the ZIP code first.

A US ZIP code is 5 char­ac­ters. From those 5 char­ac­ters you can de­ter­mine the city, the state, and the coun­try. That’s 3 fields. Autofilled. From one in­put.

But you don’t do that, do you? No. You make me type my street ad­dress, then my city, then scroll through a drop­down of 50 states to find Illinois wedged be­tween Idaho and Indiana, then type my ZIP, then — the pièce de ré­sis­tance — scroll through 200+ coun­tries to find United States, which half the time is filed un­der T” be­cause some dip­shit thought The United States of America” was the cor­rect sort key.

It’s 2026. What the fuck are we do­ing.

I type 90210. You now know I’m in Beverly Hills, California, United States. You did­n’t need me to tell you that. You did­n’t need a drop­down. You did­n’t need me to scroll past Turkmenistan. You had the an­swer the en­tire time, in 5 dig­its, and you just… did­n’t use it.

And here’s the bonus: once you know the ZIP, your street ad­dress au­to­com­plete is search­ing a few thou­sand ad­dresses in­stead of 160 mil­lion. It’s faster. It’s more ac­cu­rate. I type less. You get cleaner data. Everyone wins.

This is not new tech­nol­ogy. Free APIs ex­ist. It’s like 4 lines of code. Look:

const res = await fetch(`https://​api.zip­popotam.us/​us/${​zip}`)

const data = await res.json()

city.value = data.places[0][“place name”]

state.value = data.places[0][“state”]

coun­try.value = United States”

That’s it. That’s the whole thing. You could have shipped this in­stead of read­ing this web­site.

See how that works? See how you typed 5 num­bers and 3 fields filled them­selves in? See how you’re now typ­ing your street ad­dress and it al­ready knows what city you’re in? That’s not magic. That’s a lookup table. We’ve had those since the 1960s.

Tier 1: ZIP at the bot­tom. Street, city, state, ZIP, coun­try. You had the data to aut­ofill 3 fields and you just… put it last. Amazon does this. Target does this. Walmart does this. Basically every­one does this. Billions of col­lec­tive hours of hu­man life, spent scrolling for Illinois.”

Tier 2: No aut­ofill at all. You col­lect the ZIP. You have the ZIP. You do noth­ing with it. The ZIP just sits there in your data­base, in­ert, like a fire ex­tin­guisher in a glass case that says do not break.” What are you sav­ing it for.

Tier 3: The scrol­lable coun­try drop­down. 240 coun­tries. No search. No type-ahead. Just pure, un­fil­tered, al­pha­bet­i­cal scrolling. Bonus points if the US is un­der T.” Extra bonus points if it’s not even al­pha­bet­i­cal. You ab­solute psy­chopaths.

Tier 4: The form that re­sets when you hit back. I filled out 14 fields. Your pay­ment proces­sor failed. I hit back. Everything is gone. My street. My city. My state. My will to live. All of it. Returned to the void. The de­vel­oper re­spon­si­ble for this sleeps eight hours a night. That’s the part that haunts me.

While we’re here:

Invoke the right key­board. If you’re ask­ing for a ZIP code, use in­put­mode=“nu­meric”. It’s one HTML at­tribute. On mo­bile, I should see a num­ber pad, not a full QWERTY key­board. This ap­plies to phone num­bers, credit cards, and any­thing else that’s ob­vi­ously just dig­its. You al­ready know the in­put type. Tell the phone.

Work with aut­ofill, not against it. Browsers have had aut­ofill for over a decade. Use the right au­to­com­plete at­trib­utes — postal-code, ad­dress-line1, coun­try. If your form fights the browser’s aut­ofill, your form is wrong. The browser is try­ing to save your user 45 sec­onds. Let it.

Fine, maybe coun­try first. The purists in the com­ments are tech­ni­cally cor­rect — postal codes aren’t glob­ally unique. You could do coun­try first (pre-filled via IP), then postal code, then let the magic hap­pen. The point was never skip the coun­try field.” The point is: stop mak­ing me type things you al­ready know.

Found a site that puts the ZIP code last? A coun­try drop­down sorted by vibes? A form that makes you cry?

Send it to us →

Put the ZIP code first. Autofill the city. Autofill the state. Autofill the coun­try. Let the user type their street ad­dress last, with au­to­com­plete scoped to their ZIP.

It is a solved prob­lem. The API is free. The code is 5 lines. There is gen­uinely no rea­son not to do this other than the mass in­sti­tu­tional in­er­tia of a mil­lion prod­uct man­agers copy-past­ing the same ad­dress form tem­plate from 2009 and never once ask­ing wait, why is the ZIP code at the bot­tom?”

Why is the ZIP code at the bot­tom?

Put it first, you an­i­mals.

Tweet this ·

Post to HN ·

Copy link

Share this be­fore you have to fill out an­other ad­dress form.

...

Read the original on zipcodefirst.com »

3 313 shares, 17 trendiness

Merkley, Klobuchar Launch New Effort to Ban Federal Elected Officials Profiting from Prediction Markets

Effort comes af­ter re­ports of in­di­vid­u­als sus­pi­ciously earn­ing mas­sive pay­outs be­fore Iran Strikes, Venezuela Military Actions

Washington, D. C. — Today, Oregon’s U.S. Senator Jeff Merkley and Minnesota’s U.S. Senator Amy Klobuchar launched a new ef­fort to pre­vent gov­ern­ment of­fi­cials at the high­est lev­els from en­gag­ing in pre­dic­tion mar­kets, crack­ing down on the po­ten­tial for any in­sider trad­ing.

Following mul­ti­ple pub­lic re­ports on the grow­ing in­flu­ence of pre­dic­tion mar­kets and their po­ten­tial for cor­rup­tion, Merkley and Klobuchar in­tro­duced the End Prediction Market Corruption Act—a new bill to ban the President, Vice President, Members of Congress, and other pub­lic of­fi­cials from trad­ing event con­tracts. The bill will en­sure that fed­eral elected of­fi­cials main­tain their oath of of­fice to serve the peo­ple by pre­vent­ing them from trad­ing on in­for­ma­tion that they gained through their role.

When pub­lic of­fi­cials use non-pub­lic in­for­ma­tion to win a bet, you have the per­fect recipe to un­der­mine the pub­lic’s be­lief that gov­ern­ment of­fi­cials are work­ing for the pub­lic good, not for their own per­sonal prof­its,” said Merkley. Perfectly timed bets on pre­dic­tion mar­kets have the un­mis­tak­able stench of cor­rup­tion. To pro­tect the pub­lic in­ter­est, Congress must step up and pass my End Prediction Market Corruption Act to crack down on this bad bet for democ­racy.”

At the same time that pre­dic­tion mar­kets have seen huge growth, we have seen in­creas­ing re­ports of mis­con­duct. This leg­is­la­tion strength­ens the Commodity Futures Trading Commission’s abil­ity to go af­ter bad ac­tors and pro­vides rules of the road to pre­vent those with con­fi­den­tial gov­ern­ment or pol­icy in­for­ma­tion from ex­ploit­ing their ac­cess for fi­nan­cial gain,” said Klobuchar.

Merkley and Klobuchar’s End Prediction Market Corruption Act is cosponsored by U. S. Senators Chris Van Hollen (D-MD), Adam Schiff (D-CA), and Kirsten Gillibrand (D-NY).

Their bill is sup­ported by Public Citizen, Citizens for Responsibility and Ethics in Washington (CREW), and Project On Government Oversight (POGO).

The American peo­ple de­serve un­wa­ver­ing eth­i­cal stan­dards from their gov­ern­ment of­fi­cials. Officials have a re­spon­si­bil­ity to avoid not only ac­tual con­flicts of in­ter­est but even the ap­pear­ance of im­pro­pri­ety. POGO is pleased to en­dorse the End Prediction Market Corruption Act, which will fur­ther pro­hibit cov­ered gov­ern­ment of­fi­cials from ex­ploit­ing non­pub­lic in­for­ma­tion for per­sonal gain in pre­dic­tion mar­kets,” said Janice Luong, Policy Associate for the Project On Government Oversight (POGO).

It is now more im­por­tant than ever that pre­dic­tion mar­kets be gov­erned by eth­i­cal con­straints, es­pe­cially when it comes to bets placed by gov­ern­men­tal of­fi­cials. Sen. Merkley’s leg­is­la­tion would ap­pro­pri­ately pro­hibit key gov­ern­ment of­fi­cials from buy­ing or sell­ing on the pre­dic­tion mar­kets con­tracts in which they could have in­sider in­for­ma­tion on changes in the mar­ket. Public Citizen heartily en­dorses this bill,” said Craig Holman, Ph. D., Public Citizen.

The rapid rise of re­tail pre­dic­tion mar­kets cre­ates the risk that of­fi­cials across the gov­ern­ment could use non­pub­lic in­for­ma­tion to trade on and profit off event con­tracts,” said Debra Perlin, Vice President of Policy of Citizens for Responsibility and Ethics in Washington (CREW). “The American people must be able to trust that their gov­ern­ment of­fi­cials are work­ing on their be­half rather than for per­sonal gain. Senator Merkley’s leg­is­la­tion rep­re­sents a vi­tal step for­ward to en­sure that those in po­si­tions of power, in­clud­ing se­nior ex­ec­u­tive branch of­fi­cials and mem­bers of Congress, can­not abuse their ac­cess to non­pub­lic in­for­ma­tion in or­der to profit.”

Merkley has been a long-time leader in the push to end pub­lic cor­rup­tion. He has led the charge to crack down on elec­tion gam­bling and dark money in pol­i­tics, pre­vent law­mak­ers from trad­ing stocks, and ban cryp­tocur­rency-re­lated cor­rup­tion by elected of­fi­cials at the high­est lev­els of the fed­eral gov­ern­ment.

Full text of the End Prediction Market Corruption Act can be found by click­ing here.

...

Read the original on www.merkley.senate.gov »

4 296 shares, 14 trendiness

The yoghurt delivery women combatting loneliness in Japan

As lone­li­ness deep­ens in one of the world’s fastest-age­ing na­tions, a net­work of women de­liv­er­ing pro­bi­otic milk drinks has be­come a vi­tal source of rou­tine, con­nec­tion and care.

A woman in a neat navy suit and pow­der-blue shirt cy­cles pur­pose­fully down a quiet res­i­den­tial street in Tokyo. It’s 08:30 but al­ready balmy, and she’s grate­ful for the match­ing vi­sor that shields her eyes from the sum­mer sun.

She ar­rives at her first stop, parks her bike and knocks on the door of a small wooden house with pot­ted plants flank­ing the en­trance. Inside, an el­derly woman waits. Her face breaks into a broad smile as she opens the door — she has been ex­pect­ing this visit.

Japan is the world’s most rapidly age­ing ma­jor econ­omy. Nearly 30% of its pop­u­la­tion is now over 65, and the num­ber of el­derly peo­ple liv­ing alone con­tin­ues to rise. As fam­i­lies shrink and tra­di­tional multi-gen­er­a­tional house­holds de­cline, iso­la­tion has be­come one of the coun­try’s most press­ing so­cial chal­lenges.

The suited woman is a Yakult Lady — one of tens of thou­sands across Japan who de­liver the epony­mous pro­bi­otic drinks di­rectly to peo­ple’s homes. On pa­per they’re de­liv­ery work­ers, but in prac­tice they’re part of the coun­try’s in­for­mal so­cial safety net. In a coun­try grap­pling with a rapidly age­ing pop­u­la­tion and a deep­en­ing lone­li­ness cri­sis, Yakult Ladies have be­come an un­likely source of com­mu­nity, help­ing to re­duce the prob­lem of iso­la­tion one drop-off at a time.

With their dis­tinc­tive squat plas­tic bot­tles and shiny red caps, Yakult pi­o­neered a genre. The pro­bi­otic drink was launched in Japan 90 years ago — long be­fore microbiome” be­came com­mon par­lance. But to­day, the women who de­liver them are as im­por­tant to the brand’s iden­tity as the prod­uct it­self.

...

Read the original on www.bbc.com »

5 284 shares, 18 trendiness

0x0mer/CasNum

CasNum (Compass and straight­edge Number) is a li­brary that im­ple­ments ar­bi­trary pre­ci­sion arith­metic us­ing

com­pass and straight­edge con­struc­tions. Arbitrary pre­ci­sion arith­metic, now with 100% more Euclid. Featuring a func­tional mod­i­fied Game Boy em­u­la­tor where every ALU op­code is im­ple­mented en­tirely through geo­met­ric con­struc­tions.

This pro­ject be­gan with a sim­ple com­pass-and-straight­edge engine’, which can be found un­der the di­rec­tory cas/. In com­pass-and-straight­edge con­struc­tions, one start with just two points: The ori­gin, and a unit. Exactly as God in­tended. The en­gine then al­lows us to do what the an­cients did:

* Construct the line through two points

* Construct the cir­cle that con­tains one point and has a cen­ter at an­other point

* Construct the point at the in­ter­sec­tion of two (non-parallel) lines

* Construct the one or two points in the in­ter­sec­tion of a line and a cir­cle (if they in­ter­sect)

* Construct the one point or two points in the in­ter­sec­tion of two cir­cles (if they in­ter­sect) (Which, by the way turns out to be a nasty 4th de­gree equa­tion. Check out the for­mula in cir­cle.py, over 3600 char­ac­ters, yikes. Good thing we have WolframAlpha).

These five con­struc­tions are con­sid­ered the ba­sic com­pass and straight­edge con­struc­tions. Think of these as your ISA.

On top of the com­pass-and-straight­edge en­gine, we have the CasNum class. In CasNum, a num­ber x is rep­re­sented as the point (x,0) in the plane. Now, the fun part: im­ple­ment­ing all arith­metic and log­i­cal op­er­a­tions. We can con­struct the ad­di­tion of two points by find­ing the mid­point be­tween them and dou­bling it, which are both stan­dard com­pass-and-straight­edge con­struc­tions. Then, we can build the prod­uct and quo­tient of num­bers us­ing tri­an­gle sim­i­lar­ity. The log­i­cal op­er­a­tions (AND, OR, XOR) are a lit­tle uglier, since they are not a clean al­ge­braic op­er­a­tion” in the rel­e­vant sense, but, hey, it works right?

What I thought was pretty neat is that im­ple­ment­ing all this from scratch leaves a lot of room for op­ti­miza­tion. For ex­am­ple, mul­ti­pli­ca­tion by 2 can be im­ple­mented much more ef­fi­ciently than the generic al­go­rithm for mul­ti­pli­ca­tion us­ing tri­an­gle sim­i­lar­ity. Then, im­ple­ment­ing mod­ulo by first re­mov­ing the high­est power of two times the mod­u­lus from the div­i­dend yielded much bet­ter re­sults than the naive im­ple­men­ta­tion.

* Integrate into the ALU of a Game Boy em­u­la­tor, thus ob­tain­ing a Game Boy that arith­meti­cally and log­i­cally runs solely on com­pass and straight­edge con­struc­tions

The first two ex­am­ples were ac­tu­ally im­ple­mented and can be found un­der the ex­am­ples/ di­rec­tory. So ap­par­ently one can­not square the cir­cle

us­ing a com­pass and a straight­edge, but at least one can run Pokémon Red. Man, I’m sure the an­cient Greeks would have loved to see this.

Thanks to the great code writ­ten by PyBoy, in­te­grat­ing CasNum within it was pretty seam­less. The only file I needed to edit was op­codes_­gen.py, and the edit was pretty min­i­mal.

As al­ways, please save any im­por­tant work be­fore run­ning any­thing I ever write.

To clone the repo, and in­stall re­quire­ments:

git clone –recursive git@github.com:0x0mer/CasNum.git

cd CasNum

pip in­stall -r re­quire­ments.txt

You can run the rsa and ba­sic ex­am­ples from the re­po’s root di­rec­tory like so:

python3 -m ex­am­ples.ba­sic

python3 -m ex­am­ples.rsa

The li­brary comes with a viewer (casnum/cas/viewer.py) that shows the com­pass and straight­edge con­struc­tions. It has an au­to­matic zoom that kinda works, but it goes crazy in the rsa ex­am­ple, so you may want to use man­ual zoom there.

In or­der to run PyBoy, first you need a ROM. In or­der to avoid copy­right in­fringe­ment, I in­cluded the ROM for 2048, free to dis­trib­ute un­der the zlib li­cense. But if, for ex­am­ple, the ROM you have is Pokemon.gb’, then you can place it in ex­am­ples/​Py­boy and run:

cd ex­am­ples/​Py­Boy

pip in­stall -r re­quire­ments.txt

PYTHONPATH=../.. python

Then, once in python, run:

from py­boy im­port PyBoy

from cas­num im­port viewer

viewer.start()

py­boy = PyBoy(‘2048.gb’) # Or what­ever ROM you have

while py­boy.tick():

pass

py­boy.stop()

the viewer.start() just dis­plays the com­pass-and-straight­edge con­struc­tions, it is not strictly needed, but it is fun.

Notice how­ever that the first run of Pokemon on the Game Boy em­u­la­tor takes ap­prox­i­mately 15 min­utes to boot, so play­ing it may re­quire some­what in­creased pa­tience. You see, Euclid would­n’t have op­ti­mized the Game Boy boot screen. He would have spent those 15 min­utes in silent ap­pre­ci­a­tion, think­ing, Yeah. That’s about how long that should take.”

After run­ning it once, most cal­cu­la­tions should al­ready be cached if you run it from the same python in­ter­preter in­stance, so on the sec­ond run you should be able to get a de­cent 0.5~1 FPS, which is to­tally al­most playable.

Most mod­ern de­vel­op­ers are con­tent with a + b. They don’t want to work for it. They don’t want to see the mid­point be­ing birthed from the in­ter­sec­tion of two cir­cles.

CasNum is for the de­vel­oper who be­lieves that if you did­n’t have to solve a 4th-degree poly­no­mial just to in­cre­ment a loop counter, you did­n’t re­ally in­cre­ment it.

Python’s lru_­cache is used to cache al­most any cal­cu­la­tion done in the li­brary, as every­thing is so ex­pen­sive. Memory us­age may blow up, run at your own risk.

* py­glet (optional but highly rec­om­mended. Only needed if you want to dis­play the

com­pass-and-straight­edge con­struc­tions)

* pytest-lazy-fix­tures (Only needed in or­der to run the tests)

* py­cryptodome (Only needed if you want to run the rsa ex­am­ple)

A: It can’t re­ally run” any­thing, its a num­ber.

A: Define fast”. If you mean faster than copy­ing Euclid by hand”, then yes, dra­mat­i­cally.

Q: Why did you make this?

A: I wanted ar­bi­trary pre­ci­sion arith­metic, but I also wanted to feel some­thing.

The code in the root of this repos­i­tory is li­censed un­der the MIT License.

This pro­ject in­cor­po­rates the fol­low­ing third-party ma­te­ri­als:

PyBoy (Modified): Located in ./examples/PyBoy/. Distributed un­der the GNU Lesser General Public License (LGPL) v3.0.

Notice of Modification: This ver­sion of PyBoy has been mod­i­fied from the orig­i­nal source code to use the CasNum li­brary in­stead of Python’s int.

The orig­i­nal, un­mod­i­fied source code for PyBoy can be found at: https://​github.com/​Baekalfen/​Py­Boy.

The full LGPL li­cense text is avail­able in ./examples/PyBoy/License.md.

* Notice of Modification: This ver­sion of PyBoy has been mod­i­fied from the orig­i­nal source code to use the CasNum li­brary in­stead of Python’s int.

* The orig­i­nal, un­mod­i­fied source code for PyBoy can be found at: https://​github.com/​Baekalfen/​Py­Boy.

* The full LGPL li­cense text is avail­able in ./examples/PyBoy/License.md.

2048.gb: This Game Boy ROM bi­nary is dis­trib­uted un­der the zlib License.

Disclaimer: This soft­ware is pro­vided as-is’, with­out any ex­press or im­plied war­ranty. In no event will the au­thors be held li­able for any dam­ages aris­ing from the use of this soft­ware.

* Disclaimer: This soft­ware is pro­vided as-is’, with­out any ex­press or im­plied war­ranty. In no event will the au­thors be held li­able for any dam­ages aris­ing from the use of this soft­ware.

...

Read the original on github.com »

6 248 shares, 22 trendiness

Cloud VM benchmarks 2026

Time for the (not ex­actly) yearly cloud com­pute VM com­par­i­son. I started test­ing back in October 2025, but the bench­mark­ing scope was in­creased, not just due to more VM fam­i­lies tested (44), but also due to test­ing the in­stances over more re­gions to at­tain a pos­si­ble range of per­for­mance, as in many cases not all in­stances are cre­ated equal. I will not spoil much if I tell you that there is one new CPU that dom­i­nates the top-end re­sults more clearly than any pre­vi­ous year.

Like last time, this is all about generic CPU per­for­mance and es­pe­cially what you can ac­tu­ally get per $ spent on com­pute VM in­stances. Due to the fo­cus on CPU work­loads, burstable in­stances are not in­cluded. Single-thread per­for­mance is eval­u­ated sep­a­rately, as there are al­ways work­loads that can­not be fur­ther par­al­lelized. For multi-thread, each in­stance type is tested in a 2vCPU con­fig­u­ra­tion which is usu­ally the min­i­mum unit you can or­der (it cor­re­sponds to a sin­gle core for SMT-enabled sys­tems, like all Intel and most AMD). The more threads your work­load can uti­lize, the more mul­ti­ples of that unit you can or­der.

The com­par­i­son should help you max­i­mize per­for­mance or price de­pend­ing on your re­quire­ments, by ei­ther us­ing the op­ti­mal VM types of your provider, or per­haps by launch­ing on a dif­fer­ent provider.

If you don’t need all the de­tails, you can use the TOC be­low to jump to what’s rel­e­vant to you.

I kept the same 7 providers as last year (which was down from my max 10 providers from the 2023 com­par­i­son), but ex­panded to 44 VM types tested.

New CPUs: AMD EPYC Turin (whose per­for­mance I had ex­plored sep­a­rately) and Intel Granite Rapids are avail­able on the x86 front, while sev­eral new ARM so­lu­tions are tested: Google Axion (also ex­plored sep­a­rately last year), Azure Cobalt 100 and Ampere AmpereOne M.

More test­ing: Some ex­tra bench­marks added. More test­ing across re­gions. In the past I only fo­cused on that for small providers, but the big-three have also shown in­con­sis­tency, so the main per­for­mance and per­for­mance/​price num­bers will show a range.

As men­tioned, I will fo­cus on 2x vCPU in­stances, as that’s the min­i­mum scal­able unit for a mean­ing­ful com­par­i­son (and gen­er­ally min­i­mum for sev­eral VM types), given that most AMD and Intel in­stances use Hyper-Threading (HT) / Simultaneous Multi Threading (SMT). So, for those sys­tems a vCPU is a Hyper-Thread, or half a core, with the 2x vCPU in­stance giv­ing you a full core with 2 threads. This will be­come clear in the scal­a­bil­ity sec­tion.

I am skip­ping some very old in­stance types that are ob­vi­ously un­com­pet­i­tive. I am still try­ing to con­fig­ure at 2GB/vCPU of RAM (which is vari­ably con­sid­ered as compute-optimized”, or general-purpose”) and 30GB SSD (not high-IOPS) boot disk for the price com­par­i­son to make sense (exceptions will be noted).

The pay-as-you-go/​on-de­mand prices re­fer to the lower cost re­gion in the US (or Europe). For providers with vari­able pric­ing, cheap­est re­gions are al­most al­ways in the US. Unlike last year, I will not in­clude the 100% sus­tained dis­counts for GCP, as they are not tech­ni­cally on-de­mand so I may have been un­fair to other providers.

For providers that of­fer 1 year and 3 year com­mit­ted/​re­served dis­counted prices, the no-down­pay­ment price was listed with that op­tion. The prices were valid for January 2026 - please check for cur­rent prices be­fore mak­ing fi­nal de­ci­sions.

As a guide, here is an overview of the var­i­ous gen­er­a­tions of AMD, Intel and ARM CPUs from older (top) to newer (bottom), roughly grouped hor­i­zon­tally in per-core per­for­mance tiers, based on this and the pre­vi­ous com­par­i­son re­sults:

This should im­me­di­ately give you an idea of roughly what per­for­mance tier to ex­pect based on the CPU type alone, with the im­por­tant note that for SMT-enabled in­stances you get a sin­gle core for every 2x vC­PUs.

A gen­eral tip is that you should avoid old CPU gen­er­a­tions, as due to their lower ef­fi­ciency (higher run­ning costs) the cloud providers will ac­tu­ally charge you more for less per­for­mance. I will even not in­clude types that were al­ready too old to pro­vide good value last year, to fo­cus on the more rel­e­vant prod­ucts.

Amazon Web Services (AWS) pretty much orig­i­nated the whole cloud provider” busi­ness - even though smaller con­nected VM providers pre­dated it sig­nif­i­cantly (e.g. Linode comes to mind) - and still dom­i­nates the mar­ket. The AWS plat­form of­fers ex­ten­sive ser­vices, but, of course, we are only look­ing at their Elastic Cloud (EC2) VM of­fer­ings for this com­par­i­son.

There are 2 new CPUs in­tro­duced since last year. Intel’s Granite Rapids makes an ap­pear­ance, while the AMD EPYC Turin-powered C8a fol­lows the pre­vi­ous C7a in hav­ing SMT dis­abled (providing a full core per vCPU). I don’t want to spoil much, but if you take the fastest CPU by a mar­gin, and dis­able SMT, ex­pect some im­pres­sive per-2vCPU” re­sults…

With EC2 in­stances you gen­er­ally know what you are get­ting (instance type cor­re­sponds to spe­cific CPU), al­though there’s a mul­ti­tude of ways to pay/​re­serve/​pre­pay/​etc which makes pric­ing very com­pli­cated, and pric­ing fur­ther varies by re­gion (I used the low­est cost US re­gions). In the 1Y/3Y re­served prices listed, there is no pre­pay­ment in­cluded - you can lower them a bit fur­ther if you do pre­pay. The spot prices vary even more, both by re­gion and are up­dated of­ten (especially for newly in­tro­duced types), so you’d want to keep track of them.

* min_cpu_­plat­form needs to be set to get tested CPU.

** Extrapolated 2x vCPU in­stance - type re­quires 4x vCPU min­i­mum size.

The GCP Platform (GCP) fol­lows AWS quite closely, pro­vid­ing mostly equiv­a­lent ser­vices, but lags in mar­ket share (3rd place, af­ter Microsoft Azure). We are look­ing at the Google Compute Engine (GCE) VM of­fer­ings, which is one of the most in­ter­est­ing in re­spect to con­fig­ura­bil­ity and range of dif­fer­ent in­stance types. However, this va­ri­ety makes it harder to choose the right one for the task, which is ex­actly what prompted me to start bench­mark­ing all the avail­able types. To add ex­tra con­fu­sion, some types may come with an older (slower) CPU if you don’t set min_cpu_­plat­form to the lat­est avail­able for the type - so you need the ex­tra con­fig­u­ra­tion to get a faster ma­chine for the same price.

This year, we have the ad­di­tion of the AMD EPYC Turin (c4d and n4d), they are not yet in all re­gions/​zones, but avail­abil­ity is ex­pand­ing. We also had the in­tro­duc­tion of two Intel-based 4th gen in­stances (n4 and c4). They both fea­ture Emerald Rapids, how­ever the lat­ter can be con­fig­ured with a lo­cal SSD, in which case they come with the newer Intel Granite Rapids. Until GCP al­lows set­ting min_cpu_­plat­form to Granite Rapids (they are think­ing about it AFAIK), you have to pay for the ex­tra SSD to get the per­for­mance. Last year I cov­ered sep­a­rately the in­tro­duc­tion of the Google Axion - pow­ered c4a ARM type, but it is on a full VM com­par­i­son for the first time.

At this point, I should men­tion that the rea­son I did more ex­ten­sive test­ing this year across dif­fer­ent re­gions is the dis­ap­point­ing per­for­mance of Emerald Rapids in prac­tice, com­pared to its show­ing on my orig­i­nal bench­marks. It seems that as it started to get used, ex­hib­ited a per­for­mance vari­ance that looks con­sis­tent with boost be­hav­ior + node con­tention (i.e. more sen­si­tive to noisy neigh­bors). I sus­pect this is why GCP of­fers the op­tion to turn boost clock off in Emerald Rapids in­stances for consistent per­for­mance”.

GCP prices vary per re­gion and fea­ture some strange pat­terns. For ex­am­ple when you re­serve, t2d in­stances which give you a full AMD EPYC core per vCPU and n2d in­stances which give you a Simultaneous Multi-Thread (i.e. HALF a core) per vCPU have the same price per vCPU, but n2d is cheaper on de­mand and gets a 20% dis­count for sus­tained monthly use.

Note that c3, c3d and c4-lssd types have a 4x vCPU min­i­mum. This breaks the price com­par­i­son, so I am ex­trap­o­lat­ing to a 2x vCPU price (half the cost of CPU/RAM + full cost of 30GB SSD). GCP gives you the op­tion to dis­able cores (you se­lect visible” cores), so while you have to pay for 4x vCPU min­i­mum, you can still run bench­marks on a 2x vCPU in­stance for a fair com­par­i­son.

Azure is the #2 over­all Cloud provider and, as ex­pected, it’s the best choice for most Microsoft/Windows-based so­lu­tions. That said, it does of­fer many types of Linux VMs, with quite sim­i­lar abil­i­ties as AWS/GCP. The var­i­ous types are not easy to use as on AWS/GCP though, for some rea­son even en­ter­prise ac­counts start with zero quota on many types, so I had to re­quest quota in­creases to even test tiny in­stances.

The v6 in­stances are new for the com­par­i­son, fea­tur­ing AMD EPYC Genoa, Intel Emerald Rapids and Azure’s own Cobalt 100 ARM CPU.

The Azure pric­ing is at least as com­plex as AWS/GCP, plus the pric­ing tool seems worse. They also lag be­hind the other two ma­jor providers in CPU re­leases - Turin and Granite Rapids are still in closed pre­view at the time of writ­ing this.

Oracle Cloud Infrastructure (OCI) was the biggest sur­prise in my 2023 com­par­i­son test. It was a pleas­ant sur­prise, not only does Oracle of­fer by far the most gen­er­ous free tier (credits for the A1 type ARM VM cred­its equiv­a­lent to sus­tained 4x vCPU, 24GB RAM, 200GB disk for free, for­ever), their paid ARM in­stances were the best value across all providers - es­pe­cially for on-de­mand. The free re­sources are enough for quite a few hobby pro­jects - they would cost you well over $100/month in the big-3 providers.

Note that reg­is­tra­tion is a bit dra­con­ian to avoid abuse, make sure you are not on a VPN and also don’t use or­a­cle any­where in the email ad­dress you use for reg­is­tra­tion. You start with a free” ac­count, which gives you ac­cess to a lim­ited se­lec­tion of ser­vices and apart from the free-tier el­i­gi­ble A1 VMs, you’ll strug­gle to build any other types with the free credit you get at the start.

Upgrading to a reg­u­lar paid ac­count (which still gives you the free tier cred­its), you get a se­lec­tion of VMs. New this year are the AMD EPYC Turin Standard. E6 VMs and the next gen­er­a­tion ARM Standard.A4 type pow­ered by the AmpereOne M CPU. If you re­call from last year, the AmpereOne A2 in­stances were slower in quite a few tasks than the older Altra A1. Ampere re­ally needed a step for­ward, and AmpereOne M (A4) fi­nally de­liv­ers mean­ing­ful gains in this year’s dataset. I had trou­ble build­ing older-gen AMD in­stances, so in the end I did not in­clude them. I also could only build Standard.A4 in one re­gion (Ashburn), even though I tried in Phoenix which Oracle had in the avail­abil­ity list, to no avail.

Oracle Cloud’s prices are the same across all re­gions, which is nice. They do not of­fer any re­served dis­counts, but do of­fer a 50% dis­count for pre­emptible (spot) in­stances. One com­pli­ca­tion is that their prices are per Oracle CPU (OCPU). This seemed to make sense orig­i­nally, as it cor­re­sponded to phys­i­cal cores - the A1 in­stances had 1 OCPU per core, so 1 OCPU = 1 vCPU, while SMT x86 had 1 OCPU = 2 vCPU (threads). But then, pos­si­bly think­ing that their users are get­ting com­fort­able with it, they threw a wrench by mak­ing 1 OCPU for newer (still non-SMT) ARM types A2 and A4 be equal to 2 vCPU / 2 full Cores. I can’t think of a rea­son for this other than to con­fuse their cus­tomers.

Linode, the ven­er­a­ble cloud provider (predating AWS by sev­eral years), has now been part of Akamai for a few years.

From the pre­vi­ous years we saw that their shared core types (“Linodes”) are the best bang for buck, but it de­pends on what CPU you are as­signed on cre­ation. It seems that cur­rently the most com­mon con­fig­u­ra­tion fea­tures an AMD EPYC Milan. I tried to build quite a few and that’s what you usu­ally get (if you man­age to build an an­cient Intel or AMD Rome, try again), I did not see any newer CPUs pop up. The lat­est EPYC Turin though is avail­able as a ded­i­cated CPU in­stance. They now mark ded­i­cated in­stances with their gen­er­a­tion, so a G8 should al­ways be the same CPU. As al­ways, the ded­i­cated in­stances come with SMT, so you are nor­mally get­ting a core per 2 vC­PUs, while the shared in­stances are vir­tual cores, so twice the vC­PUs gives you twice the multi-thread per­for­mance - the caveat is that per­for­mance per thread varies de­pend­ing on how busy the node that holds your VM is.

It is a bit of an an­noy­ance that with­out test­ing your VM af­ter cre­ation you can’t be sure of what per­for­mance to ex­pect, un­less you go for the more ex­pen­sive ded­i­cated VMs, but oth­er­wise, Akamai/Linode is still easy to set up and main­tain and has fixed, sim­ple pric­ing across re­gions.

DigitalOcean was close to the top of the perf/​value charts a few years ago, pro­vid­ing the best value with their shared CPU Basic droplets”. I am ac­tu­ally us­ing DigitalOcean droplets to help out by host­ing a free weather ser­vice called 7Timer, so feel free to use my af­fil­i­ate link to sign up and get $200 free - you will help with the free pro­jec­t’s host­ing costs if you end up us­ing the ser­vice be­yond the free pe­riod. Apart from value, I chose them for the sim­plic­ity of setup, de­ploy­ment, snap­shots, back­ups.

However, they seem to have stopped up­grad­ing their fleet for quite a while now, so you end up with some very old CPUs. If you don’t mind the low per-thread per­for­mance, they are still not a bad value, given the low prices. I like their sim­ple, re­gion-in­de­pen­dent and sta­ble pric­ing struc­ture, but I wish they would up­grade their shared core data cen­ters.

Hetzner is a quite old German data cen­ter op­er­a­tor and web host, with a very bud­get-friendly pub­lic cloud of­fer­ing. They are of­ten rec­om­mended as a re­li­able ex­tra-low-bud­get so­lu­tion, and I’ve had much bet­ter luck with them than other sim­i­lar providers.

On the sur­face, their prices seem to be just a frac­tion of those of the larger providers, so I did ex­tended bench­mark runs over days to make sure there is no sig­nif­i­cant over­sub­scrib­ing - ex­cept per­haps the cheap­est vari­ant (CX23). Only the CCX13 claims ded­i­cated cores. Ironically, those ded­i­cated in­stances vary sig­nif­i­cantly in per­for­mance de­pend­ing on which data cen­ter you cre­ate them in. In the end, the CPX22 (AMD) and CAX11 (ARM) shared core in­stances are the most sta­ble in per­for­mance across in­stances and re­gions.

Note that the cheap shared-core types are not widely avail­able, not found in the US re­gions and they even show no avail­abil­ity at times in the European re­gions. And while I in­cluded a CX23 with EPYC Rome, you will nor­mally get a slower Skylake. I will not in­clude the shared in­stances in the price/​per­for­mance charts this time around, as I am think­ing that the lim­ited avail­abil­ity does not make them equal con­tenders.

In or­der to have much more test runs, I stream­lined the test suite into a docker im­age which you can run your­self. Almost all in­stances were on 64bit Debian 13, al­though I had to use Ubuntu 24.04 on a cou­ple, and Oracle’s ARM were only com­pat­i­ble with Oracle Linux. To run the en­tire suite on a sys­tem with docker you would do:

As every year, the main weight is on my my own bench­mark suite, which you can now also run on its own docker im­age. It has both proven very good at ap­prox­i­mat­ing real-world per­for­mance dif­fer­ences in the type of work­loads we use at SpareRoom, and is also good at com­par­ing sin­gle and multi-threaded per­for­mance (with scal­ing to hun­dreds of threads if needed). To run DKbench by it­self on a sys­tem with docker:

I cre­ated mul­ti­ple in­stances in dif­fer­ent re­gions and recorded min and max of all runs (both sin­gle-thread and dual-thread).

I have kept Geekbench, both be­cause it can help you com­pare re­sults from pre­vi­ous years and be­cause Geekbench 6 seems to be much worse - es­pe­cially in multi-threaded test­ing (I’d go as far to say it looks bro­ken to me).

I sim­ply kept the best of 2 runs, you can browse the re­sults here. There’s an Arm ver­sion too at https://​cdn.geek­bench.com/​Geek­bench-5.4.0-Lin­uxARM­Pre­view.tar.gz.

Apart from be­ing pop­u­lar, Phoronix bench­marks can help bench­mark some spe­cific things (e.g. AVX512 ex­ten­sions) and also re­sults are openly avail­able.

Very com­mon ap­pli­ca­tion and very com­mon bench­mark - av­er­age com­pres­sion/​de­com­pres­sion scores are recorded.

Select op­tion 1. This bench­mark uses SSE/AVX up to AVX512, which might be im­por­tant for some peo­ple. Older CPUs that lack the lat­est ex­ten­sions are at a dis­ad­van­tage.

Blender’s Big Buck Bunny video was transcoded to an H264 mp4 via FFmpeg, both in sin­gle and dual-thread mode.

The raw re­sults can be ac­cessed on this spread­sheet (or here for the full Geekbench re­sults).

In the graphs that fol­low, the y-axis lists the names of the in­stances, with the CPU type in paren­the­sis:

Single-thread per­for­mance can be cru­cial for many work­loads. If you have highly par­al­leliz­able tasks you can add more vC­PUs to your de­ploy­ment, but there are many com­mon types of tasks where that is not al­ways a so­lu­tion. For ex­am­ple, a web server can be scaled to ser­vice any num­ber of re­quests in par­al­lel, how­ever the vCPU’s thread speed de­ter­mines the min­i­mum re­sponse time of each re­quest.

We start with the lat­est DKbench, run­ning the 19 de­fault bench­marks (Perl & C/XS) which cover a va­ri­ety of com­mon server work­loads. I tried to build 2-3 in­stances at dif­fer­ent times across at least 3 re­gions (if the provider al­lowed), to get a min/​max range of per­for­mance. Here are the re­sults for sin­gle thread:

I think it’s the first time in my se­ries of com­par­isons where a CPU had this clear of a per­for­mance lead. AMDs EPYC Turin is sim­ply a tier above any­thing else. AWS has the fastest setup with that CPU, while GCPs more ex­pen­sive C4d seems to vary a lot in per­for­mance when their own, cheaper N4d gave more con­sis­tent re­sults. Overall, if you are look­ing for max­i­mum per­for­mance per thread, EPYC Turin seems to be the an­swer if your cloud provider has it.

In the 2024 com­par­i­son Intel Emerald Rapids did quite well, but it turns out that is only on non-busy nodes, where the cpu al­lows for a gen­er­ous boost - at least for GCP. This is re­flected as the range you see on the graph. The new Granite Rapids seems to fix this, pro­vid­ing a bit higher, but mainly more sta­ble per­for­mance. So, a solid step for­ward from Intel, it’s just that Turin is re­ally im­pres­sive.

As we are wait­ing for AWS to re­lease Graviton5 pub­licly, GCPs Axion is the leader for ARM so­lu­tions, im­pres­sively of­fer­ing EPYC Genoa-level per­for­mance per thread. I tested Azure’s own Cobalt 100 for the first time - it sits be­tween Graviton3 and Graviton4 per­for­mance. Ampere’s new AmpereOne M fi­nally of­fers some tan­gi­ble im­prove­ment over the ag­ing Altra, but only matches AWSs older Graviton3.

Lastly, among the lower-cost providers, DigitalOcean has lagged be­hind in per­for­mance, sig­nal­ing that their fleet is due for an up­grade. Both Akamai and Hetzner of­fer some fast Milan in­stances, al­though for both providers you are not guar­an­teed what per­for­mance level you are go­ing to get when cre­at­ing an in­stance - there is the vari­a­tion shown in the chart. It’s not over­sub­scrib­ing, the per­for­mance is sta­ble, it’s just that groups of servers are setup dif­fer­ently.

DKbench runs the bench­mark suite sin­gle-threaded and multi-threaded (2 threads in this com­par­i­son as we use 2x vCPU in­stances) and cal­cu­lates a scal­a­bil­ity per­cent­age. The bench­mark ob­vi­ously uses highly par­al­leliz­able work­loads (if that’s not what you are run­ning, you’d have to rely more on the sin­gle-thread bench­mark­ing). In the fol­low­ing graph 100% scal­a­bil­ity means that if you run 2 par­al­lel threads, they will both run at 100% speed com­pared to how they would run in iso­la­tion. For sys­tems where each vCPU is 1 core (e.g. all ARM sys­tems), or for shared” CPU sys­tems where each vCPU is a thread among a shared pool, you should ex­pect scal­a­bil­ity near 100% - what is run­ning on one vCPU should not af­fect the other when it comes to CPU-only work­loads.

Most Intel/AMD sys­tems though give you a sin­gle core that has 2x threads (Hyper-Threads / HT in Intel lingo - or Simultaneous Multi Threads / SMT if you pre­fer) as a 2x vCPU unit. Those will give you scal­a­bil­ity well be­low 100%. A 50% scal­a­bil­ity would mean you have the equiv­a­lent of just 1x vCPU, which would be very dis­ap­point­ing. Hence, the far­ther up you are from 50%, the more per­for­mance your 2x vC­PUs give you over run­ning on a sin­gle vCPU.

As ex­pected, the ARM and shared CPUs are near 100%, i.e. you are get­ting twice the mul­ti­threaded per­for­mance go­ing from 1x to 2x vC­PUs. You also get that from three x86 types: AWSs Genoa C7a and Turin C8a along­side GCPs older Milan t2d.

From the rest we note that, tra­di­tion­ally, AMD does SMT bet­ter than Intel, al­though the lat­ter has im­proved from the dis­mal Ice Lake days when it barely man­aged over 50%.

Bizarrely, the Akamai AMD Turin give an un­usu­ally high (given SMT) scal­a­bil­ity of 71.9%. I have ver­i­fied the re­sult sev­eral times, and I can’t fig­ure out what their setup is - the sin­gle-threaded per­for­mance at the same time is very low com­pared to every other Turin.

From the sin­gle-thread per­for­mance and scal­a­bil­ity re­sults we can guess how run­ning DKbench mul­ti­threaded will turn out, but in any case here it is:

Give the clearly fastest in­stance two full cores in­stead of threads and you get the Turin-powered AWS C8a com­pletely dom­i­nat­ing the chart. Interestingly, the Google Axion seems at least as good here as the leader from the pre­vi­ous com­par­i­son, the Genoa C7a - with Graviton4 very close and Cobalt 100 trail­ing not far be­hind.

The SMT-enabled Turin in­stances fol­low, with the Top-10 com­plet­ing with the ven­er­a­ble Milan in a non-SMT Tau in­stance. Long time fol­low­ers of these com­par­isons may re­mem­ber this was the top of the chart in the 2023 edi­tion.

At the bot­tom, as ex­pected we have very old Intel Broadwell/Skylake not-as-old Ice Lake and AMD Rome.

The old Geekbench 5 is pro­vided for com­par­i­son rea­sons (and I don’t trust Geekbench 6):

Both for sin­gle and multi-core, the re­sults are very close to what we get with DKbench. Which is a good thing, as both suites try a range of bench­marks to get a bal­anced generic CPU score.

Moving on to some pop­u­lar spe­cific bench­marks - start­ing with 7zip which is sen­si­tive to mem­ory la­tency and cache:

While Turin still leads over­all, Axion and Graviton4 are im­pres­sive and ac­tu­ally even beat it in the de­com­press part of the bench­mark. In fact, Cobalt 100 is the top per­former for de­com­pres­sion, but over­all the ARM so­lu­tions show great per­for­mance.

Another Turin show­case, with the non-SMT AWS C8a in par­tic­u­lar al­most dou­bling the score of the sec­ond and tripling the score of the C7a. Granite Rapids is also mak­ing a great show­ing.

It’s the first time I am run­ning this pop­u­lar bench­mark, and I am a bit puz­zled about some of the Milan types com­ing last.

Another first for this com­par­i­son is video com­pres­sion us­ing FFmpeg and libx264. Results for both sin­gle and dual-thread mode:

Once more, EPYC Turin comes first. If we look at sin­gle-thread per­for­mance only Granite Rapids comes some­what close. When us­ing 2 full cores Axion can pull ahead of all SMT (i.e. sin­gle core) in­stances ex­cept Turin.

Lastly, in case you have soft­ware that can be ac­cel­er­ated by AVX512, I am in­clud­ing an OpenSSL RSA4096 bench­mark. They are Intel’s ex­ten­sions so they are on all their CPUs since Skylake, whereas Genoa was the first AMD CPU to im­ple­ment them. Older AMD CPUs and ARM ar­chi­tec­tures will be at a dis­ad­van­tage in this bench­mark:

Like in our pre­vi­ous com­par­i­son, AMD out­per­forms Intel at their own game. It’s quite a mar­gin for Turin and even Genoa is ahead of any­thing Intel. Intel does not seem to be pri­ori­tis­ing vec­tor per­for­mance, as even the lat­est Granite Rapids does not bring much im­prove­ment over the ag­ing Ice Lake.

As ex­pected, ARM and older AMD CPUs that don’t sup­port AVX512 are slower than Intel Skylake and newer.

One fac­tor that is of­ten even more im­por­tant than per­for­mance it­self is the per­for­mance-to-price ra­tio.

I will start with the on-demand” price quoted by every provider. While I listed monthly costs on the ta­bles, these prices are ac­tu­ally charged per minute or hour, so there’s no need to re­serve for a full month.

The first chart is for sin­gle-thread per­for­mance/​price. I will have to sep­a­rate Hetzner’s shared in­stances be­cause they are not avail­able in the US and some­times run out even in Europe (esp. CX23), so I feel they are not ex­act com­pe­ti­tion - CCX13 though is avail­able and is in­cluded.

Hetzner and Oracle top the list like last year. However, thanks to the in­cred­i­ble per­for­mance of Turin, Oracle pretty much matches Hetzner’s ded­i­cated in­stance in per­for­mance to cost. They are fol­lowed by Linode and also GCPs n4d. The lat­ter, again thanks to the lead­ing sin­gle-thread per­for­mance of AMDs lat­est CPU even man­ages to bring bet­ter value than DigitalOcean, which is then fol­lowed by in-house ARM so­lu­tions like Google Axion and Azure Cobalt 100.

AWS is def­i­nitely the worst value on-de­mand. Their Turin is the best they can do, while their pre­vi­ous gen and older CPUs are the worst val­ues on the table. Unlike the pre­vi­ous com­par­i­son, even Azure seems to do bet­ter in value.

At this point I think we should see the lim­ited avail­abil­ity Hetzner VMs in com­par­i­son to the best value ded­i­cated:

The in­ex­pen­sive shared-cpu types of­fer un­beat­able value - if you man­age to get them. The top one over­all (Rome CX23) is ac­tu­ally the hard­est to pro­vi­sion, as the CX23 type usu­ally gives you a slow Skylake.

Moving on to 2x threads for eval­u­at­ing multi-threaded per­for­mance:

All the non-SMT VMs get a bump here, hence Oracle’s ARM take the lead with the new AmpereOne M, with Hetzner and shared core Linode fol­low­ing closely. The sec­ond tier con­sists of Google Axion and Azure Cobalt 100, as well as DigitalOcean droplets. AWSs non-SMT Turin is not that far be­hind this time, al­though their older gen 5/6 x86 are again at the very bot­tom of the chart.

The Hetzner shared-core in­stances get the bump as well, they pro­vide su­perb on-de­mand value com­pared to the com­pe­ti­tion:

The three largest (and most ex­pen­sive) providers of­fer sig­nif­i­cant 1-year reser­va­tion dis­counts. To get the max­i­mum dis­count you have to lock into a spe­cific VM type, which is why it is ex­tra im­por­tant to know what you are get­ting out of each. Also, for AWS you can ac­tu­ally au­to­mat­i­cally ap­ply the 1 year prices to most on-de­mand in­stances by us­ing third party ser­vices like DoIT’s Flexsave (included in their free tier!), so this seg­ment may still be rel­e­vant even if you don’t want to re­serve.

The first chart is again for sin­gle-thread per­for­mance/​price.

The 1-year dis­count is enough for GCPs Turin to match Oracle near the top of the value rank­ing. On Azure you get some good value run­ning Cobalt 100 or Genoa. If you are on AWS your best bet are the lat­est C8 fam­ily.

Moving on to eval­u­at­ing multi-threaded per­for­mance us­ing 2x vC­PUs:

OCI ARM in­stances are still at the top, joined by Azure Cobalt 100 with Axion al­most keep­ing up. This is the first in­stance where AWS can of­fer sim­i­lar value, thanks to the C8a with the fast Turin of­fer­ing twice the phys­i­cal cores, mak­ing up for the higher price.

Finally, for very long term com­mit­ments, AWS, GCP and Azure pro­vide 3-year re­served dis­counts:

GCP with its Turin in­stances fi­nally comes just ahead of Oracle and even Hetzner’s ded­i­cated VM. Azure also pro­vide good value with their Cobalt 100 and Turin types. It should be noted that even if AWS lags be­hind the other, at a 3 year com­mit­ment it still of­fers bet­ter value than the classic” value providers Akamai and DigitalOcean.

Switching to multi-thread, the num­ber of phys­i­cal cores per vCPU makes the dif­fer­ence:

I did­n’t ex­pect this, but Azure Cobalt 100 tops the chart! It is fol­lowed by GCP and OCI ARM so­lu­tions, but AWSs and GCPs Turin are not far be­hind.

The large providers (AWS, GCP, Azure, OCI) of­fer their spare VM ca­pac­ity at an - of­ten heavy - dis­count, with the un­der­stand­ing that these in­stances can be re­claimed at any time when needed by other cus­tomers. This spot” or preemptible” VM in­stance pric­ing is by far the most cost-ef­fec­tive way to add com­pute to your cloud. Obviously, it is not ap­plic­a­ble to all use cases, but if you have a fault-tol­er­ant work­load or can grace­fully in­ter­rupt your pro­cess­ing and re­build your server to con­tinue, this might be for you.

AWS and OCI will give you a 2-minute warn­ing be­fore your in­stance is ter­mi­nated. Azure and GCP will give you 30 sec­onds, which should still be enough for many use cases (e.g. web servers, batch pro­cess­ing etc).

The dis­count for Oracle’s in­stances is fixed at 50%, but varies wildly for the other providers per re­gion and can change of­ten, so you have to be on top of it to ad­just you in­stance types ac­cord­ingly.

For a longer dis­cus­sion on spot in­stances see 2023′s spot per­for­mance/​price com­par­i­son. Then you can come back to this year’s re­sults be­low.

Applying the low­est January 2026 US spot prices we get:

...

Read the original on devblog.ecuadors.net »

7 230 shares, 15 trendiness

Insider Trading Is Going to Get People Killed

Ayatollah Ali Khamenei was not, it’s safe to as­sume, a de­voted Polymarket user. If he had been, the Iranian leader might still be alive. Hours be­fore Khamenei’s com­pound in Tehran was re­duced to rub­ble last week, an ac­count un­der the user­name magamyman” bet about $20,000 that the supreme leader would no longer be in power by the end of March. Polymarket placed the odds at just 14 per­cent, net­ting magamyman” a profit of more than $120,000.

Everyone knew that an at­tack might be in the works—some American air­craft car­ri­ers had al­ready been de­ployed to the Middle East weeks ago—but the Iranian gov­ern­ment was caught off guard by the tim­ing. Although the ay­a­tol­lah surely was aware of the risks to his life, he pre­sum­ably did not know that he would be tar­geted on this par­tic­u­lar Saturday morn­ing. Yet on Polymarket, plenty of warn­ing signs pointed to an im­pend­ing at­tack. The day be­fore, 150 users bet at least $1,000 that the United States would strike Iran within the next 24 hours, ac­cord­ing to a New York Times analy­sis. Until then, few peo­ple on the plat­form were bet­ting that kind of money on an im­me­di­ate at­tack.

Maybe all of this sounds eerily fa­mil­iar. In January, some­one on Polymarket made a se­ries of sus­pi­ciously well-timed bets right be­fore the U. S. at­tacked a for­eign coun­try and de­posed its leader. By the time Nicolás Maduro was ex­tracted from Venezuela and flown to New York, the user had pock­eted more than $400,000. Perhaps this trader and the Iran bet­tors who are now flush with cash sim­ply had the luck of a life­time—the gam­bling equiv­a­lent of mak­ing a half-court shot. Or maybe they knew what was hap­pen­ing ahead of time and flipped it for easy money. We sim­ply do not know.

Polymarket traders swap crypto, not cash, and con­ceal their iden­ti­ties through the blockchain. Even so, in­ves­ti­ga­tions into in­sider trad­ing are al­ready un­der­way: Last month, Israel charged a mil­i­tary re­servist for al­legedly us­ing clas­si­fied in­for­ma­tion to make un­spec­i­fied bets on Polymarket.

The plat­form for­bids il­le­gal ac­tiv­ity, which in­cludes in­sider trad­ing in the U. S. But with a few taps on a smart­phone, any­one with priv­i­leged knowl­edge can now make a quick buck (or a hun­dred thou­sand). Polymarket and other pre­dic­tion mar­kets—the san­i­tized, in­dus­try-fa­vored term for sites that let you wa­ger on just about any­thing—have been dogged by ac­cu­sa­tions of in­sider trad­ing in mar­kets of all fla­vors. How did a Polymarket user know that Lady Gaga, Cardi B, and Ricky Martin would make sur­prise ap­pear­ances dur­ing the Super Bowl half­time show, but that Drake and Travis Scott would­n’t? Shady bets on war are even stranger and more dis­turb­ing. They risk un­leash­ing an en­tirely new kind of na­tional-se­cu­rity threat. The U.S. caught a break: The Venezuela and Iran strikes were not thwarted by in­sider traders whose bets could have prompted swift re­tal­i­a­tion. The next time, we may not be so lucky.

The at­tacks in Venezuela and Iran—like so many mil­i­tary cam­paigns—were con­ducted un­der the guise of se­crecy. You don’t swoop in on an ad­ver­sary when they know you are com­ing. The Venezuela raid was re­port­edly so con­fi­den­tial that Pentagon of­fi­cials did not know about its ex­act tim­ing un­til a few hours be­fore President Trump gave the or­ders.

Any in­sid­ers who put money down on im­pend­ing war may not have thought that they were giv­ing any­thing away. An anony­mous bet that reeks of in­sider trad­ing is not al­ways easy to spot in the mo­ment. After the sus­pi­cious Polymarket bets on the Venezuela raid, the site’s fore­cast placed the odds that Maduro would be ousted at roughly 10 per­cent. Even if Maduro and his team had been glued to Polymarket, it’s hard to imag­ine that such long odds would have com­pelled him to flee in the mid­dle of the night. And even with so many peo­ple bet­ting last Friday on an im­mi­nent strike in Iran, Polymarket fore­casted only a 26 per­cent chance, at most, of an at­tack the next day. What’s the sig­nal, and what’s the noise?

In both cases, some­one adept at pars­ing pre­dic­tion mar­kets could have known that some­thing was up. It’s pos­si­ble to spot these bets ahead of time,” Rajiv Sethi, a Barnard College econ­o­mist who stud­ies pre­dic­tion mar­kets, told me. There are some tell­tale be­hav­iors that could help dis­tin­guish a mil­i­tary con­trac­tor bet­ting off a state se­cret from a col­lege stu­dent mind­lessly scrolling on his phone af­ter one too many cans of Celsius. Someone who’s us­ing a newly cre­ated ac­count to wa­ger a lot of money against the con­ven­tional wis­dom is prob­a­bly the for­mer, not the lat­ter. And spot­ting these kinds of sus­pi­cious bet­tors is only get­ting eas­ier. The pre­dic­tion-mar­ket boom has cre­ated a cot­tage in­dus­try of tools that in­stan­ta­neously flag po­ten­tial in­sider trad­ing—not for le­gal pur­poses but so that you, too, can profit off of what the se­lect few al­ready know.

Unlike Kalshi, the other big pre­dic­tion-mar­ket plat­form, Polymarket can be used in the U. S only through a vir­tual pri­vate net­work, or VPN. In ef­fect, the site is able to skirt reg­u­la­tions that re­quire track­ing the iden­ti­ties of its cus­tomers and re­port­ing shady bets to the gov­ern­ment. In some ways, in­sider trad­ing seems to be the whole point: What’s cool about Polymarket is that it cre­ates this fi­nan­cial in­cen­tive for peo­ple to go and di­vulge the in­for­ma­tion to the mar­ket,” Shayne Coplan, the com­pa­ny’s 27-year-old CEO, said in an in­ter­view last year. (Polymarket did not re­spond to a re­quest for com­ment.)

Consider if the Islamic Revolutionary Guard Corps had paid the monthly fee for a ser­vice that flagged rel­e­vant ac­tiv­ity on Polymarket two hours be­fore the strike. The supreme leader might not have hosted in-per­son meet­ings with his top ad­vis­ers where they were easy tar­gets for mis­siles. Perhaps Iran would have launched its own pre­emp­tive strikes, tar­get­ing mil­i­tary bases across the Middle East. Six American ser­vice mem­bers have al­ready died from Iran’s drone at­tacks in the re­gion; the death toll could have been higher if Iran had struck first. In other words, some­one’s idea of a get-rich-quick scheme may have ended with a mil­i­tary raid gone hor­ri­bly awry. (The Department of Defense did not re­spond to a re­quest for com­ment.)

Maybe this all sounds far-fetched, but it should­n’t. Any ad­vance no­tice to an ad­ver­sary is prob­lem­atic,” Alex Goldenberg, a fel­low at the Rutgers Miller Center who has writ­ten about war mar­kets, told me. And these pre­dic­tive mar­kets, as they stand, are de­signed to leak out this in­for­ma­tion.” In all like­li­hood, he added, in­tel­li­gence agen­cies across the world are al­ready pay­ing at­ten­tion to Polymarket. Last year, the mil­i­tary’s bul­letin for in­tel­li­gence pro­fes­sion­als pub­lished an ar­ti­cle ad­vo­cat­ing for the armed forces to in­te­grate data from Polymarket to more fully an­tic­i­pate na­tional se­cu­rity threats.” After all, the Pentagon al­ready has some ex­pe­ri­ence with pre­dic­tion mar­kets. During the War on Terror, DARPA toyed with cre­at­ing what it billed the Policy Analysis Market,” a site that would let anony­mous traders bet on world events to fore­cast ter­ror­ist at­tacks and coups. (Democrats in Congress re­volted, and the site was quickly canned.)

Now every ad­ver­sary and ter­ror­ist group in the world can eas­ily ac­cess war mar­kets that are far more ad­vanced than what the DOD ginned up two decades ago. What makes Polymarket’s en­trance into war­fare so trou­bling is not just po­ten­tial in­sider trad­ing from users like magamyman.” If gov­ern­ments are eye­ing Polymarket for signs of an im­pend­ing at­tack, they can also be led astray. A gov­ern­ment or an­other so­phis­ti­cated ac­tor would­n’t need to spend much money to mas­sively swing the Polymarket odds on whether a Gulf state will im­mi­nently strike Iran—breeding panic and para­noia. More fun­da­men­tally, pre­dic­tion mar­kets risk warp­ing the ba­sic in­cen­tives of war, Goldenberg said. He gave the ex­am­ple of a Ukrainian mil­i­tary com­man­der mak­ing less than $1,000 a month, who could place bets that go against his own mil­i­tary’s ob­jec­tive. Maybe you choose to re­treat a day early be­cause you can dou­ble, triple, or quadru­ple your money and then send that back to your fam­ily,” he said.

Again, we don’t know for sure whether any of this is hap­pen­ing. That may be the scari­est part. As long as Polymarket lets any­one bet on war anony­mously, we may never know. Last Saturday, the day of the ini­tial Iran at­tack, Polymarket processed a record $478 mil­lion in bets, ac­cord­ing to one analy­sis. All the while, Polymarket con­tin­ues to wedge it­self into the main­stream. Substack re­cently struck a part­ner­ship with Polymarket to in­cor­po­rate the plat­for­m’s fore­casts into its newslet­ters. (“Journalism is bet­ter when it’s backed by live mar­kets,” Polymarket posted on X in an­nounc­ing the deal.) All of this makes the site even more valu­able as an in­tel­li­gence as­set, and even more de­struc­tive for the rest of us. Polymarket keeps launch­ing more war mar­kets: Will the U. S. strike Iraq? Will Israel strike Beirut? Will Iran strike Cyprus? Somewhere out there, some­one likely al­ready knows the an­swers.

...

Read the original on www.theatlantic.com »

8 228 shares, 10 trendiness

Filesystems are having a moment

I used to work at a vec­tor data­base com­pany. My en­tire job was help­ing peo­ple un­der­stand why they needed a data­base pur­pose-built for AI; em­bed­dings, se­man­tic search, the whole thing. So it’s a lit­tle funny that I’m writ­ing this. But here I am, watch­ing every­one in the AI ecosys­tem sud­denly re­dis­cover the hum­ble filesys­tem, and I think they might be onto some­thing big­ger than most peo­ple re­al­ize.

Not big­ger than data­bases. Different from data­bases. I need to say that up­front be­cause I al­ready know some­one is go­ing to read this and think I’m say­ing files good, data­bases bad.” I’m not. Stay with me.

If you’ve been pay­ing any at­ten­tion to the AI agent space over the last few months, you’ve no­ticed some­thing strange. LlamaIndex pub­lished Files Are All You Need.” LangChain wrote about how agents can use filesys­tems for con­text en­gi­neer­ing. Oracle, yes Oracle (who is cook­ing btw), put out a piece com­par­ing filesys­tems and data­bases for agent mem­ory. Dan Abramov wrote about a so­cial filesys­tem built on the AT Protocol. Archil is build­ing cloud vol­umes specif­i­cally be­cause agents want POSIX file sys­tems.

Jerry Liu from LlamaIndex put it bluntly: in­stead of one agent with hun­dreds of tools, we’re mov­ing to­ward a world where the agent has ac­cess to a filesys­tem and maybe 5-10 tools. That’s it. Filesystem, code in­ter­preter, web ac­cess. And that’s as gen­eral, if not more gen­eral than an agent with 100+ MCP tools.

Karpathy made the ad­ja­cent ob­ser­va­tion that stuck with me. He pointed out that Claude Code works be­cause it runs on your com­puter, with your en­vi­ron­ment, your data, your con­text. It’s not a web­site you go to — it’s a lit­tle spirit that lives on your ma­chine. OpenAI got this wrong, he ar­gued, by fo­cus­ing on cloud de­ploy­ments in con­tain­ers or­ches­trated from ChatGPT in­stead of sim­ply run­ning on lo­cal­host.

And here’s the thing that makes all of this mat­ter com­mer­cially: cod­ing agents make up the ma­jor­ity of ac­tual AI use cases right now. Anthropic is re­port­edly ap­proach­ing prof­itabil­ity, and a huge chunk of that is dri­ven by Claude Code, a CLI tool. Not a chat­bot. A tool that reads and writes files on your filesys­tem.

Here’s where I think most of the dis­course misses the deeper point.

Memory; in the hu­man, psy­cho­log­i­cal sense is fun­da­men­tal to how we func­tion. We don’t re-read our en­tire life story every time we make a de­ci­sion. We have long-term stor­age, se­lec­tive re­call, the abil­ity to for­get things that don’t mat­ter and sur­face things that do. Context win­dows in LLMs are none of that. They’re more like a white­board that some­one keeps eras­ing.

If you’ve used Claude Code for any real pro­ject, you know the dread of watch­ing that context left un­til auto-com­pact” no­ti­fi­ca­tion creep closer. Your en­tire con­ver­sa­tion, all the con­text the agent has built up about your code­base, your pref­er­ences, your de­ci­sions about to be com­pressed or lost.

Filesystems solve this in the most bor­ing, ob­vi­ous way pos­si­ble. Write things down. Put them in files. Read them back when you need them. Claude’s CLAUDE.md file gives the agent per­sis­tent con­text about your pro­ject. Cursor stores past chat his­tory as search­able files. People are writ­ing aboutme.md files that act as portable iden­tity de­scrip­tors any agent can read i.e. your pref­er­ences, your skills, your work­ing style, all in a file that moves be­tween ap­pli­ca­tions with­out any­one need­ing to co­or­di­nate an API.

Except! It might not be quite that sim­ple.

A re­cent pa­per from ETH Zürich eval­u­ated whether these repos­i­tory-level con­text files ac­tu­ally help cod­ing agents com­plete tasks. The find­ing was coun­ter­in­tu­itive: across mul­ti­ple agents and mod­els, con­text files tended to re­duce task suc­cess rates while in­creas­ing in­fer­ence cost by over 20%. Agents given con­text files ex­plored more broadly, ran more tests, tra­versed more files — but all that thor­ough­ness de­layed them from ac­tu­ally reach­ing the code that needed fix­ing. The files acted like a check­list that agents took too se­ri­ously.

This sounds like it un­der­mines the whole premise. But I think it ac­tu­ally sharp­ens it. The pa­per’s con­clu­sion was­n’t don’t use con­text files.” It was that un­nec­es­sary re­quire­ments make tasks harder, and con­text files should de­scribe only min­i­mal re­quire­ments. The prob­lem is­n’t the filesys­tem as a per­sis­tence layer. The prob­lem is peo­ple treat­ing CLAUDE.md like a 2,000-word on­board­ing doc­u­ment in­stead of a con­cise set of con­straints. Which brings us to the ques­tion of stan­dards.

Right now we have CLAUDE.md, AGENTS.md, copi­lot-in­struc­tions.md, .cursorrules, and prob­a­bly five more by the time you read this. Everyone agrees that agents need per­sis­tent filesys­tem-based con­text. Nobody agrees on what the file should be called or what should go in it. I see ef­forts to con­sol­i­date, this is good.

Dan Abramov’s piece on a so­cial filesys­tem crys­tal­lized some­thing im­por­tant here. He de­scribes how the AT Protocol treats user data as files in a per­sonal repos­i­tory; struc­tured, owned by the user, read­able by any app that speaks the for­mat. The crit­i­cal de­sign choice is that dif­fer­ent apps don’t need to agree on what a post” is. They just need to name­space their for­mats (using do­main names, like Java pack­ages) so they don’t col­lide. Apps are re­ac­tive to files. Every ap­p’s data­base be­comes de­rived data i.e. a cached ma­te­ri­al­ized view of every­body’s fold­ers.

The same ten­sion ex­ists in the agent con­text file space. We don’t need CLAUDE.md and AGENTS.md and copi­lot-in­struc­tions.md to con­verge into one file. We need them to co­ex­ist with­out col­li­sion. And to be fair, some con­ver­gence is hap­pen­ing. Anthropic re­leased Agent Skills as an open stan­dard, a SKILL.md for­mat that Microsoft, OpenAI, Atlassian, GitHub, and Cursor have all adopted. A skill you write for Claude Code works in Codex, works in Copilot. The file for­mat is the API.

NanoClaw, a light­weight per­sonal AI as­sis­tant frame­work, takes this to its log­i­cal con­clu­sion. Instead of build­ing an ever-ex­pand­ing fea­ture set, it uses a skills over fea­tures” model. Want Telegram sup­port? There’s no Telegram mod­ule. There’s a /add-telegram skill, es­sen­tially a mark­down file that teaches Claude Code how to rewrite your in­stal­la­tion to add the in­te­gra­tion. Skills are just files. They’re portable, au­ditable, and com­pos­able. No MCP server re­quired. No plu­gin mar­ket­place to browse. Just a folder with a SKILL.md in it.

This is in­ter­op­er­abil­ity with­out co­or­di­na­tion. And I want to be spe­cific about what I mean by that, be­cause it’s a strong claim. In tech, get­ting two com­pet­ing prod­ucts to work to­gether usu­ally re­quires ei­ther a for­mal stan­dard that takes years to rat­ify, or a dom­i­nant plat­form that forces com­pat­i­bil­ity. Files side­step both. If two apps can read mark­down, they can share con­text. If they both un­der­stand the SKILL.md for­mat, they can share ca­pa­bil­i­ties. Nobody had to sign a part­ner­ship agree­ment. Nobody had to at­tend a stan­dards body meet­ing. The file for­mat does the co­or­di­nat­ing.

There’s a use­ful anal­ogy from in­fra­struc­ture. Traditional data ar­chi­tec­tures were de­signed around the as­sump­tion that stor­age was the bot­tle­neck. The CPU waited for data from mem­ory or disk, and com­pu­ta­tion was es­sen­tially re­ac­tive to what­ever stor­age made avail­able. But as pro­cess­ing power out­paced stor­age I/O, the par­a­digm shifted. The in­dus­try moved to­ward de­cou­pling stor­age and com­pute, let­ting each scale in­de­pen­dently, which is how we ended up with ar­chi­tec­tures like S3 plus ephemeral com­pute clus­ters. The bot­tle­neck moved, and every­thing re­or­ga­nized around the new con­straint.

Something sim­i­lar is hap­pen­ing with AI agents. The bot­tle­neck is­n’t model ca­pa­bil­ity or com­pute. It’s con­text. Models are smart enough. They’re just for­get­ful. And filesys­tems, for all their sim­plic­ity, are an in­cred­i­bly ef­fec­tive way to man­age per­sis­tent con­text at the ex­act point where the agent runs — on the de­vel­op­er’s ma­chine, in their en­vi­ron­ment, with their data al­ready there.

Now, I’d be a frawd if I did­n’t ac­knowl­edge the ten­sion here. Someone on Twitter joked that all of you say­ing you don’t need a graph for agents while us­ing the filesys­tem are just in de­nial about us­ing a graph.” And… they’re not wrong. A filesys­tem is a tree struc­ture. Directories, sub­di­rec­to­ries, files i.e. a di­rected acyclic graph. When your agent runs ls, grep, reads a file, fol­lows a ref­er­ence to an­other file, it’s tra­vers­ing a graph.

Richmond in Oracle’s piece made the sharpest dis­tinc­tion I’ve seen: filesys­tems are win­ning as an in­ter­face, data­bases are win­ning as a sub­strate. The mo­ment you want con­cur­rent ac­cess, se­man­tic search at scale, dedu­pli­ca­tion, re­cency weight­ing — you end up build­ing your own in­dexes. Which is, let’s be hon­est, ba­si­cally a data­base.

Having worked at Weaviate, I can tell you that this is­n’t an ei­ther/​or sit­u­a­tion. The file in­ter­face is pow­er­ful be­cause it’s uni­ver­sal and LLMs al­ready un­der­stand it. The data­base sub­strate is pow­er­ful be­cause it pro­vides the guar­an­tees you need when things get real. The in­ter­est­ing fu­ture is­n’t files ver­sus data­bases. It’s files as the in­ter­face hu­mans and agents in­ter­act with, backed by what­ever sub­strate makes sense for the use case.

Here’s my ac­tual take on all of this, the thing I think peo­ple are danc­ing around but not say­ing di­rectly.

Filesystems can re­de­fine what per­sonal com­put­ing means in the age of AI.

Not in the everything runs lo­cally” sense (but maybe?). In the sense that your data, your con­text, your pref­er­ences, your skills, your mem­ory — lives in a for­mat you own, that any agent can read, that is­n’t locked in­side a spe­cific ap­pli­ca­tion. Your aboutme.md works with your flavour of OpenClaw/NanoClaw to­day and what­ever comes to­mor­row. Your skills files are portable. Your pro­ject con­text per­sists across tools.

This is what per­sonal com­put­ing was sup­posed to be be­fore every­thing moved into walled-gar­den SaaS apps and pro­pri­etary data­bases. Files are the orig­i­nal open pro­to­col. And now that AI agents are be­com­ing the pri­mary in­ter­face to com­put­ing, files are be­com­ing the in­ter­op­er­abil­ity layer that makes it pos­si­ble to switch tools, com­pose work­flows, and main­tain con­ti­nu­ity across ap­pli­ca­tions, all with­out any­one’s per­mis­sion.

I’ll ad­mit this is a bit ide­al­is­tic. The his­tory of open for­mats is lit­tered with stan­dards that won on pa­per and lost in prac­tice. Companies have strong in­cen­tives to make their con­text files just dif­fer­ent enough that switch­ing costs re­main high. The fact that we al­ready have CLAUDE.md and AGENTS.md and .cursorrules co­ex­ist­ing rather than one uni­ver­sal for­mat, is ev­i­dence that frag­men­ta­tion is the de­fault, not the ex­cep­tion. And the ETH Zürich pa­per is a re­minder that even when the for­mat ex­ists, writ­ing good con­text files is harder than it sounds. Most peo­ple will write bad ones, and bad con­text files are ap­par­ently worse than none at all.

But I keep com­ing back to some­thing Dan Abramov wrote: our mem­o­ries, our thoughts, our de­signs should out­live the soft­ware we used to cre­ate them. That’s not a tech­ni­cal ar­gu­ment. It’s a val­ues ar­gu­ment. And it’s one that the filesys­tem, for all its age and sim­plic­ity, is uniquely po­si­tioned to serve. Not be­cause it’s the best tech­nol­ogy. But be­cause it’s the one tech­nol­ogy that al­ready be­longs to you.

...

Read the original on madalitso.me »

9 209 shares, 16 trendiness

tropes.md

A sin­gle file con­tain­ing all cat­a­loged AI writ­ing tropes. Add it to your AIs sys­tem prompt to help it avoid these pat­terns (let’s play cat and mouse!).

Disclaimer: Creation of this file was AI-assisted. If you thought I was go­ing to write out a .md file for AI my­self you must be mad. AI for AI. Human for Human.

# AI Writing Tropes to Avoid

Add this file to your AI as­sis­tan­t’s sys­tem prompt or con­text to help it avoid

com­mon AI writ­ing pat­terns. Source: tropes.fyi

## Word Choice

### Quietly” and Other Magic Adverbs

Overuse of quietly” and sim­i­lar ad­verbs to con­vey sub­tle im­por­tance or un­der­stated power. AI reaches for these ad­verbs to make mun­dane de­scrip­tions feel sig­nif­i­cant. Also in­cludes: deeply”, fundamentally”, remarkably”, arguably”.

**Avoid pat­terns like:**

- quietly or­ches­trat­ing work­flows, de­ci­sions, and in­ter­ac­tions”

- the one that qui­etly suf­fo­cates every­thing else”

- a quiet in­tel­li­gence be­hind it”

### Delve” and Friends

Used to be the most in­fa­mous AI tell. Delve” went from an un­com­mon English word to ap­pear­ing in a stag­ger­ing per­cent­age of AI-generated text. Part of a fam­ily of overused AI vo­cab­u­lary in­clud­ing certainly”, utilize”, leverage” (as a verb), robust”, streamline”, and harness”.

**Avoid pat­terns like:**

- Let’s delve into the de­tails…”

- Delving deeper into this topic…”

- We cer­tainly need to lever­age these ro­bust frame­works…”

### Tapestry” and Landscape”

Overuse of or­nate or grandiose nouns where sim­pler words would do. Tapestry” is used to de­scribe any­thing in­ter­con­nected. Landscape” is used to de­scribe any field or do­main. Other of­fend­ers: paradigm”, synergy”, ecosystem”, framework”.

**Avoid pat­terns like:**

- The rich ta­pes­try of hu­man ex­pe­ri­ence…”

- Navigating the com­plex land­scape of mod­ern AI…”

- The ever-evolv­ing land­scape of tech­nol­ogy…”

### The Serves As” Dodge

Replacing sim­ple is” or are” with pompous al­ter­na­tives like serves as”, stands as”, marks”, or represents”. AI avoids ba­sic cop­u­las be­cause its rep­e­ti­tion penalty pushes it to­ward fancier con­struc­tions (I’ve stud­ied this!).

**Avoid pat­terns like:**

- The build­ing serves as a re­minder of the city’s her­itage.”

- Gallery 825 serves as LAAAs ex­hi­bi­tion space for con­tem­po­rary art.”

- The sta­tion marks a piv­otal mo­ment in the evo­lu­tion of re­gional tran­sit.”

## Sentence Structure

### Negative Parallelism

The It’s not X — it’s Y” pat­tern, of­ten with an em dash. The sin­gle most com­monly iden­ti­fied AI writ­ing tell. Man I f*ck­ing hate it. AI uses this to cre­ate false pro­fun­dity by fram­ing every­thing as a sur­pris­ing re­frame. One in a piece can be ef­fec­tive; ten in a blog post is a gen­uine in­sult to the reader. Before LLMs, peo­ple sim­ply did not write like this at scale. Includes the causal vari­ant not be­cause X, but be­cause Y” where every ex­pla­na­tion is framed as a sur­prise re­veal.

**Avoid pat­terns like:**

- It’s not bold. It’s back­wards.”

- Feeding is­n’t nu­tri­tion. It’s dial­y­sis.”

- Half the bugs you chase aren’t in your code. They’re in your head.”

### Not X. Not Y. Just Z.”

The dra­matic count­down pat­tern. AI builds ten­sion by negat­ing two or more things be­fore re­veal­ing the ac­tual point. Creates a false sense of nar­row­ing down to the truth.

**Avoid pat­terns like:**

- Not a bug. Not a fea­ture. A fun­da­men­tal de­sign flaw.”

- Not ten. Not fifty. Five hun­dred and twenty-three lint vi­o­la­tions across 67 files.”

- not reck­lessly, not com­pletely, but enough”

### The X? A Y.”

Self-posed rhetor­i­cal ques­tions an­swered im­me­di­ately in the next sen­tence or clause. The model asks a ques­tion no­body was ask­ing, then an­swers it for dra­matic ef­fect. Thinks this is the epit­ome of great writ­ing.

**Avoid pat­terns like:**

- The re­sult? Devastating.”

- The worst part? Nobody saw it com­ing.”

- The scary part? This at­tack vec­tor is per­fect for de­vel­op­ers.”

### Anaphora Abuse

Repeating the same sen­tence open­ing mul­ti­ple times in quick suc­ces­sion.

**Avoid pat­terns like:**

- They as­sume that users will pay… They as­sume that de­vel­op­ers will build… They as­sume that ecosys­tems will emerge… They as­sume that…”

- They could ex­pose… They could of­fer… They could pro­vide… They could cre­ate… They could let… They could un­lock…”

- They have built en­gines, but not ve­hi­cles. They have built power, but not lever­age. They have built walls, but not doors.”

### Tricolon Abuse

Overuse of the rule-of-three pat­tern, of­ten ex­tended to four or five. A sin­gle tri­colon is el­e­gant; three back-to-back tri­colons are a pat­tern recog­ni­tion fail­ure.

**Avoid pat­terns like:**

- Products im­press peo­ple; plat­forms em­power them. Products solve prob­lems; plat­forms cre­ate worlds. Products scale lin­early; plat­forms scale ex­po­nen­tially.”

- identity, pay­ments, com­pute, dis­tri­b­u­tion”

- workflows, de­ci­sions, and in­ter­ac­tions”

### It’s Worth Noting”

Filler tran­si­tions that sig­nal noth­ing. AI uses these phrases to in­tro­duce new points with­out ac­tu­ally con­nect­ing them to the pre­vi­ous ar­gu­ment. Also in­cludes: It bears men­tion­ing”, Importantly”, Interestingly”, Notably”.

**Avoid pat­terns like:**

- It’s worth not­ing that this ap­proach has lim­i­ta­tions.”

- Importantly, we must con­sider the broader im­pli­ca­tions.”

- Interestingly, this pat­tern re­peats across in­dus­tries.”

### Superficial Analyses

Tacking a pre­sent par­tici­ple (“-ing”) phrase onto the end of a sen­tence to in­ject shal­low analy­sis that says noth­ing. The model at­taches sig­nif­i­cance, legacy, or broader mean­ing to mun­dane facts us­ing phrases like highlighting its im­por­tance”, reflecting broader trends”, or contributing to the de­vel­op­ment of…”.

**Avoid pat­terns like:**

- contributing to the re­gion’s rich cul­tural her­itage”

- This et­y­mol­ogy high­lights the en­dur­ing legacy of the com­mu­ni­ty’s re­sis­tance and the trans­for­ma­tive power of unity in shap­ing its iden­tity.”

- underscoring its role as a dy­namic hub of ac­tiv­ity and cul­ture”

### False Ranges

Using from X to Y” con­struc­tions where X and Y aren’t on any real scale. In le­git­i­mate use, from X to Y” im­plies a spec­trum with a mean­ing­ful mid­dle. AI uses it as a fancy way to list two loosely re­lated things. From in­no­va­tion to cul­tural trans­for­ma­tion” — what’s in be­tween???? Nothing!

**Avoid pat­terns like:**

- From in­no­va­tion to im­ple­men­ta­tion to cul­tural trans­for­ma­tion.”

- From the sin­gu­lar­ity of the Big Bang to the grand cos­mic web.”

- From prob­lem-solv­ing and tool-mak­ing to sci­en­tific dis­cov­ery, artis­tic ex­pres­sion, and tech­no­log­i­cal in­no­va­tion.”

### Gerund Fragment Litany

After mak­ing a claim, AI il­lus­trates it with a stream of verb­less gerund frag­ments — stand­alone sen­tences with no gram­mat­i­cal sub­ject. Fixing small bugs. Writing straight­for­ward fea­tures. Implementing well-de­fined tick­ets.” The first sen­tence al­ready said every­thing. The frag­ments add noth­ing ex­cept word count and that fa­mil­iar AI ca­dence. Humans don’t write first drafts this way. It’s a pure struc­tural tic.

**Avoid pat­terns like:**

- Fixing small bugs. Writing straight­for­ward fea­tures. Implementing well-de­fined tick­ets.”

- Reviewing pull re­quests. Debugging edge cases. Attending ar­chi­tec­ture meet­ings.”

- Shipping faster. Moving quicker. Delivering more.”

## Paragraph Structure

### Short Punchy Fragments

Excessive use of very short sen­tences or sen­tence frag­ments as stand­alone para­graphs for man­u­fac­tured em­pha­sis. RLHF train­ing has pushed mod­els to­ward writing for read­abil­ity” aimed at the low­est com­mon de­nom­i­na­tor: one thought per sen­tence, no men­tal state-keep­ing re­quired. It’s an in­hu­man style. No real per­son writes first drafts this way be­cause it does­n’t match how hu­mans think or speak.

**Avoid pat­terns like:**

- He pub­lished this. Openly. In a book. As a priest.”

- These weren’t just prod­ucts. And the soft­ware side matched. Then it pro­fes­sion­alised. But I adapted.”

- Platforms do.”

### Listicle in a Trench Coat

Numbered or la­beled points dressed up as con­tin­u­ous prose. The model writes what is es­sen­tially a lis­ti­cle but wraps each point in a para­graph that starts with The first… The sec­ond… The third…” to dis­guise the for­mat. Perhaps you told it to stop gen­er­at­ing lists and it de­cided to do this in­stead… still very com­mon.

**Avoid pat­terns like:**

- The first wall is the ab­sence of a free, scoped API… The sec­ond wall is the lack of del­e­gated ac­cess… The third wall is the ab­sence of scoped per­mis­sions…”

- The sec­ond take­away is that… The third take­away is that… The fourth take­away is that…”

## Tone

### Here’s the Kicker”

False sus­pense tran­si­tions that promise a rev­e­la­tion but de­liver a point that did NOT need the buildup. The model uses these phrases to man­u­fac­ture drama be­fore an oth­er­wise un­re­mark­able ob­ser­va­tion LOL. Also in­cludes: Here’s the thing”, Here’s where it gets in­ter­est­ing”, Here’s what most peo­ple miss”.

...

Read the original on tropes.fyi »

10 207 shares, 12 trendiness

Dumping Lego NXT firmware off of an existing brick

Dumping Lego NXT firmware off of an ex­ist­ing brick

I’ve re­cently been con­tribut­ing to the Pybricks pro­ject, a com­mu­nity-run port of MicroPython to Lego Mindstorms hard­ware. As part of that, I ob­tained a used Lego NXT which just so hap­pened to still be run­ning the orig­i­nal ver­sion 1.01 firmware from when it launched in 2006. I wanted to archive a copy of this firmware, and do­ing so hap­pened to in­volve the dis­cov­ery of ar­bi­trary code ex­e­cu­tion.

The NXT is a rel­a­tively sim­ple ex­ploita­tion tar­get and can serve as a good in­tro­duc­tion to ARM and em­bed­ded ex­ploit de­vel­op­ment.

Or, in the words of a much more in­no­cent era, Google is your friend” .

Surely some­body must’ve al­ready archived a copy of this firmware, right?” I thought to my­self. Unfortunately, this does not ap­pear to have been the case. I searched but never came across a copy of this par­tic­u­lar firmware ver­sion de­spite the ex­ten­sive NXT en­thu­si­ast com­mu­nity.

I did come across a men­tion of a 1.03 firmware which ap­pears to have been re­leased on or very close to launch day. I sus­pect that en­thu­si­asts and ad­vanced users likely ea­gerly switched to newer and/​or com­mu­nity-mod­i­fied firmwares when they wanted newer fea­tures.

The NXT is also old enough that, de­spite be­ing part of the Internet era”, re­sources are start­ing to bi­trot.

Looks like I’m go­ing to have to fig­ure out how to re­trieve a copy my­self!

The first idea which came to mind for back­ing up firmware is does the tool which is used to down­load new firmware to the NXT also al­low re­triev­ing the pre­ex­ist­ing firmware?”

From sources in­clud­ing the Wikipedia page, we find that the NXT is built around a Microchip (formerly Atmel) AT91SAM7S256 mi­cro­con­troller, a dis­tant an­ces­tor of the SAM D parts that now power sev­eral Arduino, MicroPython, and CircuitPython boards. This chip con­tains a built-in boot­loader pro­gram called SAM-BA which sup­ports sim­ple read from mem­ory” (traditionally known as PEEK) and write to mem­ory” (traditionally known as POKE) com­mands. This (deceptively!) seems like it’d work!

Fortunately, while re­search­ing, I found out that some­body did try this al­ready and was un­suc­cess­ful. Attempting to en­ter the SAM-BA boot­loader ap­pears to au­to­mat­i­cally over­write part of the firmware which we want to back up. Good thing I did my re­search first! We have to find a dif­fer­ent ap­proach that does­n’t in­volve en­ter­ing firmware up­date mode.

JTAG is a hard­ware in­ter­face used for pro­vid­ing all sorts of debug” and test” func­tion­al­ity for cir­cuit boards and chips. Precisely what can be done us­ing JTAG varies greatly, but the mi­cro­con­troller in the NXT al­lows JTAG to read and mod­ify all of the CPUs state for de­bug­ging. This can be used to read back data stored in­side the chip.

Is this re­lated to us­ing JTAG to hack an Xbox or a mo­bile phone?

Yes! Those de­vices also use the same low-level pro­to­col known as JTAG. However, the de­bug and test com­mands which can be used on top of JTAG are com­pletely dif­fer­ent. Think of JTAG as be­ing sim­i­lar to TCP or UDP while the chip-spe­cific com­mands are higher-level pro­to­cols such as HTTP or SSH.

Unfortunately, since this is a hard­ware in­ter­face, us­ing it in­volves tak­ing apart the NXT and sol­der­ing to it (since the nec­es­sary con­nec­tors are not in­stalled). Additionally, this chip is so old that its de­bug in­ter­face is cum­ber­some to set up and use (it does­n’t sup­port , , or any of the in­ter­faces and pro­to­cols that the cheap mod­ern tools are de­signed for).

I con­sid­ered this method a last re­sort but re­ally wanted to find a soft­ware-only so­lu­tion. Software-only so­lu­tions are gen­er­ally eas­ier to share and de­ploy, so find­ing one would al­low many other peo­ple to also back up the firmware of bricks in their pos­ses­sion.

For a de­vice like the NXT which al­ready al­lows for lim­ited user-pro­gram­ma­bil­ity, the first in­stinct is usu­ally to ex­plore what this lim­ited or sandboxed” en­vi­ron­ment al­lows you to do. How do NXT pro­grams work? Can we just write an NXT pro­gram that dumps the firmware and sends it to the com­puter?

If we hunt around, we can find the LEGO MINDSTORMS NXT Executable File Specification” which ex­plains that NXT pro­grams run in a byte­code and does­n’t have the abil­ity to read/​write ar­bi­trary mem­ory. Variables are re­stricted to a data seg­ment” of fixed size, and all mem­ory ac­cesses must be in­side it. This means that we can­not just” write an NXT pro­gram (unless we find a bug in the VM which al­lows us to ac­cess mem­ory we’re not sup­posed to).

What is the dif­fer­ence be­tween a VM and native” code?

Native” code refers to code which a CPU can di­rectly run. A vir­tual ma­chine is a way of adding a layer of in­di­rec­tion be­tween a pro­gram and the real CPU. Computer sci­en­tists love solv­ing prob­lems by adding in­di­rec­tion, and a vir­tual ma­chine can be used to solve prob­lems such as in­com­pat­i­bil­ity, con­ve­nience, and/​or se­cu­rity.

For ex­am­ple, a vir­tual ma­chine can be used to take code de­signed for one type of CPU and run it on a dif­fer­ent type of CPU. This is of­ten called an em­u­la­tor, and they can be use­ful when it is­n’t pos­si­ble to re­com­pile the code for the new CPU (such as if the orig­i­nal pro­gram is a closed-source video game for a pro­pri­etary game con­sole but you want to run it on a desk­top PC).

Java and .NET run on vir­tual ma­chines which are specif­i­cally de­signed so that man­ag­ing mem­ory is more con­ve­nient (such as by hav­ing garbage col­lec­tion). They can also be used to im­ple­ment se­cu­rity by fun­nel­ing dangerous” op­er­a­tions into spe­cific, lim­ited path­ways. The NXTs vir­tual ma­chine is a vir­tual ma­chine of this type.

For those who aren’t aware, the source code of the NXT firmware is pub­licly avail­able! However, many links to it have bi­trot­ted, source code only seems to have been re­leased for some ver­sions (certainly not every), and it’s not even clear which ver­sions of the code have been archived and still ex­ist. (For ex­am­ple, the seem­ingly-of­fi­cial LEGO-Robotics/NXT-Firmware repos­i­tory on GitHub… is ac­tu­ally a com­mu­nity-mod­i­fied firmware! Its his­tory also only con­tains ver­sions 1.05 and 1.29 specif­i­cally and not, for ex­am­ple, the fi­nal 1.31 or the orig­i­nal 1.01.)

Nonetheless, we can still study it to see if we can find any­thing in­ter­est­ing. At the same time, we can also study a copy of the NXT Bluetooth Developer Kit in or­der to un­der­stand how the com­puter com­mu­ni­cates with the brick. (Despite be­ing the Bluetooth” de­vel­oper kit, the doc­u­mented pro­to­col and com­mands are used over USB as well.)

From read­ing through the LEGO MINDSTORMS NXT Communication Protocol” and LEGO MINDSTORMS NXT Direct Commands” doc­u­ments, we start to see the fol­low­ing high-level overview:

The pro­to­col con­tains two cat­e­gories of com­mands, system” and direct”. System com­mands vaguely re­late to operating sys­tem” func­tion­al­ity, and di­rect com­mands vaguely re­late to actually op­er­at­ing a ro­bot”. In gen­eral, this pro­to­col also seems to specif­i­cally not al­low per­form­ing ar­bi­trary op­er­a­tions and bad­ness such as ac­cess­ing the firmware or get­ting na­tive code ex­e­cu­tion out­side of the VM. It ap­pears to be de­signed to give friendly ac­cess to only the NXTs vir­tual filesys­tem and byte­code in­ter­preter.

Since both the VM and the com­mu­ni­ca­tions pro­to­col ap­pear to be de­signed to keep us out, it’s start­ing to look like we’re go­ing to need to find some kind of ex­ploit.

While look­ing through all of these doc­u­ments, I gen­er­ally fo­cused my at­ten­tion on low-level” func­tion­al­ity, as it is much more likely to con­tain the abil­ity to ac­cess the firmware and/​or ar­bi­trary mem­ory. One fea­ture, IO-Maps”, im­me­di­ately stood out.

In the NXT Communication Protocol doc­u­ment, IO-Maps are de­scribed as the well-de­scribed layer be­tween the dif­fer­ent con­trollers/​dri­vers stack and the VM run­ning the user’s code.” That sounds po­ten­tially in­ter­est­ing if it al­lows ac­cess to dri­vers in ways which aren’t nor­mally al­lowed. Also, if this is an in­ter­face which is­n’t nor­mally used, it is a po­ten­tial lo­ca­tion for un­ex­pected and ex­ploitable bugs.

So… where does one find the so-called well-described” de­scrip­tion of what IO-Maps can do?

One of the best ex­pla­na­tions I found was an old copy of the NXC pro­gram­mer’s guide. NXC (Not eX­actly C) is an al­ter­na­tive fron­tend for cre­at­ing NXT pro­grams for the stock firmware in a C-like lan­guage rather than graph­i­cal blocks. This pro­gram­mer’s guide lists all of the IO-Map off­sets for each firmware mod­ule, and the ex­pla­na­tions make it clear that IO-Maps con­tain es­sen­tially all of each mod­ule’s in­ter­nal state.

Further search­ing finds this blog post ex­plain­ing how it’s pos­si­ble to watch and plot vari­ables in the user pro­gram by read­ing from the VM mod­ule’s IO-Map. It def­i­nitely feels like we could be on to some­thing here!

How do you find the IO-Map struc­tures in the firmware source code? That blog post lists a struct, but where is said struct?

It turns out that all IO-Maps are de­fined in .iom files in the firmware, with the VMs be­ing de­fined in c_cmd.iom.

Without even hav­ing to look at any other mod­ules, we can al­ready spot some­thing: the VM IO-Map con­tains a func­tion pointer pR­CHan­dler! What does this func­tion pointer do?

It turns out that this is the com­mand han­dler for direct” com­mands!

Is this… re­ally just a na­tive code func­tion pointer sit­ting in­side this IO-Map struc­ture which is both read­able and writable over USB?

What is a func­tion pointer? Why is find­ing a func­tion pointer such a big deal?

A func­tion pointer is a piece of data which stores the lo­ca­tion of some code. A pro­gram uses this data to de­cide what code to run next. Programs them­selves can mod­ify func­tion point­ers in or­der to al­ter their func­tion­al­ity as they run, but, if we can mod­ify the func­tion pointer, we can also al­ter what the pro­gram does, in­clud­ing in ways that may be un­in­tended.

In or­der to try out whether this even has a chance of work­ing, we will need to send com­mands to the NXT over USB. This can be done in many dif­fer­ent ways, but here we will use the Python pro­gram­ming lan­guage. Python is very suit­able for test­ing and pro­to­typ­ing be­cause it has a and many third-party li­braries im­ple­ment­ing func­tion­al­ity that we can reuse. In this case, we will use the PyUSB li­brary to talk to the NXT.

Setting up Python, cre­at­ing a vir­tualenv, in­stalling PyUSB, in­stalling USB dri­vers, and con­fig­ur­ing USB per­mis­sions will all be left as an ex­er­cise for the reader. This is all very im­por­tant, but setting up and con­fig­ur­ing a de­vel­op­ment en­vi­ron­ment” is a huge task all on its own, re­quir­ing tons of of­ten-poorly-doc­u­mented im­plicit knowl­edge, and I wanted to get this ar­ti­cle done in a rea­son­able amount of time.

First we need to open a con­nec­tion to the NXT:

Then we need to see if we can in­deed ac­cess the VM (or command”) mod­ule’s IO-Map:

Ah yes. Most peo­ple have not in­vested years into skills such as star­ing at hex dumps and raw data. I’ll have to give a more de­tailed ex­pla­na­tion.

We want to send the Read IO Map Command” to the NXT. This com­mand is doc­u­mented on page 20 of the LEGO MINDSTORMS NXT Communication Protocol” doc­u­ment, and the re­quest is doc­u­mented to take 10 bytes. Here we’re man­u­ally in­putting each of the bytes us­ing a hexa­dec­i­mal es­cape se­quence.

The first two bytes are re­quired to be 0x01 and 0x94: \x01\x94.

This is fol­lowed by the mod­ule ID in lit­tle-en­dian for­mat: \x01\x00\x01\x00. This cor­re­sponds to a mod­ule ID of 0x00010001 which is the ID of the VM mod­ule.

When a value is stored us­ing more than one byte, the bytes have to be stored in a par­tic­u­lar or­der, just like how dec­i­mal num­bers with mul­ti­ple dig­its have to be writ­ten in a par­tic­u­lar or­der. Little-endian” is the opposite” or backwards” or­der from how Arabic nu­mer­als are writ­ten, mean­ing that the first” or leftmost” byte has the low­est place value. This byte is called the LSB or least-significant byte”. The last” or rightmost” byte has the high­est place value and is called the MSB or most-significant byte”.

Big-endian” is the op­po­site of lit­tle-en­dian and matches the or­der of Arabic nu­mer­als. The endian” names are a his­tor­i­cal ar­ti­fact.

TL;DR it means you have to flip the bytes around

The next two bytes \x00\x00 cor­re­spond to an off­set of 0.

Finally, the last two bytes \x10\x00 cor­re­spond to a length of 0x10 or 16.

In sum­mary, this com­mand means read 16 bytes from off­set 0 of the VM mod­ule’s IO-Map”.

To ac­tu­ally send the com­mand to the NXT, we write it to USB end­point 1. To read the re­sponse, we send a read com­mand to USB end­point 0x82 (don’t worry about it).

But I am wor­ried about it!

Understanding this re­quires a min­i­mal un­der­stand­ing of how the USB de­vice frame­work works. An ex­cel­lent overview can be found here. In short, when talk­ing to a USB de­vice, data needs to be sent to or re­ceived from spe­cific end­points. A de­vice can have mul­ti­ple end­points of dif­fer­ent types and di­rec­tions. Each end­point is iden­ti­fied by an ad­dress, which can be found in the USB de­scrip­tors. The NXT uses two bulk” end­points, one in each di­rec­tion, and their ad­dresses are 0x01 and 0x82.

If we de­code the re­sponse ac­cord­ing to the doc­u­men­ta­tion, we find that the first bytes \x02\x94 are ex­actly as spec­i­fied un­der the return pack­age” head­ing.

The next byte, \x00, means that the com­mand suc­ceeded.

This is fol­lowed by a re­peat of the mod­ule ID \x01\x00\x01\x00 and the re­quested length \x10\x00.

Finally, we have the data which was read: MindstormsNXT\x00\x00\x00. This data cor­re­sponds to FormatString in the code, and here it is ini­tial­ized to the MindstormsNXT value that we see.

Let’s try read­ing that func­tion pointer now:

It helps to see the dif­fer­ence if we line up the two com­mands:

We’ve changed the off­set from \x00\x00 to \x10\x00 (from 0 to 16). We’ve changed the length from \x10\x00 to \x04\x00 (from 16 to 4). (Remember that all the num­bers are in lit­tle-en­dian!)

Instead of turn­ing the re­sponse into a bytes ob­ject, we leave it as an ar­ray. In or­der to find the actual data” which was read, we can ei­ther man­u­ally count all the bytes again, or we can re­al­ize that the data is go­ing to be the last 4 bytes: [61, 13, 16, 0]. The fi­nal line of code con­verts this into the value of 0x100d3d. This is our func­tion pointer, but what does this num­ber mean?

If we look at the datasheet for the AT91SAM7S256 mi­cro­con­troller and look at Figure 8-1 SAM7S512/256/128/64/321/32/161/16 Memory Mapping”, we can see that mem­ory ad­dresses in the range 0x001xxxxx cor­re­spond to the in­ter­nal flash mem­ory of the chip. The value that we read, 0x100d3d, is 0xd3d bytes or about 3 KiB past the be­gin­ning of the in­ter­nal flash mem­ory. This cer­tainly looks like a rea­son­able func­tion pointer! If we mod­ify this func­tion pointer, we should be able to redi­rect code ex­e­cu­tion for direct” com­mands to some­thing else.

What, specif­i­cally, can we mod­ify this pointer to in or­der to gain ar­bi­trary code ex­e­cu­tion? On a mod­ern sys­tem with mem­ory pro­tec­tions and ad­vanced ex­ploit mit­i­ga­tions, this part of the puz­zle may end up be­ing a chal­leng­ing task. However, this mi­cro­con­troller has none of these fea­tures. We should be able to put in any valid ad­dress and have the mi­cro­con­troller ex­e­cute that ad­dress as code (as long as we’ve put valid code there).

How do you put valid code some­where”? What does that ac­tu­ally mean?

Many mod­ern com­put­ers are de­signed so that the com­put­er’s in­struc­tions can be ac­cessed and ma­nip­u­lated as data. Likewise, data can be treated as in­struc­tions. This is cer­tainly the case for the mi­cro­con­troller in ques­tion here. This idea is crit­i­cally im­por­tant. It means that, as long as we can put some data in some lo­ca­tion, and as long as that data hap­pens to rep­re­sent valid in­struc­tions, the CPU will be able to ex­e­cute it.

This is not the case on every sys­tem. For ex­am­ple, the AVR ar­chi­tec­ture does not treat in­struc­tion mem­ory and data mem­ory as in­ter­change­able. Modern op­er­at­ing sys­tems such as Windows or Android also typ­i­cally pre­vent ac­cess­ing data as in­struc­tions (often called or ) with­out go­ing through some ex­tra steps. This helps pro­tect against… ex­actly what we’re do­ing here.

The fact that we have a sim­ple tar­get which can freely in­ter­change data and code and which does­n’t have mod­ern pro­tec­tions makes this an ex­cel­lent learn­ing tar­get.

What ad­dresses can we ac­tu­ally mod­ify the func­tion pointer to? We don’t know what the code looks like (that’s the whole point of this ex­er­cise!), and we don’t know pre­cisely how the data mem­ory is laid out ei­ther. We can only put in one ad­dress, so what do we do?

Here’s where we get very lucky.

Inside the VMs IO-Map, there is a MemoryPool vari­able cor­re­spond­ing to the data seg­ment of the run­ning NXT pro­gram. This vari­able is 32 KiB in size, which means that we have 32 KiB of space that we can safely fill with what­ever we want (as long as no pro­gram is run­ning).

That means that the firmware will not crash if we mod­ify or cor­rupt the mem­ory pool, since it does­n’t get ac­cessed if no user pro­gram is run­ning.

The NXTs mi­cro­con­troller has a to­tal of 64 KiB of RAM. Observe that 32 KiB is half of that to­tal. If we as­sume that the firmware lays out RAM start­ing from the low­est ad­dress and go­ing up, and that the firmware uses more than 0 bytes of RAM (both very rea­son­able as­sump­tions), there is no pos­si­ble lo­ca­tion the firmware could put this mem­ory pool that does­n’t in­ter­sect with the ad­dress 32 KiB past the start of RAM, 0x00208000.

Since we don’t know ex­actly where the buffer sits in RAM, we can fill the ini­tial part of the buffer with nop (no op­er­a­tion) in­struc­tions. We put our ex­ploit code at the very end of the buffer. As long as 0x00208000 is­n’t too close to the end of the mem­ory pool, it will end up point­ing some­where in the pile of nops.

If we cause the CPU to jump to this ad­dress, the CPU will keep ex­e­cut­ing the nops un­til it fi­nally hits our code. This ex­ploita­tion tech­nique is called a NOP slide” or NOP sled”.

In or­der to test this out, we need to build a bunch of scaf­fold­ing:

This code in­vokes an ARM as­sem­bler to as­sem­ble code writ­ten in nxtpwn.s into bi­nary data, fills most of the MemoryPool with nops, and then writes the as­sem­bled code at the end.

You will need to some­how in­stall a copy of GCC and Binutils tar­get­ing ARM. Any rea­son­able ver­sion should do, but this is also part of environment setup”.

To test this, we can write the most ba­sic as­sem­bly code in nxtpwn.s:

This is an empty func­tion which does­n’t do any­thing. If we redi­rect the di­rect com­mand han­dler to it, all di­rect com­mands should stop work­ing.

How do you learn ARM as­sem­bly lan­guage?

I per­son­ally learned ARM as­sem­bly from this tu­to­r­ial a long time ago. I gen­er­ally think of learning as­sem­bly” as con­sist­ing of at least two parts: learn­ing how all CPUs work at a high level, and learn­ing how one par­tic­u­lar CPU ar­chi­tec­ture works.

For the first part, I started by learn­ing x86 as­sem­bly in or­der to hack PC soft­ware. It’s also pos­si­ble to learn from academic” com­puter sci­ence ma­te­ri­als, in­clud­ing free cur­ric­ula fo­cused around the RISC-V ar­chi­tec­ture. Here is an ex­am­ple of one I have found. It is also pos­si­ble to learn this by do­ing retro­com­put­ing for his­tor­i­cal 8-bit com­puter sys­tems, al­though those will have more dif­fer­ences from mod­ern CPUs.

Given suf­fi­cient fa­mil­iar­ity with the ba­sics of CPUs, it’s pos­si­ble to study and un­der­stand doc­u­men­ta­tion spe­cific to ARM or an­other ar­chi­tec­ture. Looking at the out­put of a C com­piler re­ally helps to build fa­mil­iar­ity and ex­pe­ri­ence.

We can use python3 -i nxtpwn.py to load the ex­ploit code be­fore drop­ping us into the Python REPL.

Before we ac­tu­ally trig­ger the ex­ploit, let’s try run­ning a direct” com­mand to make sure it works:

This should make the NXT beep.

To trig­ger the ex­ploit, we can en­ter the fol­low­ing:

This re­places that pR­CHan­dler func­tion pointer with an ad­dress in RAM as de­scribed above. Now let’s try to make the NXT beep again:

This time the NXT does­n’t beep (because we’ve re­placed the func­tion which han­dles di­rect com­mands with an empty func­tion) and re­turns dif­fer­ent (garbage) data (because our empty func­tion does­n’t set the out­put length prop­erly ei­ther).

We have suc­cess­fully achieved na­tive ARM code ex­e­cu­tion on the NXT, on an un­mod­i­fied firmware. This means that we are now free from all of the re­stric­tions the firmware nor­mally im­poses.

Native code ex­e­cu­tion means we can ac­cess any data in­side the mi­cro­con­troller, in­clud­ing the firmware. To ac­tu­ally ac­cess it, we need to re­place the di­rect com­mand han­dler with a func­tion which lets us read ar­bi­trary mem­ory ad­dresses. The di­rect com­mand han­dler turns out to be an ex­cel­lent lo­ca­tion to hi­jack be­cause it is al­ready hooked up to all the in­fra­struc­ture needed to com­mu­ni­cate to and from the PC. This greatly sim­pli­fies the work we need to do.

In the firmware source code, we can see that the orig­i­nal com­mand han­dler nor­mally takes three ar­gu­ments: the in­put buffer, the out­put buffer, and a pointer to the length of the out­put. According to the ARM , these val­ues will be stored in CPU reg­is­ters r0, r1, and r2 re­spec­tively.

That func­tion is writ­ten in C. How can you re­place it with a func­tion writ­ten in as­sem­bly? What is an ABI??

C code is turned into as­sem­bly code by a com­piler. If we’re not us­ing a com­piler, we can still write as­sem­bly code by hand.

When a C com­piler turns code into as­sem­bly, it has to fol­low cer­tain con­ven­tions in or­der for dif­fer­ent parts of the pro­gram to work to­gether prop­erly. For ex­am­ple, code which needs to make a func­tion call needs to agree with the func­tion be­ing called about where to put the func­tion ar­gu­ments. This in­for­ma­tion (as well as lots of other stuff we don’t care about right now) is spec­i­fied as part of the ABI.

Because the ARM ar­chi­tec­ture is a ar­chi­tec­ture with com­par­a­tively many” CPU reg­is­ters, func­tions with 4 or fewer ar­gu­ments ≤ 32 bits in size will have the ar­gu­ments placed into reg­is­ters r0-r3.

As long as our as­sem­bly code fol­lows the same con­ven­tions as the C code (follows the ABI), the ex­ist­ing firmware can call our code with no prob­lems.

...

Read the original on arcanenibble.github.io »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.