10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
Home • Automotive News • This Alberta Startup Sells No-Tech Tractors for Half Price
Automotive News
Stay connected via Google News
Follow us for the latest travel updates and guides.
Four hundred inquiries from American farmers poured in after a single interview. Not for a John Deere. Not for a Case IH. For a tractor built in Alberta with a remanufactured 1990s diesel engine and zero electronics.
Ursa Ag, a small Canadian manufacturer, is assembling tractors powered by 12-valve Cummins engines — the same mechanically injected workhorses that powered combines and pickup trucks decades ago — and selling them for roughly half the price of comparable machines from established brands. The 150-horsepower model starts at $129,900 CAD, about $95,000 USD. The range-topping 260-hp version runs $199,900 CAD, around $146,000.
Try finding a similarly powered John Deere for that money.
Owner Doug Wilson isn’t pretending this is cutting-edge technology. That’s the entire point. The 150-hp and 180-hp models use remanufactured 5.9-liter Cummins engines, while the 260-hp gets an 8.3-liter unit.
All are fed by Bosch P-pumps — purely mechanical fuel injection, no ECU, no proprietary software handshake required. The cabs are sourced externally and stripped to essentials: an air ride seat, mechanically connected controls, and nothing resembling a touchscreen.
This plays directly into a fight that has been simmering for years. John Deere’s right-to-repair battles became a national story when farmers discovered they couldn’t fix their own equipment without dealer-authorized software. Lawsuits followed, then legislation.
Deere eventually made concessions, but the damage was done. A generation of farmers learned exactly how much control they’d surrendered by buying machines loaded with proprietary code.
Wilson saw the gap and drove a tractor through it. The 12-valve Cummins is arguably the most widely understood diesel engine in North America. Every independent shop, every shade-tree mechanic with a set of wrenches, every farmer who grew up turning bolts has encountered one.
Parts sit on shelves in thousands of stores. Downtime — the thing that actually costs a farmer money during planting or harvest — shrinks dramatically when you don’t need a factory technician with a laptop to diagnose a fuel delivery problem.
Ursa Ag’s dealer network remains tiny, and the company sells direct. Wilson admitted they haven’t scaled up distribution because they can’t keep shelves stocked as it stands. He says 2026 production will exceed the company’s entire cumulative output, which is a bold claim from a small operation, and whether they can actually deliver is the single biggest question hanging over this story.
The U.S. market is where things get interesting. Ursa Ag has no American distributors yet, though Wilson says that’s likely to change. The easiest answer is yes, we can ship to the United States,” he told reporters.
Those 400 American inquiries after one Farms.com segment suggest the appetite is real. Farmers who have been buying 30-year-old equipment to avoid modern complexity now have a new alternative — a machine with fresh sheet metal, a warranty, and an engine philosophy rooted firmly in the past.
There’s a reason the used tractor market has been so robust. Plenty of operators looked at a $300,000 machine full of sensors and software and decided a well-maintained older unit was the smarter bet. Ursa Ag is manufacturing that bet from scratch.
Whether a small Alberta company can scale fast enough to meet demand from an entire continent is another matter. The big manufacturers have supply chains, dealer networks, and financing arms that took decades to build. Wilson has remanufactured Cummins engines and a value proposition that resonates with anyone who has ever waited three days for a dealer tech to show up with a diagnostic cable.
The farm equipment industry spent 20 years adding complexity and cost. Ursa Ag is wagering that a significant number of farmers never wanted any of it.
Stay connected via Google News
Follow us for the latest travel updates and guides.
We recently discovered a privacy vulnerability affecting all Firefox-based browsers. The issue allows websites to derive a unique, deterministic, and stable process-lifetime identifier from the order of entries returned by IndexedDB, even in contexts where users expect stronger isolation.
This means a website can create a set of IndexedDB databases, inspect the returned ordering, and use that ordering as a fingerprint for the running browser process. Because the behavior is process-scoped rather than origin-scoped, unrelated websites can independently observe the same identifier and link activity across origins during the same browser runtime. In Firefox Private Browsing mode, the identifier can also persist after all private windows are closed, as long as the Firefox process remains running. In Tor Browser, the stable identifier persists even through the “New Identity” feature, which is designed to be a full reset that clears cookies and browser history and uses new Tor circuits. The feature is described as being for users who “want to prevent [their] subsequent browser activity from being linkable to what [they] were doing before.” This vulnerability effectively defeats the isolation guarantees users rely on for unlinkability.
We responsibly disclosed the issue to Mozilla and to the Tor Project. Mozilla has quickly released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220. The underlying root cause is inherited by Tor Browser through Gecko’s IndexedDB implementation, so the issue is relevant to both products and to all Firefox-based browsers.
The fix is straightforward in principle: the browser should not expose internal storage ordering that reflects process-scoped state. Canonicalizing or sorting results before returning them removes the entropy and prevents this API from acting as a stable identifier.
Why this matters
Private browsing modes and privacy-focused browsers are designed to reduce websites’ ability to identify users across contexts. Users generally expect two things:
First, unrelated websites should not be able to tell they are interacting with the same browser instance unless a shared storage or explicit identity mechanism is involved.
Second, when a private session ends, the state associated with that session should disappear.
This issue breaks both expectations. A website does not need cookies, localStorage, or any explicit cross-site channel. Instead, it can rely on the browser’s own internal storage behavior to derive a high-capacity identifier from the ordering of database names returned by an API.
For developers, this is a useful reminder that privacy bugs do not always come from direct access to identifying data. Sometimes they come from deterministic exposure of internal implementation details.
For security and product stakeholders, the key point is simple: even an API that appears harmless can become a cross-site tracking vector if it leaks stable process-level state.
What is IndexedDB and what does indexedDB.databases() do?
IndexedDB is a browser API for storing structured data on the client side. Web applications use it for offline support, caching, session state, and other local storage needs. Each origin can create one or more named databases, which can hold object stores and large amounts of data.
The indexedDB.databases() API returns metadata about the databases visible to the current origin. In practice, developers might use it to inspect existing databases, debug storage usage, or manage application state.
Under normal privacy expectations, the order of results returned by this API should not, in itself, carry identifying information. It should simply reflect a neutral, canonical, or otherwise non-sensitive presentation of database metadata.
The issue we found comes from the fact that, in all Firefox-based browsers, the returned order was not neutral at all.
How indexedDB.databases() became a stable identifier
In all Firefox Private Browsing mode, indexedDB.databases() returns database metadata in an order derived from internal storage structures rather than from database creation order.
The relevant implementation is in dom/indexedDB/ActorsParent.cpp.
In Private Browsing mode, database names are not used directly as on-disk identifiers. Instead, they are mapped to UUID-based filename bases via a global hash table:
using StorageDatabaseNameHashtable = nsTHashMap<nsString, nsString>;
StaticAutoPtr<StorageDatabaseNameHashtable> gStorageDatabaseNameHashtable;
The mapping is performed inside GetDatabaseFilenameBase() called within OpenDatabaseOp::DoDatabaseWork().
When aIsPrivate is true, the website-provided database name is replaced with a generated UUID and stored in the global StorageDatabaseNameHashtable. This mapping:
Is keyed only by the database name string
Persists for the lifetime of the IndexedDB QuotaClient
Is shared across all origins
Is cleared only when Firefox is fully restarted
Later, when indexedDB.databases() is invoked, Firefox gathers database filenames via QuotaClient::GetDatabaseFilenames(…) called in GetDatabasesOp::DoDatabaseWork().
Database base names are inserted into an nsTHashSet.
No sorting is performed before iteration. The final result order is determined by iteration over the hash set’s internal bucket layout.
Because UUID mappings are stable for the lifetime of the Firefox process, and hash table structure and iteration order are deterministic for a given internal layout, the returned ordering becomes a deterministic function of the generated UUID values, hash function behavior, and hash table capacity and insertion history. This ordering persists across tabs and private windows, resetting only upon a full Firefox restart. Crucially, the UUID mapping and hash set iteration are not origin-scoped. They are process-scoped.
Reproducing the issue
A simple proof of concept is enough to demonstrate the behavior. Two different origins host the same script. Each script:
Creates a fixed set of named databases.
Calls indexedDB.databases().
Extracts and prints the returned order.
In affected Firefox Private Browsing and Tor Browser builds, both origins observe the same permutation during the lifetime of the same browser process. Restarting the browser changes the permutation.
Conceptually, the output looks like this:
created:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
listed:
g,c,p,a,l,f,n,d,j,b,o,h,e,m,i,k
The important point is not the exact order itself, but rather that the order is not the original creation order, that the same order appears across unrelated origins, and it persists across reloads and new private windows, even after all private windows are closed. Only a full browser restart yields a new one. That is exactly what you do not want from a privacy perspective.
Privacy impact
This issue enables both cross-origin and same-origin tracking within a single browser runtime.
Cross-origin impact
Unrelated websites can independently derive the same identifier and infer that they are interacting with the same running Firefox or Tor Browser process. That lets them link activity across domains without cookies or other shared storage.
Same-origin impact
In Firefox Private Browsing mode, the identifier can persist even after all private windows are closed, provided the Firefox process itself is still running. That means a site can recognize a later visit in what appears to be a fresh private session. In Tor Browser, the stable identifier effectively defeats Tor Browser’s “New Identity” isolation within a running browser process, allowing websites to link sessions that are expected to be fully isolated from one another.
Why this is especially serious in Tor Browser
Tor Browser is specifically designed to reduce cross-site linkability and minimize browser-instance-level identity. A stable process-lifetime identifier cuts directly against that design goal. Even if it only survives until a full process restart, that is still enough to weaken unlinkability during active use.
Entropy and fingerprinting capacity
The signal is not just stable. It also has high capacity.
If a site controls N database names, then the number of possible observable permutations is N!, with theoretical entropy of log2(N!). With 16 controlled names, the theoretical space is about 44 bits. That is far more than enough to distinguish realistic numbers of concurrent browser instances in practice.
The exact number of reachable permutations may be somewhat lower because of internal hash table behavior, but that does not materially change the security story. The exposed ordering still provides more than enough entropy to act as a strong identifier.
The fix
The right fix is to stop exposing entropy derived from the internal storage layout.
The cleanest mitigation is to return results in a canonical order, such as lexicographic sorting. That preserves the API’s usefulness for developers while removing the fingerprinting signal. Randomizing output per call could also hide the stable ordering, but sorting is simpler, more predictable, and easier for developers to reason about.
From a security engineering standpoint, an ideal fix:
Low conceptual complexity
Minimal compatibility risk
Direct elimination of the privacy leak
Responsible disclosure
We responsibly disclosed the issue to Mozilla and to the Tor Project. Mozilla has released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220. Because the behavior originates from Gecko’s IndexedDB implementation, downstream Gecko-based browsers, including Tor Browser, are also affected unless they apply their own mitigation.
Building for privacy
This vulnerability shows how a small implementation detail can create a meaningful privacy problem. The impact is significant. Unrelated websites can link activity across origins during the same browser runtime, and private-session boundaries are weakened because the identifier survives longer than users would expect.
The good news is that the fix is simple and effective. By canonicalizing the output before returning it, browsers can eliminate this source of entropy and restore the expected privacy boundary. This is exactly the kind of issue worth paying attention to: subtle, easy to miss, and highly instructive for anyone building privacy-sensitive browser features.
All characters fit within a 5 pixel square, and are safe to draw on a 6x6 grid.
The design is based off of lcamtuf’s 5x6 font-inline.h, which is itself inspired by the ZX Spectrum’s 8x8 font.
5x5 is the smallest size that doesn’t compromise legibility:
2x2: Impossible.
3x3: Technically possible, but unreadable.
4x4: Not enough to draw “E”, “M” or “W” properly.
5x5: This font.
Five by five is actually big enough to draw most lowercase letters one pixel smaller, making them visually distinct from uppercase.
Narrower 4x5 and 3x5 dimensions are possible, but would require sacrificing the M,
dotted zero, and reduce U/V/Y distinctiveness.
There’s no artistic reason to make all characters five wide just because a few must be…
but a using a constant width makes programming a lot easier:
The length of a string on screen is always 6 times the number of characters.
It also makes compact layouts much safer:
There’s no need to worry that a number will overflow because “8978” is longer than “1111″.
The whole font takes up just 350 bytes of memory,
which makes it ideally suited to 8-bit microcontrollers like the AVR128DA28 (16 kB of RAM)
These are cheap, low power and robust…
but they fall short on graphics:
Even a low-resolution 384x288 display has 110 thousand pixels:
way too big to fit in the AVRs memory.
… except most projects don’t need anywhere near that many pixels.
A 160x128 or 128x64 OLED is more practical and cheaper —
but these need hand-drawn, pixel-efficient fonts to make good use of them.
For reference, here’s a vector font rendered at a similar scale:
Antialiasing, several megabytes of code, a megabyte of font data, and it’s still terrible compared 350 hand-crafted bytes.
Real pixels:
Pixels aren’t perfect squares, so the font won’t actually look like the rendering at the top of this post:
This is it on an actual screen:
I actually really like the pseudo-dropshadow effect created by the subpixels.
This won’t happen on monochrome displays, but the font will still look smoother than you might expect.
The gaps between pixels really help sell the “e” and “g”, but this same effect should allow…
Even smaller fonts:
While 5x5 is the smallest no-compromise resolution, a 3x5 isn’t too bad:
The “M”, “W” and “Q” suffer, but it’s still got a distinct O and zero.
Something like this might actually be a good option if you need to cram (50%) more columns into a display.
That’s still readable, so what about 3x4?
At this size, there’s no way to have a distinct upper and lowercase, so I’ve picked whatever style works the best in the limited space.
The numbers have also taken a hit, but still work ok.
How about 3x3?
The main loss was the numbers, but the letters don’t include any duplicates and are somewhat recognizable.
This font is hugely improved by being displayed on real hardware:
That means it’s still too big. How about 2x3?
Ok, this is getting ridiculous.
Most letters are unrecognizable, and there are quite a few duplicates.
In case you couldn’t tell, the bottom line reads “Hello World”.
Flipping the aspect ratio to a 3x2 makes it a lot better:
More letters have horizontal detail (M, W, N, Q, G, P, etc) then have vertical detail (E, F).
The bottom line reads “you can probably read this”, although you might have to squint or zoom out.
… and for the sake of completeness, a 2x2:
On paper, there are 16 possible 2x2 images, but one of them is blank and 5 of them are shifted copies of another one.
That leaves 10, just enough to do all the digits…
but because they have no resemblance to the originals, it’s more of a secret code than a font.
Related:
/projects/mcufont/mcufont.h: The 5x5 font.
/projects/mcufont/test.c: Program to preview the font.
https://lcamtuf.coredump.cx/soft/embedded/font-inline.h: The original font.
https://moonbench.xyz/projects/tiny-pixel-art-fonts/: More tiny fonts.
12:13 PM PDT · April 22, 2026
Apple released a software update on Wednesday for iPhones and iPads fixing a bug that allowed law enforcement to extract messages that had been deleted or disappeared automatically from messaging apps. This was because notifications that displayed the messages’ content were also cached on the device for up to a month.
In a security notice on its website, Apple said that the bug meant “notifications marked for deletion could be unexpectedly retained on the device.”
This is a clear reference to an issue revealed by 404 Media earlier this month. The independent news outlet reported that the FBI had been able to extract deleted Signal messages from someone’s iPhone using forensic tools, due to the fact that the content of the messages had been displayed in a notification and then stored inside a phone’s database — even after the messages were deleted inside Signal.
After the news, Signal president Meredith Whittaker said the messaging app maker asked Apple to address the issue. “Notifications for deleted messages shouldn’t remain in any OS notification database,” Whittaker wrote in a post on Bluesky.
Contact Us
Do you have more information about how authorities are using forensic tools on iPhones or Android devices? From a non-work device, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or email.
It’s unclear why the notifications’ content was logged to begin with, but today’s fix suggests it was a bug.
Apple did not immediately respond to a request for comment asking why the notifications were being retained. The company also backported the fix to iPhone and iPad owners running the older iOS 18 software.
Privacy activists expressed alarm when they learned that the FBI had found a way around a security feature that is used daily by at-risk users. Signal, like other messaging apps such as WhatsApp, allows users to set up a timer that instructs the app to automatically delete messages after a set amount of time. This feature can be helpful for anyone who wants to keep their conversations secret in the event that authorities seize their devices.
Techcrunch event
San Francisco, CA
|
October 13 – 15, 2026
Topics
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
Lorenzo Franceschi-Bicchierai is a Senior Writer at TechCrunch, where he covers hacking, cybersecurity, surveillance, and privacy.
You can contact or verify outreach from Lorenzo by emailing lorenzo@techcrunch.com, via encrypted message at +1 917 257 1382 on Signal, and @lorenzofb on Keybase/Telegram.
View Bio
GitHub CLI sends pseudonymous telemetry to help us improve the product. We want you to understand what is being sent and why.
Why we collect telemetry
As agentic adoption of GitHub CLI grows, our team needs visibility into how features are being used in practice. We use this data to prioritize our work and evaluate whether features are meeting real user needs.
For example, when we ship a new subcommand, we want to understand whether anyone is using it and how. If adoption is low, we know we need to revisit the feature’s discoverability or design. If a subcommand sees high usage with certain flags, that tells us where to invest in a better experience.
What we collect
The following fields are included in telemetry events. Fields with a Command Scope value are only sent for commands within that scope.
How to inspect what’s being sent
GitHub CLI is open source, so you can review the telemetry implementation in the cli/cli repository.
Additionally, you can enable logging mode using either an environment variable or configuration option to see exactly what would be sent without sending it.
1. Environment variable:
export GH_TELEMETRY=log
2. CLI config:
gh config set telemetry log
In logging mode, the JSON payload that would normally be sent is printed to stderr instead. This lets you inspect every field before deciding whether to keep telemetry enabled, for example:
$ GH_TELEMETRY=log gh pr edit 42 –title “bug fix” –body “fixed a bug”
…
Telemetry payload:
{
“events”: [
{
“type”: “command_invocation”,
“dimensions”: {
“agent”: “”,
“architecture”: “arm64″,
“ci”: “false”,
“command”: “gh pr edit”,
“device_id”: “d80dc1eb-5c66 – 4bcd-bbc8 – 568e173bb977″,
“flags”: “body,title”,
“github_actions”: “false”,
“invocation_id”: “51b4383c-23b1 – 47da-91d7-dcc8aa79dd1c”,
“is_tty”: “true”,
“os”: “darwin”,
“timestamp”: “2026 – 04-22T00:00:00.000Z”,
“version”: “2.91.0″
}
}
]
}
Note that this command can only log telemetry for the exact command and context in which it ran. For example, changing environment variables, or authenticated accounts may change the events, and event dimensions that are included in the payload.
How to opt out
There are three ways to disable telemetry:
1. Set the GH_TELEMETRY environment variable (any falsy value works: 0, false, disabled, or an empty string):
export GH_TELEMETRY=false
2. Use the DO_NOT_TRACK convention:
export DO_NOT_TRACK=true
3. Use the CLI config:
gh config set telemetry disabled
Note: The environment variables (options 1 and 2) take precedence over the config value.
Where data is sent
Telemetry events are sent to GitHub’s internal analytics infrastructure. For more information about how GitHub handles your data, see the GitHub General Privacy Statement.
Additional information
GitHub CLI allows you to add features to the product by installing GitHub and third-party extensions, including agents. These extensions may be collecting their own usage data and are not controlled by opting out. Consult the specific extension’s documentation to learn about its telemetry reporting and whether it can be disabled.
This page describes client-side data collection for the GitHub CLI (gh). It does not apply to GitHub Copilot or the Copilot CLI, which handle data collection separately. For information on the Copilot CLI, see Using GitHub Copilot CLI and Responsible Use of the GitHub Copilot CLI.
The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale.
General summary
Google is launching its eighth-generation Tensor Processor Units, featuring two specialized chips: the TPU 8t for massive model training and the TPU 8i for high-speed inference. These chips are purpose-built to handle the complex, iterative demands of AI agents while delivering significant gains in power efficiency and performance. You can request more information now to prepare for their general availability later this year.
Summaries were generated by Google AI. Generative AI is experimental.
Bullet points
Google’s new eighth generation TPUs, TPU 8t and 8i, power the next era of AI.
The TPU 8t is a training powerhouse built to speed up complex model development.
The TPU 8i specializes in low-latency inference to support fast, collaborative AI agents.
Both chips use custom hardware to deliver better performance and energy efficiency than before.
These new systems will be available later this year to help scale your AI workloads.
Summaries were generated by Google AI. Generative AI is experimental.
Basic explainer
Google just announced its eighth generation of custom AI chips, the TPU 8t and TPU 8i. These chips are built to handle the heavy lifting required for training massive AI models and running complex AI agents. By specializing each chip for either training or performance, Google makes AI faster and more energy-efficient. This new hardware helps developers build smarter tools that can reason and solve problems more effectively.
Summaries were generated by Google AI. Generative AI is experimental.
Your browser does not support the audio element.
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Today at Google Cloud Next, we are introducing the eighth generation of Google’s custom Tensor Processor Unit (TPU), coming soon with two distinct, purpose-built architectures for training and inference: TPU 8t and TPU 8i. These two chips are designed to power our custom-built supercomputers, to drive everything from cutting-edge model training and agent development, to massive inference workloads. TPUs have been powering leading foundation models, including Gemini, for years. These 8th generation TPUs together will deliver scale, efficiency and capabilities across training, serving and agentic workloads.
In this age of AI agents, models must reason through problems, execute multi-step workflows and learn from their own actions in continuous loops. This places a new set of demands on infrastructure, and TPU 8t and TPU 8i were designed in partnership with Google DeepMind to take on the most demanding AI workloads and adapt to evolving model architectures at scale.
TPUs set the standard for a number of ML supercomputing components including custom numerics, liquid cooling, custom interconnects and more, and our eighth generation TPUs are the culmination of more than a decade of development. The key insight behind the original TPU design continues to hold today: by customizing and co-designing silicon with hardware, networking and software, including model architecture and application requirements, we can deliver dramatically more power efficiency and absolute performance.
We are thrilled to see how a decade of innovation translates into real-world breakthroughs. Today, pioneering organizations like Citadel Securities are pushing the boundaries of what’s possible, choosing TPUs to power their cutting-edge AI workloads:
Two chips to meet the moment
Hardware development cycles are much longer than software. With each generation of TPUs, we need to consider what technologies and demands will exist by the time they are brought to market. Several years ago, we anticipated rising demand for inference from customers as frontier AI models are deployed in production and at scale. And with the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving.
TPU 8t shines at massive, compute-intensive training workloads designed with larger compute throughput and more scale-up bandwidth. TPU 8i is designed with more memory bandwidth to serve the most latency-sensitive inference workloads, which is critical because interactions between agents at scale magnify even small inefficiencies.
Importantly, both chips can run various workloads, but specialization unlocks significant efficiencies and gains.
TPU 8t: The training powerhouse
TPU 8t is built to reduce the frontier model development cycle from months to weeks. By balancing the highest possible compute throughput, shared memory and interchip bandwidth with the best possible power efficiency and productive compute time, we have crafted a system that delivers nearly 3x the compute performance per pod over the previous generation, enabling faster innovation to ensure our customers continue to set the pace for the industry.
Massive scale: A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.
Maximum utilization: By also integrating 10x faster storage access, combined with TPUDirect to pull data directly into the TPU, TPU 8t helps ensure maximum utilization of the end-to-end system.
Near-linear scaling: Our new Virgo Network, combined with JAX and our Pathways software, means TPU 8t can provide near-linear scaling for up to a million chips in a single logical cluster.
In addition to raw performance, TPU 8t is engineered to target over 97% “goodput” — a measure of useful, productive compute time — through a comprehensive set of Reliability, Availability and Serviceability (RAS) capabilities. These include real-time telemetry across tens of thousands of chips, automatic detection and rerouting around faulty ICI links without interrupting a job, and Optical Circuit Switching (OCS) that reconfigures hardware around failures with no human intervention.
Every hardware failure, network stall or checkpoint restart is time the cluster is not training, and at frontier training scale, every percentage point can translate into days of active training time.
TPU 8i: The reasoning engine
In the agentic era, users expect to be able to ask questions, delegate tasks and get outcomes. TPU 8i is designed to handle the intricate, collaborative, iterative work of many specialized agents, often “swarming” together in complex flows to deliver solutions and insights for the most challenging tasks. We redesigned the stack to eliminate the “waiting room” effect through four key innovations:
Breaking the “memory wall”: To stop processors from sitting idle, TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM — 3x more than the previous generation — keeping a model’s active working set entirely on-chip.
Axion-powered efficiency: We doubled the physical CPU hosts per server, moving to our custom Axion Arm-based CPUs. By using a non-uniform memory architecture (NUMA) for isolation, we have optimized the full system for superior performance.
Scaling MoE models: For modern Mixture of Expert (MoE) models, we doubled the Interconnect (ICI) bandwidth to 19.2 Tb/s. Our new Boardfly architecture reduces the maximum network diameter by more than 50%, ensuring the system works as one cohesive, low-latency unit.
Eliminating lag: Our new on-chip Collectives Acceleration Engine (CAE) offloads global operations, reducing on-chip latency by up to 5x, minimizing lag.
These innovations deliver 80% better performance-per-dollar compared to the previous generation, enabling businesses to serve nearly twice the customer volume at the same cost.
TPU 8i hierarchical Boardfly topology building up from a building block of four fully connected chips into a fully connected group of eight boards, with 36 of such groups fully connected into a TPU 8i pod
Co-designed for Gemini, open for everyone
This eighth generation TPU is also the latest expression of our co-design philosophy, where every spec is built to solve AI’s biggest hurdles.
Boardfly topology was designed specifically for the communication demands of today’s most capable reasoning models.
SRAM capacity in TPU 8i was sized for the KV cache footprint of reasoning models at production scale.
Virgo Network fabric’s bandwidth targets were derived from the parallelism requirements of trillion-parameter training.
And for the first time, both chips run on Google’s own Axion ARM-based CPU host, allowing us to optimize the full system, not just the chip, for performance and efficiency.
Both platforms support native JAX, MaxText, PyTorch, SGLang and vLLM — the frameworks developers already use — and offer bare metal access, giving customers direct hardware access without the overhead of virtualization. Open-source contributions including MaxText reference implementations and Tunix for reinforcement learning support turn key paths between capability and production deployment.
Designing for power efficiency at scale
In today’s data centers, power, not just chip supply, is a binding constraint. To solve this, we have optimized efficiency across the entire stack, with integrated power management that dynamically adjusts the power draw based on real-time demand. TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation, Ironwood.
But efficiency at Google is not just a chip-level metric; it’s also a system-level commitment that runs from silicon to the data center. For example, we integrate network connectivity with compute on the same chip, significantly reducing the power costs of moving data across the TPU pod. Even our data centers are co-designed with our TPUs. We innovated across hardware and software to enable our data centers to deliver six times more computing power per unit of electricity than they did just five years ago.
TPU 8t and TPU 8i continue that trajectory. Both are supported by our fourth-generation liquid cooling technology that sustains performance densities air cooling cannot. By owning the full stack, from Axion host to accelerator, we can optimize system-level energy efficiency in ways that simply cannot be achieved when the host and chip are designed independently.
Google Cloud’s fourth generation cooling distribution unit
Infrastructure for the agentic era
Every major computing transition has required infrastructure breakthroughs, and the agentic era is no different. Infrastructure must evolve to meet the demands of autonomous agents operating in continuous loops of reasoning, planning, execution and learning.
TPU 8t and TPU 8i are our answer to this challenge: two specialized architectures built to redefine what is possible in AI, from building the most capable AI models, to swarms of agents perfectly orchestrated, to managing the most complex reasoning tasks. Both chips will be generally available later this year, and can be used as part of Google’s AI Hypercomputer, which brings together purpose-built hardware (compute, storage, networking), open software (frameworks, inference engines), and flexible consumption (orchestration, cluster management and delivery models) into a unified stack.
Agentic computing will redefine what is possible. We are thrilled to announce the latest incarnation of our relentless innovation to power this transformation, TPU 8i and 8t. Interested customers can request more information.
Get more stories from Google in your inbox.
Done. Just one step more.
Check your inbox to confirm your subscription.
You are already subscribed to our newsletter.
You can also subscribe with a
Code for this post is available here.
AI-assisted coding has become the norm and with tools like Cursor, GitHub Copilot, Claude Code, Codex, we are increasingly letting models touch our code. If you have used any of these tools in the past year, you have probably experienced something like this: you ask the model to fix a simple bug (perhaps a single off-by-one error, or maybe a wrong operator). The model fixes the bug but half the function has been rewritten. An extra helper function has appeared. A perfectly reasonable variable name has been renamed. New input validation has been added. And the diff is enormous.
I refer to this as the Over-Editing problem where models have the tendency to rewrite code that didn’t need rewriting. This matters more than it might seem. Code review is already a bottleneck and reviewers need to understand what changed, why it changed, and whether the change is safe. A model that rewrites entire functions, even correctly, makes this job dramatically harder as the code is now completely unrecognizable.
In this post, I will investigate this problem: whether existing LLMs have a tendency to over-edit and whether we can train models to be more faithful editors.
Over-Editing
Over-editing refers to a model modifying code beyond what is strictly necessary to fix the problem at hand. To be precise: a model is over-editing if its output is functionally correct but structurally diverges from the original code more than the minimal fix requires.
The example in Figure 1 illustrates this well. The bug is a single off-by-one error in a range() call (range(len(x) - 1) should be range(len(x))) and the correct fix is a single line. GPT-5.4 (with high reasoning effort) responds by rewriting the entire function: it adds explicit None checks, introduces np.asarray conversions with dtype=float, adds finite-value masking, validates array sizes, changes the curve_fit call signature, and replaces the plotting logic entirely. While the output passes the tests and is functionally correct, the diff is enormous, and none of those additions were asked for or even necessary.
It helps to think about this in terms of the kind of work being done. Software engineering broadly splits into two modes: green-field (building something new from scratch) and brown-field (working within an existing codebase). Specifically in brown-field, the existing code has been understood by the team and has been deliberately written the way it was. The model’s job is to fix the issue and nothing else.
A common piece of advice for working with AI coding tools is to simply write more tests because if the tests pass, the code is fine. However, Over-editing is a brown-field failure where unlike correctness failures, it is invisible to test suites. As models generate more code, engineers have more to review and over-editing makes that harder. There is more complex logic to parse, more lines of code to read, and a higher chance that overall codebase quality quietly degrades.
Measuring Over-Editing
To study over-editing, we first need a dataset of code edits where the “ground truth” edit is well-defined with some degree of “minimality”. Rather than using another LLM to introduce bugs (which is what most existing benchmarks do), we programmatically corrupt 400 problems from BigCodeBench which gives us more fine-grained control: things like flipping a comparison operator (< → <=), swapping + for -, or changing boolean values (True → False).1 Each corrupted sample remains syntactically valid and verified to break the corresponding test cases. This ensures that the ground truth edit is exactly the reversal of the corruption and nothing more, thus making this edit minimal by construction. We can then evaluate not just whether a model fixes the bug, but how much else it changed in the process.
Metrics
Most coding benchmarks evaluate models on correctness using some variant of Pass@1. However, Pass@1 is necessary but not sufficient. A model can score perfectly on Pass@1 while completely rewriting every function it touches. For this experiment, we need metrics that capture how much the model changed beyond what was required.
Token-level Levenshtein Distance. Unlike standard Levenshtein which counts the minimum number of character insertions, deletions, and substitutions to transform one string into another, we use a Python token-level variant. The code is first passed through Python’s tokenizer, which splits it into its atomic syntactic units (def, add, (, a, ,, b, ), :, return, a, +, b). Levenshtein is then computed over this token sequence rather than raw characters.
For example, consider the following two functions:
def add(a, b): def someotherfunctionname(a, b):
return a + b return a + b
Character-level Levenshtein gives a distance of 19. Token-level Levenshtein gives a distance of 1 since someotherfunctionname becomes a single token. We normalize by total token count so scores are comparable across functions of different lengths.
In addition, rather than simply comparing the model’s output to the ground truth, we compare both against the corrupted input. Let $C$ be the corrupted solution, $G$ the ground truth, and $M$ the model’s output. The true minimal edit (simply the reversal of the corruption) is $D_{\text{true}} = d(G, C)$ and the model’s edit is $D_{\text{model}} = d(M, C)$, giving a relative patch score:
Values closer to zero indicate the model’s patch resembles the true minimal fix. The intuition is that we can interpret the original uncorrupted solution as the best possible edit to the corrupted solution, compute the scores for this best possible patch, and then compare with the model’s output.
Added Cognitive Complexity. Cognitive Complexity (an improvement over Cyclomatic Complexity) measures how hard code is to understand. It penalizes nesting, recursion, mixed logical operators, and non-obvious control flow. For example a straight line of code with no branches is much easier to read than something that requires a reader to hold state, such as an if, a loop, or try/except. An example is shown below:
def process(items):
result = []
for item in items: # +1
if item > 0: # +2 (nesting penalty: inside a loop)
if item % 2 == 0: # +3 (nesting penalty: two levels deep)
result.append(item)
return result
# Cognitive Complexity: 6
Since all our corruptions change values rather than structure, the correct fix should always add zero Cognitive Complexity. Any increase in the model’s output was introduced unprompted and is unnecessary. We report the absolute difference between the model’s output and the original, which should be zero for a faithful minimal edit. Values below 0 are also unwanted as unnecessary simplifications to code are also undesirable.
Do Models Over-Edit?
Yes, even frontier ones.
Among the latest frontier models, GPT-5.4 over-edits the most. It has a Levenshtein of 0.39 in reasoning mode and 0.33 in non-reasoning, with Added Cognitive Complexity of 2.31 and 1.56 respectively. Despite this, its Pass@1 is only 0.723 and 0.770, making it one of the weakest correctness performers too. Claude Opus 4.6 achieves the highest Pass@1 of any model evaluated (0.912 reasoning, 0.900 non-reasoning) while also producing the smallest diffs with Levenshtein of 0.06 and 0.08, Added Cognitive Complexity of 0.20 and 0.31. Gemini 3.1 Pro Preview sits in similar territory, with GLM 5 arguably the most conservative model among the open weight ones.
Does Prompting Help?
Many papers that claim to uncover a new LLM failure mode do not first test whether the model can do the task when asked directly. A behavior that looks impossible in one setup may be easy under an explicit prompt, so I investigate the impact of adding “IMPORTANT: Try to preserve the original code and the logic of the original code as much as possible” to the prompt.
With explicit prompting, every model improves and reduces its Levenshtein Distance, and with the exception of DeepSeek R1/v3, also improves its Pass@1. One interpretation is that the constraint to make minimal edits inadvertently narrows the search space of possible fixes, steering models toward the kind of precise, targeted change that is more likely to be correct. The effect on Levenshtein Distance is much more pronounced in reasoning models, which is likely the result of their stronger instruction following ability.
Does Reasoning Mean Overthinking and Over-Editing?
Reasoning models are generally assumed to be better at coding tasks, and they do score higher on Pass@1. But since we are interested in the style of these edits, we need to look at the results through a different lens.
Figure 3 groups the models into pairs where each pair contains a reasoning and non-reasoning model from the same family. For each pair, we plot the Levenshtein Distance of only the samples where both models get the answer correct. This allows us to isolate edit minimality from correctness since a model that fails more often has fewer samples to over-edit on, which would otherwise bias the comparison.
In the generic setting (top), reasoning models over-edit more than their non-reasoning counterparts in the majority of pairs. DeepSeek V3, GPT-5, GPT-5.4, Gemini 3.1 Pro Preview, Qwen 3.6 Plus, and Kimi 2.5 all show the reasoning bar sitting higher. Reasoning models seems to naturally have more elaborate rewrites where the model reasons its way into a “better” implementation rather than a minimal fix. The notable exception is Claude Opus 4.6, where the reasoning variant edits substantially less than its non-reasoning counterpart.
In the explicit setting (bottom), the picture changes considerably. Once models are told to preserve the original code, reasoning models have much lower Levenshtein Distance than their non-reasoning counterparts and match or undercut them in almost every pair. Claude Opus 4.6 (reasoning) drops to the lowest Levenshtein of any model in this setting. GPT-5 and GPT-5.4 both see their reasoning variants fall significantly, though GPT-5.4’s non-reasoning model still edges ahead.
Therefore, the takeaway is that the default behavior of most reasoning models is to over-edit. Left unconstrained, the extended reasoning gives models more room to “improve” code that doesn’t need improving. But, that same reasoning capacity also makes them better at following the constraint once it is given. The gap between the generic and explicit setting is consistently larger for reasoning models, which suggests the over-editing is not a fundamental limitation but rather a default behavior that can be overridden.
Training
A natural next question: can we train models to be more faithful editors? For this experiment, I start with Qwen3 4B 2507 Instruct as the base model. I use both 0-shot and 8-shot prompting together with the explicit instruction to preserve the original code as baselines. All other methods are prompted in the generic setting without the explicit instruction during evaluation.
Setup
I first create a synthetic dataset of corrupted problems from DeepCoder using the same approach detailed above. In addition to this programmatically generated dataset, I also use the base Qwen3 4B 2507 Instruct model to create a synthetic dataset via self-distillation. Concretely, I prompt the model to generate 8 completions per problem, keeping only the samples that are functionally correct and ranking them by Levenshtein Distance. The model is then trained without the explicit instruction similar to Context Distillation.
We evaluate 4 different methods:
SFT: Supervised fine-tuning directly on the programmatically generated dataset.
rSFT: Rejection-sampled SFT where we train on the completions with the 3 lowest Levenshtein Distances for each sample from the self-distillation dataset.
DPO: Preference optimization between the completions with the highest and lowest Levenshtein Distances for each sample from the self-distillation dataset.
RL: Reinforcement learning with a reward combining functional correctness and Levenshtein-based edit minimality. The reward structure is a weighted sum of the Levenshtein Distance and a penalty for failing to pass the test cases:
r = r_edit + 0.1 # if generation passes all test cases
r = -0.2 # otherwise
# r_edit is normalized Levenshtein-based reward
Does It Work?
On the first attempt, SFT is almost suspiciously good as the resultant model seems to have perfectly learned the task. I found this extremely surprising and had the initial hypothesis that the model was just memorizing the reversal for this set of corruptions rather than learning a general minimal editing behavior. As a result, I re-created both synthetic datasets but instead using a completely different set of corruptions than the evaluation set to test for generalization. The core hypothesis was that the model was simply learning to reverse a particular set of corruptions.
Does It Generalize?
SFT collapses entirely out-of-domain. Pass@1 drops to 0.458 as the model has learned to make specific minimal changes regardless of whether they fix anything. rSFT and DPO are both better but the overall improvement is slight compared to the 8-shot baseline. This indicates that training on traces distilled from the base model itself is sufficient to induce some degree of generalization. RL is the only method that generalizes cleanly, improving on all three metrics over both baselines. The fact that the RL model has larger improvements on Levenshtein Distance and Added Cognitive Complexity than on Pass@1 is further evidence that it is not just memorizing corruption reversals but has actually generalized to minimal editing.
Given the SFT model’s inability to even fix bugs, we also wanted to look at Catastrophic Forgetting. Specifically, whether fine-tuning for minimal editing degrades general coding ability. We evaluate all fine-tuned models on LiveCodeBench v6 and compare against the original pretrained model. Ideally, performance should remain similar after training.
SFT shows a 43% performance degradation, which aligns with our earlier finding that it can no longer identify and fix basic bugs. The rSFT and DPO models experience slight degradation, indicating that even though they were trained on samples generated by the original model, the nature of the task still results in some degree of Catastrophic Forgetting. The RL model, however, does not experience any degradation. Combined with the fact that it also performs the task best, RL is able to teach the model a new behavior without degrading previously acquired abilities. This aligns with broader work showing that SFT memorizes while RL generalizes.
Inspired by other work showing that RL’s has a bias towards KL-minimal solutions reduces forgetting, we can interpret these results from a distributional perspective. Specifically, the distribution of the programmatically generated dataset is very different from the model’s original distribution. As a result, the SFT model’s distribution has been heavily modified and thus suffers from Catastrophic Forgetting. In contrast, for both rSFT and DPO, the distribution of the self-distilled dataset is more aligned and is thus less heavy-handed in nature when shaping the trained model’s distribution. Therefore, it is likely that the degree of Catastrophic Forgetting is proportional to the difference between the model’s original distribution and the distribution of the task training data.
Additional Experiments
RL with LoRA: Do We Need Full Fine-Tuning?
Given that this task is less about teaching the model new knowledge and more about tuning its style on an existing task, we also wanted to explore whether LoRA would be sufficient. Since the base model already has the capability to edit code and fix bugs, full fine-tuning might not be necessary.
The results support the hypothesis. LoRA at rank 64 nearly matches full RL on Levenshtein Distance and beats it on Added Cognitive Complexity. LiveCodeBench dips slightly at low ranks but rank 64 is effectively flat, and full RL remains best overall. There is a clean monotonic trend as rank increases where both Levenshtein and Added CC fall steadily from rank 1 to rank 64. The rate of improvement is not uniform though, as the biggest gains happen early. Rank 1 to 16 accounts for most of the Levenshtein reduction (0.166 → 0.087), while rank 16 to 64 closes the remaining gap more gradually (0.087 → 0.051). Ranks 1 and 8 also trade correctness for edit minimality which could be explained by a lack of sufficient capacity to learn both reward functions and instead bias towards the higher-reward edit minimality.
This is consistent with the idea that a small number of additional parameters is enough to shift the model’s editing behavior and more capacity beyond a certain point yields diminishing returns. For style-level behavioral changes where the underlying capability is already present, LoRA is likely sufficient and considerably cheaper to run.
The original version of the reward function had a bug where rollouts with no successful execution were given a hardcoded reward of 0. This ended up being a higher reward than rollouts with successful execution since the Levenshtein distance was negated to make it “higher is better.” I found it interesting that even with this buggy reward function, full RL was still able to learn the task. Only with LoRA did the model fail to learn it, seemingly reward hacking by learning to never output functionally correct code which triggered an investigation into the environment. With the fixed reward function, the results of full RL improved only slightly.
Does It Scale?
Lastly, to validate the results across larger models, I apply the same RL recipe using the Out-of-Domain data onto the larger Qwen3 14B model. Even at larger parameter counts, there are performance gains across the board with higher Pass@1, lower Levenshtein Distance, lower Added Cognitive Complexity, and no indication of Catastrophic Forgetting. This gives me the confidence that such a recipe for the task of Minimal Code Editing can be extended to various models of different scales.
Final Thoughts
It is notable that, despite being a frontier model, GPT 5.4 struggles on the minimal editing task, especially in the generic setting and relative to Opus 4.6. Figure 3, however, shows that it sees one of the largest gains when explicitly prompted in reasoning mode, second only to its predecessor GPT 5, which suggests strong instruction following capabilities. By contrast, Opus 4.6 shows one of the smallest improvements, though that may simply reflect its already strong baseline performance. This pattern fits the broader view that while GPT 5.4 often defaults to overly verbose code (“slop”), its behavior can be steered effectively with proper prompting.
Taken together, the results suggest that Over-Editing is both widespread and measurable. At the same time, the prompting results show that this is not purely a capability limitation. Especially for reasoning models, a simple instruction to preserve the original code leads to much more faithful edits, which is an encouraging sign that when models like GPT 5.4 over-edit, they can still be steered toward higher-quality code.
Further, the training results suggest that this behavior can be improved. Reinforcement learning produced more faithful editors without degradation in general coding ability, and those gains held up across both the 4B and 14B Qwen3 models.
Admittedly, the field of code benchmarks has gone on from simple single function evaluations to more agentic evaluation paradigms like SWE-Bench Pro. Relative to those, evaluating bug fixes in isolated functions is still a fairly contained task given the nature of the bugs.
Even so, in my experience, despite the prevalence of Over-Editing across all frontier coding models today, it has long been difficult to quantify in realistic settings. I hope this work can serve as a first step toward evaluating and improving the minimal editing capability of coding models, and ultimately the overall quality of AI-generated code.
Acknowledgements:I am grateful to my supervisor A/P Min-Yen Kan and my advisor Tongyao Zhu for their guidance, and to Prime Intellect for sponsoring the compute and API costs of this project.
The full list of corruptions can be found in the code. ↩︎
The full list of corruptions can be found in the code. ↩︎
© 2026. All rights reserved.
I am building a cloud
2026 – 04-22
Today is fundraising announcement day. As is the nature of writing for a larger audience, it is a formal, safe announcement. As it should be. Writing must necessarily become impersonal at scale. But I would like to write something personal about why I am doing this. What is the goal of building exe.dev? I am already the co-founder of one startup that is doing very well, selling a product I love as much as when I first helped design and build it.
What could possess me to go through all the pain of starting another company? Some fellow founders have looked at me with incredulity and shock that I would throw myself back into the frying pan. (Worse yet, experience tells me that most of the pain is still in my future.) It has been a genuinely hard question to answer because I start searching for a “big” reason, a principle or a social need, a reason or motivation beyond challenge. But I believe the truth is far simpler, and to some I am sure almost equally incredulous.
I like computers.
In some tech circles, that is an unusual statement. (“In this house, we curse computers!”) I get it, computers can be really frustrating. But I like computers. I always have. It is really fun getting computers to do things. Painful, sure, but the results are worth it. Small microcontrollers are fun, desktops are fun, phones are fun, and servers are fun, whether racked in your basement or in a data center across the world. I like them all.
So it is no small thing for me when I admit: I do not like the cloud today.
I want to. Computers are great, whether it is a BSD installed directly on a PC or a Linux VM. I can enjoy Windows, BeOS, Novell NetWare, I even installed OS/2 Warp back in the day and had a great time with it. Linux is particularly powerful today and a source of endless potential. And for all the pages of products, the cloud is just Linux VMs. Better, they are API driven Linux VMs. I should be in heaven.
But every cloud product I try is wrong. Some are better than others, but I am constantly constrained by the choices cloud vendors make in ways that make it hard to get computers to do the things I want them to do.
These issues go beyond UX or bad API design. Some of the fundamental building blocks of today’s clouds are the wrong shape. VMs are the wrong shape because they are tied to CPU/memory resources. I want to buy some CPUs, memory, and disk, and then run VMs on it. A Linux VM is a process running in another Linux’s cgroup, I should be able to run as many as I like on the computer I have. The only way to do that easily on today’s clouds is to take isolation into my own hands, with gVisor or nested virtualization on a single cloud VM, paying the nesting performance penalty, and then I am left with the job of running and managing, at a minimum, a reverse proxy onto my VMs. All because the cloud abstraction is the wrong shape.
Clouds have tried to solve this with “PaaS” systems. Abstractions that are inherently less powerful than a computer, bespoke to a particular provider. Learn a new way to write software for each compute vendor, only to find half way into your project that something that is easy on a normal computer is nearly impossible because of some obscure limit of the platform system buried so deep you cannot find it until you are deeply committed to a project. Time and again I have said “this is the one” only to be betrayed by some half-assed, half-implemented, or half-thought-through abstraction. No thank you.
Consider disk. Cloud providers want you to use remote block devices (or something even more limited and slow, like S3). When remote block devices were introduced they made sense, because computers used hard drives. Remote does not hurt sequential read/write performance, if the buffering implementation is good. Random seeks on a hard drive take 10ms, so 1ms RTT for the Ethernet connection to remote storage is a fine price to pay. It is a good product for hard drives and makes the cloud vendor’s life a lot easier because it removes an entire dimension from their standard instance types.
But then we all switched to SSD. Seek time went from 10 milliseconds to 20 microseconds. Heroic efforts have cut the network RTT a bit for really good remote block systems, but the IOPS overhead of remote systems went from 10% with hard drives to more than 10x with SSDs.
It is a lot of work to configure an EC2 VM to have 200k IOPS, and you will pay $10k/month for the privilege. My MacBook has 500k IOPS. Why are we hobbling our cloud infrastructure with slow disk?
Finally networking. Hyperscalers have great networks. They charge you the earth for them and make it miserable to do deals with other vendors. The standard price for a GB of egress from a cloud provider is 10x what you pay racking a server in a normal data center. At moderate volume the multiplier is even worse. Sure, if you spend $XXm/month with a cloud the prices get much better, but most of my projects want to spend $XX/month, without the little m. The fundamental technology here is fine, but this is where limits are placed on you to make sure whatever you build cannot be affordable.
Finally, clouds have painful APIs. This is where projects like K8S come in, papering over the pain so engineers suffer a bit less from using the cloud. But VMs are hard with Kubernetes because the cloud makes you do it all yourself with lumpy nested virtualization. Disk is hard because back when they were designing K8S Google didn’t really even do usable remote block devices, and even if you can find a common pattern among clouds today to paper over, it will be slow. Networking is hard because if it were easy you would private link in a few systems from a neighboring open DC and drop a zero from your cloud spend. It is tempting to dismiss Kubernetes as a scam, artificial make work designed to avoid doing real product work, but the truth is worse: it is a product attempting to solve an impossible problem: make clouds portable and usable. It cannot be done.
You cannot solve the fundamental problems with cloud abstractions by building new abstractions on top. Making Kubernetes good is inherently impossible, a project in putting (admittedly high quality) lipstick on a pig.
We have been muddying along with these miserable clouds for 15 years now. We make do, in the way we do with all the unpleasant parts of our software stack, holding our nose whenever we have to deal with and trying to minimize how often that happens.
This however, is the moment to fix it.
This is the moment because something has changed: we have agents now. (Indeed my co-founder Josh and I started tinkering because we wanted to use LLMs in programming. It turns out what needs building for LLMs are better traditional abstractions.) Agents, by making it easiest to write code, means there will be a lot more software. Economists would call this an instance of Jevons paradox. Each of us will write more programs, for fun and for work. We need private places to run them, easy sharing with friends and colleagues, minimal overhead.
With more total software in our lives the cloud, which was an annoying pain, becomes a much bigger pain. We need a lot more compute, we need it to be easier to manage. Agents help to some degree. If you trust them with your credentials they will do a great job driving the AWS API for you (though occasionally it will delete your production DB). But agents struggle with the fundamental limits of the abstractions as much as we do. You need more tokens than you should and you get a worse result than you should. Every percent of context window the agent spends thinking about how to contort classic clouds into working is context window is not using to solve your problem.
So we are going to fix it. What we have launched on exe.dev today addresses the VM resource isolation problem: instead of provisioning individual VMs, you get CPU and memory and run the VMs you want. We took care of a TLS proxy and an authentication proxy, because I do not actually want my fresh VMs dumped directly on the internet. Your disk is local NVMe with blocks replicated off machine asynchronously. We have regions around the world for your machines, because you want your machines close. Your machines are behind an anycast network to give all your global users a low latency entrypoint to your product (and so we can build some new exciting things soon).
There is a lot more to build here, from obvious things like static IPs to UX challenges like how to give you access to our automatic historical disk snapshots. Those will get built. And at the same time we are going right back to the beginning, racking computers in data centers, thinking through every layer of the software stack, exploring all the options for how we wire up networks.
So, I am building a cloud. One I actually want to use. I hope it is useful to you.
An attempt to detect AI design patterns in Show HN pages
Apr 20, 2026
When browsing Hacker News, I noticed that many Show HN projects now have a generic sterile feeling that tells me they are purely AI-generated.
Initially I couldn’t tell what it was exactly, so I wondered if we could automatically quantify this subjective feeling by scoring 500 Show HN pages for AI design patterns.
Claude Code has led to a large increase in Show HN projects. So much, that the moderators of HN had to restrict Show HN submissions for new accounts.
Here is how the Show HN submissions increased over the last few years:
Update: dang pointed out that the March 2026 dip correlates with the rollout of /showlim, the view newer accounts now see.
That should give us plenty of pages to score for AI design patterns.
AI design patterns
A designer recently told me that “colored left borders are almost as reliable a sign of AI-generated design as em-dashes for text”, so I started to notice them on many pages.
Then I asked some more designer friends what they think are common AI patterns.
The answers can be roughly grouped into fonts, colors, layout quirks, and CSS patterns.
Fonts
Inter used for everything, but especially the centered hero headlines
LLM tend to use certain font combos like Space Grotesk, Instrument Serif and Geist
Serif italic for one accent word in an otherwise-Inter hero
Colors
“VibeCode Purple”
Perma dark mode with medium-grey body text and all-caps section labels
Barely passing body-text contrast in dark themes
Gradient everything
Large colored glows and colored box-shadows
Layout quirks
Centered hero set in a generic sans
Badge right above the hero H1
Colored borders on cards, on the top or left edge
Identical feature cards, each with an icon on top
Numbered “1, 2, 3” step sequences
Stat banner rows
Sidebar or nav with emoji icons
All-caps headings and section labels
CSS patterns
shadcn/ui
Glassmorphism
A few examples from the Show HN submissions:
Detecting AI design in Show HN submissions
Now we can try to systematically score for these patterns by going through 500 of the latest Show HN submissions and scoring their landing pages against the list above.
Here is the scoring method:
A headless browser loads each site (Playwright)
A small in-page script analyzes the DOM and reads computed styles
Every pattern is a deterministic CSS or DOM check. I intentionally do not take screenshots and let the LLM judge them.
This ultimately also leads to false positives, but my manual QA run verified it’s maybe 5 – 10%.
If there is any interest in open sourcing the scoring code to replicate (and improve) the run or score your own site, let me know.
Results
A single pattern doesn’t necessarily make a site AI-generated, so I grouped them into three tiers based on how many of the 15 patterns they trigger:
Heavy slop (5+ patterns) · 105 sites · 21%
Mild (2 – 4) · 230 sites · 46%
Clean (0 – 1) · 165 sites · 33%
Is this bad? Not really, just uninspired. After all, validating a business idea was never about fancy design, and before the AI era, everything looked like Bootstrap.
There is a difference between trying to craft your own design and just shipping with whatever defaults the LLMs output. And the same has been the case pre-LLM when using CSS/HTML templates.
I guess people will get back to crafting beautiful designs to stand out from the slop. On the other hand, I’m not sure how much design will still matter once AI agents are the primary users of the web.
This post is human-written, the scoring and analysis were AI-assisted.
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.